Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Support TCK streamlines file format #486

Merged
merged 32 commits into from
Apr 18, 2017

Conversation

MarcCote
Copy link
Contributor

@MarcCote MarcCote commented Sep 13, 2016

This PR adds support for MRtrix streamlines file to the Streamlines API (#391).

Eventually, in a future PR, I will add some functions that allows easy conversion between all different formats. For now, going from TRK to TCK is easy. See
https://gist.github.com/MarcCote/ea6842cc4c3950f7596fc3c8a0be0154

Converting from TCK to TRK requires additional information from a reference anatomy. See
https://gist.github.com/MarcCote/ebf0ae83a9202eb96d1dfbb949cd98af

EDIT: Added links to some Gist code for streamlines conversion.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) to 95.934% when pulling ce2a2e9 on MarcCote:streamlines_tck into ec4567f on nipy:master.

@codecov-io
Copy link

codecov-io commented Sep 13, 2016

Codecov Report

Merging #486 into master will increase coverage by 0.2%.
The diff coverage is 98.14%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #486     +/-   ##
=========================================
+ Coverage   94.05%   94.25%   +0.2%     
=========================================
  Files         166      177     +11     
  Lines       22044    24273   +2229     
  Branches     2345     2604    +259     
=========================================
+ Hits        20733    22878   +2145     
- Misses        877      920     +43     
- Partials      434      475     +41
Impacted Files Coverage Δ
nibabel/streamlines/tests/test_trk.py 100% <ø> (ø) ⬆️
nibabel/streamlines/tractogram.py 96.67% <ø> (+0.02%) ⬆️
nibabel/streamlines/tractogram_file.py 100% <100%> (ø) ⬆️
nibabel/info.py 100% <100%> (ø) ⬆️
nibabel/streamlines/tests/test_tck.py 100% <100%> (ø)
nibabel/streamlines/tests/test_streamlines.py 98.7% <100%> (+3.33%) ⬆️
nibabel/streamlines/__init__.py 100% <100%> (ø) ⬆️
nibabel/streamlines/trk.py 95.27% <100%> (+0.01%) ⬆️
nibabel/streamlines/tests/test_tractogram_file.py 72.09% <66.66%> (-3.77%) ⬇️
nibabel/streamlines/tck.py 98.88% <98.88%> (ø)
... and 34 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9b23a32...d2e08da. Read the comment docs.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) to 95.934% when pulling 6f3ab59 on MarcCote:streamlines_tck into ec4567f on nipy:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 95.969% when pulling 7822a12 on MarcCote:streamlines_tck into ec4567f on nipy:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 95.969% when pulling e49a25b on MarcCote:streamlines_tck into ec4567f on nipy:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 95.971% when pulling 8ca88bd on MarcCote:streamlines_tck into ec4567f on nipy:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 95.971% when pulling 87c1d5a on MarcCote:streamlines_tck into ec4567f on nipy:master.

@MarcCote
Copy link
Contributor Author

@jchoude @arnaudbore @matthew-brett @FrancoisRheaultUS @samuelstjean
This is not WIP anymore, is anyone up for a review? Thanks.

@samuelstjean
Copy link
Contributor

Can have a quick look after ismrm deadline.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 95.971% when pulling 0035495 on MarcCote:streamlines_tck into ec4567f on nipy:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 95.971% when pulling 0035495 on MarcCote:streamlines_tck into ec4567f on nipy:master.

@arnaudbore
Copy link

arnaudbore commented Nov 4, 2016

So far:

  • Load -> Check
  • Save -> Check
  • Conversion (tck->trk and trk->tck) -> Check

Be really carefull with the software that you use to visualise your streamlines

Everything works perfectly (Great job @MarcCote)
I think it would be great to have an input from @jdtournier to check if we are missing something. Can you check as well the the world space and the associated reference ?

Again, great job, very helpful !

@jdtournier
Copy link

Just had a very brief look into this (why did you have to pick the week before ISMRM deadline?!?). I can't verify whether your handling of the coordinate system matches the expectations of the format, but yes, the positions are expected to be given in world/scanner/real coordinates in mm units, assuming the standard NIfTI coordinate system. Assuming that's what you mean by the 'world space' (RAS+?), then that should be fine.

Only one minor thing to add: we tend to set the offset to the data such that it guarantees memory alignment for 32-bit floats when accessed via memory-mapping, just in case that affects performance (a lot of CPU optimisations rely on memory alignment). That means it's set to the lowest multiple of 4 bytes that exceeds your computed offset. It doesn't look like that's what you're doing?

All good otherwise, but like I said, I've only skimmed through the code...

@MarcCote
Copy link
Contributor Author

MarcCote commented Nov 7, 2016

@arnaudbore @jdtournier: Thank you both for your time (especially with ISMRM deadline closing in!).

@jdtournier: You are correct there is no memory alignment right now but I'll fix that and ping you once it is done (after ISMRM deadline).

@jdtournier: Is there any convention in MRtrix to save information alongside streamlines and streamlines' coordinates? For instance, some information we might want to keep are: the mean FA (or a bunch of metrics) for each streamline, some ID/name of the bundle a streamline belongs to, or simply a different RGB color for each point of the streamlines? In Trackvis/.trk those are known as scalars et properties but in this API we used respectively the terms data_per_point and data_per_streamline.

@jdtournier: Regarding the 'world space RAS+, I'm using the following naming convention http://nipy.org/nibabel/coordinate_systems.html#naming-reference-spaces. So, yes the coordinates are in mm units and follow the standard NIfTI coordinate system. Also, it is worth mentioning that the streamline coordinate (0,0,0) refers to the center of the voxel whereas in Trackvis/.trk it refers to the corner of the voxel. Is it how streamlines are displayed in MRview?

When saving a TCK file, I make sure the streamlines are in ras+mm (world/scanner/real coordinates). See https://github.com/nipy/nibabel/pull/486/files#diff-14a106e7412b371d3c333051564ccd3eR205

@jdtournier
Copy link

jdtournier commented Nov 7, 2016

OK, good to hear wrt the alignment issue. It's not essential though, it's just that it might make a difference in terms of performance, but I have to admit that so far, I don't think I've noticed any particular issues on that front. It's probably more relevant for images.

As to saving per vertex/streamline information, this is something that we discussed extensively a while back. We opted for storing this information separately in associated files, either track scalar files (for the per-vertex case, e.g. as might be produced by tcksample) or straight ASCII files (for the per-streamline case, e.g. as used for the SIFT2 weights). We figured that maintaining compatibility with the simple existing format, and the flexibility of separating the two types of information to be sufficient reasons not to complicate the format (for example, we might want to have multiple per-vertex scalar information for the same streamlines, in which case storing everything in a single file means duplicating most of the contents of the file needlessly, etc). So if you want to add support for importing per vertex/streamline scalars, that's how we handle it in MRtrix3...

Just checked your definitions, and that matches my expectations. However, whether the origin of the coordinate system corresponds to the centre or corner of the voxel is unrelated to the streamlines storage - that really is about the image storage convention. On that front, we also assume that the voxel position at (0,0,0) corresponds to the centre of the voxel, that's the standard adopted in both DICOM and NIfTI, and that's how images are displayed in MRView. But by design, in MRtrix, the streamlines are stored in scanner coordinates to make them independent of whatever image was used to generate them. So I'm slightly puzzled by your question. In MRtrix, the world coordinate (0,0,0) is the origin of the world/scanner coordinate system, irrespective of whether you're taking about streamlines or images, and whatever image you might be talking about. A given image's transformation matrix is what tells you how to figure out the position of its voxel centres in the world/scanner coordinate system. But it is absolutely not the case that the "streamline coordinate (0,0,0) refers to the center of the voxel" - it refers to the isocentre of the scanner, which will rarely be aligned with any particular voxel. Just making sure we're on the same page...?

@coveralls
Copy link

Coverage Status

Coverage increased (+0.07%) to 96.016% when pulling 3aba8bb on MarcCote:streamlines_tck into ec4567f on nipy:master.

@matthew-brett
Copy link
Member

Where are we with this one?

Needs a rebase and PEP8 fixes at some point.

Copy link
Member

@matthew-brett matthew-brett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - I got confused in the reading code, would you mind explaining a bit?

@@ -0,0 +1,440 @@
from __future__ import division
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add module level docstring describing format and pointing to format docs.

----------
tractogram : :class:`Tractogram` object
Tractogram that will be contained in this :class:`TckFile`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blank space between parameter defs

with Opener(fileobj) as f:
magic_number = f.fobj.readline()
f.seek(-len(magic_number), os.SEEK_CUR)
return magic_number.strip() == cls.MAGIC_NUMBER
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe nicer to dedent this line.

except StopIteration:
# Empty tractogram
header[Field.NB_STREAMLINES] = 0
# Overwrite header with updated one.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this into a local function def finish_hdr(): etc, call here and at the end of the method, to avoid code duplication?


Parameters
----------
fileobj : file-like object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this file must be open for writing as binary? Note in the docstring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

offset += 1 # +1, we need one more character for that new digit.

fileobj.write(asbytes(str(offset) + "\n"))
fileobj.write(asbytes(b"END\n"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already bytes.

raise HeaderError("Missing 'file' attribute in TCK header.")

# Set endianness and _dtype attributes in the header.
hdr[Field.ENDIANNESS] = '<'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

hdr[Field.ENDIANNESS] = '>' if hdr['datatype'].endswith('BE') else '<'


i = 0

while not eof or not np.all(np.isinf(pts)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the not np.all(np.isinf(pts)) part of this - can you explain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the TCK format, the end of file is defined as inf inf inf. In my code, the variable eof indicates if there are data left to read (i.e. put in the buffer) in the file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment here with that information (about inf, inf, inf)?

" might be corrupted.")
raise DataError(msg)

# Otherwise read a bit more.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if eof == True and np.all(np.isinf(pts)) == True?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the while loop should terminate. It means the only thing that is left in the buffer is the Inf triple which indicates the end of the TCK file. Did I miss something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I don't think you missed anything - I see now that we'll go back to the while condition, then exit the loop.

eof = len(bytes_read) == 0

# Read floats.
pts = np.frombuffer(buff, dtype=dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my confusion, but won't this code re-cast the first block of data over and over, for each iteration of the while loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a small portion will get re-casted. It would correspond to the last sequence of points that is partially contained in buff.
In short, the code is doing the following.

  1. Read a chunk of data from the file (a buffer).
  2. Determine how many points we've read (i.e. nb. of triples [NaN, NaN, NaN])
    3.Yield points, one streamline at the time.
  3. Flush the beginning of the buff (that is up to the last NaN triple we've detected).
  4. Goto step 1 (making sure we concatenate the new buffer to what is remaining of the current buffer) until we detect a Inf triple (or there is nothing to read).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - I see - you are dropping the read part of buff at the end of the loop.

@matthew-brett
Copy link
Member

@MarcCote - where are we with this one?

@MarcCote
Copy link
Contributor Author

@matthew-brett sorry for the delay. I'll have more time in the following weeks :).

@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 96.162% when pulling 2df7518 on MarcCote:streamlines_tck into 9b23a32 on nipy:master.

Copy link

@jchoude jchoude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MarcCote and sorry for the delay. I mainly checked the tck code, not the tests. I had other comments that I began to enter, but @matthew-brett was quicker than me :) I deleted them, I hope they don't show up as duplicate.



def create_empty_header():
''' Return an empty compliant TCK header. '''
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't those be double quotes? """


lines = []
lines.append(header[Field.MAGIC_NUMBER])
lines.append("count: {0:010}".format(header[Field.NB_STREAMLINES]))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if someone tries to store more than 10M streamlines in this file? Not a very frequent case for now, but some papers suggest we need to do more than that.

I know that the issue is that you want to avoid shifting the whole content of the file if you allow adding streamlines to the file. Maybe this should be adjusted in a followup PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it would need more than 10 Billions streamlines. If that is an issuewe can simply bump this number up.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oups, my bad. I think we're safe for now :)

raise HeaderError("Missing 'datatype' attribute in TCK header.")

if not hdr['datatype'].startswith('Float32'):
msg = ("TCK only supports float32 dtype but 'dataype: {}' was"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataype -> datatype

hdr[Field.MAGIC_NUMBER] = magic_number

# Check integrity of TCK header.
if 'datatype' not in hdr:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really mandatory? Might not be the best comparison, but for example your old TractConverter tool only wrote the count field. All other fields were discarded.

I think it is useful to have it, but should you really reject streamlines created in another tool that don't have it?

Copy link
Contributor Author

@MarcCote MarcCote Mar 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I could raise a warning and try loading them using Float32LE, i.e. the only dtype supported right now anyway.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that would be better.

" specified in the header.").format(hdr['datatype'])
raise HeaderError(msg)

if 'file' not in hdr:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here: is this really mandatory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Now, raising a warning instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this one, the file field indicates the offset where the data starts. In the case the fieldwas not provided, should I assume the data starts directly after END ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would guess that's the best way to go, yes. Use it if there, else, use that heuristic.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 96.161% when pulling 029f853 on MarcCote:streamlines_tck into 9b23a32 on nipy:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 96.173% when pulling 4d93676 on MarcCote:streamlines_tck into 9b23a32 on nipy:master.

@MarcCote
Copy link
Contributor Author

Thanks for the review @jchoude. @matthew-brett I've addressed you last comments and increase test coverage.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 96.214% when pulling c6117fe on MarcCote:streamlines_tck into 9b23a32 on nipy:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 96.214% when pulling c6117fe on MarcCote:streamlines_tck into 9b23a32 on nipy:master.

Copy link
Member

@matthew-brett matthew-brett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last couple of changes.

MEGABYTE = 1024 * 1024


def create_empty_header():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about my suggestion to make this a method of the TckFile class?

See: #486 (comment)

Parameters
----------
fileobj : string or file-like object
If string, a filename; otherwise an open file-like object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add something like a file-like object opened for binary reads, pointing to ...

@MarcCote
Copy link
Contributor Author

MarcCote commented Apr 6, 2017 via email

@matthew-brett
Copy link
Member

Sure, streamlines phrasing is fine.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 96.206% when pulling bac044b on MarcCote:streamlines_tck into 9b23a32 on nipy:master.

@MarcCote
Copy link
Contributor Author

MarcCote commented Apr 7, 2017

@matthew-brett I moved create_empty_header into TractogramFile and modified TrkFile accordingly. I modified the docstring of Tractogram and LazyTractogram to better explain the convention used for streamlines coordinates. Let me know if there is anything else.

Move the creation of the empty streamline header into the kinda-abstract
base class.  Adjust the Trk file header creation accordingly.
When making the default trk header structured array, make a numpy array scalar
instead of a 1D array.  Adjust the code accordingly.  Take opportunity
to rename header to st_arr to make clear it is not the 'header' of
`self.header` (which is a dict).
TractogramFile now implements create_empty_header
Copy link
Member

@matthew-brett matthew-brett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments. See also MarcCote#13

Parameters
----------
fileobj : string or file-like object
If string, a filename; otherwise an open file-like object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason I can't see the modification in this interface, or the code on my local machine - any idea what's going on?

Parameters
----------
fileobj : string or file-like object
If string, a filename; otherwise an open file-like object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"open file-like object in binary mode" or similar as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what happened, but you are right,t it was missing some docstrings that weren't specifying binary mode. Should be fixed by commit 0d0ece1.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 96.206% when pulling 0d0ece1 on MarcCote:streamlines_tck into 9b23a32 on nipy:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 96.21% when pulling d2e08da on MarcCote:streamlines_tck into 9b23a32 on nipy:master.

@matthew-brett matthew-brett merged commit 3d59146 into nipy:master Apr 18, 2017
@matthew-brett
Copy link
Member

Thanks for your patience - in it goes.

@MarcCote MarcCote deleted the streamlines_tck branch April 18, 2017 21:24
@anoushkrit anoushkrit mentioned this pull request Jun 20, 2022
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants