You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For large read datasets, their size (especially plus a large countmin table) may exceed device memory. However they can happily be loaded (asynchronously) in blocks, as each read is independent.
This change will require:
Remove the read interleaving, and loading into shared. A sync block load can be used to load all the reads for a warp into shared directly and more simply.
Pin the host memory in the DeviceReads class
Add a loadBlock method to DeviceReads which loads up to a specified size using memcpy async on a non-default stream
Launch the kernel on a second non-default stream.
Iterate through loadBlock, waiting on kernel completion.
The block size will be (device memory - size of countmin table) / 2 - epsilon
The text was updated successfully, but these errors were encountered:
Iterating over k-mers is a bit irritating when this is used. If the buffer loading is only to be done once, then signs for all k-mers need to be maintained on the device (fine) as well as the countmin table (not fine). To start with, let's just try keeping the iteration as-is, and loading the buffer on and off for every k-mer length and see how that copes.
For large read datasets, their size (especially plus a large countmin table) may exceed device memory. However they can happily be loaded (asynchronously) in blocks, as each read is independent.
This change will require:
The block size will be (device memory - size of countmin table) / 2 - epsilon
The text was updated successfully, but these errors were encountered: