Allow partial processing of read datasets #63

johnlees · 2021-06-22T13:48:43Z

For large read datasets, their size (especially plus a large countmin table) may exceed device memory. However they can happily be loaded (asynchronously) in blocks, as each read is independent.

This change will require:

Remove the read interleaving, and loading into shared. A sync block load can be used to load all the reads for a warp into shared directly and more simply.
Pin the host memory in the DeviceReads class
Add a loadBlock method to DeviceReads which loads up to a specified size using memcpy async on a non-default stream
Launch the kernel on a second non-default stream.
Iterate through loadBlock, waiting on kernel completion.

The block size will be (device memory - size of countmin table) / 2 - epsilon

johnlees · 2021-06-28T16:47:27Z

Iterating over k-mers is a bit irritating when this is used. If the buffer loading is only to be done once, then signs for all k-mers need to be maintained on the device (fine) as well as the countmin table (not fine). To start with, let's just try keeping the iteration as-is, and loading the buffer on and off for every k-mer length and see how that copes.

johnlees added the enhancement New feature or request label Jun 22, 2021

johnlees self-assigned this Jun 22, 2021

johnlees mentioned this issue Jun 30, 2021

Use a buffer to hold read data on the GPU #64

Merged

johnlees closed this as completed in #64 Jul 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow partial processing of read datasets #63

Allow partial processing of read datasets #63

johnlees commented Jun 22, 2021

johnlees commented Jun 28, 2021

Allow partial processing of read datasets #63

Allow partial processing of read datasets #63

Comments

johnlees commented Jun 22, 2021

johnlees commented Jun 28, 2021