-
-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mosdepth fails when run with less than 8GB RAM on ~50MB CRAM #245
Comments
It should require only ~ 1GB of memory, perhaps 2. Do you see the same memory use if you remove --by and --thresholds ? |
Hi @brentp, I did some tests and this is the median mosdepth physical memory usage:
So there is a difference, but both values are way outside the 1-2GB range. Any thoughts? I'm not familiar with Greetings, |
When running mosdepth on a ~280mb BAM file (hg38) using a ~2gb When running mosdepth, the "Max mem used" slowly builds up to that and then stops increasing (perhaps when being done with chromosome 1). Considering whether I should chunk the bed file to reduce the RAM usage? Might increase running time to have multiple calls though. Note that I am using a fork of mosdepth, but I don't think I've changed anything relevant to RAM usage in the standard usecase. |
that sounds excessive! can you verify that mosdepth from the releases here has the same memory usage? |
@brentp I just installed mosdepth via bioconda into a new environment. It uses the same amount of RAM, unfortunately. The files are available online as part of a preprint, so it should be easy for you to reproduce: The BED file is available here: https://zenodo.org/records/14215762 The test BAM file is available here: https://zenodo.org/records/13909979 Something like the following: $ mosdepth --by whole_genome.mappable.binned_10bp.bed.gz --threads 1 --mapq 20 --min-frag-len 100 --max-frag-len 220 --no-per-base <OUTPUT_PATH> IC38.hg38.downsampled.aligned.sorted.markdup.bam |
ok. just a quick point--does |
Something on that scale, yes! Would I need to be sure I always have the same 10bp intervals, as I use a lot of other reference information for the same bins. But if that behavior is stable, I could perhaps recompute everything to use the mosdepth bins. Would be a lot of work but if it heavily reduces memory usage, it could be the best approach. |
yes, |
Perfect, thanks! It spent 5min and < 4gb, compared to 15min and 42gb. That would probably be worth rerunning everything for! |
phew! glad to hear it! I am surprised that the bed file is so bad. I am looking into it. |
Great! I'm lucky all intervals are 10bp, but of course it would be great to have similar performance for bed files, if possible :-) |
EDIT: Disregard this post. I tried disabling the sorting in proc bed_to_table(bed: string): TableRef[string, seq[region_t]] =
var bed_regions = newTable[string, seq[region_t]]()
var kstr = kstring_t(l:0, m: 0, s: nil)
var hf = hts_open(cstring(bed), "r")
while hts_getline(hf, cint(10), addr kstr) > 0:
if kstr.s[0] == 't' and ($kstr.s).startswith("track "):
continue
if kstr.s[0] == '#':
continue
var v = bed_line_to_region($kstr.s)
if v == nil: continue
bed_regions.mgetOrPut(v.chrom, new_seq[region_t]()).add(v)
# ---- DISABLED THIS SORTING ----
# since it is read into mem, can also well sort.
#for chrom, ivs in bed_regions.mpairs:
# sort(ivs, proc (a, b: region_t): int = int(a.start) - int(b.start))
hts.free(kstr.s)
return bed_regions I should say, I haven't checked that the output is the same, but my bedfile is already sorted so I assume it should be. I can make a PR if you want it (e.g., with the --sort option). EDIT: I guess this could be considered a breaking change. In that case, is it better with a |
hmm. good sleuthing! this should not increase the memory so there should be a way we can improve this without a flag. I'll have a look this week. |
so removing that section does not change the memory use for me! are you sure that's the issue? |
Hmm.. Let me just check that it properly called the version I had changed. |
ahhh, damnit, I was still testing with I don't really get it though. Aren't you just iterating over the intervals similarly to what Btw., unrelated but I was thinking of adding an option (in my fork at least) for calculating fragment midpoint coverage, so it can be used for things like cfDNA profiling of transcription factor binding sites. I noticed that any positions in-between paired reads are not counted in the coverage. It seems like that is the default for coverage, but wouldn't it be easy to calculate the actual coverage using something like |
for bed, it reads everything into memory and nim strings use 64 bits for length, 64 bits for capacity, and n-bytes for the actual data. what you describe for fragment coverage could work under --fast-mode, where we ignore cigar operations. |
The memory use seems to grow over time, as if it's increasing as it moves through the chromosome or something. So, unless it's incredibly slow at reading / uncompressing the bed file, it may not be from the initial loading into memory?
Right, so if it's a proper pair and the |
yes, I'm open to a PR. please keep me posted. I'm picky about adding features and don't want you to do a bunch of work only to run into my hesitations |
I fully understand the need for having only essential functionality from a maintenance and usability perspective. That's why I have a fork with the functionality I need. This seems generally relevant though. |
If I quit just after calling |
Hi @brentp,
Mosdepth is using more memory than expected when ran with a .bed file on a ~50MB .cram file:
We are running mosdepth 0.3.8 using a singularity apptainer image build from
docker://quay.io/biocontainers/mosdepth:0.3.8--hd299d5a_0
I was expecting mosdepth to be able to run with much less memory based on the numbers reported in README and your comment in #161 (comment). Running with 16GB resolves the issue.
Data to reproduce the issue:
https://download.molgeniscloud.org/downloads/vip/_dev/github/mosdepth/
https://download.molgeniscloud.org/downloads/vip/resources/GRCh38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
https://download.molgeniscloud.org/downloads/vip/images/mosdepth-0.3.8.sif
Is this the behavior that you would expect?
Best regards,
@dennishendriksen
The text was updated successfully, but these errors were encountered: