Mosdepth fails when run with less than 8GB RAM on ~50MB CRAM #245

dennishendriksen · 2024-12-09T09:27:37Z

Mosdepth is using more memory than expected when ran with a .bed file on a ~50MB .cram file:

    local args=()
    args+=("--threads"  "1")
# enabling/disabling this line doesn't make a difference
#    args+=("--no-per-base")
    args+=("--by" "nanopore.bed")
    args+=("--thresholds" "1,5,10,15,20,30,50,100")
    args+=("--fasta" "<cut>/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz")
    args+=("mosdepth")
    args+=("vip_fam0_HG002.cram")
    ${CMD_MOSDEPTH} "${args[@]}"

We are running mosdepth 0.3.8 using a singularity apptainer image build from docker://quay.io/biocontainers/mosdepth:0.3.8--hd299d5a_0

I was expecting mosdepth to be able to run with much less memory based on the numbers reported in README and your comment in #161 (comment). Running with 16GB resolves the issue.

Data to reproduce the issue:
https://download.molgeniscloud.org/downloads/vip/_dev/github/mosdepth/
https://download.molgeniscloud.org/downloads/vip/resources/GRCh38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
https://download.molgeniscloud.org/downloads/vip/images/mosdepth-0.3.8.sif

Is this the behavior that you would expect?

Best regards,
@dennishendriksen

The text was updated successfully, but these errors were encountered:

brentp · 2024-12-13T23:36:24Z

It should require only ~ 1GB of memory, perhaps 2. Do you see the same memory use if you remove --by and --thresholds ?

dennishendriksen · 2024-12-16T11:51:50Z

Hi @brentp,

I did some tests and this is the median mosdepth physical memory usage:

8.3GB   # with    --by and --thresholds
6.8GB   # without --by and --thresholds

So there is a difference, but both values are way outside the 1-2GB range. Any thoughts? I'm not familiar with nim, but was wondering if it could be related to a memory management setting?

Greetings,
@dennishendriksen

LudvigOlsen · 2025-01-09T15:02:02Z

When running mosdepth on a ~280mb BAM file (hg38) using a ~2gb .bed.gz file (~8gb when uncompressed) in --by (and setting --mapq 20 --min-frag-len 100 --max-frag-len 220 -no-per-base) my system uses >40gb of memory.

When running mosdepth, the "Max mem used" slowly builds up to that and then stops increasing (perhaps when being done with chromosome 1).

Considering whether I should chunk the bed file to reduce the RAM usage? Might increase running time to have multiple calls though.

Note that I am using a fork of mosdepth, but I don't think I've changed anything relevant to RAM usage in the standard usecase.

brentp · 2025-01-09T15:22:29Z

that sounds excessive! can you verify that mosdepth from the releases here has the same memory usage?

LudvigOlsen · 2025-01-09T16:06:48Z

@brentp I just installed mosdepth via bioconda into a new environment. It uses the same amount of RAM, unfortunately.

The files are available online as part of a preprint, so it should be easy for you to reproduce:

The BED file is available here: https://zenodo.org/records/14215762
It is called "whole_genome.mappable.binned_10bp.bed.gz"

The test BAM file is available here: https://zenodo.org/records/13909979

Something like the following:

$ mosdepth --by whole_genome.mappable.binned_10bp.bed.gz --threads 1 --mapq 20 --min-frag-len 100 --max-frag-len 220 --no-per-base <OUTPUT_PATH> IC38.hg38.downsampled.aligned.sorted.markdup.bam

brentp · 2025-01-09T16:09:55Z

ok. just a quick point--does whole_genome.mappable.binned_10bp.bed.gz have, for example 24.9 million 10bp intervals for chromosome 1? That would be the reason. I do not optimize for bed files of this size and so that is likely the issue.
You can also use --by 10 and then filter to mappable regions after.

LudvigOlsen · 2025-01-09T16:15:13Z

Something on that scale, yes! Would --by 10 have a much lower memory requirement?

I need to be sure I always have the same 10bp intervals, as I use a lot of other reference information for the same bins. But if that behavior is stable, I could perhaps recompute everything to use the mosdepth bins. Would be a lot of work but if it heavily reduces memory usage, it could be the best approach.

brentp · 2025-01-09T16:20:02Z

yes, --by 10 should have lower memory requirement and it is consistent, it will always create 0..10, 10..20, etc.
I am downloading your test-data on a slow connection so it will take a while, but you could try --by 10

LudvigOlsen · 2025-01-09T16:34:08Z

Perfect, thanks! It spent 5min and < 4gb, compared to 15min and 42gb. That would probably be worth rerunning everything for!

brentp · 2025-01-09T16:35:54Z

phew! glad to hear it! I am surprised that the bed file is so bad. I am looking into it.

LudvigOlsen · 2025-01-09T16:50:01Z

Great! I'm lucky all intervals are 10bp, but of course it would be great to have similar performance for bed files, if possible :-)

LudvigOlsen · 2025-01-13T17:11:58Z

EDIT: Disregard this post.

I tried disabling the sorting in bed_to_table(). It seems to have done the trick (3.14gb ram, 4min). If you document clearly that the bedfile must already be sorted (or perhaps add a --sort flag?), shouldn't it be fine to disable this? @brentp

proc bed_to_table(bed: string): TableRef[string, seq[region_t]] =
  var bed_regions = newTable[string, seq[region_t]]()
  var kstr = kstring_t(l:0, m: 0, s: nil)
  var hf = hts_open(cstring(bed), "r")
  while hts_getline(hf, cint(10), addr kstr) > 0:
    if kstr.s[0] == 't' and ($kstr.s).startswith("track "):
      continue
    if kstr.s[0] == '#':
      continue
    var v = bed_line_to_region($kstr.s)
    if v == nil: continue
    bed_regions.mgetOrPut(v.chrom, new_seq[region_t]()).add(v)

  #  ---- DISABLED THIS SORTING ----
  # since it is read into mem, can also well sort.
  #for chrom, ivs in bed_regions.mpairs:
  #    sort(ivs, proc (a, b: region_t): int = int(a.start) - int(b.start))

  hts.free(kstr.s)
  return bed_regions

I should say, I haven't checked that the output is the same, but my bedfile is already sorted so I assume it should be.

I can make a PR if you want it (e.g., with the --sort option).

EDIT: I guess this could be considered a breaking change. In that case, is it better with a --dont-sort option? What should be the default when the cost of sorting here (instead of before) is so much higher (assuming we run on more than one sample)?

brentp · 2025-01-13T18:35:33Z

hmm. good sleuthing! this should not increase the memory so there should be a way we can improve this without a flag. I'll have a look this week.

brentp · 2025-01-15T16:38:44Z

so removing that section does not change the memory use for me! are you sure that's the issue?

LudvigOlsen · 2025-01-15T16:44:31Z

Hmm.. Let me just check that it properly called the version I had changed.

LudvigOlsen · 2025-01-15T17:04:17Z

ahhh, damnit, I was still testing with --by 10... Sorry about that. Afternoon findings are unreliable, I guess :D

I don't really get it though. Aren't you just iterating over the intervals similarly to what --by 10 does?

Btw., unrelated but I was thinking of adding an option (in my fork at least) for calculating fragment midpoint coverage, so it can be used for things like cfDNA profiling of transcription factor binding sites. I noticed that any positions in-between paired reads are not counted in the coverage. It seems like that is the default for coverage, but wouldn't it be easy to calculate the actual coverage using something like arr[min(rec.start, rec.matepos)] += 1; arr[min(rec.start, rec.matepos) + rec.isize] -= 1; and then skipping the mate read? (This approach could be selectable via a flag of course).
I don't want to mess up this issue with that though, so if it's an interesting discussion should I open an issue on it? There might be a good reason not to consider the middle positions that I don't know.

brentp · 2025-01-15T17:33:16Z

for bed, it reads everything into memory and nim strings use 64 bits for length, 64 bits for capacity, and n-bytes for the actual data.

what you describe for fragment coverage could work under --fast-mode, where we ignore cigar operations.

LudvigOlsen · 2025-01-15T17:54:58Z

The memory use seems to grow over time, as if it's increasing as it moves through the chromosome or something. So, unless it's incredibly slow at reading / uncompressing the bed file, it may not be from the initial loading into memory?

what you describe for fragment coverage could work under --fast-mode, where we ignore cigar operations.

Right, so if it's a proper pair and the rec.start position is before rec.matepos, then count it. (What is the difference between .matepos and .mate_pos, btw? Was only able to find mate_pos in the docs.)
Will think of what to do when rec.start == rec.matepos so we only need to track those for deduplication. I will make a PR to move the discussion to and then you can always reject the idea, if it isn't meaningful.

brentp · 2025-01-15T18:29:27Z

yes, I'm open to a PR. please keep me posted. I'm picky about adding features and don't want you to do a bunch of work only to run into my hesitations

LudvigOlsen · 2025-01-15T18:34:10Z

I fully understand the need for having only essential functionality from a maintenance and usability perspective. That's why I have a fork with the functionality I need. This seems generally relevant though.

LudvigOlsen · 2025-01-19T21:27:25Z

If I quit just after calling bed_to_table(), it reaches a max. memory usage of 39.45gb. Nothing I tried reduced this though (but my nim knowledge is also very limited). Perhaps some chromosome chunking is necessary?

LudvigOlsen mentioned this issue Jan 9, 2025

mosdepth: returned non-zero exit status 137. BesenbacherLab/lionheart#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mosdepth fails when run with less than 8GB RAM on ~50MB CRAM #245

Mosdepth fails when run with less than 8GB RAM on ~50MB CRAM #245

dennishendriksen commented Dec 9, 2024

brentp commented Dec 13, 2024

dennishendriksen commented Dec 16, 2024 •

edited

Loading

LudvigOlsen commented Jan 9, 2025

brentp commented Jan 9, 2025

LudvigOlsen commented Jan 9, 2025

brentp commented Jan 9, 2025

LudvigOlsen commented Jan 9, 2025

brentp commented Jan 9, 2025

LudvigOlsen commented Jan 9, 2025

brentp commented Jan 9, 2025

LudvigOlsen commented Jan 9, 2025

LudvigOlsen commented Jan 13, 2025 •

edited

Loading

brentp commented Jan 13, 2025

brentp commented Jan 15, 2025

LudvigOlsen commented Jan 15, 2025

LudvigOlsen commented Jan 15, 2025

brentp commented Jan 15, 2025

LudvigOlsen commented Jan 15, 2025 •

edited

Loading

brentp commented Jan 15, 2025

LudvigOlsen commented Jan 15, 2025 •

edited

Loading

LudvigOlsen commented Jan 19, 2025

Mosdepth fails when run with less than 8GB RAM on ~50MB CRAM #245

Mosdepth fails when run with less than 8GB RAM on ~50MB CRAM #245

Comments

dennishendriksen commented Dec 9, 2024

brentp commented Dec 13, 2024

dennishendriksen commented Dec 16, 2024 • edited Loading

LudvigOlsen commented Jan 9, 2025

brentp commented Jan 9, 2025

LudvigOlsen commented Jan 9, 2025

brentp commented Jan 9, 2025

LudvigOlsen commented Jan 9, 2025

brentp commented Jan 9, 2025

LudvigOlsen commented Jan 9, 2025

brentp commented Jan 9, 2025

LudvigOlsen commented Jan 9, 2025

LudvigOlsen commented Jan 13, 2025 • edited Loading

brentp commented Jan 13, 2025

brentp commented Jan 15, 2025

LudvigOlsen commented Jan 15, 2025

LudvigOlsen commented Jan 15, 2025

brentp commented Jan 15, 2025

LudvigOlsen commented Jan 15, 2025 • edited Loading

brentp commented Jan 15, 2025

LudvigOlsen commented Jan 15, 2025 • edited Loading

LudvigOlsen commented Jan 19, 2025

dennishendriksen commented Dec 16, 2024 •

edited

Loading

LudvigOlsen commented Jan 13, 2025 •

edited

Loading

LudvigOlsen commented Jan 15, 2025 •

edited

Loading

LudvigOlsen commented Jan 15, 2025 •

edited

Loading