ref(node) refactor contextlines to use readline #12221

JonasBa · 2024-05-26T02:10:35Z

Refactors contextlines integration to use readline instead of reading the entire file into memory (we were also doubling the memory as we were splitting the file via .split(\n).

Current implementation is not leveraging the cache to optimize file reads, so I need to reintroduce that. Something to note is that the caching behavior will become per frame location vs per file, which I think makes more sense as we can only assume that this stack frame might throw in the future (as opposed to anything from the file it is in).

github-actions · 2024-05-26T02:15:18Z

size-limit report 📦

Path	Size
@sentry/browser	22.03 KB (0%)
@sentry/browser (incl. Tracing)	33.2 KB (0%)
@sentry/browser (incl. Tracing, Replay)	68.94 KB (0%)
@sentry/browser (incl. Tracing, Replay) - with treeshaking flags	62.27 KB (0%)
@sentry/browser (incl. Tracing, Replay with Canvas)	73.01 KB (0%)
@sentry/browser (incl. Tracing, Replay, Feedback)	85.17 KB (0%)
@sentry/browser (incl. Tracing, Replay, Feedback, metrics)	86.99 KB (0%)
@sentry/browser (incl. metrics)	26.21 KB (0%)
@sentry/browser (incl. Feedback)	38.22 KB (0%)
@sentry/browser (incl. sendFeedback)	26.62 KB (0%)
@sentry/browser (incl. FeedbackAsync)	31.17 KB (0%)
@sentry/react	24.8 KB (0%)
@sentry/react (incl. Tracing)	36.28 KB (0%)
@sentry/vue	26.03 KB (0%)
@sentry/vue (incl. Tracing)	35.06 KB (0%)
@sentry/svelte	22.17 KB (0%)
CDN Bundle	23.36 KB (0%)
CDN Bundle (incl. Tracing)	34.89 KB (0%)
CDN Bundle (incl. Tracing, Replay)	68.99 KB (0%)
CDN Bundle (incl. Tracing, Replay, Feedback)	74.16 KB (0%)
CDN Bundle - uncompressed	68.68 KB (0%)
CDN Bundle (incl. Tracing) - uncompressed	103.29 KB (0%)
CDN Bundle (incl. Tracing, Replay) - uncompressed	213.74 KB (0%)
CDN Bundle (incl. Tracing, Replay, Feedback) - uncompressed	226.2 KB (0%)
@sentry/nextjs (client)	35.61 KB (0%)
@sentry/sveltekit (client)	33.83 KB (0%)
@sentry/node	112.67 KB (+0.57% 🔺)
@sentry/node - without tracing	90.08 KB (+0.64% 🔺)
@sentry/aws-serverless	99.16 KB (+0.6% 🔺)

AbhiPrasad

Nice! Is it possible to do a basic benchmark for comparison here? Maybe like an express server that requires a trace of huge files?

packages/node/src/integrations/contextlines.ts

JonasBa · 2024-06-10T23:44:38Z

It seems that createReadStream error handling was only backported to node 16 and not 14, which would explain the unit test failure on 14

nodejs/node#31603

I am tempted to add a fs.stat here to check if the file exists, but it feels like a very poor fix...

JonasBa · 2024-06-11T12:17:52Z

If anyone wants to give this another look, feel free to. I still need to run benchmarks and confirm the tradeoff before merging (sometime this week).

JonasBa · 2024-06-12T13:56:40Z

It just occurred to me that we should store the result of snip(line) in our cache and avoid doing the trimming per cache read.

I'll run benchmarks after that change

dev-packages/node-integration-tests/suites/public-api/captureException/catched-error/test.ts

AbhiPrasad

damn this is so clean 🔥

tyty @JonasBa for helping us with this!

packages/node/src/integrations/contextlines.ts

…ached snipLine values

JonasBa · 2024-06-12T21:44:46Z

Seems like dev is broken - the tests fail on span tests which are unrelated.

@AbhiPrasad on a side note, I think we should add some simple heuristic where we short circuit adding context frames past a certain line or column number. For example a colno past a few hundred lines is very likely to be minified code and anything past a few thousand lines is likely bundled.

AbhiPrasad · 2024-06-12T22:59:59Z

Yeah shoot - let's tackle this in the morning cc @mydea I think your changes in #12477 are failing on develop.

AbhiPrasad · 2024-06-12T23:00:49Z

I think we should add some simple heuristic where we short circuit adding context frames past a certain line or column number

I wonder if we can get avg/p75/p95 of line/col length by analyzing sentry errors, and then deciding a number based on that.

JonasBa · 2024-06-14T12:49:00Z

@AbhiPrasad I just ran some CPU benchmarks between 1201eb2d7 and 2bf1b24ce.

The large file was roughly 64KB, ~4k limes while the medium was only 500 lines (others were just simple init sdk and throw cases). In the case of medium and large file benchmarks, the error being thrown was at the last few lines to force a worst case scenario where we need to scan an entire file. As is, there is no limit on the upper bound, but I think it'd be reasonable to set one (even if it's something like 10k lines).

The benchmarks were run using hyperfine with a warmup of 3 cycles and 40 min iteration count.

// hyperfine --warmup 3 -m 40 'node error.js' 'node multiple_errors.js' 'node medium_file.js' 'node medium_file_multiple.js' 'node large_file.js' 'node large_file_multiple.js' 

develop (#1201eb2d7)
Benchmark 1: node error.js
  Time (mean ± σ):     235.8 ms ±   9.5 ms    [User: 224.0 ms, System: 54.8 ms]
  Range (min … max):   221.2 ms … 274.3 ms    40 runs
 
Benchmark 2: node multiple_errors.js
  Time (mean ± σ):     250.4 ms ±  12.0 ms    [User: 232.6 ms, System: 56.1 ms]
  Range (min … max):   232.6 ms … 291.5 ms    40 runs
 
Benchmark 3: node medium_file.js
  Time (mean ± σ):     235.9 ms ±   8.6 ms    [User: 226.9 ms, System: 55.6 ms]
  Range (min … max):   219.0 ms … 262.3 ms    40 runs
 
Benchmark 4: node medium_file_multiple.js
  Time (mean ± σ):     250.2 ms ±   4.1 ms    [User: 227.3 ms, System: 53.9 ms]
  Range (min … max):   245.1 ms … 261.6 ms    40 runs
 
Benchmark 5: node large_file.js
  Time (mean ± σ):     234.1 ms ±   9.3 ms    [User: 221.6 ms, System: 53.5 ms]
  Range (min … max):   227.1 ms … 286.6 ms    40 runs
  
Benchmark 6: node large_file_multiple.js
  Time (mean ± σ):     248.9 ms ±   3.3 ms    [User: 227.0 ms, System: 53.7 ms]
  Range (min … max):   239.2 ms … 256.1 ms    40 runs


PR (#2bf1b24ce)
Benchmark 1: node error.js
  Time (mean ± σ):     235.5 ms ±   7.9 ms    [User: 223.3 ms, System: 53.6 ms]
  Range (min … max):   223.7 ms … 267.4 ms    40 runs
  
Benchmark 2: node multiple_errors.js
  Time (mean ± σ):     254.8 ms ±   7.6 ms    [User: 231.0 ms, System: 54.7 ms]
  Range (min … max):   239.9 ms … 286.2 ms    40 runs
 
Benchmark 3: node medium_file.js
  Time (mean ± σ):     241.2 ms ±  18.3 ms    [User: 226.5 ms, System: 54.8 ms]
  Range (min … max):   228.6 ms … 340.9 ms    40 runs
  
Benchmark 4: node medium_file_multiple.js
  Time (mean ± σ):     254.7 ms ±   8.1 ms    [User: 230.2 ms, System: 54.7 ms]
  Range (min … max):   245.3 ms … 290.1 ms    40 runs
 
Benchmark 5: node large_file.js
  Time (mean ± σ):     236.6 ms ±   7.7 ms    [User: 225.3 ms, System: 53.8 ms]
  Range (min … max):   221.9 ms … 271.2 ms    40 runs
 
Benchmark 6: node large_file_multiple.js
  Time (mean ± σ):     251.9 ms ±   6.7 ms    [User: 230.7 ms, System: 54.4 ms]
  Range (min … max):   237.3 ms … 283.5 ms    40 runs

Something to note is that in a lot of the benchmarks (both previous and current) hyperfine was warnings about statistical outliers - you can see that the ranges for each benchmark vary quite a bit, and seem to have no correlation with the size of the file we are reading. My takeaway is that either my benchmarks are wrong, or the readline approach is not significantly worse in terms of CPU usage than mapping the entire file to memory while giving us a reduced memory usage.

I did try measure the memory usage, but it seemed that it just varies in the range of a few megabytes, which makes it similarly unreliable. Maybe one last thing to test is to throw errors from a few large files and see how it impacts memory usage, wdyt?

I also think that we should set some reasonably high max line (maybe at 10k) to avoid cases where we might be reading a very large bundled file.

timfish · 2024-06-14T14:37:23Z

These benchmark results make sense to me because I wouldn't have expected the readline approach to be considerably faster, especially when the errors are towards the end of the file (ie. worst case).

Do the v8 debugger "Record Allocation profile" results look considerably different between this branch and develop?

I've never looked into the node readline code before. But now I'm looking at the API, in the worse case scenario, I'm doubtful it ends up allocating less than reading the entire file to string and splitting. Since there's no way to pass string slice references in JavaScript, every call to the line event will require a new string allocation. Most of these will be wasted allocations since we skip all lines up until the first line we're looking for.

To make it faster and allocate less you'd need to skip to the first required line through the raw stream and only then slice the buffer and allocate to strings for the required lines.

JonasBa · 2024-06-14T15:00:32Z

These benchmark results make sense to me because I wouldn't have expected the readline approach to be considerably faster, especially when the errors are towards the end of the file (ie. worst case).

Do the v8 debugger "Record Allocation profile" results look considerably different between this branch and develop?

I've never looked into the node readline code before. But now I'm thinking about it, in the worse case scenario, I'm doubtful it actually ends up allocating less than reading the entire file to string and splitting. Since there's no way to pass string slice references in JavaScript, every call to the line event will require a new string allocation. Most of these will be wasted allocations since we skip all lines up until the first line we're looking for.

To make it faster and allocate less you'd need to skip to the first required line through the raw Buffer read stream and only then slice the buffer and allocate to strings for the required lines.

Yeah exactly, though the goal here is to reduce peak memory usage while not impact CPU time. I would expect to see fs spike to allocate the entire file while readline would perform smaller allocation, somewhat similar as to what is described in https://betterprogramming.pub/a-memory-friendly-way-of-reading-files-in-node-js-a45ad0cc7bb6 which also indicates that the fs.read could be the next option if we want to optimize the individual string allocations. Worst case here is that we end up reading a large minified file, then none of this applies and we just map the entire string (we could avoid with some sensible max colno after which we just give up on reading such files)

timfish · 2024-06-14T15:31:38Z

the goal here is to reduce peak memory usage while not impact CPU time.

This PR has already achieved that by using createReadStream which reads in chunks rather than the entire file and re-uses the buffers.

Using readline like this does allocate new strings for every line in the file from 0 to rangeEnd but this is still better than string.split() in everything other than the worst case.

I think that's enough to merge this PR!

To reduce the allocations in the worse cases, we'd want a readline implementation where the line event gets passed a Buffer rather than a string. The Buffer could be created from Buffer.subarray of the read stream which is just a memory reference + length and has no allocation cost. At that point we could check the size of the line and do other sanity checks and only then allocate into a string if we need that line and are happy with the "cost".

JonasBa · 2024-06-14T16:23:01Z

the goal here is to reduce peak memory usage while not impact CPU time.

This PR has already achieved that by using createReadStream which reads in chunks rather than the entire file and re-uses the buffers.

Using readline like this does allocate new strings for every line in the file from 0 to rangeEnd but this is still better than string.split() in everything other than the worst case.

I think that's enough to merge this PR!

To reduce the allocations in the worse cases, we'd want a readline implementation where the line event gets passed a Buffer rather than a string. The Buffer could be created from Buffer.subarray of the read stream which is just a memory reference + length and has no allocation cost. At that point we could check the size of the line and do other sanity checks and only then allocate into a string if we need that line and are happy with the "cost".

You nailed it :) It would be super interesting to see how well that performs (though I wont be tackling this anytime soon I think). I'm going to try and see if we have any internal services that we can monitor once this PR goes out.

AbhiPrasad · 2024-06-14T16:44:37Z

@JonasBa there is https://github.com/getsentry/chartcuterie/ and https://github.com/getsentry/sentry-docs but we don't have the best infra metrics monitoring for them re: CPU/memory

…file handles (#14995) The ContextLines integration uses readable streams to more memory efficiently read files that it uses to attach source context to outgoing events. See more details here: #12221 Unfortunately, if we don't explicitly destroy the stream after creating and using it, it won't get closed, even when we remove the readline interface that uses the stream (which actual does the reading of files). To fix this, we adjust the resolve logic when getting file context to destroy the stream, as we anyway are done with the readline interface.

JonasBa added 3 commits May 25, 2024 17:55

ref(node): use readline for context lines

2d042a2

test(node): contextline tests

1fc2544

test(node): contextline tests

b0df967

AbhiPrasad linked an issue May 27, 2024 that may be closed by this pull request

[contextLines]: reduce memory usage #12175

Closed

AbhiPrasad reviewed May 27, 2024

View reviewed changes

packages/node/src/integrations/contextlines.ts Outdated Show resolved Hide resolved

packages/node/src/integrations/contextlines.ts Outdated Show resolved Hide resolved

timfish reviewed May 28, 2024

View reviewed changes

packages/node/src/integrations/contextlines.ts Outdated Show resolved Hide resolved

JonasBa added 2 commits May 28, 2024 11:03

ref(node): contextlines cache attempt

3a785bc

ref(node) exclude other contextlines extensions

28f576f

github-advanced-security bot found potential problems Jun 9, 2024

View reviewed changes

packages/node/src/integrations/contextlines.ts Fixed Show fixed Hide fixed

JonasBa added 2 commits June 9, 2024 21:05

test(node) fix contextlines test

2cfda9b

fix scoped test

db2df8d

JonasBa marked this pull request as ready for review June 10, 2024 13:30

JonasBa added 5 commits June 10, 2024 09:57

lint

2be28a5

fix regexp danger warning

93e8173

Merge branch 'develop' into jb/node/contextlines-memory

a949d73

strict undefined check

584dad2

reshuffle order

00fd3fc

JonasBa added 6 commits June 10, 2024 19:46

test adding error handler on input stream

6588673

lint

96a617e

fix duplicate reads on stream error

b4a51f6

fix duplicate reads on stream error

421178e

add emplace

306a11b

fix(node) last line is not picked up as it does not include \n

0dca6c5

JonasBa commented Jun 12, 2024

View reviewed changes

dev-packages/node-integration-tests/suites/public-api/captureException/catched-error/test.ts Show resolved Hide resolved

AbhiPrasad approved these changes Jun 12, 2024

View reviewed changes

packages/node/src/integrations/contextlines.ts Outdated Show resolved Hide resolved

packages/node/src/integrations/contextlines.ts Show resolved Hide resolved

JonasBa added 3 commits June 12, 2024 14:52

ref(contexlines) use ordered startsWith instead of regexp and store c…

dea113b

…ached snipLine values

Merge branch 'develop' into jb/node/contextlines-memory

e684eb6

ref(node) remove all but cache set snipline occurences

9af7dfc

JonasBa added 2 commits June 13, 2024 10:04

Merge branch 'develop' into jb/node/contextlines-memory

83a8d05

Merge branch 'develop' into jb/node/contextlines-memory

17dea5b

Merge branch 'develop' into jb/node/contextlines-memory

2e3fe8f

JonasBa added 5 commits June 17, 2024 08:35

Merge branch 'develop' into jb/node/contextlines-memory

eecbf5f

fix(contextlines) update type assertions

e79dac2

fix(contextlines) use runtime checks instead

ef1fa79

ref(node) fix types in tests

cc4305a

test(node) fix wrong type assertion

bd35821

JonasBa merged commit 62be087 into develop Jun 17, 2024
98 checks passed

JonasBa deleted the jb/node/contextlines-memory branch June 17, 2024 15:34

gokulk16 mentioned this pull request Jul 11, 2024

[Snyk] Upgrade @sentry/browser from 8.8.0 to 8.10.0 gokulk16/type-to-calculate-web#30

Merged

FengShangWuQi mentioned this pull request Jul 14, 2024

[Snyk] Upgrade @sentry/gatsby from 7.20.0 to 8.11.0 FengShangWuQi/fengshangwuqi.github.io#1908

Closed

mstrokin mentioned this pull request Jan 2, 2025

Sentry.captureException doesn't close readable stream opened by createReadStream in getContextLinesFromFile #14892

Closed

3 tasks

AbhiPrasad mentioned this pull request Jan 13, 2025

fix(node): Enforce that ContextLines integration does not leave open file handles #14995

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ref(node) refactor contextlines to use readline #12221

ref(node) refactor contextlines to use readline #12221

JonasBa commented May 26, 2024 •

edited

Loading

github-actions bot commented May 26, 2024 •

edited

Loading

AbhiPrasad left a comment

JonasBa commented Jun 10, 2024

JonasBa commented Jun 11, 2024

JonasBa commented Jun 12, 2024 •

edited

Loading

AbhiPrasad left a comment

JonasBa commented Jun 12, 2024

AbhiPrasad commented Jun 12, 2024

AbhiPrasad commented Jun 12, 2024

JonasBa commented Jun 14, 2024

timfish commented Jun 14, 2024 •

edited

Loading

JonasBa commented Jun 14, 2024

timfish commented Jun 14, 2024

JonasBa commented Jun 14, 2024 •

edited

Loading

AbhiPrasad commented Jun 14, 2024

ref(node) refactor contextlines to use readline #12221

ref(node) refactor contextlines to use readline #12221

Conversation

JonasBa commented May 26, 2024 • edited Loading

github-actions bot commented May 26, 2024 • edited Loading

size-limit report 📦

AbhiPrasad left a comment

Choose a reason for hiding this comment

JonasBa commented Jun 10, 2024

JonasBa commented Jun 11, 2024

JonasBa commented Jun 12, 2024 • edited Loading

AbhiPrasad left a comment

Choose a reason for hiding this comment

JonasBa commented Jun 12, 2024

AbhiPrasad commented Jun 12, 2024

AbhiPrasad commented Jun 12, 2024

JonasBa commented Jun 14, 2024

timfish commented Jun 14, 2024 • edited Loading

JonasBa commented Jun 14, 2024

timfish commented Jun 14, 2024

JonasBa commented Jun 14, 2024 • edited Loading

AbhiPrasad commented Jun 14, 2024

JonasBa commented May 26, 2024 •

edited

Loading

github-actions bot commented May 26, 2024 •

edited

Loading

JonasBa commented Jun 12, 2024 •

edited

Loading

timfish commented Jun 14, 2024 •

edited

Loading

JonasBa commented Jun 14, 2024 •

edited

Loading