-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ref(spans): Add spans buffer v2 #85856
Conversation
see getsentry/streaming-planning#18 The current process-spans consumer assumes that each span has a segment ID. In the new world we need to construct segments and correlate spans purely based on their parent-child relationship + timeouts. Build a new redis-based spans buffer, patch it into the existing consumer behind a CLI flag.
Codecov ReportAttention: Patch coverage is ✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## master #85856 +/- ##
===========================================
+ Coverage 33.16% 87.74% +54.57%
===========================================
Files 8314 9834 +1520
Lines 463026 556594 +93568
Branches 21939 21939
===========================================
+ Hits 153582 488373 +334791
+ Misses 309013 67790 -241223
Partials 431 431 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's some high level feedback; mostly questions I thought of as I read the code.
hey @mcannizz, this wasn't supposed to be reviewable yet, next steps for us is to test it in the sandbox but there's a lot of stuff to clean up in code. i'll address your comments partially though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some other questions:
This code doesn't seem to deal with the fact that a parent might arrive after a child. In that case the child won't be consolidated into the parent set in Redis. Is that something that will be addressed later/not at all?
you're right -- there are a few other edgecases that are not properly handled. right now we're mostly concerned with performance. I don't think those cases will change our perf profile. |
parent_span_id=val.get("parent_span_id"), | ||
project_id=val["project_id"], | ||
payload=payload.value, | ||
# TODO: validate, this logic may not be complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, however we can follow up with a dedicated PR for this.
- We shouldn't use
span.op
nor accesssentry_tags
in here, instead check the span's kind (Relay PR). - We need to call out that after flushing the buffer (or flusher?) checks whether the parent span is in a different project; otherwise considering it the same as
is_remote is True
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh i couldn't find any code in relay-master that showed how to use those attributes, didn't know this was still WIP.
src/sentry/spans/buffer_v2.py
Outdated
|
||
payload_nums.append(len(payloads)) | ||
for payload in payloads: | ||
span_id = rapidjson.loads(payload)["span_id"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to avoid this load by getting the span_id from somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't store individual span IDs in redis, only payloads, so when we flush we eventually have to get it out of the payload one way or another.
src/sentry/spans/buffer_v2.py
Outdated
# redis-cluster-py sentrywide. this probably leaves a bit of | ||
# perf on the table as we send the full lua sourcecode with every span. | ||
p.eval( | ||
add_buffer_script.script, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are highly likely to find spans of the same trace in a batch. This means we could pre-process them in memory and minimize the numbers of mutations in Redis. For example, we should build partial trees in memory, then insert redirect keys for the top-most parent known at the current time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally don't believe that reducing the amount of work done in Redis will pay off if we pay the cost for the same CPU-bound work but executed in Python. So while Redis is a scaling bottleneck I do believe it's just more cost efficient to run things like tree construction within Redis, and scaling out Redis to the degree we need it is actually not a problem.
Of course I say this without having it rigorously tested, but I don't think we should spend time on that. If this consumer was Rust then I would've been all-in on the local buffer approach.
The current process-spans consumer assumes that each span has a segment
ID. In the new world we need to construct segments and correlate spans
purely based on their parent-child relationship + timeouts.
Build a new redis-based spans buffer, patch it into the existing consumer behind a
CLI flag.
See https://github.com/getsentry/streaming-planning/issues/18