-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
processor_sampling: new trace sampling processor #10029
Open
edsiper
wants to merge
22
commits into
master
Choose a base branch
from
processor_trace_sampling
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+3,319
−40
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This patch introduces a new trace sampling processor designed with a pluggable architecture, allowing easy extension to support multiple sampling strategies and backends. The initial implementation includes basic probabilistic sampling, with future patches planned to add additional sampling methods such as rate-limiting, latency-based, and tail-based sampling. The probabilistic sampler can be configured as follows: pipeline: inputs: - name: opentelemetry port: 4318 processors: traces: - name: sampling type: probabilistic debug: true rules: sampling_percentage: 40 outputs: - name: stdout match: '*' in this configuration: - debug mode (debug: true) is enabled, allowing detailed logging of sampling decisions. - sampling_percentage: 40 ensures that 40% of traces are retained, while the rest are discarded. - traces that pass sampling will be forwarded to the stdout output for visibility. Fluent Bit v4.0.0 * Copyright (C) 2015-2024 The Fluent Bit Authors * Fluent Bit is a CNCF sub-project under the umbrella of Fluentd * https://fluentbit.io ______ _ _ ______ _ _ ___ _____ | ___| | | | | ___ (_) | / || _ | | |_ | |_ _ ___ _ __ | |_ | |_/ /_| |_ __ __/ /| || |/' | | _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / / /_| || /| | | | | | |_| | __/ | | | |_ | |_/ / | |_ \ V /\___ |\ |_/ / \_| |_|\__,_|\___|_| |_|\__| \____/|_|\__| \_/ |_(_)___/ [2025/02/28 16:46:00] [ info] [fluent bit] version=4.0.0, commit=0e885e2d60, pid=778903 [2025/02/28 16:46:00] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128 [2025/02/28 16:46:00] [ info] [simd ] disabled [2025/02/28 16:46:00] [ info] [cmetrics] version=0.9.9 [2025/02/28 16:46:00] [ info] [ctraces ] version=0.6.0 [2025/02/28 16:46:00] [ info] [input:opentelemetry:opentelemetry.0] initializing [2025/02/28 16:46:00] [ info] [input:opentelemetry:opentelemetry.0] storage_strategy='memory' (memory only) [2025/02/28 16:46:00] [ info] [input:opentelemetry:opentelemetry.0] listening on 0.0.0.0:4318 [2025/02/28 16:46:00] [ info] [processor:sampling:sampling.0] initializing probabilistic sampling processor [2025/02/28 16:46:00] [ info] [sp] stream processor started [2025/02/28 16:46:00] [ info] [output:stdout:stdout.0] worker #0 started 🔍 Debug sampling 'probabilistic' (0x779068027940): before ┌─────────────────────────────────────────────────────────────────┐ │ trace_id=5b8efff798038103d269b633813fc60c │ ├─────────────────────────────────────────────────────────────────┤ │ spans: │ │ ├── id=eee19b7ec3c1b174 name=I'm a server span │ │ ├── id=eee19b7ec3c1b175 name=Child span of server span │ │ ├── id=eee19b7ec3c1b176 name=Database query │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ trace_id=6a9dfff798038103d269b633813fc60d │ ├─────────────────────────────────────────────────────────────────┤ │ spans: │ │ ├── id=fff19b7ec3c1b174 name=A span in another trace │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ trace_id=7c8efff798038103d269b633813fc60e │ ├─────────────────────────────────────────────────────────────────┤ │ spans: │ │ ├── id=0000000000000000 name=Slow request │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ trace_id=8d9efff798038103d269b633813fc60f │ ├─────────────────────────────────────────────────────────────────┤ │ spans: │ │ ├── id=0000000000000000 name=High traffic span │ │ ├── id=0000000000000000 name=Load testing event │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ trace_id=9a1bfff798038103d269b633813fc610 │ ├─────────────────────────────────────────────────────────────────┤ │ spans: │ │ ├── id=0000000000000000 name=Faulty transaction │ │ ├── id=0000000000000000 name=Database rollback │ └─────────────────────────────────────────────────────────────────┘ 🔍 Debug sampling 'probabilistic' (0x779068027940): after ┌─────────────────────────────────────────────────────────────────┐ │ trace_id=6a9dfff798038103d269b633813fc60d │ ├─────────────────────────────────────────────────────────────────┤ │ spans: │ │ ├── id=fff19b7ec3c1b174 name=A span in another trace │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ trace_id=7c8efff798038103d269b633813fc60e │ ├─────────────────────────────────────────────────────────────────┤ │ spans: │ │ ├── id=0000000000000000 name=Slow request │ └─────────────────────────────────────────────────────────────────┘ |-------------------- RESOURCE SPAN --------------------| resource: - attributes: - service.name: 'other.service' - dropped_attributes_count: 0 - schema_url: "" [scope_span] instrumentation scope: - name : other.library - version : 2.0.0 - dropped_attributes_count: 0 - attributes: undefined schema_url: "" [spans] [span #0 'A span in another trace'] - trace_id : 6a9dfff798038103d269b633813fc60d - span_id : fff19b7ec3c1b174 - parent_span_id : undefined - kind : 2 (server) - start_time : 1544712660000000000 - end_time : 1544712662000000000 - dropped_attributes_count: 0 - dropped_events_count : 0 - dropped_links_count : 0 - trace_state : (null) - status: - code : 0 - attributes: none - events: none - [links] |-------------------- RESOURCE SPAN --------------------| resource: - attributes: - service.name: 'latency.test.service' - dropped_attributes_count: 0 - schema_url: "" [scope_span] instrumentation scope: - name : latency.test.library - version : 3.0.0 - dropped_attributes_count: 0 - attributes: undefined schema_url: "" [spans] [span #0 'Slow request'] - trace_id : 7c8efff798038103d269b633813fc60e - span_id : 0000000000000000 - parent_span_id : undefined - kind : 2 (server) - start_time : 1544712660000000000 - end_time : 1544712675000000000 - dropped_attributes_count: 0 - dropped_events_count : 0 - dropped_links_count : 0 - trace_state : (null) - status: - code : 0 - attributes: none - events: none - [links] Signed-off-by: Eduardo Silva <[email protected]>
Signed-off-by: Eduardo Silva <[email protected]>
The processors callback for traces, supported only the incoming CTraces context which aimed to be modified by the processors. This patch changes the function prototype by adding a new optional argument to set a new output CTraces context. Behavior on return: - If the CTrace output context is NULL, it means the processor units should stop right away. The assumption is that the processor plugin did some buffering or simply discarded the context, no extra processing is needed. - if the CTrace output context is "different" than the incoming CTrace, it overrides the original context (original context is destroyed). Signed-off-by: Eduardo Silva <[email protected]>
Signed-off-by: Eduardo Silva <[email protected]>
…output Signed-off-by: Eduardo Silva <[email protected]>
Signed-off-by: Eduardo Silva <[email protected]>
Signed-off-by: Eduardo Silva <[email protected]>
Signed-off-by: Eduardo Silva <[email protected]>
Signed-off-by: Eduardo Silva <[email protected]>
Signed-off-by: Eduardo Silva <[email protected]>
Signed-off-by: Eduardo Silva <[email protected]>
Signed-off-by: Eduardo Silva <[email protected]>
Signed-off-by: Eduardo Silva <[email protected]>
Signed-off-by: Eduardo Silva <[email protected]>
a05ad05
to
e7b5317
Compare
For tail sampling type, this commit adds a new 'latency' conditional that allows to select spans based on their duration (end time - start time) by matching specific thresholds: - threshold_ms_low : specifies the lower latency threshold. Traces with a duration <= this value will be sampled. - threshold_ms_high: specifies the upper latency threshold. Traces with a duration >= this value will be sampled. note that the thresholds are set in milliseconds. usage: pipeline: inputs: - name: opentelemetry port: 4318 processors: traces: - name: sampling type: tail sampling_settings: decision_wait: 5s conditions: - type: latency threshold_ms_high: 200 threshold_ms_high: 3000 This tail-based sampling configuration waits 5 seconds before making a decision. It samples traces based on latency, capturing short traces of 200ms or less and long traces of 3000ms or more. Traces between 200ms and 3000ms are not sampled unless another condition applies. Signed-off-by: Eduardo Silva <[email protected]>
This commit introduces the string_attribute conditional to the sampling processor, allowing traces to be sampled based on specific span or resource attributes. Users can define key-value filters like http.method=POST to selectively capture relevant traces: pipeline: inputs: - name: opentelemetry port: 4318 processors: traces: - name: sampling type: tail sampling_settings: decision_wait: 5s conditions: - type: string_attribute key: "http.method" values: ["GET"] - type: string_attribute key: "service.name" values: ["payment-processing"] outputs: - name: stdout match: '*' Signed-off-by: Eduardo Silva <[email protected]>
This patch introduce the match_type property for the string_attribute conditional, it allows the values 'strict' (default) and 'exists'. usage: pipeline: inputs: - name: opentelemetry port: 4318 processors: traces: - name: sampling type: tail sampling_settings: decision_wait: 5s conditions: - type: string_attribute match_type: strict key: "http.method" values: ["GET"] - type: string_attribute match_type: exists key: "service.name" Signed-off-by: Eduardo Silva <[email protected]>
Signed-off-by: Eduardo Silva <[email protected]>
…onal This commit introduces support for the numeric_attribute conditional in the sampling processor, allowing traces to be sampled based on numeric attribute values. Users can define min and max thresholds. usage: pipeline: inputs: - name: opentelemetry port: 4318 processors: traces: - name: sampling type: tail sampling_settings: decision_wait: 2s conditions: - type: numeric_attribute key: "http.status_code" min_value: 400 max_value: 504 outputs: - name: stdout match: '*' Signed-off-by: Eduardo Silva <[email protected]>
Signed-off-by: Eduardo Silva <[email protected]>
Adds a new conditional that allows to sample only the traces that contains a specific range of spans associated to it. The following configuration options are available: - min_spans: minimum number of expected spans - max_spans: maximum number of spans found in the trace usage: pipeline: inputs: - name: opentelemetry port: 4318 processors: traces: - name: sampling type: tail sampling_settings: decision_wait: 2s conditions: - type: span_count min_spans: 3 max_spans: 5 Signed-off-by: Eduardo Silva <[email protected]>
This commit introduces support for the trace_state conditional in the sampling processor, allowing traces to be sampled based on metadata stored in the W3C trace_state field. configuration: - values: Defines a list of key-value pairs to match against the trace_state. A trace is sampled if any of the specified values exist in the trace_state. Matching follows OR logic, meaning at least one value must be present for sampling to occur. example: pipeline: inputs: - name: opentelemetry port: 4318 processors: traces: - name: sampling type: tail sampling_settings: decision_wait: 2s conditions: - type: trace_state values: [debug=false, priority=high] outputs: - name: stdout match: '*' Signed-off-by: Eduardo Silva <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces a new trace sampling processor designed with a pluggable architecture, allowing easy extension to support multiple sampling strategies and backends.
Samplers
probabilistic
(head sampling)tail
(tail sampling)Head Sampling
For head sampling needs, it provides a
probabilistic
sampler.Configuration example
Tail Sampling
This sampler supports conditionals to sample traces if any of their spans meet a specific condition.
Condition: status_code
Samples traces based on span status codes (OK, ERROR, UNSET)
example:
Condition: latency
Samples traces based on span duration. Uses threshold_ms_low to capture short traces and threshold_ms_high for long traces
example:
This tail-based sampling configuration waits 5 seconds before making a decision. It samples traces based on latency, capturing short traces of 200ms or less and long traces of 3000ms or more. Traces between 200ms and 3000ms are not sampled unless another condition applies.
Condition: string_attribute
The string_attribute conditional allows traces to be sampled based on specific span or resourceattributes. Users can define key-value filters (e.g., http.method=POST) to selectively capture relevant traces.
strict
ensures exact value matching, whileexists
checks if the attribute is present regardless of its value (note that string type is enforced)example:
Condition: numeric_attribute
Allows traces to be sampled based on numeric attribute values. Users can define min and max thresholds.
strict
matches exact values (default), whileexists
checks if the attribute is present, regardless of its value.example:
Condition: boolean_attribute
The boolean_attribute sampling policy filters traces based on a boolean attribute’s value (e.g., true or false). This allows selecting traces based on flags like error indicators or debug modes.
true
orfalse
.strict
matches exact values (default), whileexists
checks if the attribute is present, regardless of its value.example:
Condition: span_count
Allows to sample traces that contain a specific number of spans defined by a configurable range.
example:
Condition: trace_state
Uses a conditional for the
trace_state
field allowing traces to be sampled based on metadata stored in the W3C trace_state field.example:
Test
Manual test
Using this JSON file: trace_sampling_extended.json , try with curl:
curl -X POST -H "Content-Type: application/json" -d @trace_sampling_extended.json -i localhost:4318/v1/traces
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.