Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stack trace collection support #4269

Closed
wants to merge 18 commits into from

Conversation

vpellan
Copy link
Contributor

@vpellan vpellan commented Jan 9, 2025

What does this PR do?

This PR adds stack trace collection support using meta_struct

Motivation:

Stack trace collection is required for exploit prevention.

Change log entry

Yes. Fix stack trace not being with RASP SQLi signals

Additional Notes:

How to test the change?

@github-actions github-actions bot added integrations Involves tracing integrations appsec Application Security monitoring product tracing labels Jan 9, 2025
@datadog-datadog-prod-us1
Copy link
Contributor

datadog-datadog-prod-us1 bot commented Jan 9, 2025

Datadog Report

Branch report: vpellan/stack-trace-collection
Commit report: a6ab36f
Test service: dd-trace-rb

❌ 9 Failed (0 Known Flaky), 21946 Passed, 1476 Skipped, 5m 27.74s Total Time

❌ Failed Tests (9)

This report shows up to 5 failed tests.

  • Datadog::AppSec::ActionsHandler.handle calls both generate_stack and block_request when both are present - rspec - Details

    Expand for error
     undefined method \`trace' for nil:NilClass
     Did you mean?  trap
     
     Failure/Error: service_entry_operation = context.trace || context.span
     
     NoMethodError:
       undefined method \`trace' for nil:NilClass
       Did you mean?  trap
     ./lib/datadog/appsec/stack_trace.rb:82:in \`add_stack_trace'
     ./lib/datadog/appsec/actions_handler.rb:23:in \`generate_stack'
     ...
    
  • Datadog::AppSec::ActionsHandler.handle calls both generate_stack and generate_schema when both are present - rspec - Details

    Expand for error
     undefined method \`trace' for nil:NilClass
     Did you mean?  trap
     
     Failure/Error: service_entry_operation = context.trace || context.span
     
     NoMethodError:
       undefined method \`trace' for nil:NilClass
       Did you mean?  trap
     ./lib/datadog/appsec/stack_trace.rb:82:in \`add_stack_trace'
     ./lib/datadog/appsec/actions_handler.rb:23:in \`generate_stack'
     ...
    
  • Datadog::AppSec::ActionsHandler.handle calls both generate_stack and redirect_request when both are present - rspec - Details

    Expand for error
     undefined method \`trace' for nil:NilClass
     Did you mean?  trap
     
     Failure/Error: service_entry_operation = context.trace || context.span
     
     NoMethodError:
       undefined method \`trace' for nil:NilClass
       Did you mean?  trap
     ./lib/datadog/appsec/stack_trace.rb:82:in \`add_stack_trace'
     ./lib/datadog/appsec/actions_handler.rb:23:in \`generate_stack'
     ...
    
  • Datadog::AppSec::ActionsHandler.handle calls generate_stack with action parameters - rspec - Details

    Expand for error
     undefined method \`trace' for nil:NilClass
     Did you mean?  trap
     
     Failure/Error: service_entry_operation = context.trace || context.span
     
     NoMethodError:
       undefined method \`trace' for nil:NilClass
       Did you mean?  trap
     ./lib/datadog/appsec/stack_trace.rb:82:in \`add_stack_trace'
     ./lib/datadog/appsec/actions_handler.rb:23:in \`generate_stack'
     ...
    
  • Datadog::Tracing::Span#to_hash is expected to eq {:error=>0, :meta=>{}, :metrics=>{}, :name=>"my.span", :parent_id=>0, :resource=>"my.span", :service=>nil, :span_id=>34, :span_links=>[], :trace_id=>12, :type=>nil} - rspec - Details

    Expand for error
     expected: {:error=>0, :meta=>{}, :metrics=>{}, :name=>"my.span", :parent_id=>0, :resource=>"my.span", :service=>nil, :span_id=>34, :span_links=>[], :trace_id=>12, :type=>nil}
          got: {:error=>0, :meta=>{}, :meta_struct=>{}, :metrics=>{}, :name=>"my.span", :parent_id=>0, :resource=>"my.span", :service=>nil, :span_id=>34, :span_links=>[], :trace_id=>12, :type=>nil}
     
     (compared using ==)
     
     Diff:
     @@ -1,5 +1,6 @@
      :error => 0,
      :meta => {},
     +:meta_struct => {},
     ...
    

@pr-commenter
Copy link

pr-commenter bot commented Jan 9, 2025

Benchmarks

Benchmark execution time: 2025-03-03 17:52:52

Comparing candidate commit d68d25b in PR branch vpellan/stack-trace-collection with baseline commit 20d026a in branch vpellan/meta-struct.

Found 0 performance improvements and 1 performance regressions! Performance is the same for 30 metrics, 2 unstable metrics.

scenario:profiler - gvl benchmark samples

  • 🟥 throughput [-948.107op/s; -940.005op/s] or [-8.158%; -8.088%]

@vpellan vpellan force-pushed the vpellan/stack-trace-collection branch from f79044e to 77e9969 Compare January 14, 2025 15:40
@vpellan vpellan force-pushed the vpellan/stack-trace-collection branch from 5d75fc3 to 255e242 Compare January 21, 2025 16:17
@@ -35,7 +35,7 @@ def self.subscribe(engine, context)
next unless result.match?

yield result
throw(:block, true) unless result.actions.empty?
throw(:block, true) if result.actions.include?('block')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we changing this everywhere?

@datadog-datadog-prod-us1
Copy link
Contributor

datadog-datadog-prod-us1 bot commented Jan 23, 2025

Datadog Report

Branch report: vpellan/stack-trace-collection
Commit report: 365dd2f
Test service: dd-trace-rb

✅ 0 Failed, 22135 Passed, 1476 Skipped, 5m 2.11s Total Time

@codecov-commenter
Copy link

codecov-commenter commented Jan 24, 2025

Codecov Report

Attention: Patch coverage is 94.38669% with 27 lines in your changes missing coverage. Please review.

Project coverage is 97.69%. Comparing base (65dd37b) to head (365dd2f).

Files with missing lines Patch % Lines
...atadog/appsec/actions_handler/stack_trace/frame.rb 31.57% 13 Missing ⚠️
.../appsec/actions_handler/stack_trace/representor.rb 35.29% 11 Missing ⚠️
lib/datadog/appsec/actions_handler/stack_trace.rb 94.87% 2 Missing ⚠️
spec/support/thread_backtrace_helpers.rb 96.96% 1 Missing ⚠️
Additional details and impacted files
@@                   Coverage Diff                   @@
##           vpellan/meta-struct    #4269      +/-   ##
=======================================================
- Coverage                97.71%   97.69%   -0.02%     
=======================================================
  Files                     1370     1377       +7     
  Lines                    83106    83583     +477     
  Branches                  4227     4254      +27     
=======================================================
+ Hits                     81209    81659     +450     
- Misses                    1897     1924      +27     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@vpellan vpellan force-pushed the vpellan/stack-trace-collection branch from 5dfecd6 to 7aa25ec Compare January 31, 2025 12:10
@vpellan vpellan changed the base branch from master to vpellan/meta-struct January 31, 2025 12:11
@vpellan vpellan force-pushed the vpellan/meta-struct branch 2 times, most recently from b3dc5c2 to 0836e11 Compare January 31, 2025 12:48
@vpellan vpellan changed the title Add meta_struct and stack trace collection Add stack trace collection support Jan 31, 2025
@vpellan vpellan force-pushed the vpellan/meta-struct branch 4 times, most recently from a771a6d to 65dd37b Compare February 6, 2025 12:31
@vpellan vpellan force-pushed the vpellan/stack-trace-collection branch from 7aa25ec to c49806e Compare February 7, 2025 16:43
@vpellan vpellan force-pushed the vpellan/stack-trace-collection branch from c49806e to 9f10909 Compare February 7, 2025 16:46
@vpellan vpellan force-pushed the vpellan/meta-struct branch from 65dd37b to 630c10d Compare February 19, 2025 14:50
@vpellan vpellan requested a review from a team as a code owner February 19, 2025 14:50
@vpellan vpellan force-pushed the vpellan/stack-trace-collection branch from 365dd2f to 8bae22c Compare February 19, 2025 15:28
@vpellan vpellan requested review from y9v and Strech February 20, 2025 10:30
@vpellan vpellan force-pushed the vpellan/stack-trace-collection branch from 8bae22c to 8aa9b65 Compare February 20, 2025 13:39
- Make metastruct extends Forwardable and forward methods to @Metastruct
-add self.empty for when creating an empty metastruct (used in SpanOperation
- Extract Thread:Backtrace:Location to hash convertion
- Add Noop class for nil metastruct
@vpellan vpellan force-pushed the vpellan/stack-trace-collection branch from 8aa9b65 to 094cc5e Compare February 21, 2025 14:32

private

def generate_stack_trace: (Integer max_depth, Float top_percent) -> (Hash[Symbol, String | Integer | Array[Hash[Symbol, String | Integer | nil]] | nil ])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the return a missing type alias? And with type you can have it recursive

module AppSec
module ActionsHandler
class StackTraceInMetastruct
type stack_trace = Hash[Symbol, String | nil | Array[Datadog::AppSec::ActionsHandler::StackTraceCollection::stack_frame]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be opposite order

Suggested change
type stack_trace = Hash[Symbol, String | nil | Array[Datadog::AppSec::ActionsHandler::StackTraceCollection::stack_frame]]
type stack_trace = Hash[nil, Symbol, String | Array[Datadog::AppSec::ActionsHandler::StackTraceCollection::stack_frame]]

Comment on lines 40 to 44
locations = []
1000.times do
locations << self.locations.first
end
locations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will that work?

Suggested change
locations = []
1000.times do
locations << self.locations.first
end
locations
1000.times.map { self.locations.first }

Comment on lines 1044 to 1070
subject(:set_stack_trace_max_depth_top_percent) do
settings.appsec.stack_trace.max_depth_top_percent = stack_trace_max_depth_top_percent
end

context 'when given a value' do
let(:stack_trace_max_depth_top_percent) { 50 }

before { set_stack_trace_max_depth_top_percent }

it { expect(settings.appsec.stack_trace.max_depth_top_percent).to eq(50) }
end

context 'when given a negative value' do
let(:stack_trace_max_depth_top_percent) { -1 }

before { set_stack_trace_max_depth_top_percent }

it { expect(settings.appsec.stack_trace.max_depth_top_percent).to eq(0) }
end

context 'when given a value higher than 100' do
let(:stack_trace_max_depth_top_percent) { 101 }

before { set_stack_trace_max_depth_top_percent }

it { expect(settings.appsec.stack_trace.max_depth_top_percent).to eq(100) }
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all the examples could be simplified to examples like

context 'when given a value' do
  it 'sets max value to 0' do
    settings.appsec.stack_trace.max_depth_top_percent = -1
    expect(settings.appsec.stack_trace.max_depth_top_percent).to eq(0)
  end
end

end

context 'when stack trace is enabled and context contains only span' do
let(:context) { OpenStruct.new(span: span_op) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use verified double i.e (instance_double) here

let(:exploit_category) { Datadog::AppSec::Ext::EXPLOIT_PREVENTION_EVENT_CATEGORY }

before do
Datadog.configure do |c|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI if you go this way - you will need to clean the settings afterwards (see other tests where we do that)

Comment on lines 127 to 135
ThreadBacktraceHelper.locations_inside_nested_blocks.map.with_index do |location, index|
{
id: index,
text: location.to_s.encode('UTF-8'),
file: (location.absolute_path || location.path)&.encode('UTF-8'),
line: location.lineno,
function: location.label&.encode('UTF-8')
}
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH i thought that helper suppose to make things easier. I think here it's either missing something or else ...

@@ -97,4 +101,124 @@
end
end
end

describe '.generate_stack' do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH this test suite look overly complicated and violates certain test practices and it would be easier to go over in 1:1 instead of chatting here.

Comment on lines 49 to 50
stack = context.trace.nil? ? span_stack : trace_stack
stack.push({ language: 'ruby', id: utf8_stack_id, frames: stack_frames })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
stack = context.trace.nil? ? span_stack : trace_stack
stack.push({ language: 'ruby', id: utf8_stack_id, frames: stack_frames })
(context.trace || context.span).push(language: 'ruby', id: stack_id, frames: stack_frames)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not work as we are using the push method defined in StackTraceInMetastruct

return if config.max_collect != 0 && span_stack.count + trace_stack.count >= config.max_collect

# Generate stacktrace
utf8_stack_id = action_params['stack_id'].encode('UTF-8') if action_params['stack_id']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable will be nil if no params[stack_id] is given, right? I think the whole method will not work without stack_id, hence we always have it or of not - we should do nothing and report a telemetry or generate warning or so.

Because of that that line must be adjusted to

Suggested change
utf8_stack_id = action_params['stack_id'].encode('UTF-8') if action_params['stack_id']
stack_id = action_params['stack_id'].encode('UTF-8')

P.S it doesn't make sense to mention UTF-8 in the variable name because it's not important

Comment on lines 37 to 38
span_stack = ActionsHandler::StackTraceInMetastruct.create(context.span&.metastruct)
trace_stack = ActionsHandler::StackTraceInMetastruct.create(context.trace&.metastruct)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
span_stack = ActionsHandler::StackTraceInMetastruct.create(context.span&.metastruct)
trace_stack = ActionsHandler::StackTraceInMetastruct.create(context.trace&.metastruct)
span_stack = StackTraceInMetastruct.create(context.span&.metastruct)
trace_stack = StackTraceInMetastruct.create(context.trace&.metastruct)

module_function

def collect(max_depth:, top_percent:)
locations = (caller_locations || []).reject { |location| location.to_s.include?('lib/datadog') }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be converted into mutating to avoid allocation

Suggested change
locations = (caller_locations || []).reject { |location| location.to_s.include?('lib/datadog') }
locations = (caller_locations || [])
locations.reject! { |location| location.to_s.include?('lib/datadog') }

locations.map.with_index do |location, index|
{
id: index,
text: location.to_s.encode('UTF-8'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a second time we convert location into the string, I think it makes sense to do it once at the beginning of the collection

return
end

config = Datadog.configuration.appsec.stack_trace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be closer to the guard-clause on line 39

config = Datadog.configuration.appsec.stack_trace

# Check that the sum of stack_trace count in trace and entry_span does not exceed configuration
span_stack = ActionsHandler::StackTraceInMetastruct.create(context.span&.metastruct)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we calling span metastruct a span_stack? Should we rather call it span_metastruct? Same one line below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because StackTraceInMetastruct is specifically applying changes to '_dd.stack' field of metastruct. We give a metastruct as an input but we cannot do any changes to it other than on the stack traces

# Check that the sum of stack_trace count in trace and entry_span does not exceed configuration
span_stack = ActionsHandler::StackTraceInMetastruct.create(context.span&.metastruct)
trace_stack = ActionsHandler::StackTraceInMetastruct.create(context.trace&.metastruct)
return if config.max_collect != 0 && span_stack.count + trace_stack.count >= config.max_collect
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to avoid stack trace collection if the stack is longer than max_collect, or do we want to truncate the stack?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_collect is the maximum number of stack traces that we want in our trace, not the maximum number of frames in each stack trace (which is defined by max_depth). So if we already have the maximum number of stack traces in metastruct, we want to avoid collection.

Comment on lines +16 to +19
top_limit = (max_depth * top_percent / 100.0).round
bottom_limit = locations.size - (max_depth - top_limit)

locations.slice!(top_limit...bottom_limit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is top_limit lower than bottom_limit?

What are we trying to do here? Get the top n items from the array? Do we really need to calculate both limits, or can we just slice n items from one end?

Copy link
Contributor Author

@vpellan vpellan Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, if max_depth is 32 and top_percent is 75%, and we have more than 32 stack frames, we want to keep the 24th first stack traces, the 8th last, and drop everything in the middle. So what this method do is calculate the limits of the frames that we should drop, so we can just do a slice on the locations with that range

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to RFC, we have to take 2 slices - one from the top of the stack trace, and one from the bottom:

In the original RFC, some restrictions were placed on the layout of stack traces, namely:
Stack traces must contain a maximum of 32 frames.
When a stack trace is truncated, 24 frames should be from the bottom of the stack and 8 from the top.

The number of top and bottom frames on truncated stack traces can be considered a flexible requirement and it’s up to the library implementor to decide the most useful number of frames from each side of the stack for their language. In addition, since the user can configure the maximum number of frames through the use of DD_APPSEC_MAX_STACK_TRACE_DEPTH, the number of top and bottom frames for a truncated stack trace of arbitrary size must be computed dynamically based on the configuration value, with the simplest approach being a proportional one. 

Similarly, the default maximum number of frames might not be particularly useful in languages with notoriously deep stacks, for that reason, the requirement must be updated to ensure that each library can provide meaningful stack traces.

Given the above, the requirements can be amended as follows:
Stack traces must be limited to a maximum of 32 frames, or a more suitable higher value for the given language.
When a stack trace is truncated, libraries must collect some frames from both the top and bottom of the stack. As a recommendation, libraries may collect 75% of frames from the bottom of the stack and 25% from the top.

I think this is not what we are doing here?

Comment on lines +16 to +19
top_limit = (max_depth * top_percent / 100.0).round
bottom_limit = locations.size - (max_depth - top_limit)

locations.slice!(top_limit...bottom_limit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to RFC, we have to take 2 slices - one from the top of the stack trace, and one from the bottom:

In the original RFC, some restrictions were placed on the layout of stack traces, namely:
Stack traces must contain a maximum of 32 frames.
When a stack trace is truncated, 24 frames should be from the bottom of the stack and 8 from the top.

The number of top and bottom frames on truncated stack traces can be considered a flexible requirement and it’s up to the library implementor to decide the most useful number of frames from each side of the stack for their language. In addition, since the user can configure the maximum number of frames through the use of DD_APPSEC_MAX_STACK_TRACE_DEPTH, the number of top and bottom frames for a truncated stack trace of arbitrary size must be computed dynamically based on the configuration value, with the simplest approach being a proportional one. 

Similarly, the default maximum number of frames might not be particularly useful in languages with notoriously deep stacks, for that reason, the requirement must be updated to ensure that each library can provide meaningful stack traces.

Given the above, the requirements can be amended as follows:
Stack traces must be limited to a maximum of 32 frames, or a more suitable higher value for the given language.
When a stack trace is truncated, libraries must collect some frames from both the top and bottom of the stack. As a recommendation, libraries may collect 75% of frames from the bottom of the stack and 25% from the top.

I think this is not what we are doing here?

text: text,
file: location.absolute_path || location.path,
line: location.lineno,
function: location.label
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC specifies separate class_name and function fields, and #label is containing both class name and function name. We have to parse it

@y9v y9v force-pushed the vpellan/meta-struct branch 2 times, most recently from 1fe22fc to 2e8289c Compare March 6, 2025 12:34
@vpellan
Copy link
Contributor Author

vpellan commented Mar 6, 2025

according to RFC, we have to take 2 slices - one from the top of the stack trace, and one from the bottom. I think this is not what we are doing here?

The slice! method will remove the part in the middle and locations will then contains the top part and the bottom part only.

@y9v
Copy link
Member

y9v commented Mar 6, 2025

according to RFC, we have to take 2 slices - one from the top of the stack trace, and one from the bottom. I think this is not what we are doing here?

The slice! method will remove the part in the middle and locations will then contains the top part and the bottom part only.

sorry, my bad, you are right

@y9v y9v force-pushed the vpellan/meta-struct branch from d580c5e to a70a07e Compare March 10, 2025 09:11
Base automatically changed from vpellan/meta-struct to master March 10, 2025 10:15
@y9v y9v closed this Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
appsec Application Security monitoring product integrations Involves tracing integrations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants