Skip to content

OTEP: Recording exceptions as log records #4333

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 35 commits into
base: main
Choose a base branch
from

Conversation

lmolkova
Copy link
Contributor

@lmolkova lmolkova commented Dec 10, 2024

Related to open-telemetry/semantic-conventions#1536

Changes

Recording exceptions as span events is problematic since it

  • ties recording exceptions to tracing/sampling
  • duplicates exceptions recorded by instrumented libraries on logs
  • does not leverage log features such as typical log filtering based on severity

This OTEP provides guidance on how to record exceptions using OpenTelemetry logs focusing on minimizing duplication and providing context to reduce the noise.

If accepted, the follow-up spec changes are expected to replace existing (stable) documents:


@lmolkova lmolkova changed the title OTEP: Recording exceptions and errors with OpenTelemetry OTEP: Recording exceptions as log based events Dec 10, 2024
@pellared
Copy link
Member

pellared commented Dec 10, 2024

I think this is a related issue:

@tedsuo tedsuo added the OTEP OpenTelemetry Enhancement Proposal (OTEP) label Dec 12, 2024
@lmolkova lmolkova force-pushed the exceptions-on-logs-otep branch 2 times, most recently from b06a09f to 76c7d85 Compare December 17, 2024 17:30
@lmolkova lmolkova force-pushed the exceptions-on-logs-otep branch from db27087 to e9f38aa Compare January 3, 2025 01:42
@carlosalberto
Copy link
Contributor

A small doubt:

If this instrumentation supports tracing, it should capture the error in the scope of the processing span.

Although (I think) it's not called out, I'm understanding exceptions should now be explicitly reported as both 1) Span.Event and 2) Log/Event? i.e. coding wise you should do this:

currentSpan.recordException(e);
logger.logRecordBuilder
    .addException(e);

Is this the case?

Copy link
Contributor

@jsuereth jsuereth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I'm very supportive. Just some nits and one mitigation I'd like to see called out/addressed.

@alexmojaki
Copy link

And it's not within this OTEP directly, but @lmolkova did confirm in a comment on this OTEP (#4333 (comment)) that yes the plan is for SDK methods to only record the stacktrace in a child log-event instead of a span event.

Where? This points to your comment. Are you sure you mean "the plan is for SDK methods to only record [...] log-event instead of span"?

My comment essentially contains this:

# Get tracer via SDK only
with tracer.start_as_current_span("foo"):

Am I correct that the goal is [for the code above to] emit the following? The differences being:

  1. Span events are gone, and the stacktrace will only be found in a child event-log.

@lmolkova quoted the last line above and said "yes".

Now, tracer.start_as_current_span is a method of the Python API (the abstract API, not just the concrete SDK) but it's admittedly not part of the OTEL spec. But I would be surprised if the answer changes for this code:

try:
    raise something
except Exception as e:
    span.record_exception(e)

@pellared

This comment was marked as outdated.

@adriangb
Copy link

I work with 2 backends one preferring Exceptions on SpanEvents, and another only support Exception via Logs. I'd love to know what am I missing

This surprises me a bit, in particular because logs were stabilized years after tracing and span events were, so I am confused as to how these backends would have existed (or in particular supported reporting of errors, which seems like table stakes for observability) before logs were stable. I must be missing something, but if indeed backends that support OTEL logs but not OTEL traces not for legacy reasons are widely in use you can certainly come at it from the point of view that there was no standard and this proposes one. My experience is that exceptions as span events was more of a de facto standard than exceptions as logs was but that may be biased.

I think it would be helpful to have some information on what the backends / SDKs currently do. In particular if SDKs for the most popular languages all behave a certain way out of the box that to me is a de facto standard, even if it is not officially the only way to do it per spec and there may be other SDKs out there that do it differently. I understand that some pain and breakage may be necessary to move the project forward but it's important to understand the cost / benefit weighted in terms of user impact.

ConvertLogRecordExceptionToSpanEvent

This is what I'm getting at with "mitigations are not simple": I imagine this might require configuring the logging SDK where it was previously unused, setting the ConvertLogRecordExceptionToSpanEvent config flag if it is even available / exposed, etc.

do not cover the complete range of real world scenarios a user is going to encounter

As an example of this as far as I know there would be no way currently to do tail sampling in the OTEL Collector on the basis of "keep the entire trace information for any trace that has a log that recorded an exception" whereas it is quite easy and documented to do "keep the entire trace information for any trace that has an exception span event" (as per Tail Sampling Processor
docs
).

Copy link

github-actions bot commented Mar 5, 2025

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Mar 5, 2025
@lmolkova lmolkova removed the Stale label Mar 6, 2025
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Mar 13, 2025
@lmolkova lmolkova removed the Stale label Mar 13, 2025
@trask
Copy link
Member

trask commented Mar 13, 2025

not stale, just currently blocked on #4430

Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

Copy link

github-actions bot commented Apr 2, 2025

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Apr 2, 2025
@lmolkova lmolkova removed the Stale label Apr 2, 2025
@pellared pellared changed the title OTEP: Recording exceptions as log based events OTEP: Recording exceptions as log records Apr 8, 2025
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Apr 16, 2025
@lmolkova lmolkova removed the Stale label Apr 16, 2025
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Apr 24, 2025
@lmolkova lmolkova removed the Stale label Apr 24, 2025
lmolkova added a commit that referenced this pull request Apr 25, 2025
Related to #4333, #4393, #4429, #4414

---------

Co-authored-by: Liudmila Molkova <[email protected]>
Co-authored-by: Robert Pająk <[email protected]>
Co-authored-by: jack-berg <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog.opentelemetry.io OTEP OpenTelemetry Enhancement Proposal (OTEP)
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.