Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Profiler] Implement interning API #917

Merged
merged 29 commits into from
Mar 25, 2025
Merged
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
670bda7
[Profiler] Implement interning API
danielsn Mar 10, 2025
33d4266
FFI API
danielsn Mar 10, 2025
17dba1f
c++ example
danielsn Mar 11, 2025
666f5d9
bulk string interning
danielsn Mar 12, 2025
d765dbf
add generations are equal api
danielsn Mar 12, 2025
47c5679
forgot to add file
danielsn Mar 12, 2025
c105ef0
Intern managed strings
danielsn Mar 12, 2025
fa37449
Add sample_start machinery, haven't wired it to the exporter yet
danielsn Mar 12, 2025
68fd98b
fix annoying 1.78 issue, and do more with the state machine
danielsn Mar 12, 2025
3af720d
Merge branch 'main' into dsn/r_and_d_week_mar_2024
danielsn Mar 12, 2025
6d3c4cd
fix to use the new slice based input
danielsn Mar 12, 2025
33380b9
Interned empty string, as Ivo requested
danielsn Mar 13, 2025
2a2be98
Merge branch 'main' into dsn/r_and_d_week_mar_2024
danielsn Mar 13, 2025
176b144
PR comment: rename long identifier
danielsn Mar 13, 2025
cb79525
PR comments, use nonnull instead of pointer
danielsn Mar 13, 2025
717cee1
YAGNI
danielsn Mar 13, 2025
d4bdb5c
Interned string constant
danielsn Mar 14, 2025
ecdd88f
Merge branch 'main' into dsn/r_and_d_week_mar_2024
danielsn Mar 20, 2025
f2318c7
fix const
danielsn Mar 20, 2025
693999f
fixups
danielsn Mar 20, 2025
b1c9534
typo
danielsn Mar 20, 2025
1dabeab
Update examples/ffi/profiles.c
danielsn Mar 20, 2025
a4afb69
Merge branch 'main' into dsn/r_and_d_week_mar_2024
danielsn Mar 21, 2025
b9df348
Fix exporter
danielsn Mar 21, 2025
8e8ab0f
profile_intern
danielsn Mar 21, 2025
f35e2df
kick ci
danielsn Mar 21, 2025
f781e60
fix warning on windows
danielsn Mar 24, 2025
65aacc2
Merge branch 'main' into dsn/r_and_d_week_mar_2024
danielsn Mar 24, 2025
73ac3d5
Merge branch 'main' into dsn/r_and_d_week_mar_2024
danielsn Mar 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions ddcommon-ffi/src/lib.rs
Original file line number Diff line number Diff line change
@@ -15,6 +15,7 @@ pub mod handle;
pub mod option;
pub mod result;
pub mod slice;
pub mod slice_mut;
pub mod string;
pub mod tags;
pub mod timespec;
@@ -27,6 +28,7 @@ pub use handle::*;
pub use option::*;
pub use result::*;
pub use slice::{CharSlice, Slice};
pub use slice_mut::MutSlice;
pub use string::*;
pub use timespec::*;
pub use vec::Vec;
277 changes: 277 additions & 0 deletions ddcommon-ffi/src/slice_mut.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,277 @@
// Copyright 2021-Present Datadog, Inc. https://www.datadoghq.com/
// SPDX-License-Identifier: Apache-2.0

use core::slice;
use serde::ser::Error;
use serde::Serializer;
use std::borrow::Cow;
use std::fmt::{Debug, Display, Formatter};
use std::hash::{Hash, Hasher};
use std::marker::PhantomData;
use std::os::raw::c_char;
use std::ptr::NonNull;
use std::str::Utf8Error;

#[repr(C)]
#[derive(Copy, Clone)]
pub struct MutSlice<'a, T: 'a> {
/// Should be non-null and suitably aligned for the underlying type. It is
/// allowed but not recommended for the pointer to be null when the len is
/// zero.
ptr: Option<NonNull<T>>,

/// The number of elements (not bytes) that `.ptr` points to. Must be less
/// than or equal to [isize::MAX].
len: usize,
_marker: PhantomData<&'a mut [T]>,
}

impl<'a, T: 'a> core::ops::Deref for MutSlice<'a, T> {
type Target = [T];

fn deref(&self) -> &Self::Target {
self.as_slice()
}
}

impl<T: Debug> Debug for MutSlice<'_, T> {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
self.as_slice().fmt(f)
}
}

/// Use to represent strings -- should be valid UTF-8.
pub type CharMutSlice<'a> = MutSlice<'a, c_char>;

/// Use to represent bytes -- does not need to be valid UTF-8.
pub type ByteMutSlice<'a> = MutSlice<'a, u8>;

#[inline]
fn is_aligned<T>(ptr: NonNull<T>) -> bool {
ptr.as_ptr() as usize % std::mem::align_of::<T>() == 0
}

pub trait AsBytes<'a> {
fn as_bytes(&self) -> &'a [u8];

#[inline]
fn try_to_utf8(&self) -> Result<&'a str, Utf8Error> {
std::str::from_utf8(self.as_bytes())
}

fn try_to_string(&self) -> Result<String, Utf8Error> {
Ok(self.try_to_utf8()?.to_string())
}

#[inline]
fn try_to_string_option(&self) -> Result<Option<String>, Utf8Error> {
Ok(Some(self.try_to_string()?).filter(|x| !x.is_empty()))
}

#[inline]
fn to_utf8_lossy(&self) -> Cow<'a, str> {
String::from_utf8_lossy(self.as_bytes())
}

#[inline]
/// # Safety
/// Must only be used when the underlying data was already confirmed to be utf8.
unsafe fn assume_utf8(&self) -> &'a str {
std::str::from_utf8_unchecked(self.as_bytes())
}
}

impl<'a> AsBytes<'a> for MutSlice<'a, u8> {
fn as_bytes(&self) -> &'a [u8] {
self.as_slice()
}
}

impl<'a, T: 'a> MutSlice<'a, T> {
/// Creates a valid empty slice (len=0, ptr is non-null).
// TODO, this can be const once MSRV >= 1.85
#[must_use]
pub fn empty() -> Self {
Self {
ptr: Some(NonNull::dangling()),
len: 0,
_marker: PhantomData,
}
}

/// # Safety
/// Uphold the same safety requirements as [std::str::from_raw_parts].
/// However, it is allowed but not recommended to provide a null pointer
/// when the len is 0.
// TODO, this can be const once MSRV >= 1.85
pub unsafe fn from_raw_parts(ptr: *mut T, len: usize) -> Self {
Self {
ptr: NonNull::new(ptr),
len,
_marker: PhantomData,
}
}

// TODO, this can be const once MSRV >= 1.85
pub fn new(slice: &mut [T]) -> Self {
Self {
ptr: NonNull::new(slice.as_mut_ptr()),
len: slice.len(),
_marker: PhantomData,
}
}

pub fn as_mut_slice(&mut self) -> &'a mut [T] {
if let Some(ptr) = self.ptr {
// Crashing immediately is likely better than ignoring these.
assert!(is_aligned(ptr));
assert!(self.len <= isize::MAX as usize);
unsafe { slice::from_raw_parts_mut(ptr.as_ptr(), self.len) }
} else {
// Crashing immediately is likely better than ignoring this.
assert_eq!(self.len, 0);
&mut []
}
}

pub fn as_slice(&self) -> &'a [T] {
if let Some(ptr) = self.ptr {
// Crashing immediately is likely better than ignoring these.
assert!(is_aligned(ptr));
assert!(self.len <= isize::MAX as usize);
unsafe { slice::from_raw_parts(ptr.as_ptr(), self.len) }
} else {
// Crashing immediately is likely better than ignoring this.
assert_eq!(self.len, 0);
&[]
}
}

pub fn into_slice(self) -> &'a [T] {
self.as_slice()
}

pub fn into_mut_slice(mut self) -> &'a mut [T] {
self.as_mut_slice()
}
}

impl<T> Default for MutSlice<'_, T> {
fn default() -> Self {
Self::empty()
}
}

impl<'a, T> Hash for MutSlice<'a, T>
where
MutSlice<'a, T>: AsBytes<'a>,
{
fn hash<H: Hasher>(&self, state: &mut H) {
state.write(self.as_bytes())
}
}

impl<'a, T> serde::Serialize for MutSlice<'a, T>
where
MutSlice<'a, T>: AsBytes<'a>,
{
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
serializer.serialize_str(self.try_to_utf8().map_err(Error::custom)?)
}
}

impl<'a, T> Display for MutSlice<'a, T>
where
MutSlice<'a, T>: AsBytes<'a>,
{
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
write!(f, "{}", self.try_to_utf8().map_err(|_| std::fmt::Error)?)
}
}

impl<'a, T: 'a> From<&'a mut [T]> for MutSlice<'a, T> {
fn from(s: &'a mut [T]) -> Self {
MutSlice::new(s)
}
}

impl<'a, T> From<&'a mut Vec<T>> for MutSlice<'a, T> {
fn from(value: &'a mut Vec<T>) -> Self {
MutSlice::new(value)
}
}

impl<'a> From<&'a mut str> for MutSlice<'a, c_char> {
fn from(s: &'a mut str) -> Self {
// SAFETY: Rust strings meet all the invariants required.
unsafe { MutSlice::from_raw_parts(s.as_mut_ptr().cast(), s.len()) }
}
}

#[cfg(test)]
mod tests {
use super::*;
use std::ptr;

#[derive(Debug, Eq, PartialEq)]
struct Foo(i64);

#[test]
fn slice_from_foo() {
let mut raw = Foo(42);
let ptr = &mut raw as *mut _;
let mut slice = unsafe { MutSlice::from_raw_parts(ptr, 1) };

let expected: &[Foo] = &[raw];
let actual: &[Foo] = slice.as_mut_slice();

assert_eq!(expected, actual)
}

#[test]
fn test_iterator() {
let slice: &mut [i32] = &mut [1, 2, 3];
let slice = MutSlice::from(slice);

let mut iter = slice.iter();

assert_eq!(Some(&1), iter.next());
assert_eq!(Some(&2), iter.next());
assert_eq!(Some(&3), iter.next());
}

#[test]
fn test_null_len0() {
let mut null_len0: MutSlice<u8> = MutSlice {
ptr: None,
len: 0,
_marker: PhantomData,
};
assert_eq!(null_len0.as_mut_slice(), &[]);
}

#[should_panic]
#[test]
fn test_null_panic() {
let mut null_len0: MutSlice<u8> = MutSlice {
ptr: None,
len: 1,
_marker: PhantomData,
};
_ = null_len0.as_mut_slice();
}

#[should_panic]
#[test]
fn test_long_panic() {
let mut dangerous: MutSlice<u8> = MutSlice {
ptr: Some(ptr::NonNull::dangling()),
len: isize::MAX as usize + 1,
_marker: PhantomData,
};
_ = dangerous.as_mut_slice();
}
}
4 changes: 4 additions & 0 deletions examples/ffi/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -30,6 +30,10 @@ add_executable(crashinfo crashinfo.cpp)
target_compile_features(crashinfo PRIVATE cxx_std_20)
target_link_libraries(crashinfo PRIVATE Datadog::Profiling)

add_executable(profile_intern profile_intern.cpp)
# needed for designated initializers
target_compile_features(profile_intern PRIVATE cxx_std_20)
target_link_libraries(profile_intern PRIVATE Datadog::Profiling)

if(CMAKE_CXX_COMPILER_ID MATCHES "MSVC")
target_compile_definitions(exporter PUBLIC _CRT_SECURE_NO_WARNINGS)
2 changes: 1 addition & 1 deletion examples/ffi/exporter.cpp
Original file line number Diff line number Diff line change
@@ -133,7 +133,7 @@ int main(int argc, char *argv[]) {
"\"platform\": {\"kernel\": \"Darwin Kernel 22.5.0\"}}");

auto res = ddog_prof_Exporter_set_timeout(exporter, 30000);
if (res.tag == DDOG_PROF_VOID_RESULT_ERR) {
if (res.tag == DDOG_VOID_RESULT_ERR) {
print_error("Failed to set the timeout", res.err);
ddog_Error_drop(&res.err);
return 1;
143 changes: 143 additions & 0 deletions examples/ffi/profile_intern.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
// Copyright 2024-Present Datadog, Inc. https://www.datadoghq.com/
// SPDX-License-Identifier: Apache-2.0

extern "C" {
#include <datadog/common.h>
#include <datadog/crashtracker.h>
#include <datadog/profiling.h>
}
#include <chrono>
#include <cstdint>
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <iostream>
#include <memory>
#include <optional>
#include <string>
#include <thread>
#include <vector>

static ddog_CharSlice to_slice_c_char(const char *s) { return {.ptr = s, .len = strlen(s)}; }
static ddog_CharSlice to_slice_c_char(const char *s, std::size_t size) {
return {.ptr = s, .len = size};
}
static ddog_CharSlice to_slice_string(std::string const &s) {
return {.ptr = s.data(), .len = s.length()};
}

static std::string to_string(ddog_CharSlice s) { return std::string(s.ptr, s.len); }

void print_error(const ddog_Error &err) {
auto charslice = ddog_Error_message(&err);
printf("%.*s\n", static_cast<int>(charslice.len), charslice.ptr);
}

#define CHECK_RESULT(typ, ok_tag) \
void check_result(typ result) { \
if (result.tag != ok_tag) { \
print_error(result.err); \
ddog_Error_drop(&result.err); \
exit(EXIT_FAILURE); \
} \
}

CHECK_RESULT(ddog_VoidResult, DDOG_VOID_RESULT_OK)

#define EXTRACT_RESULT(typ, uppercase) \
ddog_prof_##typ##Id extract_result(ddog_prof_##typ##Id_Result result) { \
if (result.tag != DDOG_PROF_##uppercase##_ID_RESULT_OK_GENERATIONAL_ID_##uppercase##_ID) { \
print_error(result.err); \
ddog_Error_drop(&result.err); \
exit(EXIT_FAILURE); \
} else { \
return result.ok; \
} \
}

EXTRACT_RESULT(Function, FUNCTION)
EXTRACT_RESULT(Label, LABEL)
EXTRACT_RESULT(LabelSet, LABEL_SET)
EXTRACT_RESULT(Location, LOCATION)
EXTRACT_RESULT(Mapping, MAPPING)
EXTRACT_RESULT(StackTrace, STACK_TRACE)
EXTRACT_RESULT(String, STRING)

void wait_for_user(std::string s) {
std::cout << s << std::endl;
getchar();
}

int main(void) {
const ddog_prof_ValueType wall_time = {
.type_ = to_slice_c_char("wall-time"),
.unit = to_slice_c_char("nanoseconds"),
};
const ddog_prof_Slice_ValueType sample_types = {&wall_time, 1};
const ddog_prof_Period period = {wall_time, 60};

ddog_prof_Profile_NewResult new_result = ddog_prof_Profile_new(sample_types, &period);
if (new_result.tag != DDOG_PROF_PROFILE_NEW_RESULT_OK) {
ddog_CharSlice message = ddog_Error_message(&new_result.err);
fprintf(stderr, "%.*s", (int)message.len, message.ptr);
ddog_Error_drop(&new_result.err);
exit(EXIT_FAILURE);
}

ddog_prof_Profile *profile = &new_result.ok;
auto root_function_name =
extract_result(ddog_prof_Profile_intern_string(profile, to_slice_c_char("{main}")));
auto root_file_name = extract_result(
ddog_prof_Profile_intern_string(profile, to_slice_c_char("/srv/example/index.php")));
auto root_mapping = extract_result(
ddog_prof_Profile_intern_mapping(profile, 0, 0, 0, root_file_name, INTERNED_EMPTY_STRING));
auto root_function = extract_result(ddog_prof_Profile_intern_function(
profile, root_function_name, INTERNED_EMPTY_STRING, root_file_name));
auto root_location = extract_result(ddog_prof_Profile_intern_location_with_mapping_id(
profile, root_mapping, root_function, 0, 0));
ddog_prof_Slice_LocationId locations = {.ptr = &root_location, .len = 1};
auto stacktrace = extract_result(ddog_prof_Profile_intern_stacktrace(profile, locations));

auto magic_label_key =
extract_result(ddog_prof_Profile_intern_string(profile, to_slice_c_char("magic_word")));
auto magic_label_val =
extract_result(ddog_prof_Profile_intern_string(profile, to_slice_c_char("abracadabra")));
auto magic_label =
extract_result(ddog_prof_Profile_intern_label_str(profile, magic_label_key, magic_label_val));

// Keep this id around, no need to reintern the same string over and over again.
auto counter_id =
extract_result(ddog_prof_Profile_intern_string(profile, to_slice_c_char("unique_counter")));

// wait_for_user("Press any key to start adding values ...");

std::chrono::time_point<std::chrono::system_clock> start = std::chrono::system_clock::now();
for (auto i = 0; i < 10000000; i++) {
auto counter_label = extract_result(ddog_prof_Profile_intern_label_num(profile, counter_id, i));
ddog_prof_LabelId label_array[2] = {magic_label, counter_label};
ddog_prof_Slice_LabelId label_slice = {.ptr = label_array, .len = 2};
auto labels = extract_result(ddog_prof_Profile_intern_labelset(profile, label_slice));

int64_t value = i * 10;
ddog_Slice_I64 values = {.ptr = &value, .len = 1};
int64_t timestamp = 3 + 800 * i;
check_result(ddog_prof_Profile_intern_sample(profile, stacktrace, values, labels, timestamp));
}
std::chrono::time_point<std::chrono::system_clock> end = std::chrono::system_clock::now();
std::chrono::duration<double> elapsed_seconds = end - start;
std::cout << "elapsed time: " << elapsed_seconds.count() << "s" << std::endl;

// wait_for_user("Press any key to reset and drop...");

ddog_prof_Profile_Result reset_result = ddog_prof_Profile_reset(profile);
if (reset_result.tag != DDOG_PROF_PROFILE_RESULT_OK) {
ddog_CharSlice message = ddog_Error_message(&reset_result.err);
fprintf(stderr, "%.*s", (int)message.len, message.ptr);
ddog_Error_drop(&reset_result.err);
}
ddog_prof_Profile_drop(profile);

// wait_for_user("Press any key to exit...");

return EXIT_SUCCESS;
}
2 changes: 0 additions & 2 deletions examples/ffi/profiles.c
Original file line number Diff line number Diff line change
@@ -66,8 +66,6 @@ int main(void) {
}
ddog_prof_Profile_drop(profile);

printf("Press any key to exit...");
getchar();

return 0;
}
18 changes: 9 additions & 9 deletions profiling-ffi/Cargo.toml
Original file line number Diff line number Diff line change
@@ -33,20 +33,20 @@ build_common = { path = "../build-common" }

[dependencies]
anyhow = "1.0"
data-pipeline-ffi = { path = "../data-pipeline-ffi", default-features = false, optional = true }
datadog-crashtracker-ffi = { path = "../crashtracker-ffi", default-features = false, optional = true}
datadog-library-config-ffi = { path = "../library-config-ffi", default-features = false, optional = true }
datadog-profiling = { path = "../profiling" }
hyper = { version = "1.6", features = ["http1", "client"] }
http-body-util = "0.1"
ddcommon = { path = "../ddcommon"}
ddcommon-ffi = { path = "../ddcommon-ffi", default-features = false }
ddtelemetry-ffi = { path = "../ddtelemetry-ffi", default-features = false, optional = true, features = ["expanded_builder_macros"] }
function_name = "0.3.0"
futures = { version = "0.3", default-features = false }
http-body-util = "0.1"
hyper = { version = "1.6", features = ["http1", "client"] }
libc = "0.2"
tokio-util = "0.7.1"
serde_json = { version = "1.0" }
futures = { version = "0.3", default-features = false }
symbolizer-ffi = { path = "../symbolizer-ffi", optional = true, default-features = false }
symbolic-demangle = { version = "12.8.0", default-features = false, features = ["rust", "cpp", "msvc"] }
symbolic-common = "12.8.0"
data-pipeline-ffi = { path = "../data-pipeline-ffi", default-features = false, optional = true }
datadog-library-config-ffi = { path = "../library-config-ffi", default-features = false, optional = true }
function_name = "0.3.0"
symbolic-demangle = { version = "12.8.0", default-features = false, features = ["rust", "cpp", "msvc"] }
symbolizer-ffi = { path = "../symbolizer-ffi", optional = true, default-features = false }
tokio-util = "0.7.1"
35 changes: 35 additions & 0 deletions profiling-ffi/cbindgen.toml
Original file line number Diff line number Diff line change
@@ -52,6 +52,41 @@ renaming_overrides_prefixing = true
"ManagedStringId" = "ddog_prof_ManagedStringId"
"StringWrapper" = "ddog_StringWrapper"
"StringWrapperResult" = "ddog_StringWrapperResult"
"VoidResult" = "ddog_VoidResult"

"CbindgenIsDumbStringId" = "ddog_prof_StringId"

"Slice_GenerationalIdLabelId" = "ddog_prof_Slice_LabelId"
"Slice_GenerationalIdLocationId" = "ddog_prof_Slice_LocationId"

"GenerationalId_FunctionId" = "ddog_prof_FunctionId"
"Result_GenerationalIdFunctionId" = "ddog_prof_FunctionId_Result"
"FunctionId" = "OpaqueFunctionId"

"GenerationalId_LabelId" = "ddog_prof_LabelId"
"Result_GenerationalIdLabelId" = "ddog_prof_LabelId_Result"
"LabelId" = "OpaqueLabelId"

"GenerationalId_LabelSetId" = "ddog_prof_LabelSetId"
"Result_GenerationalIdLabelSetId" = "ddog_prof_LabelSetId_Result"
"LabelSetId" = "OpaqueLabelSetId"

"GenerationalId_LocationId" = "ddog_prof_LocationId"
"Result_GenerationalIdLocationId" = "ddog_prof_LocationId_Result"
"LocationId" = "OpaqueLocationId"

"GenerationalId_MappingId" = "ddog_prof_MappingId"
"Result_GenerationalIdMappingId" = "ddog_prof_MappingId_Result"
"MappingId" = "OpaqueMappingId"

"GenerationalId_StackTraceId" = "ddog_prof_StackTraceId"
"Result_GenerationalIdStackTraceId" = "ddog_prof_StackTraceId_Result"
"StackTraceId" = "OpaqueStackTraceId"

"GenerationalId_StringId" = "ddog_prof_StringId"
"Result_GenerationalIdStringId" = "ddog_prof_StringId_Result"
"StringId" = "OpaqueStringId"


"HandleProfileExporter" = "ddog_prof_ProfileExporter"
"Handle_ProfileExporter" = "ddog_prof_ProfileExporter"
Original file line number Diff line number Diff line change
@@ -515,7 +515,7 @@ pub unsafe extern "C" fn ddog_prof_Profile_add(
.into()
}

unsafe fn profile_ptr_to_inner<'a>(
pub(crate) unsafe fn profile_ptr_to_inner<'a>(
profile_ptr: *mut Profile,
) -> anyhow::Result<&'a mut internal::Profile> {
match profile_ptr.as_mut() {
445 changes: 445 additions & 0 deletions profiling-ffi/src/profiles/interning_api.rs

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions profiling-ffi/src/profiles/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
// Copyright 2021-Present Datadog, Inc. https://www.datadoghq.com/
// SPDX-License-Identifier: Apache-2.0

mod datatypes;
mod interning_api;
21 changes: 7 additions & 14 deletions profiling-ffi/src/string_storage.rs
Original file line number Diff line number Diff line change
@@ -2,6 +2,7 @@
// SPDX-License-Identifier: Apache-2.0

use anyhow::Context;
use datadog_profiling::api::ManagedStringId;
use datadog_profiling::collections::string_storage::ManagedStringStorage as InternalManagedStringStorage;
use ddcommon_ffi::slice::AsBytes;
use ddcommon_ffi::{CharSlice, Error, MaybeError, Slice, StringWrapperResult};
@@ -11,12 +12,6 @@ use std::num::NonZeroU32;
use std::sync::Arc;
use std::sync::Mutex;

#[derive(Copy, Clone, Debug, Eq, PartialEq)]
#[repr(C)]
pub struct ManagedStringId {
pub value: u32,
}

// A note about this being Copy:
// We're writing code for C with C semantics but with Rust restrictions still
// around. In terms of C, this is just a pointer with some unknown lifetime
@@ -74,7 +69,7 @@ pub unsafe extern "C" fn ddog_prof_ManagedStringStorage_intern(
) -> ManagedStringStorageInternResult {
// Empty strings always get assigned id 0, no need to check.
if string.is_empty() {
return anyhow::Ok(ManagedStringId { value: 0 }).into();
return anyhow::Ok(ManagedStringId::empty()).into();
}

(|| {
@@ -85,7 +80,7 @@ pub unsafe extern "C" fn ddog_prof_ManagedStringStorage_intern(
.map_err(|_| anyhow::anyhow!("string storage lock was poisoned"))?
.intern(string.try_to_utf8()?)?;

anyhow::Ok(ManagedStringId { value: string_id })
anyhow::Ok(ManagedStringId::new(string_id))
})()
.context("ddog_prof_ManagedStringStorage_intern failed")
.into()
@@ -126,11 +121,9 @@ pub unsafe extern "C" fn ddog_prof_ManagedStringStorage_intern_all(

for (output_id, input_str) in output_slice.iter_mut().zip(strings.iter()) {
let string_id = if input_str.is_empty() {
ManagedStringId { value: 0 }
ManagedStringId::empty()
} else {
ManagedStringId {
value: write_locked_storage.intern(input_str.try_to_utf8()?)?,
}
ManagedStringId::new(write_locked_storage.intern(input_str.try_to_utf8()?)?)
};
output_id.write(string_id);
}
@@ -302,7 +295,7 @@ mod tests {

// We're going to intern the same group of strings twice to make sure
// that we get the same ids.
let mut ids_rs1 = [ManagedStringId { value: 0 }; 2];
let mut ids_rs1 = [ManagedStringId::empty(); 2];
let ids1 = ids_rs1.as_mut_ptr();
let result = unsafe {
ddog_prof_ManagedStringStorage_intern_all(storage, strings, ids1.cast(), strings.len())
@@ -311,7 +304,7 @@ mod tests {
panic!("{err}");
}

let mut ids_rs2 = [ManagedStringId { value: 0 }; 2];
let mut ids_rs2 = [ManagedStringId::empty(); 2];
let ids2 = ids_rs2.as_mut_ptr();
let result = unsafe {
ddog_prof_ManagedStringStorage_intern_all(storage, strings, ids2.cast(), strings.len())
5 changes: 5 additions & 0 deletions profiling/src/api.rs
Original file line number Diff line number Diff line change
@@ -25,11 +25,16 @@ pub struct Period<'a> {
}

#[derive(Copy, Clone, Default, Debug, Eq, PartialEq, PartialOrd, Ord, Hash)]
#[repr(C)]
pub struct ManagedStringId {
pub value: u32,
}

impl ManagedStringId {
pub fn empty() -> Self {
Self::new(0)
}

pub fn new(value: u32) -> Self {
ManagedStringId { value }
}
2 changes: 1 addition & 1 deletion profiling/src/collections/identifiable/string_id.rs
Original file line number Diff line number Diff line change
@@ -4,7 +4,7 @@
use super::*;

#[derive(Copy, Clone, Default, Debug, Eq, PartialEq, PartialOrd, Ord, Hash)]
#[repr(transparent)]
#[repr(C)]
pub struct StringId(u32);

impl StringId {
2 changes: 1 addition & 1 deletion profiling/src/internal/function.rs
Original file line number Diff line number Diff line change
@@ -31,7 +31,7 @@ impl PprofItem for Function {
}

#[derive(Copy, Clone, Debug, Eq, PartialEq, Hash, PartialOrd, Ord)]
#[repr(transparent)]
#[repr(C)]
pub struct FunctionId(NonZeroU32);

impl Id for FunctionId {
4 changes: 2 additions & 2 deletions profiling/src/internal/label.rs
Original file line number Diff line number Diff line change
@@ -81,7 +81,7 @@ impl Item for Label {
}

#[derive(Copy, Clone, Debug, Eq, PartialEq, Hash, PartialOrd, Ord)]
#[repr(transparent)]
#[repr(C)]
pub struct LabelId(u32);

impl Id for LabelId {
@@ -129,7 +129,7 @@ impl Item for LabelSet {
}

#[derive(Copy, Clone, Debug, Eq, PartialEq, Hash)]
#[repr(transparent)]
#[repr(C)]
#[cfg_attr(test, derive(bolero::generator::TypeGenerator))]
pub struct LabelSetId(u32);

2 changes: 1 addition & 1 deletion profiling/src/internal/location.rs
Original file line number Diff line number Diff line change
@@ -38,7 +38,7 @@ impl PprofItem for Location {
}

#[derive(Copy, Clone, Debug, Eq, PartialEq, Hash, PartialOrd, Ord)]
#[repr(transparent)]
#[repr(C)]
pub struct LocationId(NonZeroU32);

impl Id for LocationId {
2 changes: 1 addition & 1 deletion profiling/src/internal/mapping.rs
Original file line number Diff line number Diff line change
@@ -47,7 +47,7 @@ impl PprofItem for Mapping {
}

#[derive(Copy, Clone, Debug, Eq, PartialEq, Hash, PartialOrd, Ord)]
#[repr(transparent)]
#[repr(C)]
pub struct MappingId(NonZeroU32);

impl Id for MappingId {
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
// Copyright 2025-Present Datadog, Inc. https://www.datadoghq.com/
// SPDX-License-Identifier: Apache-2.0

use std::sync::atomic::AtomicU64;

/// Opaque identifier for the profiler generation
#[derive(Clone, Copy, PartialEq, Eq)]
#[repr(C)]
pub struct Generation {
id: u64,
}

impl Generation {
const IMMORTAL: Self = Self { id: u64::MAX };

/// The only way to create a generation. Guaranteed to give a new value each time.
pub fn new() -> Self {
static COUNTER: AtomicU64 = AtomicU64::new(0);
Self {
id: COUNTER.fetch_add(1, std::sync::atomic::Ordering::SeqCst),
}
}
}
impl Default for Generation {
fn default() -> Self {
Self::new()
}
}

#[repr(C)]
pub struct GenerationalId<T: Copy> {
generation: Generation,
id: T,
}

impl<T: Copy> GenerationalId<T> {
pub fn get(&self, expected_generation: Generation) -> anyhow::Result<T> {
anyhow::ensure!(
self.generation == expected_generation || self.generation == Generation::IMMORTAL
);
Ok(self.id)
}

pub const fn new(id: T, generation: Generation) -> Self {
Self { id, generation }
}

pub const fn new_immortal(id: T) -> Self {
Self {
id,
generation: Generation::IMMORTAL,
}
}
}
234 changes: 234 additions & 0 deletions profiling/src/internal/profile/interning_api/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,234 @@
// Copyright 2025-Present Datadog, Inc. https://www.datadoghq.com/
// SPDX-License-Identifier: Apache-2.0

mod generational_ids;
pub use generational_ids::*;

use crate::api::ManagedStringId;
use crate::collections::identifiable::{Dedup, StringId};
use crate::internal::{
Function, FunctionId, Label, LabelId, LabelSet, LabelSetId, Location, LocationId, Mapping,
MappingId, Profile, Sample, StackTrace, StackTraceId, Timestamp,
};
use std::sync::atomic::Ordering::SeqCst;

impl Profile {
pub fn intern_function(
&mut self,
name: GenerationalId<StringId>,
system_name: GenerationalId<StringId>,
filename: GenerationalId<StringId>,
) -> anyhow::Result<GenerationalId<FunctionId>> {
let function = Function {
name: name.get(self.generation)?,
system_name: system_name.get(self.generation)?,
filename: filename.get(self.generation)?,
};
let id = self.functions.dedup(function);
Ok(GenerationalId::new(id, self.generation))
}

pub fn intern_label_num(
&mut self,
key: GenerationalId<StringId>,
val: i64,
unit: Option<GenerationalId<StringId>>,
) -> anyhow::Result<GenerationalId<LabelId>> {
let key = key.get(self.generation)?;
let unit = unit.map(|u| u.get(self.generation)).transpose()?;
let id = self.labels.dedup(Label::num(key, val, unit));
Ok(GenerationalId::new(id, self.generation))
}

pub fn intern_label_str(
&mut self,
key: GenerationalId<StringId>,
val: GenerationalId<StringId>,
) -> anyhow::Result<GenerationalId<LabelId>> {
let key = key.get(self.generation)?;
let val = val.get(self.generation)?;
let id = self.labels.dedup(Label::str(key, val));
Ok(GenerationalId::new(id, self.generation))
}

pub fn intern_labelset(
&mut self,
labels: &[GenerationalId<LabelId>],
) -> anyhow::Result<GenerationalId<LabelSetId>> {
let labels = labels
.iter()
.map(|l| l.get(self.generation))
.collect::<anyhow::Result<Vec<_>>>()?;
let labels = LabelSet::new(labels);
let id = self.label_sets.dedup(labels);
Ok(GenerationalId::new(id, self.generation))
}

pub fn intern_location(
&mut self,
mapping_id: Option<GenerationalId<MappingId>>,
function_id: GenerationalId<FunctionId>,
address: u64,
line: i64,
) -> anyhow::Result<GenerationalId<LocationId>> {
let location = Location {
mapping_id: mapping_id.map(|id| id.get(self.generation)).transpose()?,
function_id: function_id.get(self.generation)?,
address,
line,
};
let id = self.locations.dedup(location);
Ok(GenerationalId::new(id, self.generation))
}

pub fn intern_managed_string(
&mut self,
s: ManagedStringId,
) -> anyhow::Result<GenerationalId<StringId>> {
let id = self.resolve(s)?;
Ok(GenerationalId::new(id, self.generation))
}

pub fn intern_managed_strings(
&mut self,
s: &[ManagedStringId],
out: &mut [GenerationalId<StringId>],
) -> anyhow::Result<()> {
anyhow::ensure!(s.len() == out.len());
for i in 0..s.len() {
out[i] = self.intern_managed_string(s[i])?;
}
Ok(())
}

pub fn intern_mapping(
&mut self,
memory_start: u64,
memory_limit: u64,
file_offset: u64,
filename: GenerationalId<StringId>,
build_id: GenerationalId<StringId>,
) -> anyhow::Result<GenerationalId<MappingId>> {
let mapping = Mapping {
memory_start,
memory_limit,
file_offset,
filename: filename.get(self.generation)?,
build_id: build_id.get(self.generation)?,
};
let id = self.mappings.dedup(mapping);
Ok(GenerationalId::new(id, self.generation))
}

pub fn intern_sample(
&mut self,
stacktrace: GenerationalId<StackTraceId>,
values: &[i64],
labels: GenerationalId<LabelSetId>,
timestamp: Option<Timestamp>,
) -> anyhow::Result<()> {
// TODO: validate sample labels? Or should we do that when we make the label set?
anyhow::ensure!(
values.len() == self.sample_types.len(),
"expected {} sample types, but sample had {} sample types",
self.sample_types.len(),
values.len(),
);
let stacktrace = stacktrace.get(self.generation)?;
let labels = labels.get(self.generation)?;

self.observations
.add(Sample::new(labels, stacktrace), timestamp, values)
}

pub fn intern_stacktrace(
&mut self,
locations: &[GenerationalId<LocationId>],
) -> anyhow::Result<GenerationalId<StackTraceId>> {
let locations = locations
.iter()
.map(|l| l.get(self.generation))
.collect::<anyhow::Result<Vec<_>>>()?;
let stacktrace = StackTrace { locations };
let id = self.stack_traces.dedup(stacktrace);
Ok(GenerationalId::new(id, self.generation))
}

pub const INTERNED_EMPTY_STRING: GenerationalId<StringId> =
GenerationalId::new_immortal(StringId::ZERO);

pub fn intern_string(&mut self, s: &str) -> anyhow::Result<GenerationalId<StringId>> {
if s.is_empty() {
Ok(Self::INTERNED_EMPTY_STRING)
} else {
Ok(GenerationalId::new(self.intern(s), self.generation))
}
}

pub fn intern_strings(
&mut self,
s: &[&str],
out: &mut [GenerationalId<StringId>],
) -> anyhow::Result<()> {
anyhow::ensure!(s.len() == out.len());
for i in 0..s.len() {
out[i] = self.intern_string(s[i])?;
}
Ok(())
}

// Simple synchronization between samples and profile rotation/export.
// Interning a sample may require several calls to the profiler to intern intermediate values,
// which are not inherently atomic. Since these intermediate values are tied to a particular
// profiler generation, and are invalidated when the generation changes, some coordination must
// occur between sampling and profile rotation/export.
// When the generation changes, one of three things can happen:
// 1. The sample can be dropped.
// 2. The sample can be recreated and interned into the new profile.
// 3. The profile rotation should wait until the sampling operation is complete.
//
// This API provides a mechanism for samples to pause rotation until they complete, and
// for samples to be notified that a rotation is in progress so they can wait to begin.
// There are probably better ways, and maybe we should have a notification mechanism.
// But for now this should be enough.
const FLAG: u64 = u32::MAX as u64;

/// Prevent any new samples from starting.
/// Returns the number of remaining samples.
pub fn sample_block(&mut self) -> anyhow::Result<u64> {
let current = self.active_samples.fetch_add(Self::FLAG, SeqCst);
if current >= Self::FLAG {
self.active_samples.fetch_sub(Self::FLAG, SeqCst);
}
Ok(current % Self::FLAG)
}

pub fn sample_end(&mut self) -> anyhow::Result<()> {
self.active_samples.fetch_sub(1, SeqCst);
Ok(())
}

pub fn sample_start(&mut self) -> anyhow::Result<()> {
let old = self.active_samples.fetch_add(1, SeqCst);
if old >= Self::FLAG {
self.active_samples.fetch_sub(1, SeqCst);
anyhow::bail!("Can't start sample, export in progress");
}
Ok(())
}

pub fn samples_active(&mut self) -> anyhow::Result<u64> {
let current = self.active_samples.load(SeqCst);
Ok(current % Self::FLAG)
}

pub fn samples_are_blocked(&mut self) -> anyhow::Result<bool> {
let current = self.active_samples.load(SeqCst);
Ok(current >= Self::FLAG)
}

pub fn samples_are_drained(&mut self) -> anyhow::Result<bool> {
let current = self.active_samples.load(SeqCst);
Ok(current % Self::FLAG == 0)
}
}
21 changes: 19 additions & 2 deletions profiling/src/internal/profile/mod.rs
Original file line number Diff line number Diff line change
@@ -4,6 +4,8 @@
#[cfg(test)]
mod fuzz_tests;

pub mod interning_api;

use self::api::UpscalingInfo;
use super::*;
use crate::api;
@@ -15,10 +17,11 @@ use crate::iter::{IntoLendingIterator, LendingIterator};
use crate::pprof::sliced_proto::*;
use crate::serializer::CompressedProtobufSerializer;
use anyhow::Context;
use interning_api::Generation;
use std::borrow::Cow;
use std::collections::HashMap;
use std::sync::Arc;
use std::sync::Mutex;
use std::sync::atomic::AtomicU64;
use std::sync::{Arc, Mutex};
use std::time::{Duration, SystemTime};

pub struct Profile {
@@ -30,8 +33,10 @@ pub struct Profile {
/// When profiles are reset, the period needs to be preserved. This
/// stores it in a way that does not depend on the string table.
owned_period: Option<owned_types::Period>,
active_samples: AtomicU64,
endpoints: Endpoints,
functions: FxIndexSet<Function>,
generation: interning_api::Generation,
labels: FxIndexSet<Label>,
label_sets: FxIndexSet<LabelSet>,
locations: FxIndexSet<Location>,
@@ -227,6 +232,10 @@ impl Profile {
Ok(())
}

pub fn get_generation(&self) -> anyhow::Result<Generation> {
Ok(self.generation)
}

pub fn resolve(&mut self, id: ManagedStringId) -> anyhow::Result<StringId> {
let non_empty_string_id = if let Some(valid_id) = NonZeroU32::new(id.value) {
valid_id
@@ -288,6 +297,12 @@ impl Profile {
/// Returns the previous Profile on success.
#[inline]
pub fn reset_and_return_previous(&mut self) -> anyhow::Result<Profile> {
let current_active_samples = self.sample_block()?;
anyhow::ensure!(
current_active_samples == 0,
"Can't rotate the profile, there are still active samples. Drain them and try again."
);

let mut profile = Profile::new_internal(
self.owned_period.take(),
self.owned_sample_types.take(),
@@ -663,8 +678,10 @@ impl Profile {
let mut profile = Self {
owned_period,
owned_sample_types,
active_samples: Default::default(),
endpoints: Default::default(),
functions: Default::default(),
generation: Generation::new(),
labels: Default::default(),
label_sets: Default::default(),
locations: Default::default(),
2 changes: 1 addition & 1 deletion profiling/src/internal/sample.rs
Original file line number Diff line number Diff line change
@@ -24,7 +24,7 @@ impl Sample {
}

#[derive(Copy, Clone, Debug, Eq, PartialEq, Hash)]
#[repr(transparent)]
#[repr(C)]
pub struct SampleId(u32);

impl SampleId {
2 changes: 1 addition & 1 deletion profiling/src/internal/stack_trace.rs
Original file line number Diff line number Diff line change
@@ -15,7 +15,7 @@ impl Item for StackTrace {
}

#[derive(Copy, Clone, Debug, Eq, PartialEq, Hash)]
#[repr(transparent)]
#[repr(C)]
#[cfg_attr(test, derive(bolero::generator::TypeGenerator))]
pub struct StackTraceId(u32);


Unchanged files with check annotations Beta

ARG BUILDER_IMAGE=debian_builder
### Debian builder
FROM rust:1-slim-buster as debian_builder

Check warning on line 7 in tools/docker/Dockerfile.build

GitHub Actions / FFI alpine-build via docker bake

The 'as' keyword should match the case of the 'from' keyword

FromAsCasing: 'as' and 'FROM' keywords' casing do not match More info: https://docs.docker.com/go/dockerfile/rule/from-as-casing/
ENV CARGO_HOME="/root/.cargo"
WORKDIR /build
RUN cargo install cbindgen; mv /root/.cargo/bin/cbindgen /usr/bin/; rm -rf /root/.cargo
### Debian buildplatform builder
FROM --platform=$BUILDPLATFORM rust:1-slim-buster as debian_builder_platform_native

Check warning on line 13 in tools/docker/Dockerfile.build

GitHub Actions / FFI alpine-build via docker bake

The 'as' keyword should match the case of the 'from' keyword

FromAsCasing: 'as' and 'FROM' keywords' casing do not match More info: https://docs.docker.com/go/dockerfile/rule/from-as-casing/
ENV CARGO_HOME="/root/.cargo"
WORKDIR /build
### Alpine builder
FROM ${ALPINE_BASE_IMAGE} as alpine_base

Check warning on line 18 in tools/docker/Dockerfile.build

GitHub Actions / FFI alpine-build via docker bake

The 'as' keyword should match the case of the 'from' keyword

FromAsCasing: 'as' and 'FROM' keywords' casing do not match More info: https://docs.docker.com/go/dockerfile/rule/from-as-casing/
ENV CARGO_HOME="/root/.cargo"
WORKDIR /build
# Also, it doesn't understand x86_64-alpine-linux-musl like the OS's cargo.
#RUN rustup-init -y --no-modify-path --default-toolchain stable
FROM alpine_base as alpine_aws_cli

Check warning on line 46 in tools/docker/Dockerfile.build

GitHub Actions / FFI alpine-build via docker bake

The 'as' keyword should match the case of the 'from' keyword

FromAsCasing: 'as' and 'FROM' keywords' casing do not match More info: https://docs.docker.com/go/dockerfile/rule/from-as-casing/
RUN apk add --no-cache aws-cli \
&& rm -rf /var/cache/apk/*
RUN aws --version # Just to make sure its installed alright
FROM alpine_base as alpine_cbindgen

Check warning on line 52 in tools/docker/Dockerfile.build

GitHub Actions / FFI alpine-build via docker bake

The 'as' keyword should match the case of the 'from' keyword

FromAsCasing: 'as' and 'FROM' keywords' casing do not match More info: https://docs.docker.com/go/dockerfile/rule/from-as-casing/
ENV PATH="/root/.cargo/bin:$PATH"
ARG CARGO_BUILD_INCREMENTAL
ARG CARGO_NET_RETRY
ENV CARGO_NET_RETRY="${CARGO_NET_RETRY}"
RUN cargo install cbindgen --version "^0.26" && cargo install bindgen-cli --locked && rm -rf /root/.cargo/registry /root/.cargo/git
FROM alpine_aws_cli as alpine_builder

Check warning on line 59 in tools/docker/Dockerfile.build

GitHub Actions / FFI alpine-build via docker bake

The 'as' keyword should match the case of the 'from' keyword

FromAsCasing: 'as' and 'FROM' keywords' casing do not match More info: https://docs.docker.com/go/dockerfile/rule/from-as-casing/
COPY --from=alpine_cbindgen /root/.cargo/bin/cbindgen /usr/local/bin/cbindgen
COPY --from=alpine_cbindgen /root/.cargo/bin/bindgen /usr/local/bin/bindgen
RUN cargo fetch --locked
# extract cargo cache
FROM --platform=$BUILDPLATFORM scratch as ffi_build_platform_agnostic_cache

Check warning on line 149 in tools/docker/Dockerfile.build

GitHub Actions / FFI alpine-build via docker bake

The 'as' keyword should match the case of the 'from' keyword

FromAsCasing: 'as' and 'FROM' keywords' casing do not match More info: https://docs.docker.com/go/dockerfile/rule/from-as-casing/
COPY --from=ffi_build_platform_agnostic_cache_build /root/.cargo /root/.cargo
COPY --from=ffi_build_platform_agnostic_cache_build /build /build
COPY ./ ./
RUN cargo run --bin release --features profiling,telemetry,data-pipeline,symbolizer,crashtracker,datadog-library-config-ffi --release -- --out /build/output
FROM scratch as ffi_build_output

Check warning on line 166 in tools/docker/Dockerfile.build

GitHub Actions / FFI alpine-build via docker bake

The 'as' keyword should match the case of the 'from' keyword

FromAsCasing: 'as' and 'FROM' keywords' casing do not match More info: https://docs.docker.com/go/dockerfile/rule/from-as-casing/
COPY --from=ffi_build /build/output/ ./