Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
feat(observability): API health checks + GraphQL #3514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(observability): API health checks + GraphQL #3514
Changes from 112 commits
20ac3a2
c2cbbd0
97197d6
6c288cc
897bbe1
6b4ee1e
e2f7e7f
9cf5710
52d0ae6
01d4cb0
d9aa9ff
2b64d39
76c850f
2f1bb86
ccbab9b
b3f512c
f5ebec8
379ee14
691bbd8
41272c6
32da59d
dde38ab
18abf6e
8ae3fb3
1ca1a25
ad19a3a
496649c
c1ab928
5fe26f6
3278d14
14330c8
7e6ea67
c46039e
f0f4b11
3148415
c0bf90f
445f03b
0659541
ade1c8e
acf7458
8c54dba
2d87021
9c98df8
a6f7023
d58fb5f
24ae20e
a1ef3ee
ea192f5
d080f93
f4359d3
274f6ff
6588ed2
f9a9a3e
21f2a49
273cfc8
bfe3a2d
d67f40d
27f52ca
6db6ae5
5d2c83e
33860b6
979c09a
d9dc54e
3069e08
fd60004
4661aa5
6d44bf2
fcd52ea
9a8b3a9
5992a69
494b00f
cde8e77
f19a77a
c16e7cd
10c9235
f7da2b9
a120404
9ccd579
5cc20ea
fae8d1f
e007b4a
11b9805
2ad63ee
e4ff93f
c976bcd
cf7ed9c
0c150f2
cbf18f7
207ea58
025fe2f
e8fb167
0db1bbf
1bcf40c
ceab40f
b3e5aeb
77328f8
794e9a1
67df981
39f7f7b
edf7d81
db84b43
af88a75
14f755d
9caeced
865a39c
7a30886
951ec51
fd738c1
c7889eb
21be95f
8633879
cf60baa
542b328
337d9a5
37ef346
6c27584
c07eda0
4ae9402
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
Large diffs are not rendered by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can come up with a more descriptive name for this feature?
internal-events-api
or something?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's probably a better name for this than just 'api', although my starting intention is that this would become the de facto gateway into general UI <-> Vector observability and grow into more than just events.
One current use-case we're planning for, for example, is retrieving/validating Cargo.toml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To conclude this point: In the absence of a better name, I'll keep this as
api
for now. I do think it'll grow to be more than simply metrics, so trying to keep the nomenclature in sync with its function might be a bit tough; this generalisation may be preferred.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this will eventually expose the actual health of the Vector instance by querying the health of the different active components?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, yeah. Right now, this just asserts that the GraphQL handler is alive. There's also a
/health
that returns{"ok": true}
+ a 200 response.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could probably be simplified a bit with the
serde_json::json
macro. Something along the lines ofThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This macro balances Warp's
.or()
composition, for faster compilation times. For debug builds, it boxes the response for dynamic dispatch, to improve compilation further.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of improvement can we expect from introducing this macro?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the issue that motivated it. Quote:
Thought it was worth a crack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you test this on vector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I would really like to see numbers for our use case before we add such a complex optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, only
events_processed
metrics are consumed. This will evolve to accommodate all metrics types, and will likely be consumed via specific queries rather than the generalisedget_metrics
subscription. I also imagine we'll be able to do additional filtering by params.Since the design of that is TBD and likely to be covered by RFCs, this is just a meantime demonstration that it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than treat the API as a sink, I thought it made sense to utilise the 'global' controller directly. This is called per-subscription, at the point of query, and returns a stream that's specific to that one subscription.
I'm using the
stream!
macro to yield the inner value when the value is of typeEvent::Metric
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason I'd caution against this is that it ties us pretty directly to the current implementation of metrics and doesn't give us a clear path to exposing non-metric data. The idea of making internal metrics a source is that they become less special (e.g. represented by the same data type as user metrics) and any improvements we make to vector's internal observability tools should help improve our users' ability to handle their own metrics.
Basically, internal metrics as a source was an intentional design decision so let's not sidestep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to be an infinite loop (
Interval::tick
cannot fail), so: