Consider optimizing pods call to Kubernetes API Server so that it becomes efficient and scalable #6676

JohnRusk · 2023-01-12T00:06:06Z

Is your feature request related to a problem? Please describe.
There are known issues with Fluent Bit on large Kubernetes clusters. The workaround is to enable the use_kubelet setting. But that's not enabled by default, and so users can run into scaling problems without understanding why.

Describe the solution you'd like
Optimization of the default behaviour (in which Fluent Bit calls the Kube API Server) so that the workaround becomes less necessary, and maybe unnecessary.

This might be easy. It depends on how fresh the listed data must be. Right now, it appears that Fluent Bit is using default query string parameters, and therefore the query is being served from etcd. This is expensive in terms of performance. However, if FluentBit instead indicated that it would accept data from the API server's cache, performance would be much better.

In particular, FluentBit already appears to be setting fieldSelector=spec.nodeName=.... in the query string when listing pods. There is a special optimization, which makes that highly performant in Kubernetes... but only when the data is served from the API Server's cache. That optimized cache read path is exactly how all the kubelets in a large cluster can efficiently read pods. So it would be good if Fluent Bit was changed to also use that path. The only change necessary is to signal to Kubernetes that cached data will be accepted.

As for whether it would be appropriate to use cached data, that's something only the FluentBit team can judge.

For more info, please see this new section of the K8s FAQ: https://github.com/kubernetes/community/blob/master/sig-scalability/configs-and-limits/faq.md#how-should-we-code-client-applications-to-improve-scalability. Points 4, 6, and 7 are the most relevant to Fluent Bit.

The text was updated successfully, but these errors were encountered:

github-actions · 2023-04-12T01:55:33Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

JohnRusk · 2023-04-12T04:50:54Z

Hey bot, please don't close this one.

github-actions · 2023-07-12T02:10:33Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions · 2023-07-17T02:14:03Z

This issue was closed because it has been stalled for 5 days with no activity.

shuaich · 2024-04-05T22:16:26Z

Thanks John for proposing this. +1 on optimizing querying pod labels from api-server.

Querying '/pods' endpoint via secure 10255 port is not always feasible due to security concerns. Granting permission to resource nodes/proxy imposes security risks.

One optimization direction is to use list-watch mechanism. Unfortunately, there is no shared informer in C client library. We can a http client or k8 C client with a callback to handle incremental pod events.

Happy to chat and collaborate more on this proposal. @JohnRusk

shuaich · 2024-04-05T22:41:12Z

This is a good example of using k8s C client to query pod information with list-watch: https://github.com/kubernetes-client/c/blob/07648eda6118449de94354d9deb6611cdd19d4e6/examples/watch_list_pod/main.c

I am looking into if this is feasible to use a http client to do the same.

JohnRusk · 2024-04-07T22:12:15Z

@shuaich Oh, I did not realise that there was no shared informer in the C client library. That's inconvenient. Yes, I agree that finding a way to call list-watch sounds like a good idea.

Thanks for the suggestion that we collaborate. My C skills are almost non-existent, so I can't offer to help with code I'm sorry. But happy to chat here about ideas if needed.

JohnRusk mentioned this issue Jan 12, 2023

[filter_kube] Rely on /pods endpoint of kubelet #1948

Closed

github-actions bot added the Stale label Apr 12, 2023

github-actions bot removed the Stale label Apr 13, 2023

github-actions bot added the Stale label Jul 12, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider optimizing pods call to Kubernetes API Server so that it becomes efficient and scalable #6676

Consider optimizing pods call to Kubernetes API Server so that it becomes efficient and scalable #6676

JohnRusk commented Jan 12, 2023

github-actions bot commented Apr 12, 2023

JohnRusk commented Apr 12, 2023

github-actions bot commented Jul 12, 2023

github-actions bot commented Jul 17, 2023

shuaich commented Apr 5, 2024

shuaich commented Apr 5, 2024

JohnRusk commented Apr 7, 2024

Consider optimizing pods call to Kubernetes API Server so that it becomes efficient and scalable #6676

Consider optimizing pods call to Kubernetes API Server so that it becomes efficient and scalable #6676

Comments

JohnRusk commented Jan 12, 2023

github-actions bot commented Apr 12, 2023

JohnRusk commented Apr 12, 2023

github-actions bot commented Jul 12, 2023

github-actions bot commented Jul 17, 2023

shuaich commented Apr 5, 2024

shuaich commented Apr 5, 2024

JohnRusk commented Apr 7, 2024