Feat : Add aggregation from kube service endpoints feature in metrics API scaler #6565

julianguinard · 2025-02-24T16:16:07Z

TL:DR

Add the ability for metrics-api scaler to get metrics from all endpoint targets of a kubernetes API service (aggregateFromKubeServiceEndpoints: true metadata) and aggregate them using average, sum, min or max aggregation functions (aggregationType metadata), which is a handy feature in an environment where one didn't set up a metric aggregator/scraping stack (i.e prometheus), or simply doesn't want to use their monitoring stack to fetch and serve metrics from customers workload in their own kubernetes clusters, and leave the metrics API's reponsability up to the customer

This PR comes from an issue met when metrics-api scaler targets an internal kubernetes service, such as the one from metrics_api E2E tests but, unlike it, has more than 1 replica serving it, leading to inconsistent HPA average calculation

The more a kubernetes service used by metrics API scaler has pods as targets, the less likely it is for the metrics API scaler to get all metrics from all these pods, leading to inconsistent HPA average metric computation and eventually to scaling issues. Especially in configurations where kubernetes services have "sticky" configurations

Below are screenshots of the modified E2E metrics_api_test.go, commenting out aggregateFromKubeServiceEndpoints: true addition to metadata, being executed & failing because scale in never occurs as every metric from all 10 replicas behind metrics server deployment are not all well taken into account

logs from keda-operator show same value being returned to metrics-api scaler by the kubernetes service (16)

As a consequence, test eventually fails because scaled deployment is unable to scale out within 3 minutes timeframe

Uncommenting aggregateFromKubeServiceEndpoints: true addition to metadata results in metric aggregation mode and same test passes this time

Checklist

When introducing a new scaler, I agree with the scaling governance policy
I have verified that my change is according to the deprecations & breaking changes policy
Tests have been added
Changelog has been updated and is aligned with our changelog requirements
A PR is opened to update our Helm chart (repo) (if applicable, ie. when deployment manifests are modified)
A PR is opened to update the documentation on (repo) (if applicable)
Commits are signed with Developer Certificate of Origin (DCO - learn more)

Helm PR : kedacore/charts#739
Docs PR : kedacore/keda-docs#1541

tests/scalers/metrics_api/metrics_api_test.go

zroubalik · 2025-03-05T13:46:54Z

/run-e2e metrics_api
Update: You can check the progress here

julianguinard · 2025-03-06T09:06:47Z

/run-e2e metrics_api Update: You can check the progress here

@zroubalik thanks for triggering E2E, which ran fine here

…aler Signed-off-by: julian GUINARD <[email protected]>

…s API (was not visible in E2E tests as server ran by ghcr.io/kedacore/tests-metrics-api seems to accomodate it anyway) - metrics_api_test.go : fix getUpdateUrlsForAllMetricAllMetricsServerReplicas() to retry more transient failure cases & fail test after 5 failed retries Signed-off-by: julian GUINARD <[email protected]>

julianguinard · 2025-03-11T15:32:03Z

@zroubalik FYI I added this commit to make E2E more resilient on clusters where launching all metrics servers replicas take a longer time than expected to start : we now sleep 1 second & try 4 more times to validate expected pods before failing

It also fixes formatting of each endpoint URL to reach for aggregating metrics (double "/" in path, did not trigger errors in E2E because server ran by ghcr.io/kedacore/tests-metrics-api seems to cope with it anyway)

JorTurFer · 2025-03-14T22:15:39Z

/run-e2e metrics_api
Update: You can check the progress here

julianguinard · 2025-03-17T08:52:50Z

/run-e2e metrics_api Update: You can check the progress here

@JorTurFer thanks for E2E trigger which all ran fine

waiting for reviews & opinions on this now :)

JorTurFer · 2025-03-17T10:07:30Z

pkg/scalers/metrics_api_scaler.go

+		if err != nil {
+			s.logger.Error(err, "Failed to get kubernetes endpoints urls from configured service URL. Falling back to querying url configured in metadata")
+		} else {
+			if len(endpointsUrls) == 0 {
+				s.logger.Error(err, "No endpoints URLs were given for the service name. Falling back to querying url configured in metadata")
+			} else {
+				aggregatedMetric, err := s.aggregateMetricsFromMultipleEndpoints(ctx, endpointsUrls)
+				if err != nil {
+					s.logger.Error(err, "No aggregated metrics could be computed from service endpoints. Falling back to querying url configured in metadata")
+				} else {
+					return aggregatedMetric, err
+				}
+			}
+		}


I'm not sure if we should fallback here, if a user wants to use the feature and it fails, the fallback can produce unexpected scaling behaviors

semgrep-app bot reviewed Feb 24, 2025

View reviewed changes

tests/scalers/metrics_api/metrics_api_test.go Outdated Show resolved Hide resolved

semgrep-app bot reviewed Feb 24, 2025

View reviewed changes

tests/scalers/metrics_api/metrics_api_test.go Outdated Show resolved Hide resolved

This was referenced Feb 24, 2025

update metrics-api scaler doc related to fetching metrics from kubernetes service kedacore/keda-docs#1541

Open

add get/list/watch operations rights on endpoints to keda-operator kedacore/charts#739

Open

julianguinard force-pushed the add-metrics-api-aggregation-from-kube-service-feature branch 2 times, most recently from 742e19d to be170fd Compare February 25, 2025 09:57

julianguinard marked this pull request as ready for review February 25, 2025 10:31

julianguinard requested a review from a team as a code owner February 25, 2025 10:31

julianguinard changed the title ~~Add aggregation from kube service endpoints feature in metrics API scaler~~ Feat : Add aggregation from kube service endpoints feature in metrics API scaler Feb 25, 2025

Add aggregation from kube service endpoints feature in metrics API sc…

67a3eb4

…aler Signed-off-by: julian GUINARD <[email protected]>

julianguinard force-pushed the add-metrics-api-aggregation-from-kube-service-feature branch from be170fd to 172e504 Compare March 11, 2025 14:46

julianguinard force-pushed the add-metrics-api-aggregation-from-kube-service-feature branch from 172e504 to 61ebd1f Compare March 11, 2025 15:02

JorTurFer reviewed Mar 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat : Add aggregation from kube service endpoints feature in metrics API scaler #6565

Feat : Add aggregation from kube service endpoints feature in metrics API scaler #6565

julianguinard commented Feb 24, 2025 •

edited

Loading

zroubalik commented Mar 5, 2025 •

edited by github-actions bot

Loading

julianguinard commented Mar 6, 2025 •

edited

Loading

julianguinard commented Mar 11, 2025

JorTurFer commented Mar 14, 2025 •

edited by github-actions bot

Loading

julianguinard commented Mar 17, 2025

JorTurFer Mar 17, 2025

Feat : Add aggregation from kube service endpoints feature in metrics API scaler #6565

Are you sure you want to change the base?

Feat : Add aggregation from kube service endpoints feature in metrics API scaler #6565

Conversation

julianguinard commented Feb 24, 2025 • edited Loading

Checklist

zroubalik commented Mar 5, 2025 • edited by github-actions bot Loading

julianguinard commented Mar 6, 2025 • edited Loading

julianguinard commented Mar 11, 2025

JorTurFer commented Mar 14, 2025 • edited by github-actions bot Loading

julianguinard commented Mar 17, 2025

JorTurFer Mar 17, 2025

Choose a reason for hiding this comment

julianguinard commented Feb 24, 2025 •

edited

Loading

zroubalik commented Mar 5, 2025 •

edited by github-actions bot

Loading

julianguinard commented Mar 6, 2025 •

edited

Loading

JorTurFer commented Mar 14, 2025 •

edited by github-actions bot

Loading