Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling EsExecutors with core size 0 might starve work due to missing workers #124667

Open
mosche opened this issue Mar 12, 2025 · 1 comment
Open
Assignees
Labels
blocker >bug :Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team

Comments

@mosche
Copy link
Contributor

mosche commented Mar 12, 2025

This was discovered in cases where masterService#updateTask had work enqueued, but no worker to process it.

The root cause of the issue is a bug in EsExecutors. When the pool core size is set to 0 and max pool size is 1 (though, also possible with a higher max pool size, but less likely), EsThreadPoolExecutor sometimes fails to add another worker to execute the task because we're already at the max pool size (expected). However, in rare cases, a single worker thread (or threads) can time out at about the same time (based on their keepAliveTime) when then queueing the new task via ForceQueuePolicy (triggered by the initial rejection as we failed to add a worker). Unless more tasks are submitted later (which is not the case for masterService#updateTask), this task will starve in the queue without any worker to process it.

Respective code in EsExecutors is old and unchanged. We were able to reproduce the bug on main using Java 21, 22, 23 as well as 8.0 using Java 17. Likely the same is possible for older versions of ES.

It looks as if the bug is triggered more frequently with more recent versions of the JDK, but this might just be an observation bias as we haven't been aware of this bug earlier.

@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Mar 12, 2025
@mosche mosche added blocker :Core/Infra/Core Core issues without another label >bug and removed needs:triage Requires assignment of a team area label labels Mar 12, 2025
@mosche mosche self-assigned this Mar 12, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Mar 12, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker >bug :Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team
Projects
None yet
Development

No branches or pull requests

2 participants