-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating an alert with Slack notifications errors out due to rate-limiting in large Slack workspaces #32480
Comments
Hey there, @Usiel! I'm here to assist you with any bugs, questions, or contributions you have for Superset. Let's tackle this issue together! To address the rate-limiting issue when retrieving Slack channels in large workspaces, you can consider the following approaches:
These strategies can help mitigate the rate-limiting issue and improve the user experience when configuring Slack notifications in Superset. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other |
@Usiel how many channels does your slack workspace have. Unfortunately some of this is limited by the slack api itself since they only allow 1k in their api response with pagination and from the internet, it sounds like it may be rate limited to about 20 requests per minute. I think we don't currently look at archived channels but if we do, we can probably just add a flag, but if we aren't doing that a quick fix might be to add the flag as true according to the doc: https://api.slack.com/methods/conversations.list |
We have around 20k channels, half of them archived. Interestingly, just now when I wanted to do some more probing, iterating all conversations took just a few seconds and only 16 requests. I can see in the logs that yesterday iterating all channels took 6 minutes and more than 200 (successful) requests, each with a distinct cursor (so these were not retried due to connectivity or similar). We saw this issue on 2 different Superset deployments in different datacenters owned by different teams (but connecting to the same Slack workspace). Assuming this was just a temporary issue with Slack's API, the more extensive changes listed under "Options" are probably not necessary. Imo, the improvement to Superset's |
I had an hourly celery task running Superset's I'm OK to close this issue unless someone else encountered the same degradation. I will see about contributing the Slack rate-limit handler improvement soon-ish. |
While investigating apache#32480 I noticed that Superset's slack client does not handle rate-limiting. Although the original issue may have been intermittent, I still believe it will be useful to handle rate-limiting to make Slack operations with Superset more robust. The rate-limit is easily triggered by 2 users trying to change a Slack alert with a large Slack workspace (multiple `conversations.list` requests with rate-limit 20 per minute), or lots of reports firing at the same time. The `RateLimitErrorRetryHandler` feature is available since slack_sdk>=3.9.0 (Superset requires at least 3.19.0).
While investigating apache#32480 I noticed that Superset's slack client does not handle rate-limiting. Although the original issue may have been intermittent, I still believe it will be useful to handle rate-limiting to make Slack operations with Superset more robust. The rate-limit is easily triggered by 2 users trying to change a Slack alert with a large Slack workspace (multiple `conversations.list` requests with rate-limit 20 per minute), or lots of reports firing at the same time. The `RateLimitErrorRetryHandler` feature is available since slack_sdk>=3.9.0 (Superset requires at least 3.19.0).
I encountered the problem again and realized my test (hourly celery task) inadvertently only requested public channels. I can reliably reproduce the issue when requesting private and public channels. The bot is member of just a single private channel. Iterating the full list takes again around 5-7 minutes. For further testing I ran the
Imo, this is an issue, I just don't know whether this affects many other Superset users :) I doubt Slack will optimize their API. |
See apache#32480 This changes introduces a cached `get_channels` function for Slack conversations. Large workspaces require 5-10 minutes to retrieve all conversations.
Potential fix for this might be: #32529 |
Thank you, yes that looks promising and fixes it pretty much (well, not for the first user - they have to wait a few minutes 😆). I think we'd be golden if we exposed the caching function as a celery task, so it can run be part of the beat schedule. What do you think? |
Bug description
Preconditions: Be connected to a Slack workspace with thousands of channels and even more archived channels.
Actual: The request to
/api/v1/report/slack_channels/?...
errors out with a rate-limit error from the Slack client.Expected: User can select a channel/group without much delay.
Screenshots/recordings
Superset version
4.1.1
Python version
3.11
Node version
Not applicable
Browser
Chrome
Additional context
Why?
From what I could see this is due to Slack's implementation of the
conversation.list
endpoint. When the connected workspace has lots of archived channels, then the cursor will return far less conversations than specified in the request (usinglimit=999
). This means we end up doing a lot of requests to fully iterate all conversations, eventually running into the rate-limit.I applied a diff to properly handle rate-limit errors. This allows the request to eventually finish (6 minutes for our workspace) but it's not a great user experience 😆 (It may still be useful to add the rate-limit handler to preempt other Slack API issues).
Options
Checklist
The text was updated successfully, but these errors were encountered: