Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce build and test coverage to cope with Azure limits #1994

Merged
merged 2 commits into from
Jul 10, 2023

Conversation

gctucker
Copy link
Contributor

@gctucker gctucker commented Jul 5, 2023

As we're moving to the new Azure subscription we're currently limited in terms of build and network bandwidth capacity. As such, reduce builds by disabling allmodconfig and reducing standard tree coverage to the minimal variants. Also reduce test coverage to only have a couple of devices running each test plan to minimise the downloads from storage.

This was referenced Jul 5, 2023
@broonie
Copy link
Member

broonie commented Jul 5, 2023 via email

@nuclearcat
Copy link
Member

Azure CDN prices are almost same as egress, they have just better geographical proximity to users:
https://azure.microsoft.com/en-us/pricing/details/bandwidth/
Azure -> N.America/Europe $0.087 per GB
https://azure.microsoft.com/en-us/pricing/details/cdn/
$0.081 per GB and much higher, depends...

On my opinion to reduce costs we have following options:
1(best)) Hosting storage.*kernelci.org on hetzner server:
https://www.hetzner.com/dedicated-rootserver/sx64
96.75 Eur, unmetered 1Gbit. They have add-on 10Gbit and charging $1/TB($0.001/GB) for overusage of allocated quota.
This might be complex due frontend/backend configuration, but maybe we can sort this out

2)Caching proxy on this hetzner server, so any file from storage.* served only once. Still might reduce bandwidth a lot.
3)Cloudflare pro + Cloudflare R2 https://www.cloudflare.com/products/r2/

Maybe there is more options, i am not sure...

@gctucker
Copy link
Contributor Author

gctucker commented Jul 5, 2023

I really think this is an interim solution to avoid hitting some limits inadvertently, and tbh the kernel builds are probably costing us much more than the binary downloads. But we first need to bring the costs to the bare minimal to ensure continuity of the main services and there's no doubt we'll find a solution in the coming weeks to bring things back to normal. Making efficient use of the available resources is important in any case, aside from how much things cost it will lead to better performance overall.

Let's see how things go with the changes in this PR with the full linux-next build on staging this weekend and probably we'll already be able to find a balance in between the full set we had and this next week even without changing how the infrastructure is setup. Then if things go well we'll end up with a more efficient config as well as some long-term sustainable amount of resources.

@nuclearcat
Copy link
Member

Actually Azure(and other cloud services) is not feasible (unless you are very rich corporation or startup with lot of funding) even for transfer volumes we have, i believe optimizing storage and egress required for the long-term.

@broonie
Copy link
Member

broonie commented Jul 6, 2023

Ah, that's a shame with the cache service - for AWS CloudFront in front of a S3 buckets actually super cost effective - I'm not paying any bandwidth costs for the binaries I serve up to my lab for my CI (uploads to S3 are free, transfers from S3 to CloudFront are free and even when I get out of the free tier on the CDN it's their cheapest bandwidth IIRC).

@nuclearcat
Copy link
Member

I am not sure on exact numbers on our egress, because for example we have transfers of sources to GKS nodes (classified as egress), serving kernels to labs and etc, but looking to just number on production instance eth0 - it might be quite significant traffic, 5-8TB/week.
Even AWS Cloudfront it is $0.080/GB.
It is really hard to beat good dedicated servers/colo offers, we are in that zone when having such server might be justified, as we are not earning on distributing this files.
Unfortunately i dont have any kind of logs to estimate numbers, but if our bandwidth costs on Azure are over $2000, and we can save half, we can get dedicated server with 10G port on something like fdcservers(10G unmetered is below $1k) or voxility($6.26/TB).
Even if less we can try to get away with hetzner, they have unmetered 1Gbit/s offers (which might be enough for us), or 1TB/$1 10G offers.
If we really insist on staying on S3 compatible managed services, then maybe we should look at all options, including linode, DO, vultr.
Linode:

Egress transfer costs is free up to 1TB then $.01/GB, AWS egress transfer $.09/GB with first 1GB free, GCP egress transfer $.12/GB.

Vultr:

 $10 per additional TB transferred  (thats $0.01/GB)

(And they have pretty significant free tier)

But even with external server generating content things might be complicated and egress charges will be significant. For example our build K8S clusters will upload binaries elsewhere, this is still egress.

@gctucker
Copy link
Contributor Author

gctucker commented Jul 6, 2023

This PR is now producing the kind of discussions we need for kernelci/kernelci-api#9 ;)

@broonie
Copy link
Member

broonie commented Jul 6, 2023

Right, a CDN is definitely not an ideal solution once you get too far over the free tier - I was just thinking that they're really simple and non-invasive to enable so if the pricing worked out with Azure it might've helped mitigate things with little effort. It seems like there's not enough of a free/cheap tier with them to be relevant for us sadly :(

gctucker added 2 commits July 10, 2023 10:56
While we're transitioning the Azure resources to a new subscription,
we need to drastically reduce the build load in order to keep the
costs under control.  This is meant to be a temporary measure,
although some trees might need to stay on minimal variants permanently
as a general opmisation effort.

Signed-off-by: Guillaume Tucker <[email protected]>
While we're transitioning to the new Azure subscription, reduce the
test coverage to the bare minimal to minimise bandwidth usage in
downloads.  This is meant to be a short-term interim measure to keep
the costs under control until we have a new sustainable solution.

Signed-off-by: Guillaume Tucker <[email protected]>
@gctucker gctucker force-pushed the minimal-config-azure-limits branch from 5116230 to d90acbc Compare July 10, 2023 08:57
@gctucker
Copy link
Contributor Author

There was still a couple of allmodconfig left as an oversight, fixed that now. Otherwise the linux-next results from the staging weekend run are available here:
https://staging.kernelci.org/build/next/branch/master/kernel/next-20230707/

@gctucker
Copy link
Contributor Author

Nobody has replied to the email thread or mentioned any blocking issue with this PR so it looks like it's all ready to go for today's production update.

@gctucker gctucker added this pull request to the merge queue Jul 10, 2023
Merged via the queue into kernelci:main with commit 491a3db Jul 10, 2023
@gctucker gctucker deleted the minimal-config-azure-limits branch July 10, 2023 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants