-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fargate/ECS] [Image caching]: provide image caching for Fargate. #696
Comments
@matthewcummings can you clarify which doc you're talking about ("The doc is horrific")? Can you also clarify which regions your Fargate tasks and your ECR images are in? |
@jtoberon can we have these kinds of things in every region? I generally use us-east-1 and us-west-2 these days. |
It seems better now https://docs.aws.amazon.com/AmazonECR/latest/userguide/vpc-endpoints.html. It has been updated from what I can see. However, it still feels like a leaky abstraction. I'd argue that I shouldn't need to know/think about S3 here. Nowhere else in the ECS/EKS/ECR ecosystem do we really see mention of S3. It would be great if the S3 details could be "abstracted away". |
Regarding regions, I'm really asking whether you're doing cross-region pulls. You're right: this is a leaky abstraction. The client (e.g. docker) doesn't care, but from a networking perspective you need to poke a hole to S3 right now. Regarding making all of this easier, we plan to build cross-region replication, and we plan to simplify the registry URL so that you don't have to think as much about which region you're pulling from. #140 has more details and some discussion. |
Ha ha, thanks. Excuse my snarkiness. . . I am not doing cross-region pulls right now but that is something I may need to do. |
@jtoberon your call on whether this should be a separate request or folded into the other one. |
Wait, aren't you really asking for This was added (it seems) to ECS EC2 in 2018: Agent config docs. I get the impression Fargate does not give control over that, and does not have it set to |
@ronkorving yes, that's exactly what I've requested. I wasn't aware of the ECS/EC2 feature. . . thanks for pointing me to that. However, a Fargate option would be great. I'm going to update the request. |
much needed indeed this caching option for fargate |
I would like to upvote this feature too. |
How's this evolving? There are many use cases where what you need is just a Lambda with unrestricted access to a kernel / filesystem. Having Fargate with cached / hot images perfectly fits this use case. |
@jtoberon @samuelkarp I realize that this is a more involved feature to build than it was on ECS with EC2 since the instances are changing underneath across AWS accounts, but are you able to provide any timeline on if and when this image caching would be available in Fargate? Lambda eventually fixed this same cold start issue with the short-term cache. This request is for the direct analog in Fargate. Our use case: we run containers on-demand when our customers initiate an action and connect them to the container that we spin up. So, it's a real-time use case. Right now, we run these containers on ECS with EC2 and the launch times are perfectly acceptable (~1-3 seconds) because we cache the image on the EC2 box with We'd really like to move to Fargate but our testing shows our Fargate containers spend ~70 seconds in the We have to make some investments in the area soon so I am trying to get a sense for how much we should invest into optimizing our current EC2-based setup because we absolutely want to move to Fargate as soon as this cold start issue is resolved. As always, thank you for your communication. |
I wish Fargate could have some sort of caching. Due to lack of environment variables my task just kept falling during all weekend. And every restart meant that new image will be downloaded from docker hub. In the end I've faced with horrible traffic usage, since Fargate had been deployed within private VPC. |
@Brother-Andy For this use-case, I built cdk-ecr-sync which syncs specific images from DockerHub to ECR. Doesn't solve the caching part but might reduce your bill. |
Ditto on the feature. We use containers to spin-off cyber ranges for students. Usage can fluctuate from 0 to thousands, Fargate is the best solution for ease of management, but the launch time is a challenge even with ECR. Caching is a much-needed feature. |
+1 |
1 similar comment
+1 |
Same here, I need to run multiple Fargate cross-region and it takes around a minute to pull the image. Once pulled, the task only takes 4 seconds to run. This completely stops us from using Fargate. |
we had the same problem, the Fargate task should take only 10 seconds to run but it takes like a minute to pull the I image :( |
Is that possible to use EFS file system to store image and the task just run this image? Or that is the same question of pulling from EFS to VPS which storing the container? |
Azure is solving this problem in their plataform |
+1 we run a very large number of tasks and 1GB image. This would significantly speed up our deploys and would be a super helpful feature. We're considering moving to EC2 due to Fargate deployment slowness and this is one of the factors. |
Currently using Gitlab Runner Fargate driver which is great, except for the spinup time ~1-2 minutes for our image (> 1gb) because it has to pull it from ECS for every job. Not super great. Would really like to see some sort of image caching. |
@gregtws Can you confirm that it is really that fast on EC2 instances for ECS? We are currently run on Fargate and it takes ~60s for our ~1GB images. From everything I read so far, EC2 based ECS will not be much faster with scheduling, this is why I didn't consider this as a solution. If we can get from ~60s to below 10s that would be awesome and a move to EC2 worthwhile. |
@benben We run containers on ECS backed by EC2 instances. We use the setting to use cached container images on the EC2 instances. ECR reports our container image to be ~1.5GB. When we run ECS tasks with these containers, they start up in 1 second or so when cached. It takes 60 seconds or so to download an image fresh to the EC2 instance. So, we download the image as part of our start-up script for EC2 so that by the time the EC2 instance is added to the ECS cluster, the image is already cached. |
@fitzn thank you for sharing! That sounds awesome. I was under the impression that most of the time is spend on AWS internal scheduling 🔮 but if this is the case, I definitely move things to EC2. Would you be able to share that script which downloads the images on EC2 startup? Thanks again! |
Zero excitement with this release. Quite the opposite. I was very disappointed when I understood what it was... I was misled by the fact the SOCI feature was posted in this issue, which is related to caching (something else entirely). |
I guess there are even people out there that would pay for a "real" caching solution with Fargate. How about a third type of capacity provider that has access to EFS backed file servers for the caching part? 🤔 I mean it seems to me that there is a huge architectural issue with offering a real pull through caching solution with Fargate. Maybe it would make things easier if the cache would only be needed for a rather limited amount of Fargate hosts. |
Yes. I just checked some of our ECS on EC2 services and I'm seeing sub 10s start to running when it hits a warm cache. The sampled container was ~1gb according to the ECR repo stats. The risk of course is randomly you hit a cold cache either because the image changed, its a new instance (autoscaling), or the cache was purged (unusual). In your particular use case, YMMV since there are variables outside of control like whether you need to attach an ENI, your apps init time, etc. This is on a Nitro based system fwiw. |
We have as similar solution. When our EC2 images start we pull our base image which is a chunky 20GB (AI/Robotics libs are big). Different images are built on top of that so different layers may or may or not then get pulled in automatically by the runtime. I can't share out code because of IP but it boils down to something like the below inside cloud-init
|
@benben Yeah it's something like this: # Configure caching of images
cat <<'EOF' >> /etc/ecs/ecs.config
ECS_CLUSTER=ourcluster
ECS_IMAGE_PULL_BEHAVIOR=once
ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=30m
ECS_IMAGE_CLEANUP_INTERVAL=24h
EOF
# Get the ECR sign-in credentials
curl -o docker-credential-ecr-login https://amazon-ecr-credential-helper-releases.s3.us-east-2.amazonaws.com/0.6.0/linux-amd64/docker-credential-ecr-login
chmod +x docker-credential-ecr-login
mv docker-credential-ecr-login /usr/bin/
# Tell Docker to use ECR login credentials
DockerDir="/home/ec2-user/.docker"
DockerConfig="${DockerDir}/config.json"
mkdir -p $DockerDir
echo "{ \"credsStore\": \"ecr-login\" } " > $DockerConfig
chmod a+rw $DockerConfig
# Download the image
# Optional: read "v1" from somewhere (or use the current date) to dynamically get a newer image version.
su -c "docker pull ourrepository.com/ourimage:v1" - ec2-user |
For everyone interested in SOCI: I added it to our github actions build pipeline and did not see any improvements. Here are the measurements after deployment:
I had to give up our zstd compression which was another recommendation from AWS since it is not compatible yet and had to do quite some hoops to get there since soci is incompatible to the current docker version in github actions so I had to repull everything into containerd. |
@benben can you share some general details about the tasks that you tested with? The pulls times look very similar with and without SOCI which is unusual. It could be that SOCI isn't being used for some reason. (For example, SOCI won't be used if you have a logging container in your task that doesn't have a SOCI index. Currently Fargate requires that all images in the task have a SOCI index to use SOCI). Either way it would be helpful for more details so we can investigate this. |
Thank you for that info! I must have missed that in the docs. We have another container running to route logs with firelens. I added SOCI there too and now it worked.
PS: Sorry to everyone else for being slightly offtopic here. |
A caching feature is indeed needed in Fargate for me as well! I have a streamlit app with yolo model. After trimming dependencies and compressing via zstd my docker container is circa 690MB on ECR and takes about 55s to run a task on ECS Fargate. |
If CDK can one line abstract SOCI away for private ECRs, I'm all for it as an intermediate solution 👍🏼 |
We really have the same problem with Fargate. it cost us a lot of money. so it will be awesome if it can be done for Fargate. |
For everyone here needing cross-region pull to fargate - please vote on: ECR to ECR pull-through cache: #2208. This will address all use cases, not only fargate. |
This feature will be great. I just started working in Fargate and am developing an on-demand HLS streaming solution using gstreamer and a SaaS repository that stores the original mp3 and mp4s. The image pull from ECR is the biggest bottleneck otherwise the performance is terrific. I did a multi-stage build and SOCI helps a lot but still my image pull every time is around 15 seconds but a huge improvement over the 50+ seconds I was having. |
big need |
drop some cache please |
Almost five years and still not nothing. Big need. |
SOCI cut down my launch time by nearly 50%, but it still takes around 1 minute for the task to launch. There should be a better option for folks with large images. I have stripped down my image as much as possible. |
Interested! |
@fish-not-phish Hi, can you please share some details of what stack do you use? |
EDIT: as @ronkorving mentioned, image caching is available for EC2 backed ECS. I've updated this request to be specifically for Fargate.
What do you want us to build?
I've deployed scheduled Fargate tasks and been clobbered with high data transfer fees pulling down the image from ECR. Additionally, configuring a VPC endpoint for ECR is not for the faint of heart. The doc is a bit confusing.
It would be a big improvement if there were a resource (network/host) local to the instance where my containers run which could be used to load my docker images.
Which service(s) is this request for?
Fargate and ECR.
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
I don't want to be charged for pulling a Docker image every time my scheduled Fargate task runs.
On that note the VPC endpoint doc should be better too.
Are you currently working around this issue?
This was for a personal project, I instead deployed an EC2 instance running a cron job, which is not my preference. I would prefer using Docker and the ECS/Fargate ecosystem.
The text was updated successfully, but these errors were encountered: