Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: AWS S3 Support (with IRSA option) #2898

Open
1 task done
tip-dteller opened this issue May 29, 2024 · 10 comments · May be fixed by #6142
Open
1 task done

Enhancement: AWS S3 Support (with IRSA option) #2898

tip-dteller opened this issue May 29, 2024 · 10 comments · May be fixed by #6142
Assignees
Labels
✨ enhancement New feature or request

Comments

@tip-dteller
Copy link

tip-dteller commented May 29, 2024

What features would you like to see added?

When generating images or even python plots, images are stored to disk, which can be mounted to any location.
IconURL however points to anything external .i.e Github.
So when trying to use an S3 address, inherently it fails because the call never goes to AWS, but instead goes through outside the network. (Private trying to access Public).

This was observed in Kubernetes deployment of LibreChat.

Also I believe that storing images on s3, for all intents and purposes, would be better for historical purposes.
If a pod or container suddenly goes down, and the images aren't mapped properly you'd simply lose them.

Can you please add support for IRSA?
I know this requires some code additions.

More details

Example Trust Policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::{AWS_ACCOUNT}:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/{EKS_OIDC}"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "oidc.eks.us-east-1.amazonaws.com/id/{EKS_OIDC}:sub": "system:serviceaccount:librechat:librechat",
                    "oidc.eks.us-east-1.amazonaws.com/id/{EKS_OIDC}:aud": "sts.amazonaws.com"
                }
            }
        }
    ]
}

Example Policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObjectAcl",
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::my-example-librechat-bucket/*",
                "arn:aws:s3:::my-example-librechat-bucket"
            ]
        }
    ]
}

The user that will deploy, ideally. should have base knowledge of K8s and how to utilize a serviceAccount.

Which components are impacted by your request?

General

Pictures

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@tip-dteller tip-dteller added the ✨ enhancement New feature or request label May 29, 2024
@danny-avila
Copy link
Owner

firebase is already an option for images, from which you can apply similar policies, and S3 support is planned.

It would follow the same dependency injection pattern as firebase.

Since the main concern seems to be the ability to store and retrieve images using AWS S3, which is planned, I'm renaming this issue as such. I will keep IRSA in mind as an authentication method to implement for this.

@danny-avila danny-avila changed the title Enhancement: IRSA Support Enhancement: AWS S3 Support (with IRSA option) May 29, 2024
@jameslamine
Copy link
Contributor

Our company needs this for file uploads. S3 is generally what we use, and we don't support Firebase or Kubernetes persistent volumes.

@rubentalstra rubentalstra self-assigned this Mar 3, 2025
@rubentalstra
Copy link
Collaborator

@tip-dteller @jameslamine please have a look at my PR will this work for you both? I'm happy to receive some feedback.

@d-teller
Copy link

d-teller commented Mar 4, 2025

@rubentalstra looks good! Thank you!
Can you add support for AWS ROLE
Specifying secret and access key isnt very secure :)

@rubentalstra
Copy link
Collaborator

@d-teller AWS ROLE? Can you explain it a little more?

@d-teller
Copy link

d-teller commented Mar 4, 2025

@rubentalstra AWS, apart from users, access keys and secret keys. You should utilize roles.
Roles have policies just like users do, but their token (one they get from AWS token service) is short lived - can be refreshed every hour.
This better than using a hardcoded secret key inside the env var.

Roles can be attached to ec2 machines, ecs, k8s service account.

The code needs to be abls to draw that information.

Here's a generated claude example
Here's a short example of how to use AWS roles in TypeScript:

import { AssumeRoleCommand, STSClient } from "@aws-sdk/client-sts";
import { S3Client, ListBucketsCommand } from "@aws-sdk/client-s3";

async function assumeRoleAndListBuckets() {
  // Initialize STS client
  const stsClient = new STSClient({ region: "us-east-1" });

  // Define parameters for assuming a role
  const params = {
    RoleArn: "arn:aws:iam::123456789012:role/example-role",
    RoleSessionName: "example-session",
    DurationSeconds: 3600, // 1 hour
  };

  try {
    // Assume the role
    const assumeRoleResponse = await stsClient.send(new AssumeRoleCommand(params));
    
    if (!assumeRoleResponse.Credentials) {
      throw new Error("Failed to obtain credentials");
    }

    // Create S3 client with temporary credentials from assumed role
    const s3Client = new S3Client({
      region: "us-east-1",
      credentials: {
        accessKeyId: assumeRoleResponse.Credentials.AccessKeyId!,
        secretAccessKey: assumeRoleResponse.Credentials.SecretAccessKey!,
        sessionToken: assumeRoleResponse.Credentials.SessionToken!,
      },
    });

    // Use the temporary credentials to list S3 buckets
    const bucketsResponse = await s3Client.send(new ListBucketsCommand({}));
    console.log("Buckets:", bucketsResponse.Buckets);

    return bucketsResponse.Buckets;
  } catch (error) {
    console.error("Error:", error);
    throw error;
  }
}

// Call the function
assumeRoleAndListBuckets()
  .then((buckets) => console.log(`Found ${buckets?.length || 0} buckets`))
  .catch((err) => console.error("Operation failed:", err));

This example demonstrates:

  1. Using AWS SDK v3 for TypeScript
  2. Assuming a role using STS (Security Token Service)
  3. Using temporary credentials from the assumed role to perform an S3 operation

@rubentalstra
Copy link
Collaborator

rubentalstra commented Mar 4, 2025

@d-teller thank you for the explanation. but are these roles not assigned to the machine it self? and not the application?

this is what I have written so far for the IRSA and non-IRSA

the docs I written: https://github.com/LibreChat-AI/librechat.ai/blob/fc0ce22a53f5b87732ea2b3b1a08571b87a54b83/pages/docs/configuration/cdn/s3.mdx

const { S3Client } = require('@aws-sdk/client-s3');
const { logger } = require('~/config');

let s3 = null;

/**
 * Initializes and returns an instance of the AWS S3 client.
 *
 * If AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are provided, they will be used.
 * Otherwise, the AWS SDK's default credentials chain (including IRSA) is used.
 *
 * @returns {S3Client|null} An instance of S3Client if the region is provided; otherwise, null.
 */
const initializeS3 = () => {
  if (s3) {
    return s3;
  }

  const region = process.env.AWS_REGION;
  if (!region) {
    logger.error('[initializeS3] AWS_REGION is not set. Cannot initialize S3.');
    return null;
  }

  const accessKeyId = process.env.AWS_ACCESS_KEY_ID;
  const secretAccessKey = process.env.AWS_SECRET_ACCESS_KEY;

  if (accessKeyId && secretAccessKey) {
    s3 = new S3Client({
      region,
      credentials: { accessKeyId, secretAccessKey },
    });
    logger.info('[initializeS3] S3 initialized with provided credentials.');
  } else {
    // When using IRSA, credentials are automatically provided via the IAM Role attached to the ServiceAccount.
    s3 = new S3Client({ region });
    logger.info('[initializeS3] S3 initialized using default credentials (IRSA).');
  }

  return s3;
};

module.exports = { initializeS3 };

@tip-dteller
Copy link
Author

tip-dteller commented Mar 4, 2025

@rubentalstra roles offer a way to communicate with AWS STS service, so that you get temporary credentials, hence roles are assigned to EC2 machines (as instance profile option), to ECS via execution role and to K8s via IRSA.
so it seems from your code, if Access and Secret keys are supplied it should assume the default chain with IRSA being one of them.

@chemeris
Copy link

chemeris commented Mar 5, 2025

Do I understand correctly that with this PR we'll be able to get Code Interpreter to read/write files from/to S3, so we could e.g. store input CSVs for Code Interpreter and output plots from it in S3?
Will we be able to reference uploaded files from S3 in the chat context?
And will we be able to read/write these files from MCP tools?

Sorry for many questions, I'm trying to understand whether this PR covers everything we're looking for or not.

@danny-avila
Copy link
Owner

Do I understand correctly that with this PR we'll be able to get Code Interpreter to read/write files from/to S3, so we could e.g. store input CSV's for Code Interpreter and output plots from it in S3?

Yes but let me clarify. For the Code Interpreter to read files, it needs to be uploaded to its internal ephemeral storage. The actual file is uploaded locally (currently, only Firebase as the alternative, but with this update, also S3) to re-upload once those file references expire in the API's ephemeral storage. Outputs from the API are then stored locally/Firebase/S3.

Will we be able to reference uploaded files from S3 in the chat context?

In general, only images are currently referenced as is (static) in the chat context. When uploading to File Search (RAG API), the file is not kept and transformed to vectors, and I already explained the Code Interpreter behavior.

And will we be able to read/write these files from MCP tools?

No, MCP tools have opinionated way of updating files and mainly corresponds to MCP resources.


In general, this update mainly concerns storage of images and code interpreter files for re-upload, and eventually other file types that would be stored locally by default, as there are upcoming features to support general file storage for extended use cases such as parsing text, and uploading files directly to LLM providers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨ enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants