Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support external OIDC identity provider #1483

Closed
bseenu opened this issue Jul 15, 2021 · 23 comments
Closed

Support external OIDC identity provider #1483

bseenu opened this issue Jul 15, 2021 · 23 comments

Comments

@bseenu
Copy link

bseenu commented Jul 15, 2021

Is your request related to a new offering from AWS?

yes , aws now supports external oidc identity provider - https://aws.amazon.com/blogs/containers/introducing-oidc-identity-provider-authentication-amazon-eks/

https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_identity_provider_config

Is your request related to a problem? Please describe.

I am trying associating the oidc identity provider as post step after the cluster is created and is taking a long time
ref: aws/containers-roadmap#1438

Describe the solution you'd like.

the external oidc provider is associated with the eks cluster during the build time so that the api server is not restarted to associate the oidc provider

@daroga0002
Copy link
Contributor

daroga0002 commented Aug 31, 2021

Looking into docs this is using resource https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_identity_provider_config

so you can achieve this outside module by just:

resource "aws_eks_identity_provider_config" "example" {
  cluster_name = module.eks_cluster.cluster_id

  oidc {
    client_id                     = "your client_id"
    identity_provider_config_name = "example"
    issuer_url                    = "your issuer_url"
  }
}

I don`t feel that making module complexity in this will be required as this can be done really easy.

@antonbabenko @barryib any feelings here?

@bseenu
Copy link
Author

bseenu commented Aug 31, 2021

Yes i am doing that now, the problem is it takes like 25-35 mins to associate and disassociate the identity provider increasing the overall time to build a new cluster and teardown it

@daroga0002
Copy link
Contributor

how embedding this into module will influence those timings?

@bseenu
Copy link
Author

bseenu commented Aug 31, 2021

Currently this is done as post step after the cluster is built out, causing the api server to restart . I was wondering if we can hook this identity provider config as part of the cluster build rather than post step

@daroga0002
Copy link
Contributor

daroga0002 commented Sep 1, 2021

I don`t see any option (in webui) to add this during a creation, so seems you must first have cluster control planes available and then attach it.

I don`t think there will be any way to make it faster as seems be AWS limitation.

Does you have any other faster scenario in Webui or in CLI which is configuring this during cluster build?

@daroga0002
Copy link
Contributor

I also checked AWS API here https://docs.aws.amazon.com/eks/latest/APIReference/API_CreateCluster.html and here https://docs.aws.amazon.com/eks/latest/APIReference/API_AssociateIdentityProviderConfig.html so as per me there is no way to make this in other way than proposed initially

@antonbabenko
Copy link
Member

For the time being, I don't see us adding even more resources like aws_eks_identity_provider_config into this module even if it has a potentially better experience for the end-users. There is already a lot of magic in this module which we need to improve first (see #635, for e.g.).

@bseenu I agree with this comment - #1483 (comment) - you should be able to accomplish this outside of the module.

Closing this issue.

@ashishjullia
Copy link

I'm trying to achieve something like this, let me know if this is possible to achieve with this module in 2022. @antonbabenko

data "tls_certificate" "eks"{
        url = aws_eks_cluster.demo.identity[0].oidc[0].issuer
}

resource "aws_iam_openid_connect_provider" "eks"{
    client_id_list  = ["sts.amazonaws.com"]
    thumbprint_list = [data.tls_certificate.eks.certificates[0].sha1_fingerprint]
    url             =  aws_eks_cluster.demo.identity[0].oidc[0].issuer
}

resource"aws_iam_policy" "test-policy" {
    name="test-cluster-autoscaler-policy"
    policy = jsonencode({
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "autoscaling:SetDesiredCapacity",
                "autoscaling:TerminateInstanceInAutoScalingGroup"
            ],
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "aws:ResourceTag/k8s.io/cluster-autoscaler/${var.eks_cluster_name}": "owned"
                }
            }
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "autoscaling:DescribeAutoScalingInstances",
                "autoscaling:DescribeAutoScalingGroups",
                "ec2:DescribeLaunchTemplateVersions",
                "autoscaling:DescribeTags",
                "autoscaling:DescribeLaunchConfigurations"
            ],
            "Resource": "*"
        }
    ]
    })
}

resource"aws_iam_role" "test_oidc"{
    assume_role_policy = data.aws_iam_policy_document.test_oidc_assume_role_policy.json
    name = "test-oidc"
}

resource "aws_iam_role_policy_attachment" "test_attach"{
    role = aws_iam_role.test_oidc.name 
    policy_arn = aws_iam_policy.test-policy.arn
}

data "aws_iam_policy_document" "test_oidc_assume_role_policy" {
  statement {
    actions = ["sts:AssumeRoleWithWebIdentity"]
    effect  = "Allow"

    condition{
      test     = "StringEquals"
      variable = "${replace(aws_iam_openid_connect_provider.eks.url,"https://","")}:sub"
      values   = ["system:serviceaccount:kube-system:cluster-autoscaler"]
    }

    principals{
      identifiers      = [aws_iam_openid_connect_provider.eks.arn]
      type             = "Federated"
    }
  }
}

@bryantbiggs
Copy link
Member

@ashishjullia

terraform-aws-eks/main.tf

Lines 168 to 185 in 7d3c714

data "tls_certificate" "this" {
count = local.create && var.enable_irsa ? 1 : 0
url = aws_eks_cluster.this[0].identity[0].oidc[0].issuer
}
resource "aws_iam_openid_connect_provider" "oidc_provider" {
count = local.create && var.enable_irsa ? 1 : 0
client_id_list = distinct(compact(concat(["sts.${local.dns_suffix}"], var.openid_connect_audiences)))
thumbprint_list = concat([data.tls_certificate.this[0].certificates[0].sha1_fingerprint], var.custom_oidc_thumbprints)
url = aws_eks_cluster.this[0].identity[0].oidc[0].issuer
tags = merge(
{ Name = "${var.cluster_name}-eks-irsa" },
var.tags
)
}

@ashishjullia
Copy link

ashishjullia commented May 31, 2022

thanks @bryantbiggs I was able to replicate this but the problem with latest eks module's version is when I'm trying to execute it with the following code:
Main "variables" to take in consideration are:

create_iam_role = true
create_cloudwatch_log_group = false
iam_role_use_name_prefix = false
module "eks" {
  source = "terraform-aws-modules/eks/aws"
  version      = "18.21.0"
  cluster_name = var.eks_cluster_name
  cluster_version = var.kubernetes_version
  enable_irsa = true
  subnet_ids = module.vpc.private_subnets
  cluster_endpoint_private_access = true
  vpc_id = module.vpc.vpc_id
  eks_managed_node_group_defaults = {
    instance_types    = var.node_instance_type
    disk_size   = var.node_ami_disk_size
    ami_type = var.node_ami_type
  }

  eks_managed_node_groups = {
    nodes = {
      desired_size = var.nodes_desired_capacity
      max_size     = var.nodes_max_capacity
      min_size     = var.nodes_min_capacity
      instance_types = var.node_instance_type
      capacity_type = var.node_capacity_type

      k8s_labels = {
        Environment = "on_demand"
        GithubRepo  = "terraform-aws-eks"
        GithubOrg   = "terraform-aws-modules"
      }
      additional_tags = {
        ExtraTag = "on_demand-node"
      }
    }

    create_iam_role = true
    create_cloudwatch_log_group = false
    iam_role_use_name_prefix = false
  }
}

data "aws_eks_cluster" "cluster" {
  name = module.eks.cluster_id
}

data "aws_eks_cluster_auth" "cluster" {
  name = module.eks.cluster_id
}

data "aws_iam_openid_connect_provider" "cluster_oidc_arn" {
  arn = module.eks.oidc_provider_arn
}
data "aws_iam_openid_connect_provider" "cluster_oidc_url" {
  url = module.eks.oidc_provider
}

I'm getting the following error:

bash-4.2# terraform apply

│ Error: expected length of name_prefix to be in the range (1 - 38), got create_cloudwatch_log_group-eks-node-group-

│ with module.eks.module.eks_managed_node_group["create_cloudwatch_log_group"].aws_iam_role.this[0],
│ on .terraform/modules/eks/modules/eks-managed-node-group/main.tf line 435, in resource "aws_iam_role" "this":
│ 435: name_prefix = var.iam_role_use_name_prefix ? "${local.iam_role_name}-" : null



│ Error: expected length of name_prefix to be in the range (1 - 38), got iam_role_use_name_prefix-eks-node-group-

│ with module.eks.module.eks_managed_node_group["iam_role_use_name_prefix"].aws_iam_role.this[0],
│ on .terraform/modules/eks/modules/eks-managed-node-group/main.tf line 435, in resource "aws_iam_role" "this":
│ 435: name_prefix = var.iam_role_use_name_prefix ? "${local.iam_role_name}-" : null

I struggled a lot with the "parent eks" module as well as with the child "eks-managed-node-group" module but can't figure this out.

@bryantbiggs
Copy link
Member

I think you have an indentation issue

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "18.21.0"

  cluster_name = var.eks_cluster_name
  cluster_version = var.kubernetes_version
  enable_irsa = true
  subnet_ids = module.vpc.private_subnets
  cluster_endpoint_private_access = true
  vpc_id = module.vpc.vpc_id

  # This should be at cluster level
  create_cloudwatch_log_group = false
  
  # I don't know if you meant this for the cluster IAM role or the node IAM role, moved to cluster level
  iam_role_use_name_prefix = false

  eks_managed_node_group_defaults = {
    instance_types    = var.node_instance_type
    disk_size   = var.node_ami_disk_size
    ami_type = var.node_ami_type
  }

  eks_managed_node_groups = {
    nodes = {
      desired_size = var.nodes_desired_capacity
      max_size     = var.nodes_max_capacity
      min_size     = var.nodes_min_capacity
      instance_types = var.node_instance_type
      capacity_type = var.node_capacity_type

      k8s_labels = {
        Environment = "on_demand"
        GithubRepo  = "terraform-aws-eks"
        GithubOrg   = "terraform-aws-modules"
      }
      additional_tags = {
        ExtraTag = "on_demand-node"
      }
    }
  }
}

@ashishjullia
Copy link

Thanks, @bryantbiggs for the quick resolution here, it was my bad that I hadn't double-checked the indentation of the block.

But again I have another problem, while using the following:
data "aws_iam_openid_connect_provider" "cluster_oidc_url" { url = module.eks.cluster_oidc_issuer_url }

I'm getting the following error on terraform apply:

│ Error: error finding IAM OIDC Provider by URL (https://oidc.eks.us-east-1.amazonaws.com/id/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx): not found

│ with data.aws_iam_openid_connect_provider.cluster_oidc_url,
│ on eks-cluster.tf line 84, in data "aws_iam_openid_connect_provider" "cluster_oidc_url":
│ 84: data "aws_iam_openid_connect_provider" "cluster_oidc_url" {

I checked in the aws console the same "URL" is being created under "OpenID Connect provider URL".

Please let me know what I'm missing here.

@bryantbiggs
Copy link
Member

I would look it up using the ARN instead:

data "aws_iam_openid_connect_provider" "this" { 
  arn = module.eks.oidc_provider_arn 
}

@ashishjullia
Copy link

ashishjullia commented May 31, 2022

@bryantbiggs Oh I see but I'm trying to do achieve something like:

resource "aws_iam_policy" "test-policy" {
    name="test-cluster-autoscaler-policy"
    policy = jsonencode({
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "autoscaling:SetDesiredCapacity",
                "autoscaling:TerminateInstanceInAutoScalingGroup"
            ],
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "aws:ResourceTag/k8s.io/cluster-autoscaler/${var.eks_cluster_name}": "owned"
                }
            }
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "autoscaling:DescribeAutoScalingInstances",
                "autoscaling:DescribeAutoScalingGroups",
                "ec2:DescribeLaunchTemplateVersions",
                "autoscaling:DescribeTags",
                "autoscaling:DescribeLaunchConfigurations"
            ],
            "Resource": "*"
        }
    ]
    })
}

resource"aws_iam_role" "test_oidc"{
    assume_role_policy = data.aws_iam_policy_document.test_oidc_assume_role_policy.json
    name = "test-oidc"
}

resource "aws_iam_role_policy_attachment" "test_attach"{
    role = aws_iam_role.test_oidc.name 
    policy_arn = aws_iam_policy.test-policy.arn
}

data "aws_iam_policy_document" "test_oidc_assume_role_policy" {
  statement {
    actions = ["sts:AssumeRoleWithWebIdentity"]
    effect  = "Allow"

    condition{
      test     = "StringEquals"
      variable = "${data.aws_iam_openid_connect_provider.cluster_oidc_arn.arn}:sub"
      values   = ["system:serviceaccount:kube-system:cluster-autoscaler"]
    }

    principals{
      identifiers      = [data.aws_iam_openid_connect_provider.cluster_oidc_url.url]
      type             = "Federated"
    }
  }
}

output "test_policy_arn" {
    value = aws_iam_role.test_oidc.arn
}

And "data sources" as:

data "aws_iam_openid_connect_provider" "cluster_oidc_arn" {
arn = module.eks.oidc_provider_arn
}

data "aws_iam_openid_connect_provider" "cluster_oidc_url" {
url = module.eks.cluster_oidc_issuer_url
}

Please let me know what I'm missing here.

@bryantbiggs
Copy link
Member

Why not use https://github.com/terraform-aws-modules/terraform-aws-iam/tree/master/modules/iam-role-for-service-accounts-eks

module "cluster_autoscaler_irsa_role" {
  source      = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  version     = "~> 5.0"

  role_name                        = "cluster-autoscaler"
  attach_cluster_autoscaler_policy = true
  cluster_autoscaler_cluster_ids   = [module.eks.cluster_id]

  oidc_providers = {
    ex = {
      provider_arn               = module.eks.oidc_provider_arn
      namespace_service_accounts = ["kube-system:cluster-autoscaler"]
    }
  }

  tags = local.tags
}

@ashishjullia
Copy link

@bryantbiggs beautiful ❤️

Now, my only questions are:

  1. Can we implement/deploy/achieve -> Cluster Autoscaler and Metrics Server using this module?
  2. If not, what's the best way to achieve these two?

@bryantbiggs
Copy link
Member

This module does not provision cluster level resources (aws-auth configmap is the only cluster level management performed in this module). You can wrap the helm charts in the Terraform helm chart provider like https://github.com/clowdhaus/eks-reference-architecture/blob/e2a8cdc6405a4eff69bb6119ec57dc729f6ab8f1/karpenter/us-east-1/karpenter.tf#L37 or you can use a GitOps based approach with something like ArgoCD or Flux (would be the recommended way)

@ashishjullia
Copy link

@bryantbiggs thanks a ton for the help.

I've one more query regarding "cluster version update" while using terraform and eks module I'm trying to change the existing k8s cluster version to a newer one but I'm not getting the expected behaviour.
image
From the screen please check when I applied the changes for k8s cluster version update (in this case IAM user has the "UpdateCluster" permission attached as well" but instead of making the output as "waiting for update OR updating...) it is just throwing an error and when I checked the "aws console" it was showing "updating" status and updated the cluster's version.

The problem here is the terraform's output on this particular activity.

Please let me know if you can provide me with an explanation for this.

@bryantbiggs
Copy link
Member

You are probably missing some describe type permissions. Terraform has to poll and describe the cluster/resources while it waits for the update to complete

@ashishjullia
Copy link

@bryantbiggs Oh, I have the following permissions set for the IAM user.
image

Not sure whether the rest (on the left are also required), the problem is there isn't any perfect list for required permissions at all.

@bryantbiggs
Copy link
Member

right, and the DescribeUpdate is probably necessary for Terraform to keep checking if the update has succeeded (or it will hit the timeout set in the provider and error out if the update doesn't progress within the window)

@ashishjullia
Copy link

Got it, thanks a lot for the clarification. @bryantbiggs

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants