Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tls Passthrough support #643

Merged
merged 10 commits into from
Jun 7, 2024
Merged

Conversation

zijun726911
Copy link
Contributor

@zijun726911 zijun726911 commented May 24, 2024

Initial version cherry picked from: https://github.com/liwenwu-amazon/aws-application-networking-k8-publics/tree/tls-route-support-mar20
User experiences about TLSRoute, TargetGroupPolicy of this PR are same with the liwenwu-amazon:tls-route-support-mar20 branch.

main branch to liwenwu-amazon:tls-route-support-mar20 branch diff: zijun726911#3

TLS Passthrough Behaviors Summary

  • Users can create Kind:TLSRoute k8s resource, the controller will under the hood create VPC Lattice Service with TLS_PASSTHROUGH protocol listener and TCP type TargetGroup.
  • For cross cluster use case, users can create Kind:ServiceExport k8s resource and create the Kind:TargetGroupPolicy resource to specify the TargetGroup protocol to TCP. The controller will under the hood create VPC Lattice TCP TargetGroup
  • By default, controller created TCP TargetGroups has healthcheck==disabled
  • Same as old TargetGroupPolicy use cases, users can create the Kind:TargetGroupPolicy to override the healthcheck config for either Service (backendref by TLSRoute) or ServiceExport
  • For the TargetGroupPolicy with protocol==TCP && protocolVersion!=nil, it will be rejected.
  • The controller created TCP Target Groups have tag: application-networking.k8s.aws/ProtocolVersion:""

Changes

The controller logic changes:

  • Make the e2e code path (route_controller/serviceexport_controller --> model_build_xxxxx --> xxx_synthesizer --> target_group_manager) support the TLSRoute, and the ServiceExport with protocol:TCP TargetGroupPolicy.
  • In the rule_synthesizer.go, skips to create rules for TLSRoute since Lattice TLS_PASSTHROUGH listener can only have one defaultAction and without any addition rules. In the listener_synthesizer.go, fill the lattice listener's defaultAction to point to the referred TCP TargetGroups.
  • To support Tls passthrough, both rule_synthesizer and listener_synthesizer need the ResolveRuleTgIds() function (originally resided in the rule_synthesizer.go), so I move the ResolveRuleTgIds() and findSvcExportTG() to the target_group_manager.go to avoid copy paste same code, and the rule_synthesizer and listener_synthesizer can both use the same s.tgManager.ResolveRuleTgIds(ctx, rule, r.stack)
  • Encapsulate TLS_PATHTHROUGH listener related logic in the listenerSynthesizer.getLatticeListenerDefaultAction()
  • Move the unit test Test_ResolveRuleTgIds() to the target_group_manager_test.go
  • Set healthcheck to disabled for TCP targetGroup by default (for both target_group_manager Create() and Update() methods). If the user specify the healthcheck setting by TargetGroupPolicy, the default healthcheck setting will be overridden. (Disable healthcheck by default for TCP TargetGroup is also the default VPC Lattice API default behavior) https://docs.aws.amazon.com/vpc-lattice/latest/ug/target-group-health-checks.html
  • Since VPC Lattice TCP TargetGroup don't have ProtocolVersion, for the TCP TargetGroup, set the tag application-networking.k8s.aws/ProtocolVersion:"". In the (t *TargetGroupSpec) Validate() remove the ProtocolVersion from requiredFields.
  • Improvement for targetgrouppolicy with targetRef ServiceExport, now, the targetgrouppolicy_controller Watch() the ServiceExport resource change

e2e test changes

  • Add new automation e2e tests to test TLSRoute and ServiceExport with protocol:TCP TargetGroupPolicy

CRDs && Image build && helm chart related changes

  • Added tlsroutes create, delete, get, list, patch, update, watch permissions for the controller serviceAccount AuthZ permissions
  • Added gateway.networking.k8s.io_tlsroutes.yaml, included the tlsroutes CRDs in the helm chart and deploy.yaml

Testing

  • Following tls-passthrough.md steps to set up the single cluster e2e connectivity (TLSRoute backendref Service) and cross-cluster e2e connectivity (TLSRoute backendref ServiceImport + ServiceExport + TargetGroupPolicy) both work as expected.
  • 2 new automation e2e tests can pass (one test TLSRoute and the other one test the TLSRoute+ ServiceImport+ ServiceExport + TargetGroupPolicy).
  • Ran the whole e2e test suite, all test can pass (skiped the RAM share test)
[SynchronizedAfterSuite]
/Volumes/workspace/aws-application-networking-k8s/test/suites/integration/suite_test.go:72
{“level”:“info”,“ts”:“2024-06-04T01:32:32.262-0700",“caller”:“test/framework.go:264",“msg”:“Deleting objects: *v1.Gateway/test-gateway, *v1.Pod/grpc-runner”}
{“level”:“info”,“ts”:“2024-06-04T01:32:32.312-0700",“caller”:“test/framework.go:282",“msg”:“Waiting for NotFound, objects: *v1.Gateway/test-gateway, *v1.Pod/grpc-runner”}
[SynchronizedAfterSuite] PASSED [31.091 seconds]
------------------------------
Ran 66 of 69 Specs in 2537.133 seconds
SUCCESS! -- 66 Passed | 0 Failed | 0 Pending | 3 Skipped

Changes for the new revision: #643 (comment)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@zijun726911 zijun726911 changed the title Tls passthrough support [Draft] Tls passthrough support May 24, 2024
@@ -28,33 +30,29 @@ func (t *latticeServiceModelBuildTask) extractListenerInfo(
if err != nil {
Copy link
Contributor Author

@zijun726911 zijun726911 May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make the extractListenerInfo() code logic more easy to follow (by reversing the parentRef.SectionName != nil condition and removing the found := false flag and return the function earlier ).

and override protocol string to "TLS_PASSTHROUGH" when create the lattice listener for k8s Gateway with section.TLS.Mode==Passthrough

@zijun726911 zijun726911 force-pushed the tls-passthrough-support branch 11 times, most recently from 94336dd to eefb72a Compare May 28, 2024 19:34
@zijun726911 zijun726911 force-pushed the tls-passthrough-support branch from eefb72a to 595fff8 Compare May 28, 2024 19:39
Copy link
Contributor

@liwenwu-amazon liwenwu-amazon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move non TLS-passthrough related change into a separate CR? thanks

@@ -26,7 +26,7 @@ jobs:
pip install mkdocs-material mike
- name: Build
run: |
mike deploy 1.0.5 latest --update-aliases --push
mike deploy v1.0.6-rc1 latest --update-aliases --push
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes seems not related to TLS passthrough feature. Can you separate them into different CR? thanks

Copy link
Contributor Author

@zijun726911 zijun726911 May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is a little bit special, need to bump up the doc version here. otherwise the the doc version 1.0.5 will include the tls-passthrough.md, but the the controller v1.0.5 actually don't support the TLSRoute.

https://www.gateway-api-controller.eks.aws.dev/1.0.5/

What I am thinking is we can improve the whole Github Action (CI pipeline) publish-doc.yaml (in the future), for example, main branch code change trigger the mike deploy dev latest, and branch release-xxxxx trigger the mike deploy v1.x.x --push

@zijun726911 zijun726911 changed the title [Draft] Tls passthrough support [Draft] Tls Passthrough support May 28, 2024
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -26,7 +26,7 @@ jobs:
pip install mkdocs-material mike
- name: Build
run: |
mike deploy 1.0.5 latest --update-aliases --push
mike deploy v1.0.6-rc1 latest --update-aliases --push
Copy link
Contributor Author

@zijun726911 zijun726911 May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is a little bit special, need to bump up the doc version here. otherwise the the doc version 1.0.5 will include the tls-passthrough.md, but the the controller v1.0.5 actually don't support the TLSRoute.

https://www.gateway-api-controller.eks.aws.dev/1.0.5/

What I am thinking is we can improve the whole Github Action (CI pipeline) publish-doc.yaml (in the future), for example, main branch code change trigger the mike deploy dev latest, and branch release-xxxxx trigger the mike deploy v1.x.x --push

@zijun726911 zijun726911 marked this pull request as ready for review May 28, 2024 20:29
@zijun726911 zijun726911 requested a review from erikfuller May 28, 2024 20:29
Copy link
Contributor

@liwenwu-amazon liwenwu-amazon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zijun726911 zijun726911 changed the title [Draft] Tls Passthrough support Tls Passthrough support May 29, 2024
Copy link
Contributor

@erikfuller erikfuller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking on this massive change @zijun726911 and thanks to @liwenwu-amazon for the initial logic. I've added quite a few comments - please reach out if anything needs clarification.

I understand these changes all go together, but it would be really helpful to have this PR broken up. I'd love to see something like:

  1. Refactoring (no logic changes)
  2. New code/logic changes
  3. Documentation (can even break this up if needed)

I'm OK reviewing 1+2 together, but if we could decouple documentation that would be really helpful. In theory, the docs for this should all be additive functionality anyway.

Comment on lines 148 to 153
//+kubebuilder:rbac:groups=gateway.networking.k8s.io,resources=grpcroutes;httproutes,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=gateway.networking.k8s.io,resources=grpcroutes/status;httproutes/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=gateway.networking.k8s.io,resources=grpcroutes/finalizers;httproutes/finalizers,verbs=update

// +kubebuilder:rbac:groups=gateway.networking.k8s.io,resources=tlsroutes;grpcroutes;httproutes,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=gateway.networking.k8s.io,resources=tlsroutes/status;grpcroutes/status;httproutes/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=gateway.networking.k8s.io,resources=tlsroutes/finalizers;grpcroutes/finalizers;httproutes/finalizers,verbs=update
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually use these kubebuilder directives?

Copy link
Contributor Author

@zijun726911 zijun726911 May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The +kubebuilder:rbac annotation instructs the controller-gen rbac command to generate the ClusterRole, however our Makefile don't run the controller-gen rbac, I think they are useless, will remove them

https://book.kubebuilder.io/reference/controller-gen

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's what I thought. Just wanted to double check. Thanks for checking.

Comment on lines 75 to 97
defaultAction := vpclattice.RuleAction{
FixedResponse: &vpclattice.FixedResponseAction{
StatusCode: aws.Int64(404),
},
}

if modelListener.Spec.Protocol == vpclattice.ListenerProtocolTlsPassthrough {
// Fill the defaultAction tgs for TLS_PASSTHROUGH lattice listener
var latticeTGs []*vpclattice.WeightedTargetGroup
for _, modelTG := range defaultActionTgs {
latticeTG := vpclattice.WeightedTargetGroup{
TargetGroupIdentifier: aws.String(modelTG.LatticeTgId),
Weight: aws.Int64(modelTG.Weight),
}
latticeTGs = append(latticeTGs, &latticeTG)
}
d.log.Debugf("For TLS_PASSTHROUGH listener, forward to default target groups %v", latticeTGs)
defaultAction = vpclattice.RuleAction{
Forward: &vpclattice.ForwardAction{
TargetGroups: latticeTGs,
},
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you refactor this to a method? Something like getDefaultAction or similar. Alternatively, maybe just the logic in the if block. It actually looks pretty similar to the logic in rule_manager.go, maybe see if it's possible to combine?

SectionName: lo.ToPtr(gwv1.SectionName("tls")),
}},
},
Hostnames: []gwv1.Hostname{"tls.test.com"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use something that isn't a real domain here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in the next revision.

Expect(*tgSummary.VpcIdentifier).To(Equal(os.Getenv("CLUSTER_VPC_ID")))
Expect(*tgSummary.Protocol).To(Equal("TCP"))
if tg.Config.HealthCheck != nil {
Expect(*tg.Config.HealthCheck.Enabled).To(BeFalse())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we already need a TG policy here, would it be a good idea to include an enabled health check to make sure that's being set up properly?

}

targetsV1 := testFramework.GetTargets(ctx, tgSummary, deployment1)
Expect(*tgSummary.Port).To(BeEquivalentTo(80))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be 443?

Copy link
Contributor Author

@zijun726911 zijun726911 Jun 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This 80 is come from here:

The Target Group port is kind of useless, only the lattice targets' port are useful and guide vpc lattice to route traffic to that port. i.e, :

apiVersion: gateway.networking.k8s.io/v1alpha2
kind: TLSRoute
metadata:
  name: rate-tls-passthrough
spec:
  hostnames:
    - tls-rate.my-test.com
  parentRefs:
    - name: my-hotel-tls-passthrough
      sectionName: tls
  rules:
    - backendRefs:
        - name: tls-rate1
          kind: Service
          port: 443 <-------- This port + k8s service ports intersection define to the lattice target port

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just drop this Expect() then since we don't really care?

pod := pods[0]

Eventually(func(g Gomega) {
cmd := fmt.Sprintf("curl -k https://tls.test.com:444 --resolve tls.test.com:444:%s", dnsIP[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have an explicit check for port 444 on the listener?

Expect(*tg.Config.HealthCheck.Enabled).To(BeFalse())
}
targetsV1 := testFramework.GetTargets(ctx, tgSummary, deployment1)
Expect(*tgSummary.Port).To(BeEquivalentTo(80))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

443?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, should we drop the Expect()?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protocolVersion:
description: The protocol version used when performing health
checks on targets. Defaults to HTTP/1.

Hc config protocol version needs to be changed too, it doesn't have a default for tcp tg, customers need to specify both health check protocol and the protocol version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove the sentence "Defaults to HTTP/1." in the next revision

@erikfuller
Copy link
Contributor

One other question around TLSRouteRule/BackendRef from the spec: https://github.com/kubernetes-sigs/gateway-api/blob/main/apis/v1alpha2/tlsroute_types.go#L103

	// BackendRefs defines the backend(s) where matching requests should be
	// sent. If unspecified or invalid (refers to a non-existent resource or
	// a Service with no endpoints), the rule performs no forwarding; if no
	// filters are specified that would result in a response being sent, the
	// underlying implementation must actively reject request attempts to this
	// backend, by rejecting the connection or returning a 500 status code.
	// Request rejections must respect weight; if an invalid backend is
	// requested to have 80% of requests, then 80% of requests must be rejected
	// instead.

Just curious what our current approach does with invalid BackendRefs. I think this is probably a less vital part of the spec to honour, especially given it's still experiemental, but would be good to describe the expected behaviour here.

@zijun726911
Copy link
Contributor Author

One other question around TLSRouteRule/BackendRef from the spec: https://github.com/kubernetes-sigs/gateway-api/blob/main/apis/v1alpha2/tlsroute_types.go#L103

	// BackendRefs defines the backend(s) where matching requests should be
	// sent. If unspecified or invalid (refers to a non-existent resource or
	// a Service with no endpoints), the rule performs no forwarding; if no
	// filters are specified that would result in a response being sent, the
	// underlying implementation must actively reject request attempts to this
	// backend, by rejecting the connection or returning a 500 status code.
	// Request rejections must respect weight; if an invalid backend is
	// requested to have 80% of requests, then 80% of requests must be rejected
	// instead.

Just curious what our current approach does with invalid BackendRefs. I think this is probably a less vital part of the spec to honour, especially given it's still experiemental, but would be good to describe the expected behaviour here.

I will document this part. actually the current behavior is, for example for this TLSRoute:

apiVersion: gateway.networking.k8s.io/v1alpha2
kind: TLSRoute
metadata:
  name: rate-tls-passthrough
spec:
  hostnames:
    - tls-rate.my-test.com
  parentRefs:
    - name: my-hotel-tls-passthrough
      sectionName: tls
  rules:
    - backendRefs:
        - name: tls-rate1
          kind: Service
          port: 443
          weight: 10
        - name: tls-rate2
          kind: ServiceImport
          port: 443
          weight: 90

If the tls-rate2 targetGroup is not ready, it will block the traffic for both tls-rate1 and tls-rate2

( base on the ResolveRuleTgIds() logic

func (s *defaultTargetGroupManager) ResolveRuleTgIds(ctx context.Context, modelRule *model.Rule, stack core.Stack) error {
)

@zijun726911 zijun726911 force-pushed the tls-passthrough-support branch from febd9cf to 5bc2283 Compare June 1, 2024 23:08
@zijun726911 zijun726911 force-pushed the tls-passthrough-support branch 2 times, most recently from ff999a2 to 9b3f323 Compare June 4, 2024 08:16
@zijun726911 zijun726911 force-pushed the tls-passthrough-support branch from 9b3f323 to 6f3d2aa Compare June 4, 2024 08:23
@zijun726911 zijun726911 force-pushed the tls-passthrough-support branch from e63fa8b to 2b19a8a Compare June 5, 2024 01:30
Copy link

@Mengli-Shu Mengli-Shu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only reviewed doc changes for lattice tls passthrough behaviour

Copy link
Contributor

@erikfuller erikfuller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @zijun726911, for all your work on this. I just have a couple remaining requests to work through then I think we'll be all set.

@@ -80,6 +89,45 @@ func (l *listenerSynthesizer) Synthesize(ctx context.Context) error {
return nil
}

func (l *listenerSynthesizer) getLatticeListenerDefaultAction(ctx context.Context, modelListenerProtocol string) (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again it seems like this logic is out of place in the synthesizer.

The default action should be part of the model listener spec. Then, we would set the default action inside model_build_listener.go.

In Synthesize() above, we would resolve target group Ids for the default action. Finally, we would read the default action off the modelListener and convert to Lattice format inside the listener_manager.go#Upsert method. Upsert would then not need an additional argument for the default action.

Can you please have a look at this approach and see if it's feasible?

Copy link
Contributor Author

@zijun726911 zijun726911 Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion! I did some changes in the new revision and that did make the code cleaner:

  1. Added the field DefaultAction in the modelListener.ListenerSpec

    type Listener struct {
    core.ResourceMeta `json:"-"`
    Spec ListenerSpec `json:"spec"`
    Status *ListenerStatus `json:"status,omitempty"`
    }
    type ListenerSpec struct {
    StackServiceId string `json:"stackserviceid"`
    K8SRouteName string `json:"k8sroutename"`
    K8SRouteNamespace string `json:"k8sroutenamespace"`
    Port int64 `json:"port"`
    Protocol string `json:"protocol"`
    DefaultAction *DefaultAction `json:"defaultaction"`
    }
    type DefaultAction struct {
    FixedResponseStatusCode *int64 `json:"fixedresponsestatuscode"`
    Forward *RuleAction `json:"forward"`
    }

  2. Completely skip to create stackRule for TLS_PASSTHROUGH gwListener. i.e, completely don't call latticeServiceModelBuildTask.buildRules() for TLS_PASSTHROUGH gwListener, therefore removing the "long comments" in the buildRules() and therefore removing the seemly confusing code l.tgManager.ResolveRuleTgIds(ctx, stackRules[0], l.stack) (changed to l.tgManager.ResolveRuleTgIds(ctx, stackListener.Spec.DefaultAction.Forward, l.stack) instead). And since we don't create any stackRule for TLS_PASSTHROUGH gwListener, I also remove the TlsPassthrough skipping logic in the (r *ruleSynthesizer) createOrUpdateRules()

    if modelListener.Spec.Protocol == vpclattice.ListenerProtocolTlsPassthrough {
    t.log.Debugf("Skip building rules for TLS_PASSTHROUGH listener %s, since lattice TLS_PASSTHROUGH listener can only have listener defaultAction and without any other rule", modelListener.ID())
    continue
    }

  3. Added the (t *latticeServiceModelBuildTask) getListenerDefaultAction in the model_build_listener

    func (t *latticeServiceModelBuildTask) getListenerDefaultAction(ctx context.Context, modelListenerProtocol string) (
    *model.DefaultAction, error,
    ) {
    if modelListenerProtocol != vpclattice.ListenerProtocolTlsPassthrough {
    return &model.DefaultAction{
    FixedResponseStatusCode: aws.Int64(defaultActionFixedResponseStatusCode),
    }, nil
    }
    if len(t.route.Spec().Rules()) != 1 {
    return nil, fmt.Errorf("TLS_PASSTHROUGH listener can only have one rule for lattice listener default action, but got %d rules", len(t.route.Spec().Rules()))
    }
    modelRouteRule := t.route.Spec().Rules()[0]
    ruleTgList, err := t.getTargetGroupsForRuleAction(ctx, modelRouteRule)
    if err != nil {
    return nil, err
    }
    return &model.DefaultAction{
    Forward: &model.RuleAction{
    TargetGroups: ruleTgList,
    },
    }, nil
    }
    (Added unit test to cover build TLS_PASSTHROUGH listener)

  4. Changed the logic of (l *listenerSynthesizer) getLatticeListenerDefaultAction(

    func (l *listenerSynthesizer) getLatticeListenerDefaultAction(ctx context.Context, stackListener *model.Listener) (
    *vpclattice.RuleAction, error,
    ) {
    if stackListener.Spec.DefaultAction == nil ||
    (stackListener.Spec.DefaultAction.FixedResponseStatusCode == nil && stackListener.Spec.DefaultAction.Forward == nil) {
    return nil, fmt.Errorf("invalid listener default action, must be either fixed response or forward")
    }
    if stackListener.Spec.DefaultAction.FixedResponseStatusCode != nil {
    return &vpclattice.RuleAction{
    FixedResponse: &vpclattice.FixedResponseAction{
    StatusCode: stackListener.Spec.DefaultAction.FixedResponseStatusCode,
    },
    }, nil
    }
    // If the listener DefaultAction is not fixed response, for example for TLS_PASSTHROUGH listener, it must be a forward action, fill the forward action target group ids for it
    if err := l.tgManager.ResolveRuleTgIds(ctx, stackListener.Spec.DefaultAction.Forward, l.stack); err != nil {
    return nil, fmt.Errorf("failed to resolve rule tg ids, err = %v", err)
    }
    var latticeTGs []*vpclattice.WeightedTargetGroup
    for _, modelTG := range stackListener.Spec.DefaultAction.Forward.TargetGroups {
    latticeTG := vpclattice.WeightedTargetGroup{
    TargetGroupIdentifier: aws.String(modelTG.LatticeTgId),
    Weight: aws.Int64(modelTG.Weight),
    }
    latticeTGs = append(latticeTGs, &latticeTG)
    }
    l.log.Debugf("DefaultAction Forward target groups: %v", latticeTGs)
    return &vpclattice.RuleAction{
    Forward: &vpclattice.ForwardAction{
    TargetGroups: latticeTGs,
    },
    }, nil
    , removed the seemly confusing part l.tgManager.ResolveRuleTgIds(ctx, stackRules[0], l.stack), change to l.tgManager.ResolveRuleTgIds(ctx, stackListener.Spec.DefaultAction.Forward, l.stack);() instead.

  5. Still keep the defaultAction argument for the listenerManager.Upsert() , since in the listenerSynthesizer, it needs to convert the model.ListenerDefaultAction into the vpcLattice.ListenerDefaultAction, and pass the vpcLattice.ListenerDefaultAction to listenerManager.Upsert() to "determine" (needToUpdateDefaultAction()) whether the lattice.UpdateListener() need to be called.

Ran the full test suite for the new revision(skip the RAM test):

• [257.448 seconds]
------------------------------
[SynchronizedAfterSuite]
/Volumes/workspace/aws-application-networking-k8s/test/suites/integration/suite_test.go:72
{“level”:“info”,“ts”:“2024-06-05T23:08:51.716-0700",“caller”:“test/framework.go:264",“msg”:“Deleting objects: *v1.Gateway/test-gateway, *v1.Pod/grpc-runner”}
{“level”:“info”,“ts”:“2024-06-05T23:08:51.758-0700",“caller”:“test/framework.go:282",“msg”:“Waiting for NotFound, objects: *v1.Gateway/test-gateway, *v1.Pod/grpc-runner”}
[SynchronizedAfterSuite] PASSED [31.193 seconds]
------------------------------
Ran 66 of 69 Specs in 2412.270 seconds
SUCCESS! -- 66 Passed | 0 Failed | 0 Pending | 3 Skipped
--- PASS: TestIntegration (2412.27s)
PASS

)

var routeTypeToFinalizer = map[core.RouteType]string{
core.HttpRouteType: "httproute.k8s.aws/resources",
core.GrpcRouteType: "grpcroute.k8s.aws/resources",
core.TlsRouteType: "tlsroute.k8s.aws/resource",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The others are "resources" but this one is "resource"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in the new revisions

Comment on lines 79 to 83

// For TLSRoute, although in here we create a lattice listener rule with PathMatchValue = "/", but it will not be used in the rule_synthesizer,
// Since vpc lattice doesn't support any rule matching for TLS_PASSTHROUGH listener,
// the rule_synthesizer will skip updating the rule while the listener_synthesizer will fill the listener with the default action route to the referenced TCP target groups
// https://docs.aws.amazon.com/vpc-lattice/latest/ug/tls-listeners.html
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I'd like to change is that right now we have a few comment lines explaining why the code should drop through to the else for TLS passthrough. A multi-line explanation in comments can be a warning sign we don't have the code structured correctly. I'd like us to rather take an approach that is explicit, intuitive to follow, and needs no justifying or explanatory comments. I think what I proposed (or something like it) would do the trick.

Ideally we would redefine the interfaces so there were separate L4 and L7 interfaces, and there would be no Matches() at all on TCPRoute. This would be a larger change and is probably OK to skip for now, but we should not be relying on the empty list behavior here.

}

targetsV1 := testFramework.GetTargets(ctx, tgSummary, deployment1)
Expect(*tgSummary.Port).To(BeEquivalentTo(80))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just drop this Expect() then since we don't really care?

Expect(*tg.Config.HealthCheck.Enabled).To(BeFalse())
}
targetsV1 := testFramework.GetTargets(ctx, tgSummary, deployment1)
Expect(*tgSummary.Port).To(BeEquivalentTo(80))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, should we drop the Expect()?

@zijun726911 zijun726911 force-pushed the tls-passthrough-support branch from a6916b0 to cd6072c Compare June 6, 2024 16:50
@zijun726911
Copy link
Contributor Author

The reason I'd like to change is that right now we have a few comment lines explaining why the code should drop through to the else for TLS passthrough. A multi-line explanation in comments can be a warning sign we don't have the code structured correctly. I'd like us to rather take an approach that is explicit, intuitive to follow, and needs no justifying or explanatory comments. I think what I proposed (or something like it) would do the trick.

Ideally we would redefine the interfaces so there were separate L4 and L7 interfaces, and there would be no Matches() at all on TCPRoute. This would be a larger change and is probably OK to skip for now, but we should not be relying on the empty list behavior here.

Changed in the new revision. please check: #643 (comment)

Should we just drop this Expect() then since we don't really care?

Removed assertion for the tg port in the new revision.

@zijun726911 zijun726911 force-pushed the tls-passthrough-support branch from cd6072c to 036e2a3 Compare June 6, 2024 17:39
Copy link
Contributor

@erikfuller erikfuller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple small changes and we should be all set.

@@ -80,6 +89,45 @@ func (l *listenerSynthesizer) Synthesize(ctx context.Context) error {
return nil
}

func (l *listenerSynthesizer) getLatticeListenerDefaultAction(ctx context.Context, stackListener *model.Listener) (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conversion logic from model -> Lattice type typically lives in each component's manager. Here, it should only resolve the target group Ids.

Please move getDefaultAction logic inside listener_manager. The logic should assume the target group Ids have already been resolved. However, it needs to handling the case where the target group ID is set to model.InvalidBackendRefTgId. This can happen when the backendRef points to something that doesn't exist. Note we should have a test specifically for InvalidBackendRefTgId handling, which I don't think we do yet since I don't see it referenced here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working on in. Will do the exactly you describe here in the new revision. Will mimic the r *defaultRuleManager) buildLatticeRule() to get the listener defaultAction

func (r *defaultRuleManager) buildLatticeRule(modelRule *model.Rule) (*vpclattice.GetRuleOutput, error) {
gro := vpclattice.GetRuleOutput{
IsDefault: aws.Bool(false),
Priority: aws.Int64(modelRule.Spec.Priority),
}
httpMatch := vpclattice.HttpMatch{}
updateMatchFromRule(&httpMatch, modelRule)
gro.Match = &vpclattice.RuleMatch{HttpMatch: &httpMatch}
// check if we have at least one valid target group
var hasValidTargetGroup bool
for _, tg := range modelRule.Spec.Action.TargetGroups {
if tg.LatticeTgId != model.InvalidBackendRefTgId {
hasValidTargetGroup = true
break
}
}
if hasValidTargetGroup {
var latticeTGs []*vpclattice.WeightedTargetGroup
for _, ruleTg := range modelRule.Spec.Action.TargetGroups {
// skip any invalid TGs - eventually VPC Lattice may support weighted fixed response
// and this logic can be more in line with the spec
if ruleTg.LatticeTgId == model.InvalidBackendRefTgId {
continue
}
latticeTG := vpclattice.WeightedTargetGroup{
TargetGroupIdentifier: aws.String(ruleTg.LatticeTgId),
Weight: aws.Int64(ruleTg.Weight),
}
latticeTGs = append(latticeTGs, &latticeTG)
}
gro.Action = &vpclattice.RuleAction{
Forward: &vpclattice.ForwardAction{
TargetGroups: latticeTGs,
},
}
} else {
r.log.Debugf("There are no valid target groups, defaulting to 404 Fixed response")
gro.Action = &vpclattice.RuleAction{
FixedResponse: &vpclattice.FixedResponseAction{
StatusCode: aws.Int64(404),
},
}
}
gro.Name = aws.String(fmt.Sprintf("k8s-%d-rule-%d", modelRule.Spec.CreateTime.Unix(), modelRule.Spec.Priority))
return &gro, nil

Copy link
Contributor Author

@zijun726911 zijun726911 Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sent the new revision can you take a look? Did the changes exactly as you describe above. Thank you! b4dfd4d

Note we should have a test specifically for InvalidBackendRefTgId handling

The test cases: Test_ListenerManager_getLatticeListenerDefaultAction_TLS_PASSTHROUGH_Listener covered that.

Also ran the full e2e test suite against new revision

[SynchronizedAfterSuite]
/Volumes/workspace/aws-application-networking-k8s/test/suites/integration/suite_test.go:72
{“level”:“info”,“ts”:“2024-06-07T01:26:52.316-0700",“caller”:“test/framework.go:264",“msg”:“Deleting objects: *v1.Gateway/test-gateway, *v1.Pod/grpc-runner”}
{“level”:“info”,“ts”:“2024-06-07T01:26:52.357-0700",“caller”:“test/framework.go:282",“msg”:“Waiting for NotFound, objects: *v1.Gateway/test-gateway, *v1.Pod/grpc-runner”}
[SynchronizedAfterSuite] PASSED [31.041 seconds]
------------------------------
Ran 66 of 69 Specs in 2407.814 seconds
SUCCESS! -- 66 Passed | 0 Failed | 0 Pending | 3 Skipped
--- PASS: TestIntegration (2407.82s)
PASS
ok 	[github.com/aws/aws-application-networking-k8s/test/suites/integration](http://github.com/aws/aws-application-networking-k8s/test/suites/integration)	2408.753s

Comment on lines 109 to 111
if err := l.tgManager.ResolveRuleTgIds(ctx, stackListener.Spec.DefaultAction.Forward, l.stack); err != nil {
return nil, fmt.Errorf("failed to resolve rule tg ids, err = %v", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bring this block into the Synthesize() method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change in the new revision

Comment on lines 95 to 98
if stackListener.Spec.DefaultAction == nil ||
(stackListener.Spec.DefaultAction.FixedResponseStatusCode == nil && stackListener.Spec.DefaultAction.Forward == nil) {
return nil, fmt.Errorf("invalid listener default action, must be either fixed response or forward")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should rather validate this when we add the listener to the stack in listener.go#NewListener().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will mimic the func (t *TargetGroupSpec) Validate(), and put that validation in the new function func (spec *ListenerSpec) Validate()

Copy link
Contributor

@erikfuller erikfuller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple very small changes.

@@ -41,3 +60,24 @@ func NewListener(stack core.Stack, spec ListenerSpec) (*Listener, error) {

return listener, nil
}

func (spec *ListenerSpec) Validate() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Contributor

@erikfuller erikfuller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for all your hard work on this!

if stackListener.Spec.Protocol == vpclattice.ListenerProtocolTlsPassthrough {
return nil, fmt.Errorf("TLSRoute %s/%s must have at least one valid backendRef target group", stackListener.Spec.K8SRouteNamespace, stackListener.Spec.K8SRouteName)
} else {
return nil, fmt.Errorf("unreachable code, since the defaultAction for non-TLS_PASSTHROUGH listener is always the FixedResponse 404")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Famous last words 😉

@erikfuller erikfuller merged commit d3fa856 into aws:main Jun 7, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants