Change threshold for detecting a deployment failure

eagletmt · eagletmt · commit 12e27259dc2f · 2019-04-10T11:32:22.000+09:00
Previously, hako fails a deployment when there's a task which is
launched by the deployment and stops during the deployment.  It wrongly
fails the deployment process when the container instance which a new
task is assigned to is changed to DRAINING status during the deployment.
So I decided to change the threshold as a mitigation. This is not a
complete solution though.
diff --git a/lib/hako/schedulers/ecs.rb b/lib/hako/schedulers/ecs.rb
@@ -946,10 +946,12 @@ def wait_for_ready(service)
           Hako.logger.debug "  latest_event_id=#{latest_event_id}, deployments=#{s.deployments}"
           no_active = s.deployments.all? { |d| d.status != 'ACTIVE' }
           primary = s.deployments.find { |d| d.status == 'PRIMARY' }
-          if primary.desired_count < @started_task_ids.size
+          if primary.desired_count * 2 < @started_task_ids.size
             Hako.logger.error('Some started tasks are stopped. It seems new deployment is failing to start')
-            ecs_client.describe_tasks(cluster: service.cluster_arn, tasks: @started_task_ids).tasks.each do |task|
-              report_task_diagnostics(task)
+            @started_task_ids.each_slice(100) do |task_ids|
+              ecs_client.describe_tasks(cluster: service.cluster_arn, tasks: task_ids).tasks.each do |task|
+                report_task_diagnostics(task)
+              end
             end
             return false
           end