Skip to content

Commit 12e2725

Browse files
committedApr 10, 2019
Change threshold for detecting a deployment failure
Previously, hako fails a deployment when there's a task which is launched by the deployment and stops during the deployment. It wrongly fails the deployment process when the container instance which a new task is assigned to is changed to DRAINING status during the deployment. So I decided to change the threshold as a mitigation. This is not a complete solution though.
1 parent 3148a4d commit 12e2725

File tree

1 file changed

+5
-3
lines changed

1 file changed

+5
-3
lines changed
 

‎lib/hako/schedulers/ecs.rb

+5-3
Original file line numberDiff line numberDiff line change
@@ -946,10 +946,12 @@ def wait_for_ready(service)
946946
Hako.logger.debug " latest_event_id=#{latest_event_id}, deployments=#{s.deployments}"
947947
no_active = s.deployments.all? { |d| d.status != 'ACTIVE' }
948948
primary = s.deployments.find { |d| d.status == 'PRIMARY' }
949-
if primary.desired_count < @started_task_ids.size
949+
if primary.desired_count * 2 < @started_task_ids.size
950950
Hako.logger.error('Some started tasks are stopped. It seems new deployment is failing to start')
951-
ecs_client.describe_tasks(cluster: service.cluster_arn, tasks: @started_task_ids).tasks.each do |task|
952-
report_task_diagnostics(task)
951+
@started_task_ids.each_slice(100) do |task_ids|
952+
ecs_client.describe_tasks(cluster: service.cluster_arn, tasks: task_ids).tasks.each do |task|
953+
report_task_diagnostics(task)
954+
end
953955
end
954956
return false
955957
end

0 commit comments

Comments
 (0)