Skip to content
This repository was archived by the owner on Jan 8, 2019. It is now read-only.

Allow fair_share_queue overrides #1285

Merged
merged 1 commit into from
Oct 9, 2018
Merged

Conversation

macmaster
Copy link
Contributor

@macmaster macmaster commented Oct 3, 2018

These changes will allow an attribute to override / define fair scheduler queue properties.

Tested on both the VM and dev hardware cluster.

Purpose:
Deploying a release on existing hardware clusters quickly can sometimes prove challenging.
We'd like to avoid deploying new releases to push out small and urgent compute budget changes.
These changes will allow us to override the budgets codified in tenant recipes with environment attributes.

Usage:
Override the node['bcpc']['hadoop']['yarn']['fair_scheduler_queue'] in the environment json.
The data structure takes the following form:

node['bcpc']['hadoop']['yarn']['fair_scheduler_queue'] = {
  queue_name[string]: {
    parent_resource: [chef_resource_string],
    minResources: [string ~= /(\d+)mb (\d+)vcores/],
    maxResources: [string ~= /(\d+)mb (\d+)vcores/],
    maxRunningApps: [integer],
    maxAMShare: [float],
    weight: [float],
    schedulingPolicy: [string ~= /(fair|fifo|drf)/i],
    aclSubmitApps: [string (csv)],
    aclAdministerApps: [string (csv)],
    minSharePreemptionTimeout: [integer],
    fairSharePreemptionTimeout: [integer],
    fairSharePreemptionThreshold: [float]
  },
  ...
}

See the second comment below for an example.

Reference:

  1. fair_share_queue resource
  2. fair_share_queue helper methods
  3. yarn_schedulers recipe
  4. fair_scheduler reference

@macmaster
Copy link
Contributor Author

hadoopy

@macmaster
Copy link
Contributor Author

Test-Laptop.json:
....
        "yarn": {
          "fair_scheduler_queue": {
            "user": { },
            "groups": { },
            "rmacmaster": {
              "minResources": "1300mb, 5vcores",
              "parent_resource": "fair_share_queue[groups]"
            },
            "chef-bach": {
              "minResources": "2600mb, 10vcores",
              "parent_resource": "fair_share_queue[groups]"
            }
          },
...

@macmaster macmaster requested a review from cbaenziger October 4, 2018 14:14
@aespinosa
Copy link
Collaborator

This is the type of change that I would like to have chefspec laid out.

@aespinosa aespinosa removed the Tested label Oct 8, 2018
@aespinosa aespinosa self-requested a review October 8, 2018 06:30
@aespinosa
Copy link
Collaborator

Can you expound on the need to have it available in the environment? Is this a hack so that we won't need to cut releases?

@vt0r
Copy link
Member

vt0r commented Oct 8, 2018

Can you expound on the need to have it available in the environment? Is this a hack so that we won't need to cut releases?

As far as I understand, yes. This is one of the last remaining items we write tenant recipes for in the wrapper (gross), and we'd like to stop doing that if possible.

@macmaster
Copy link
Contributor Author

Can you expound on the need to have it available in the environment? Is this a hack so that we won't need to cut releases?

As far as I understand, yes. This is one of the last remaining items we write tenant recipes for in the wrapper (gross), and we'd like to stop doing that if possible.

Yes, this is a hack to avoid cutting releases when changing compute queues. I added some documentation in the PR description to elaborate on this.

Managing the queues through tenant recipes can remain the preferred approach because these recipes provide a single source of truth across environments. However, environment overrides will give us the flexibility to quickly turnaround compute queue changes while tenant recipe changes pend in the next release.

@vt0r vt0r dismissed aespinosa’s stale review October 9, 2018 21:30

Dismissing Allan's review, as the change requested was made in the description.

@vt0r vt0r merged commit 72a8cc7 into bloomberg:master Oct 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants