Name	Name	Last commit message	Last commit date
parent directory ..
.gitignore	.gitignore
README.md	README.md
requirements.txt	requirements.txt
run_zeus.py	run_zeus.py
train.py	train.py

Integrating Zeus with Huggingface and Capriccio

This example will demonstrate how to integrate Zeus with Capriccio, a drifting sentiment analysis dataset.

You can search for # ZEUS in train.py for noteworthy places that require modification from conventional training scripts. Parts relevant to using Capriccio are also marked with # CAPRICCIO.

Usages

Zeus
- Running Zeus for a single job
- Running Zeus over multiple recurrences
Extra
- Profiling power and time
- Fine-tuning a Huggingface language model on one slice

Running Zeus for a single job

While our paper is about optimizing the batch size and power limit over multiple recurrences of the job, it is also possible to use just ZeusDataLoader to JIT-profile and optimize the power limit.

Dependencies

Generate Capriccio, following the instructions in Capriccio's README.md.
If you're not using our Docker image, install zeus and build the power monitor, following Installing and Building.
Install python dependencies for this example:
```
pip install -r requirements.txt
```

Example command

ZeusDataLoader interfaces with the outside world via environment variables. Check out the class reference for details.

Only ZEUS_TARGET_METRIC is required; other environment variables below show their default values when omitted.

export ZEUS_TARGET_METRIC="0.84"               # Stop training when target val metric is reached
export ZEUS_LOG_DIR="zeus_log"                 # Directory to store profiling logs
export ZEUS_JOB_ID="zeus"                      # Used to distinguish recurrences, so not important
export ZEUS_COST_THRESH="inf"                  # Kill training when cost (Equation 2) exceeds this
export ZEUS_ETA_KNOB="0.5"                     # Knob to tradeoff energy and time (Equation 2)
export ZEUS_MONITOR_PATH="/workspace/zeus/zeus_monitor/zeus_monitor" # Path to power monitor
export ZEUS_PROFILE_PARAMS="10,40"              # warmup_iters,profile_iters for each power limit
export ZEUS_USE_OPTIMAL_PL="True"              # Whether to acutally use the optimal PL found

python train.py \
    --zeus \
    --data_dir data \
    --slice_number 9 \
    --model_name_or_path bert-base-uncased \
    --batch_size 128

Running Zeus over multiple recurrences

This example shows how to integrate ZeusDataLoader and drive batch size and power optimizations with ZeusMaster.

Dependencies

Generate Capriccio, following the instructions in Capriccio's README.md.
If you're not using our Docker image, install zeus and build the power monitor, following Installing and Building.
Install python dependencies for this example:
```
pip install -r requirements.txt
```

Example command

# All arguments shown below are default values.
python run_zeus.py \
    --seed 123 \
    --b_0 128 \
    --lr_0 4.00e-7 \
    --b_min 8 \
    --b_max 128 \
    --num_recurrence 38 \
    --eta_knob 0.5 \
    --beta_knob 2.0 \
    --target_metric 0.84 \
    --max_epochs 10 \
    --window_size 10

Profiling power and time

You can use Zeus's ProfileDataLoader to profile the power and time consumption of training.

Dependencies

Generate Capriccio, following the instructions in Capriccio's README.md.
If you're not using our Docker image, install zeus and build the power monitor, following Installing and Building.
Install python dependencies for this example:
```
pip install -r requirements.txt
```

Example command

ProfileDataLoader interfaces with the outside world via environment variables. Check out its class reference for details.

Only ZEUS_LOG_PREFIX is required; other environment variables below show their default values when omitted.

export ZEUS_LOG_PREFIX="capriccio"              # Filename prefix for power and time log files
export ZEUS_MONITOR_SLEEP_MS="100"              # Milliseconds to sleep after sampling power
export ZEUS_MONITOR_PATH="/workspace/zeus/zeus_monitor/zeus_monitor"  # Path to power monitor

python train.py \
    --profile \
    --data_dir ../../capriccio/data \
    --slice_number 9 \
    --model_name_or_path bert-base-uncased \
    --batch_size 128

A CSV file of timestamped momentary power draw of the first GPU (index 0) will be written to capriccio+gpu0.power.csv. At the same time, a CSV file with headers epoch number, split (train or eval), and time consumption in seconds will be written to capriccio.time.csv.

Fine-tuning a Huggingface language model on one slice

train.py can also be used to fine-tune a pretrained language model on one slice of Capriccio, without having to do anything with Zeus.

Dependencies

Generate Capriccio, following the instructions in Capriccio's README.md.
Only for those not using our Docker image, install PyTorch separately:
```
conda install -c pytorch pytorch==1.10.1
```
Install python dependencies for this example:
```
pip install -r requirements.txt
```

Example command

python train.py \
    --data_dir data \
    --slice_number 9 \
    --model_name_or_path bert-base-uncased \
    --batch_size 128

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

capriccio

capriccio

README.md

Integrating Zeus with Huggingface and Capriccio

Running Zeus for a single job

Dependencies

Example command

Running Zeus over multiple recurrences

Dependencies

Example command

Profiling power and time

Dependencies

Example command

Fine-tuning a Huggingface language model on one slice

Dependencies

Example command

Files

capriccio

Directory actions

More options

Directory actions

More options

Latest commit

History

capriccio

Folders and files

parent directory

README.md

Integrating Zeus with Huggingface and Capriccio

Running Zeus for a single job

Dependencies

Example command

Running Zeus over multiple recurrences

Dependencies

Example command

Profiling power and time

Dependencies

Example command

Fine-tuning a Huggingface language model on one slice

Dependencies

Example command