Harmony

This repository contains the source code implementation of the following papers:

''Harmony: Overcoming the hurdles of GPU memory capacity to train massive DNN models on commodity servers'', which appeared at VLDB 2022.
''Doing more with less: Training large DNN models on commodity servers for the masses'', which appeared at HotOS 2021.

This work was done as part of Microsoft Research's Project Fiddle. This source code is available under the MIT License.

Directory Structure

harmony: the Harmony source code, with detailed instructions, various example scripts, as well as previous results.
model_lib: the model libary containing model code that is not included in pytorch, such as the transformer library from huggingface.
util_lib: the customized utility libary.

Setup

To run Harmony, the easiest way is to use the standard nvidia's container (nvcr.io/nvidia/pytorch:20.03-py3) which satisfies most dependencies. It can be launched by:

./launch.sh

Once getting into the container, the remaining dependencies can be satisified by running:

./install.sh

Note:

Harmony was developed in the environment of Python 3.6.9, PyTorch 1.5.0a0, CUDA 10.1.243, cuDNN 7.6.3, NCCL 2.4.8, Nvidia driver 418, Ubuntu 18.04.3 LTS.
Harmony was developed with Nivida GPUs.
Harmony does not modfiy PyTorch library and may remain portable to different versions.

Dataset

GLUE (including MRPC): It can be downloaded by running this script and unpacked to a directorary /data/glue/MRPC.
WikiText-2 and WikiText-103: It can be downloaded from here and unpacked to a directorary /data/wikitext-2-tokens and /data/wikitext-103-tokens.
ImageNet: The ImageNet ILSVC 2012 can be downloaded by running this script and unpacked to a directory /data/imagenet/.

End-to-end Workflow

The end-to-end workflow of Harmony can be illustrated by the figure below:

For example, to run a BERT-Large with Harmony, we can go through following steps:

Decompose model into per-layer code

cd harmony/1_decomposer/bert_thomwolf && ./run_bert_large.sh

Profile each layer

cd ../../2_profiler/bert_thomwolf && ./run_bert_large.sh

Search the best schedule

cd ../../3_scheduler && ./run_bert_large.sh

Run the best schedule

cd ../4_runtime/bert_thomwolf && ./run_bert_large.sh

Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

License

Licensed under the MIT License.

Reference

If you find the code helpful, citing our papers would be appreciated : )

@article{VLDB22Harmony,
    title = {{Harmony: Overcoming the Hurdles of GPU Memory Capacity to Train Massive DNN Models on Commodity Servers}}, 
    author = {Youjie Li and Amar Phanishayee and Derek Murray and Jakub Tarnawski and Nam Sung Kim},
    journal = {The 48th International Conference on Very Large Databases (VLDB'22)},
    year = {2022},
    address = {Sydney, Australia},
    month = sep
}

@inproceedings{HotOS21Harmony,
    title = {{Doing More with Less: Training Large DNN Models on Commodity Servers for the Masses}},
    author = {Youjie Li and Amar Phanishayee and Derek Murray and Nam Sung Kim},
    booktitle = {Workshop on Hot Topics in Operating Systems (HotOS’21)},
    year = {2021},
    address = {Ann Arbor, MI, USA},
    month = jun
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
harmony		harmony
model_lib		model_lib
util_lib		util_lib
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
Overview3.jpg		Overview3.jpg
README.md		README.md
install.sh		install.sh
launch.sh		launch.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Harmony

Directory Structure

Setup

Note:

Dataset

End-to-end Workflow

Decompose model into per-layer code

Profile each layer

Search the best schedule

Run the best schedule

Code of Conduct

License

Reference

About

Releases

Packages

Languages

License

MachineLearningSystem/harmony

Folders and files

Latest commit

History

Repository files navigation

Harmony

Directory Structure

Setup

Note:

Dataset

End-to-end Workflow

Decompose model into per-layer code

Profile each layer

Search the best schedule

Run the best schedule

Code of Conduct

License

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages