Skip to content

Commit ed3f2ca

Browse files
committed
added harmony
1 parent cf29d48 commit ed3f2ca

File tree

4,106 files changed

+161364
-3
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

4,106 files changed

+161364
-3
lines changed

.gitattributes

+36
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# Auto detect text files and perform LF normalization
2+
* text=lf
3+
4+
# Custom for Visual Studio
5+
*.cs diff=csharp
6+
7+
# Standard to msysgit
8+
*.doc diff=astextplain
9+
*.DOC diff=astextplain
10+
*.docx diff=astextplain
11+
*.DOCX diff=astextplain
12+
*.dot diff=astextplain
13+
*.DOT diff=astextplain
14+
*.pdf diff=astextplain
15+
*.PDF diff=astextplain
16+
*.rtf diff=astextplain
17+
*.RTF diff=astextplain
18+
19+
# Declare files that will always have Windows CRLF line endings on checkout.
20+
*.sln text eol=crlf
21+
*.bat text eol=crlf
22+
*.cmd text eol=crlf
23+
24+
# Declare files that will always have UNIX LF line endings on checkout.
25+
*.sh text eol=lf
26+
*.rb text eol=lf
27+
28+
# Denote all files that are truly binary and should not be modified.
29+
*.png binary
30+
*.jpg binary
31+
*.tar binary
32+
*.tgz binary
33+
*.gz binary
34+
*.zip binary
35+
*.exe binary
36+
*.snappy binary

.gitignore

+200
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
########### For Python.
2+
# Byte-compiled / optimized / DLL files
3+
__pycache__/
4+
*.py[cod]
5+
*$py.class
6+
7+
# C extensions
8+
*.so
9+
10+
# Distribution / packaging
11+
.Python
12+
build/
13+
develop-eggs/
14+
dist/
15+
downloads/
16+
eggs/
17+
.eggs/
18+
lib/
19+
lib64/
20+
parts/
21+
sdist/
22+
var/
23+
wheels/
24+
share/python-wheels/
25+
*.egg-info/
26+
.installed.cfg
27+
*.egg
28+
MANIFEST
29+
30+
# PyInstaller
31+
# Usually these files are written by a python script from a template
32+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
33+
*.manifest
34+
*.spec
35+
36+
# Installer logs
37+
pip-log.txt
38+
pip-delete-this-directory.txt
39+
40+
# Unit test / coverage reports
41+
htmlcov/
42+
.tox/
43+
.nox/
44+
.coverage
45+
.coverage.*
46+
.cache
47+
nosetests.xml
48+
coverage.xml
49+
*.cover
50+
*.py,cover
51+
.hypothesis/
52+
.pytest_cache/
53+
cover/
54+
55+
# Translations
56+
*.mo
57+
*.pot
58+
59+
# Django stuff:
60+
# *.log
61+
local_settings.py
62+
db.sqlite3
63+
db.sqlite3-journal
64+
65+
# Flask stuff:
66+
instance/
67+
.webassets-cache
68+
69+
# Scrapy stuff:
70+
.scrapy
71+
72+
# Sphinx documentation
73+
docs/_build/
74+
75+
# PyBuilder
76+
.pybuilder/
77+
target/
78+
79+
# Jupyter Notebook
80+
.ipynb_checkpoints
81+
82+
# IPython
83+
profile_default/
84+
ipython_config.py
85+
86+
# pyenv
87+
# For a library or package, you might want to ignore these files since the code is
88+
# intended to run in multiple environments; otherwise, check them in:
89+
# .python-version
90+
91+
# pipenv
92+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
93+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
94+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
95+
# install all needed dependencies.
96+
#Pipfile.lock
97+
98+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
99+
__pypackages__/
100+
101+
# Celery stuff
102+
celerybeat-schedule
103+
celerybeat.pid
104+
105+
# SageMath parsed files
106+
*.sage.py
107+
108+
# Environments
109+
.env
110+
.venv
111+
env/
112+
venv/
113+
ENV/
114+
env.bak/
115+
venv.bak/
116+
117+
# Spyder project settings
118+
.spyderproject
119+
.spyproject
120+
121+
# Rope project settings
122+
.ropeproject
123+
124+
# mkdocs documentation
125+
/site
126+
127+
# mypy
128+
.mypy_cache/
129+
.dmypy.json
130+
dmypy.json
131+
132+
# Pyre type checker
133+
.pyre/
134+
135+
# pytype static type analyzer
136+
.pytype/
137+
138+
# Cython debug symbols
139+
cython_debug/
140+
141+
142+
########### For C/C++
143+
# Prerequisites
144+
*.d
145+
146+
# Object files
147+
*.o
148+
*.ko
149+
*.obj
150+
*.elf
151+
152+
# Linker output
153+
*.ilk
154+
*.map
155+
*.exp
156+
157+
# Precompiled Headers
158+
*.gch
159+
*.pch
160+
161+
# Libraries
162+
*.lib
163+
*.a
164+
*.la
165+
*.lo
166+
167+
# Shared objects (inc. Windows DLLs)
168+
*.dll
169+
*.so
170+
*.so.*
171+
*.dylib
172+
173+
# Executables
174+
*.exe
175+
*.out
176+
*.app
177+
*.i*86
178+
*.x86_64
179+
*.hex
180+
181+
# Debug files
182+
*.dSYM/
183+
*.su
184+
*.idb
185+
*.pdb
186+
187+
# Kernel Module Compile Results
188+
*.mod*
189+
*.cmd
190+
.tmp_versions/
191+
modules.order
192+
Module.symvers
193+
Mkfile.old
194+
dkms.conf
195+
196+
############# For Mac
197+
*.DS_Store
198+
199+
############# For logs
200+
dumps/

LICENSE LICENSE.txt

+4-2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
MIT License
22

3-
Copyright (c) 2022 msr-fiddle
3+
Copyright (c) 2020 - present Microsoft Corporation
4+
5+
All rights reserved.
46

57
Permission is hereby granted, free of charge, to any person obtaining a copy
68
of this software and associated documentation files (the "Software"), to deal
@@ -18,4 +20,4 @@ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
1820
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
1921
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
2022
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21-
SOFTWARE.
23+
SOFTWARE.

Overview3.jpg

263 KB
Loading

README.md

+109-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,109 @@
1-
# harmony
1+
# Harmony
2+
3+
This repository contains the source code implementation of the following papers:
4+
5+
- ''[Harmony: Overcoming the hurdles of GPU memory capacity to train massive DNN models on commodity servers](https://www.microsoft.com/en-us/research/publication/harmony-overcoming-the-hurdles-of-gpu-memory-capacity-to-train-massive-dnn-models-on-commodity-servers/)'', which appeared at VLDB 2022.
6+
7+
- ''[Doing more with less: Training large DNN models on commodity servers for the masses](https://www.microsoft.com/en-us/research/publication/doing-more-with-less-training-large-dnn-models-on-commodity-servers-for-the-masses/)'', which appeared at HotOS 2021.
8+
9+
This work was done as part of Microsoft Research's [Project Fiddle](https://aka.ms/msr-fiddle). This source code is available under the [MIT License](./LICENSE.txt).
10+
11+
## Directory Structure
12+
13+
- `harmony`: the Harmony source code, with detailed instructions, various example scripts, as well as previous results.
14+
15+
- `model_lib`: the model libary containing model code that is not included in pytorch, such as the transformer library from [huggingface](https://huggingface.co/).
16+
17+
- `util_lib`: the customized utility libary.
18+
19+
## Setup
20+
21+
To run Harmony, the easiest way is to use the standard nvidia's container (nvcr.io/nvidia/pytorch:20.03-py3) which satisfies most dependencies. It can be launched by:
22+
23+
```bash
24+
./launch.sh
25+
```
26+
27+
Once getting into the container, the remaining dependencies can be satisified by running:
28+
29+
```bash
30+
./install.sh
31+
```
32+
33+
### Note:
34+
35+
- Harmony was developed in the environment of Python 3.6.9, PyTorch 1.5.0a0, CUDA 10.1.243, cuDNN 7.6.3, NCCL 2.4.8, Nvidia driver 418, Ubuntu 18.04.3 LTS.
36+
37+
- Harmony was developed with Nivida GPUs.
38+
39+
- Harmony does not modfiy PyTorch library and may remain portable to different versions.
40+
41+
## Dataset
42+
43+
- GLUE (including MRPC): It can be downloaded by running [this script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e) and unpacked to a directorary `/data/glue/MRPC`.
44+
45+
- WikiText-2 and WikiText-103: It can be downloaded from [here](https://blog.salesforceairesearch.com/the-wikitext-long-term-dependency-language-modeling-dataset/) and unpacked to a directorary `/data/wikitext-2-tokens` and `/data/wikitext-103-tokens`.
46+
47+
- ImageNet: The ImageNet ILSVC 2012 can be downloaded by running [this script](https://github.com/msr-fiddle/pipedream/blob/pipedream/scripts/download_imagenet.py) and unpacked to a directory `/data/imagenet/`.
48+
49+
## End-to-end Workflow
50+
51+
The end-to-end workflow of Harmony can be illustrated by the figure below:
52+
53+
<img src="Overview3.jpg" alt="drawing" width="80%"/>
54+
55+
For example, to run a BERT-Large with Harmony, we can go through following steps:
56+
57+
### Decompose model into per-layer code
58+
```bash
59+
cd harmony/1_decomposer/bert_thomwolf && ./run_bert_large.sh
60+
```
61+
62+
### Profile each layer
63+
```bash
64+
cd ../../2_profiler/bert_thomwolf && ./run_bert_large.sh
65+
```
66+
67+
### Search the best schedule
68+
```bash
69+
cd ../../3_scheduler && ./run_bert_large.sh
70+
```
71+
72+
### Run the best schedule
73+
```bash
74+
cd ../4_runtime/bert_thomwolf && ./run_bert_large.sh
75+
```
76+
77+
## Code of Conduct
78+
79+
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [[email protected]](mailto:[email protected]) with any additional questions or comments.
80+
81+
82+
## License
83+
84+
Copyright (c) Microsoft Corporation. All rights reserved.
85+
86+
Licensed under the [MIT License](./LICENSE.txt).
87+
88+
## Reference
89+
90+
If you find the code helpful, citing our papers would be appreciated : )
91+
```bibtex
92+
@article{VLDB22Harmony,
93+
title = {{Harmony: Overcoming the Hurdles of GPU Memory Capacity to Train Massive DNN Models on Commodity Servers}},
94+
author = {Youjie Li and Amar Phanishayee and Derek Murray and Jakub Tarnawski and Nam Sung Kim},
95+
journal = {The 48th International Conference on Very Large Databases (VLDB'22)},
96+
year = {2022},
97+
address = {Sydney, Australia},
98+
month = sep
99+
}
100+
101+
@inproceedings{HotOS21Harmony,
102+
title = {{Doing More with Less: Training Large DNN Models on Commodity Servers for the Masses}},
103+
author = {Youjie Li and Amar Phanishayee and Derek Murray and Nam Sung Kim},
104+
booktitle = {Workshop on Hot Topics in Operating Systems (HotOS’21)},
105+
year = {2021},
106+
address = {Ann Arbor, MI, USA},
107+
month = jun
108+
}
109+
```

0 commit comments

Comments
 (0)