Skip to content

Commit 51970e3

Browse files
ahmedo42vfdev-5
andauthored
transformers example (#1656)
* transformers example * fixed docs and imports * changed name * Updated example Co-authored-by: vfdev-5 <[email protected]>
1 parent abe1ddd commit 51970e3

File tree

6 files changed

+608
-0
lines changed

6 files changed

+608
-0
lines changed
+106
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# Transformers Example with Ignite
2+
3+
In this example, we show how to use _Ignite_ to finetune a transformer model:
4+
5+
- on 1 or more GPUs or TPUs
6+
- compute training/validation metrics
7+
- log learning rate, metrics etc
8+
- save the best model weights
9+
10+
Configurations:
11+
12+
- [x] single GPU
13+
- [x] multi GPUs on a single node
14+
- [x] TPUs on Colab
15+
16+
## Requirements:
17+
18+
- pytorch-ignite: `pip install pytorch-ignite`
19+
- [transformers](https://github.com/huggingface/transformers): `pip install transformers`
20+
- [datasets](https://github.com/huggingface/datasets): `pip install datasets`
21+
- [tqdm](https://github.com/tqdm/tqdm/): `pip install tqdm`
22+
- [tensorboardx](https://github.com/lanpa/tensorboard-pytorch): `pip install tensorboardX`
23+
- [python-fire](https://github.com/google/python-fire): `pip install fire`
24+
- Optional: [clearml](https://github.com/allegroai/clearml): `pip install clearml`
25+
26+
Alternatively, install the all requirements using `pip install -r requirements.txt`.
27+
28+
## Usage:
29+
30+
Run the example on a single GPU:
31+
32+
```bash
33+
python main.py run
34+
```
35+
If needed, please, adjust the batch size to your GPU device with `--batch_size` argument.
36+
37+
For details on accepted arguments:
38+
39+
```bash
40+
python main.py run -- --help
41+
```
42+
43+
44+
### Distributed training
45+
46+
#### Single node, multiple GPUs
47+
48+
Let's start training on a single node with 2 gpus:
49+
50+
```bash
51+
# using torch.distributed.launch
52+
python -u -m torch.distributed.launch --nproc_per_node=2 --use_env main.py run --backend="nccl"
53+
```
54+
55+
or
56+
57+
```bash
58+
# using function spawn inside the code
59+
python -u main.py run --backend="nccl" --nproc_per_node=2
60+
```
61+
62+
##### Using [Horovod](https://horovod.readthedocs.io/en/latest/index.html) as distributed backend
63+
64+
Please, make sure to have Horovod installed before running.
65+
66+
Let's start training on a single node with 2 gpus:
67+
68+
```bash
69+
# horovodrun
70+
horovodrun -np=2 python -u main.py run --backend="horovod"
71+
```
72+
73+
or
74+
75+
```bash
76+
# using function spawn inside the code
77+
python -u main.py run --backend="horovod" --nproc_per_node=2
78+
```
79+
80+
#### Colab or Kaggle kernels, on 8 TPUs
81+
82+
```python
83+
# setup TPU environment
84+
import os
85+
assert os.environ['COLAB_TPU_ADDR'], 'Make sure to select TPU from Edit > Notebook settings > Hardware accelerator'
86+
```
87+
```bash
88+
VERSION = "nightly"
89+
!curl -q https://raw.githubusercontent.com/pytorch/xla/master/contrib/scripts/env-setup.py -o pytorch-xla-env-setup.py
90+
!python pytorch-xla-env-setup.py --version $VERSION > /dev/null
91+
```
92+
93+
```python
94+
from main import run
95+
run(backend="xla-tpu", nproc_per_node=8)
96+
```
97+
98+
## ClearML fileserver
99+
100+
If `ClearML` server is used (i.e. `--with_clearml` argument), the configuration to upload artifact must be done by
101+
modifying the `ClearML` configuration file `~/clearml.conf` generated by `clearml-init`. According to the
102+
[documentation](https://allegro.ai/clearml/docs/docs/examples/reporting/artifacts.html), the `output_uri` argument can be
103+
configured in `sdk.development.default_output_uri` to fileserver uri. If server is self-hosted, `ClearML` fileserver uri is
104+
`http://localhost:8081`.
105+
106+
For more details, see https://allegro.ai/clearml/docs/docs/examples/reporting/artifacts.html
+34
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
import torch
2+
3+
4+
class TransformerDataset(torch.utils.data.Dataset):
5+
def __init__(self, texts, labels, tokenizer, max_length):
6+
self.texts = texts
7+
self.labels = labels
8+
self.tokenizer = tokenizer
9+
self.max_length = max_length
10+
11+
def __getitem__(self, idx):
12+
text = str(self.texts[idx])
13+
text = " ".join(text.split())
14+
inputs = self.tokenizer.encode_plus(
15+
text, None, add_special_tokens=True, max_length=self.max_length, truncation=True
16+
)
17+
18+
ids = inputs["input_ids"]
19+
token_type_ids = inputs["token_type_ids"]
20+
mask = inputs["attention_mask"]
21+
padding_length = self.max_length - len(ids)
22+
23+
ids = ids + ([0] * padding_length)
24+
mask = mask + ([0] * padding_length)
25+
token_type_ids = token_type_ids + ([0] * padding_length)
26+
return {
27+
"input_ids": torch.tensor(ids, dtype=torch.long),
28+
"attention_mask": torch.tensor(mask, dtype=torch.long),
29+
"token_type_ids": torch.tensor(token_type_ids, dtype=torch.long),
30+
"label": torch.tensor(self.labels[idx], dtype=torch.float),
31+
}
32+
33+
def __len__(self):
34+
return len(self.labels)

0 commit comments

Comments
 (0)