Skip to content

Continual Transformer Encoder #317

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 49 commits into from
Dec 8, 2022
Merged
Show file tree
Hide file tree
Changes from 48 commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
168b1f3
Add CoTransEnc learner, tests, and docs
LukasHedegaard Sep 26, 2022
eaff172
Fix continual_transformer_encoder name
LukasHedegaard Oct 5, 2022
596b649
Fix usage of layers parameter
LukasHedegaard Oct 5, 2022
4cb11b9
Add cotransenc benchmark script
LukasHedegaard Oct 5, 2022
d2acf0a
Merge branch 'develop' into continual-transformer
LukasHedegaard Oct 5, 2022
7c929cb
Add learned pos enc option and fix device
LukasHedegaard Oct 5, 2022
681e16e
Merge branch 'continual-transformer' of https://github.com/opendr-eu/…
LukasHedegaard Oct 5, 2022
53d0504
Remove obsolete assertion
LukasHedegaard Oct 5, 2022
f955c5f
Merge branch 'develop' into continual-transformer
ad-daniel Oct 11, 2022
cd77810
Merge branch 'develop' into continual-transformer
ad-daniel Oct 13, 2022
c2ef73a
Merge branch 'develop' into continual-transformer
ad-daniel Oct 14, 2022
e205008
Merge branch 'develop' into continual-transformer
tsampazk Oct 24, 2022
f092968
Merge branch 'develop' into continual-transformer
tsampazk Oct 24, 2022
99d85c9
Merge branch 'develop' into continual-transformer
ad-daniel Oct 25, 2022
884490b
Minor formatting adjustements
ad-daniel Oct 25, 2022
88d753f
Merge branch 'develop' into continual-transformer
LukasHedegaard Dec 2, 2022
7f73f09
Update docs/reference/activity-recognition.md
LukasHedegaard Dec 2, 2022
d60d866
Update docs/reference/activity-recognition.md
LukasHedegaard Dec 2, 2022
21cda8a
Update docs/reference/activity-recognition.md
LukasHedegaard Dec 2, 2022
7b2c22c
Update docs/reference/activity-recognition.md
LukasHedegaard Dec 2, 2022
dc6863f
Update docs/reference/activity-recognition.md
LukasHedegaard Dec 2, 2022
b74d56d
Update projects/python/perception/activity_recognition/benchmark/READ…
LukasHedegaard Dec 2, 2022
b94919e
Update CoTransEnc docs
LukasHedegaard Dec 2, 2022
e87330b
Merge branch 'continual-transformer' of https://github.com/opendr-eu/…
LukasHedegaard Dec 2, 2022
27771a1
Update docs index.md
LukasHedegaard Dec 2, 2022
9b32c80
Update cotransenc requirements
LukasHedegaard Dec 2, 2022
810b8de
Fix cotransenc ort inference
LukasHedegaard Dec 2, 2022
2cdc136
Merge branch 'develop' into continual-transformer
ad-daniel Dec 2, 2022
63fe202
Hardcode dependency versions again
LukasHedegaard Dec 6, 2022
56ec936
Merge branch 'develop' into continual-transformer
ad-daniel Dec 6, 2022
c476bf0
Revert "Hardcode dependency versions again"
ad-daniel Dec 6, 2022
7e38e94
Replace sklearn with scikit-learn
ad-daniel Dec 6, 2022
5c10e95
Update CHANGELOG.md
LukasHedegaard Dec 7, 2022
3e3ee0a
Add continual transformer encoder demo
LukasHedegaard Dec 7, 2022
33a020a
Add demo README
LukasHedegaard Dec 7, 2022
b16d7a4
Update projects/python/perception/activity_recognition/demos/continua…
LukasHedegaard Dec 8, 2022
08ccb6a
Update projects/python/perception/activity_recognition/demos/continua…
LukasHedegaard Dec 8, 2022
cbdf177
Remove unused class
LukasHedegaard Dec 8, 2022
48d8d12
Merge branch 'continual-transformer' of https://github.com/opendr-eu/…
LukasHedegaard Dec 8, 2022
a14dca8
Rename arguments to x
LukasHedegaard Dec 8, 2022
d887776
Update docs/reference/continual-transformer-encoder.md
LukasHedegaard Dec 8, 2022
29ae9be
Update docs/reference/continual-transformer-encoder.md
LukasHedegaard Dec 8, 2022
d07d6e0
Update src/opendr/perception/activity_recognition/README.md
LukasHedegaard Dec 8, 2022
092bb00
Fix typo: Sinus -> Sinusoidal
LukasHedegaard Dec 8, 2022
e2eb352
Merge branch 'continual-transformer' of https://github.com/opendr-eu/…
LukasHedegaard Dec 8, 2022
bf752a3
Update src/opendr/perception/activity_recognition/continual_transform…
LukasHedegaard Dec 8, 2022
62c6bce
Fix typo
LukasHedegaard Dec 8, 2022
c213f80
Merge branch 'continual-transformer' of https://github.com/opendr-eu/…
LukasHedegaard Dec 8, 2022
19c2b48
Merge branch 'develop' into continual-transformer
tsampazk Dec 8, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ Released on December, XX, 2022.

- New Features:
- Added YOLOv5 as an inference-only tool ([#360](https://github.com/opendr-eu/opendr/pull/360)).
- Added Continual Transformer Encoders ([#317](https://github.com/opendr-eu/opendr/pull/317)).

## Version 1.1.1
Released on June, 30th, 2022.
Expand Down
11 changes: 4 additions & 7 deletions docs/reference/activity-recognition.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,6 @@ Parameters:
Path to metadata file in json format or to weights path.



#### `X3DLearner.optimize`
```python
X3DLearner.optimize(self, do_constant_folding)
Expand Down Expand Up @@ -215,8 +214,6 @@ Parameters:
```




#### References
<a name="x3d" href="https://arxiv.org/abs/2004.04730">[1]</a> X3D: Expanding Architectures for Efficient Video Recognition,
[arXiv](https://arxiv.org/abs/2004.04730).
Expand Down Expand Up @@ -398,7 +395,6 @@ Inherited from [X3DLearner](/src/opendr/perception/activity_recognition/x3d/x3d_
```



#### Performance Evaluation

TABLE-1: Input shapes, prediction accuracy on Kinetics 400, floating point operations (FLOPs), parameter count and maximum allocated memory of activity recognition learners at inference.
Expand Down Expand Up @@ -426,7 +422,7 @@ TABLE-2: Speed (evaluations/second) of activity recognition learner inference on


TABLE-3: Throughput (evaluations/second) of activity recognition learner inference on various computational devices.
The largest fitting power of two was used as batch size for each device.
The largest fitting power of two was used as batch size for each device.
| Model | CPU | TX2 | Xavier | RTX 2080 Ti |
| ------- | ----- | ---- | ------ | ----------- |
| X3D-L | 0.22 | 0.21 | 1.73 | 3.55 |
Expand All @@ -438,7 +434,7 @@ The largest fitting power of two was used as batch size for each device.
| CoX3D-S | 11.60 | 8.22 | 64.91 | 196.54 |


TABLE-4: Energy (Joules) of activity recognition learner inference on embedded devices.
TABLE-4: Energy (Joules) of activity recognition learner inference on embedded devices.
| Model | TX2 | Xavier |
| ------- | ------ | ------ |
| X3D-L | 187.89 | 23.54 |
Expand Down Expand Up @@ -468,5 +464,6 @@ Model inference works as expected.


#### References
<a name="x3d" href="https://arxiv.org/abs/2004.04730">[1]</a> X3D: Expanding Architectures for Efficient Video Recognition,
<a name="x3d" href="https://arxiv.org/abs/2004.04730">[2]</a> X3D: Expanding Architectures for Efficient Video Recognition,
[arXiv](https://arxiv.org/abs/2004.04730).

211 changes: 211 additions & 0 deletions docs/reference/continual-transformer-encoder.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,211 @@
## Continual Transformer Encoder module


### Class CoTransEncLearner
Bases: `engine.learners.Learner`

The *CoTransEncLearner* class provides a Continual Transformer Encoder learner, which can be used for time-series processing of user-provided features.
This module was originally proposed by Hedegaard et al. in "Continual Transformers: Redundancy-Free Attention for Online Inference", 2022, https://arxiv.org/abs/2201.06268"

The [CoTransEncLearner](src/opendr/perception/activity_recognition/continual_transformer_decoder/continual_transformer_decoder_learner.py) class has the following public methods:

#### `CoTransEncLearner` constructor

```python
CoX3DLearner(self, lr, iters, batch_size, optimizer, lr_schedule, network_head, num_layers, input_dims, hidden_dims, sequence_len, num_heads, dropout, num_classes, positional_encoding_learned, checkpoint_after_iter, checkpoint_load_iter, temp_path, device, loss, weight_decay, momentum, drop_last, pin_memory, num_workers, seed)
```

Constructor parameters:

- **lr**: *float, default=1e-2*\
Learning rate during optimization.
- **iters**: *int, default=10*\
Number of epochs to train for.
- **batch_size**: *int, default=64*\
Dataloader batch size. Defaults to 64.
- **optimizer**: *str, default="sgd"*\
Name of optimizer to use ("sgd" or "adam").
- **lr_schedule**: *str, default=""*\
Schedule for training the model.
- **network_head**: *str, default="classification"*\
Head of network (only "classification" is currently available).
- **num_layers**: *int, default=1*\
Number of Transformer Encoder layers (1 or 2). Defaults to 1.
- **input_dims**: *float, default=1024*\
Input dimensions per token.
- **hidden_dims**: *float, default=1024*\
Hidden projection dimension.
- **sequence_len**: *int, default=64*\
Length of token sequence to consider.
- **num_heads**: *int, default=8*\
Number of attention heads.
- **dropout**: *float, default=0.1*\
Dropout probability.
- **num_classes**: *int, default=22*\
Number of classes to predict among.
- **positional_encoding_learned**: *bool, default=False*\
Positional encoding type.
- **checkpoint_after_iter**: *int, default=0*\
Unused parameter.
- **checkpoint_load_iter**: *int, default=0*\
Unused parameter.
- **temp_path**: *str, default=""*\
Path in which to store temporary files.
- **device**: *str, default="cuda"*\
Name of computational device ("cpu" or "cuda").
- **loss**: *str, default="cross_entropy"*\
Loss function used during optimization.
- **weight_decay**: *[type], default=1e-4*\
Weight decay used for optimization.
- **momentum**: *float, default=0.9*\
Momentum used for optimization.
- **drop_last**: *bool, default=True*\
Drop last data point if a batch cannot be filled.
- **pin_memory**: *bool, default=False*\
Pin memory in dataloader.
- **num_workers**: *int, default=0*\
Number of workers in dataloader.
- **seed**: *int, default=123*\
Random seed.


#### `CoTransEncLearner.fit`
```python
CoTransEncLearner.fit(self, dataset, val_dataset, epochs, steps)
```

This method is used for training the algorithm on a train dataset and validating on a val dataset.

Parameters:
- **dataset**: *Dataset*:
Training dataset.
- **val_dataset**: *Dataset, default=None*
Validation dataset. If none is given, validation steps are skipped.
- **epochs**: *int, default=None*
Number of epochs. If none is supplied, self.iters will be used.
- **steps**: *int, default=None*
Number of training steps to conduct. If none, this is determined by epochs.


#### `CoTransEncLearner.eval`
```python
CoTransEncLearner.eval(self, dataset, steps)
```
This method is used to evaluate a trained model on an evaluation dataset.
Returns a dictionary containing stats regarding evaluation.

Parameters:
- **dataset**: *Dataset*
Dataset on which to evaluate model.
- **steps**: *int, default=None*
Number of validation batches to evaluate. If None, all batches are evaluated.


#### `CoTransEncLearner.infer`
```python
CoTransEncLearner.infer(x)
```

This method is used to perform classification of a video.
Returns a `engine.target.Category` objects, where each holds a category.

Parameters:
- **x**: *Union[Timeseries, Vector, torch.Tensor]*
Either a single time instance (Vector) or a Timeseries. x can also be passed as a torch.Tensor.


#### `CoTransEncLearner.save`
```python
CoTransEncLearner.save(self, path)
```

Save model weights and metadata to path.
Provided with the path "/my/path/name" (absolute or relative), it creates the "name" directory, if it does not already exist.
Inside this folder, the model is saved as "model_name.pth" and the metadata file as "name.json".
If the files already exist, their names are versioned with a suffix.

If `self.optimize` was run previously, it saves the optimized ONNX model in a similar fashion with an ".onnx" extension.

Parameters:
- **path**: *str*
Directory in which to save model weights and meta data.


#### `CoTransEncLearner.load`
```python
CoTransEncLearner.load(self, path)
```

This method is used to load a previously saved model from its saved folder.

Parameters:
- **path**: *str*
Path to metadata file in json format or to weights path.


#### `CoTransEncLearner.optimize`
```python
CoTransEncLearner.optimize(self, do_constant_folding)
```

Optimize model execution. This is accomplished by saving to the ONNX format and loading the optimized model.

Parameters:
- **do_constant_folding**: *bool, default=False*
ONNX format optimization.
If True, the constant-folding optimization is applied to the model during export.
Constant-folding optimization will replace some of the ops that have all constant inputs, with pre-computed constant nodes.


#### Examples

* **Fit model**.

```python
from opendr.perception.activity_recognition import CoTransEncLearner
from opendr.perception.activity_recognition.datasets import DummyTimeseriesDataset

learner = CoTransEncLearner(
batch_size=2,
device="cpu",
input_dims=8,
hidden_dims=32,
sequence_len=64,
num_heads=8,
num_classes=4,
)
train_ds = DummyTimeseriesDataset(
sequence_len=64, num_sines=8, num_datapoints=128
)
val_ds = DummyTimeseriesDataset(
sequence_len=64, num_sines=8, num_datapoints=128, base_offset=128
)
learner.fit(dataset=train_ds, val_dataset=val_ds, steps=2)
learner.save('./saved_models/trained_model')
```

* **Evaluate model**.

```python
from opendr.perception.activity_recognition import CoTransEncLearner
from opendr.perception.activity_recognition.datasets import DummyTimeseriesDataset

learner = CoTransEncLearner(
batch_size=2,
device="cpu",
input_dims=8,
hidden_dims=32,
sequence_len=64,
num_heads=8,
num_classes=4,
)
test_ds = DummyTimeseriesDataset(
sequence_len=64, num_sines=8, num_datapoints=128, base_offset=256
)
results = learner.eval(test_ds) # Dict with accuracy and loss
```


#### References
<a name="cotransenc" href="https://arxiv.org/abs/2201.06268">[3]</a> Continual Transformers: Redundancy-Free Attention for Online Inference,
[arXiv](https://arxiv.org/abs/2201.06268).
5 changes: 3 additions & 2 deletions docs/reference/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,10 @@ Neither the copyright holder nor any applicable licensor will be liable for any
- pose estimation:
- [lightweight_open_pose Module](lightweight-open-pose.md)
- activity recognition:
- [activity_recognition Module](activity-recognition.md)
- action recognition:
- [skeleton_based_action_recognition](skeleton-based-action-recognition.md)
- [x3d Module](activity-recognition.md#class-x3dlearner)
- [continual x3d Module](activity-recognition.md#class-cox3dlearner)
- [continual transformer encoder Module](continual-transformer-encoder.md)
- speech recognition:
- [matchboxnet Module](matchboxnet.md)
- [edgespeechnets Module](edgespeechnets.md)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,10 @@ X3D
CoX3D
```bash
./benchmark_cox3d.py
```
```

CoTransEnc
```bash
./benchmark_cotransenc.py
```
NB: The CoTransEnc module benchmarks various configurations of the Continual Transformer Encoder modules only. This doesn't include any feature-extraction that you might want to use beforehand.
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Copyright 2020-2022 OpenDR European Project
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


import torch
import yaml
from opendr.perception.activity_recognition import CoTransEncLearner

from pytorch_benchmark import benchmark
import logging
from typing import List, Union
from opendr.engine.target import Category
from opendr.engine.data import Image

logger = logging.getLogger("benchmark")
logging.basicConfig()
logger.setLevel("DEBUG")


def benchmark_cotransenc():
temp_dir = "./projects/python/perception/activity_recognition/benchmark/tmp"
num_runs = 100
batch_size = 1

for num_layers in [1, 2]: # --------- A few plausible hparams ----------
for (input_dims, sequence_len) in [(1024, 32), (2048, 64), (4096, 64)]:
print(
f"==== Benchmarking CoTransEncLearner (l{num_layers}-d{input_dims}-t{sequence_len}) ===="
)
learner = CoTransEncLearner(
device="cuda" if torch.cuda.is_available() else "cpu",
temp_path=temp_dir + f"/{num_layers}_{input_dims}_{sequence_len}",
num_layers=num_layers,
input_dims=input_dims,
hidden_dims=input_dims // 2,
sequence_len=sequence_len,
num_heads=input_dims // 128,
batch_size=batch_size,
)
learner.optimize()

sample = torch.randn(1, input_dims)

# Warm-up continual inference not needed for optimized version:
# for _ in range(sequence_len - 1):
# learner.infer(sample)

def get_device_fn(*args):
nonlocal learner
return next(learner.model.parameters()).device

def transfer_to_device_fn(
sample: Union[torch.Tensor, List[Category], List[Image]],
device: torch.device,
):
if isinstance(sample, torch.Tensor):
return sample.to(device=device)

assert isinstance(sample, Category)
return Category(
prediction=sample.data,
confidence=sample.confidence.to(device=device),
)

results1 = benchmark(
model=learner.infer,
sample=sample,
num_runs=num_runs,
get_device_fn=get_device_fn,
transfer_to_device_fn=transfer_to_device_fn,
batch_size=batch_size,
print_fn=print,
)
print(yaml.dump({"learner.infer": results1}))


if __name__ == "__main__":
benchmark_cotransenc()
Loading