opendr-eu · LukasHedegaard · Dec 8, 2022 · Sep 26, 2022 · Oct 5, 2022 · Oct 5, 2022
@@ -5,6 +5,7 @@ Released on December, XX, 2022.
 
   - New Features:
     - Added YOLOv5 as an inference-only tool ([#360](https://github.com/opendr-eu/opendr/pull/360)).
+    - Added Continual Transformer Encoders ([#317](https://github.com/opendr-eu/opendr/pull/317)).
 
 ## Version 1.1.1
 Released on June, 30th, 2022.

@@ -146,7 +146,6 @@ Parameters:
   Path to metadata file in json format or to weights path.
 
 
-
 #### `X3DLearner.optimize`
 ```python
 X3DLearner.optimize(self, do_constant_folding)
@@ -215,8 +214,6 @@ Parameters:
   ```
 
 
-
-
 #### References
 <a name="x3d" href="https://arxiv.org/abs/2004.04730">[1]</a> X3D: Expanding Architectures for Efficient Video Recognition,
 [arXiv](https://arxiv.org/abs/2004.04730).
@@ -398,7 +395,6 @@ Inherited from [X3DLearner](/src/opendr/perception/activity_recognition/x3d/x3d_
   ```
 
 
-
 #### Performance Evaluation
 
 TABLE-1: Input shapes, prediction accuracy on Kinetics 400, floating point operations (FLOPs), parameter count and maximum allocated memory of activity recognition learners at inference.
@@ -426,7 +422,7 @@ TABLE-2: Speed (evaluations/second) of activity recognition learner inference on
 
 
 TABLE-3: Throughput (evaluations/second) of activity recognition learner inference on various computational devices.
-The largest fitting power of two was used as batch size for each device. 
+The largest fitting power of two was used as batch size for each device.
 | Model   | CPU   | TX2  | Xavier | RTX 2080 Ti |
 | ------- | ----- | ---- | ------ | ----------- |
 | X3D-L   | 0.22  | 0.21 | 1.73   | 3.55        |
@@ -438,7 +434,7 @@ The largest fitting power of two was used as batch size for each device.
 | CoX3D-S | 11.60 | 8.22 | 64.91  | 196.54      |
 
 
-TABLE-4: Energy (Joules) of activity recognition learner inference on embedded devices. 
+TABLE-4: Energy (Joules) of activity recognition learner inference on embedded devices.
 | Model   | TX2    | Xavier |
 | ------- | ------ | ------ |
 | X3D-L   | 187.89 | 23.54  |
@@ -468,5 +464,6 @@ Model inference works as expected.
 
 
 #### References
-<a name="x3d" href="https://arxiv.org/abs/2004.04730">[1]</a> X3D: Expanding Architectures for Efficient Video Recognition,
+<a name="x3d" href="https://arxiv.org/abs/2004.04730">[2]</a> X3D: Expanding Architectures for Efficient Video Recognition,
 [arXiv](https://arxiv.org/abs/2004.04730).
+
@@ -0,0 +1,211 @@
+## Continual Transformer Encoder module
+
+
+### Class CoTransEncLearner
+Bases: `engine.learners.Learner`
+
+The *CoTransEncLearner* class provides a Continual Transformer Encoder learner, which can be used for time-series processing of user-provided features.
+This module was originally proposed by Hedegaard et al. in "Continual Transformers: Redundancy-Free Attention for Online Inference", 2022, https://arxiv.org/abs/2201.06268"
+
+The [CoTransEncLearner](src/opendr/perception/activity_recognition/continual_transformer_decoder/continual_transformer_decoder_learner.py) class has the following public methods:
+
+#### `CoTransEncLearner` constructor
+
+```python
+CoX3DLearner(self, lr, iters, batch_size, optimizer, lr_schedule, network_head, num_layers, input_dims, hidden_dims, sequence_len, num_heads, dropout, num_classes, positional_encoding_learned, checkpoint_after_iter, checkpoint_load_iter, temp_path, device, loss, weight_decay, momentum, drop_last, pin_memory, num_workers, seed)
+```
+
+Constructor parameters:
+
+  - **lr**: *float, default=1e-2*\
+    Learning rate during optimization.
+  - **iters**: *int, default=10*\
+    Number of epochs to train for.
+  - **batch_size**: *int, default=64*\
+    Dataloader batch size. Defaults to 64.
+  - **optimizer**: *str, default="sgd"*\
+    Name of optimizer to use ("sgd" or "adam").
+  - **lr_schedule**: *str, default=""*\
+    Schedule for training the model.
+  - **network_head**: *str, default="classification"*\
+    Head of network (only "classification" is currently available).
+  - **num_layers**: *int, default=1*\
+    Number of Transformer Encoder layers (1 or 2). Defaults to 1.
+  - **input_dims**: *float, default=1024*\
+    Input dimensions per token.
+  - **hidden_dims**: *float, default=1024*\
+    Hidden projection dimension.
+  - **sequence_len**: *int, default=64*\
+    Length of token sequence to consider.
+  - **num_heads**: *int, default=8*\
+    Number of attention heads.
+  - **dropout**: *float, default=0.1*\
+    Dropout probability.
+  - **num_classes**: *int, default=22*\
+    Number of classes to predict among.
+  - **positional_encoding_learned**: *bool, default=False*\
+    Positional encoding type.
+  - **checkpoint_after_iter**: *int, default=0*\
+    Unused parameter.
+  - **checkpoint_load_iter**: *int, default=0*\
+    Unused parameter.
+  - **temp_path**: *str, default=""*\
+    Path in which to store temporary files.
+  - **device**: *str, default="cuda"*\
+    Name of computational device ("cpu" or "cuda").
+  - **loss**: *str, default="cross_entropy"*\
+    Loss function used during optimization.
+  - **weight_decay**: *[type], default=1e-4*\
+    Weight decay used for optimization.
+  - **momentum**: *float, default=0.9*\
+    Momentum used for optimization.
+  - **drop_last**: *bool, default=True*\
+    Drop last data point if a batch cannot be filled.
+  - **pin_memory**: *bool, default=False*\
+    Pin memory in dataloader.
+  - **num_workers**: *int, default=0*\
+    Number of workers in dataloader.
+  - **seed**: *int, default=123*\
+    Random seed.
+
+
+#### `CoTransEncLearner.fit`
+```python
+CoTransEncLearner.fit(self, dataset, val_dataset, epochs, steps)
+```
+
+This method is used for training the algorithm on a train dataset and validating on a val dataset.
+
+Parameters:
+  - **dataset**: *Dataset*:
+    Training dataset.
+  - **val_dataset**: *Dataset, default=None*
+    Validation dataset. If none is given, validation steps are skipped.
+  - **epochs**: *int, default=None*
+    Number of epochs. If none is supplied, self.iters will be used.
+  - **steps**: *int, default=None*
+    Number of training steps to conduct. If none, this is determined by epochs.
+
+
+#### `CoTransEncLearner.eval`
+```python
+CoTransEncLearner.eval(self, dataset, steps)
+```
+This method is used to evaluate a trained model on an evaluation dataset.
+Returns a dictionary containing stats regarding evaluation.
+
+Parameters:
+  - **dataset**: *Dataset*
+    Dataset on which to evaluate model.
+  - **steps**: *int, default=None*
+    Number of validation batches to evaluate. If None, all batches are evaluated.
+
+
+#### `CoTransEncLearner.infer`
+```python
+CoTransEncLearner.infer(x)
+```
+
+This method is used to perform classification of a video.
+Returns a `engine.target.Category` objects, where each holds a category.
+
+Parameters:
+- **x**: *Union[Timeseries, Vector, torch.Tensor]*
+  Either a single time instance (Vector) or a Timeseries. x can also be passed as a torch.Tensor.
+
+
+#### `CoTransEncLearner.save`
+```python
+CoTransEncLearner.save(self, path)
+```
+
+Save model weights and metadata to path.
+Provided with the path "/my/path/name" (absolute or relative), it creates the "name" directory, if it does not already exist.
+Inside this folder, the model is saved as "model_name.pth" and the metadata file as "name.json".
+If the files already exist, their names are versioned with a suffix.
+
+If `self.optimize` was run previously, it saves the optimized ONNX model in a similar fashion with an ".onnx" extension.
+
+Parameters:
+- **path**: *str*
+  Directory in which to save model weights and meta data.
+
+
+#### `CoTransEncLearner.load`
+```python
+CoTransEncLearner.load(self, path)
+```
+
+This method is used to load a previously saved model from its saved folder.
+
+Parameters:
+- **path**: *str*
+  Path to metadata file in json format or to weights path.
+
+
+#### `CoTransEncLearner.optimize`
+```python
+CoTransEncLearner.optimize(self, do_constant_folding)
+```
+
+Optimize model execution. This is accomplished by saving to the ONNX format and loading the optimized model.
+
+Parameters:
+- **do_constant_folding**: *bool, default=False*
+  ONNX format optimization.
+  If True, the constant-folding optimization is applied to the model during export.
+  Constant-folding optimization will replace some of the ops that have all constant inputs, with pre-computed constant nodes.
+
+
+#### Examples
+
+* **Fit model**.
+
+  ```python
+  from opendr.perception.activity_recognition import CoTransEncLearner
+  from opendr.perception.activity_recognition.datasets import DummyTimeseriesDataset
+
+  learner = CoTransEncLearner(
+      batch_size=2,
+      device="cpu",
+      input_dims=8,
+      hidden_dims=32,
+      sequence_len=64,
+      num_heads=8,
+      num_classes=4,
+  )
+  train_ds = DummyTimeseriesDataset(
+      sequence_len=64, num_sines=8, num_datapoints=128
+  )
+  val_ds = DummyTimeseriesDataset(
+      sequence_len=64, num_sines=8, num_datapoints=128, base_offset=128
+  )
+  learner.fit(dataset=train_ds, val_dataset=val_ds, steps=2)
+  learner.save('./saved_models/trained_model')
+  ```
+
+* **Evaluate model**.
+
+  ```python
+  from opendr.perception.activity_recognition import CoTransEncLearner
+  from opendr.perception.activity_recognition.datasets import DummyTimeseriesDataset
+
+  learner = CoTransEncLearner(
+      batch_size=2,
+      device="cpu",
+      input_dims=8,
+      hidden_dims=32,
+      sequence_len=64,
+      num_heads=8,
+      num_classes=4,
+  )
+  test_ds = DummyTimeseriesDataset(
+      sequence_len=64, num_sines=8, num_datapoints=128, base_offset=256
+  )
+  results = learner.eval(test_ds)  # Dict with accuracy and loss
+  ```
+
+
+#### References
+<a name="cotransenc" href="https://arxiv.org/abs/2201.06268">[3]</a> Continual Transformers: Redundancy-Free Attention for Online Inference,
+[arXiv](https://arxiv.org/abs/2201.06268).
@@ -31,9 +31,10 @@ Neither the copyright holder nor any applicable licensor will be liable for any
         - pose estimation:
             - [lightweight_open_pose Module](lightweight-open-pose.md)
         - activity recognition:
-            - [activity_recognition Module](activity-recognition.md)
-        - action recognition:
             - [skeleton_based_action_recognition](skeleton-based-action-recognition.md)
+            - [x3d Module](activity-recognition.md#class-x3dlearner)
+            - [continual x3d Module](activity-recognition.md#class-cox3dlearner)
+            - [continual transformer encoder Module](continual-transformer-encoder.md)
         - speech recognition:
             - [matchboxnet Module](matchboxnet.md)
             - [edgespeechnets Module](edgespeechnets.md)

@@ -25,4 +25,10 @@ X3D
 CoX3D
 ```bash
 ./benchmark_cox3d.py
-```
+```
+
+CoTransEnc
+```bash
+./benchmark_cotransenc.py
+```
+NB: The CoTransEnc module benchmarks various configurations of the Continual Transformer Encoder modules only. This doesn't include any feature-extraction that you might want to use beforehand.
@@ -0,0 +1,89 @@
+# Copyright 2020-2022 OpenDR European Project
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+import torch
+import yaml
+from opendr.perception.activity_recognition import CoTransEncLearner
+
+from pytorch_benchmark import benchmark
+import logging
+from typing import List, Union
+from opendr.engine.target import Category
+from opendr.engine.data import Image
+
+logger = logging.getLogger("benchmark")
+logging.basicConfig()
+logger.setLevel("DEBUG")
+
+
+def benchmark_cotransenc():
+    temp_dir = "./projects/python/perception/activity_recognition/benchmark/tmp"
+    num_runs = 100
+    batch_size = 1
+
+    for num_layers in [1, 2]:  # --------- A few plausible hparams ----------
+        for (input_dims, sequence_len) in [(1024, 32), (2048, 64), (4096, 64)]:
+            print(
+                f"==== Benchmarking CoTransEncLearner (l{num_layers}-d{input_dims}-t{sequence_len}) ===="
+            )
+            learner = CoTransEncLearner(
+                device="cuda" if torch.cuda.is_available() else "cpu",
+                temp_path=temp_dir + f"/{num_layers}_{input_dims}_{sequence_len}",
+                num_layers=num_layers,
+                input_dims=input_dims,
+                hidden_dims=input_dims // 2,
+                sequence_len=sequence_len,
+                num_heads=input_dims // 128,
+                batch_size=batch_size,
+            )
+            learner.optimize()
+
+            sample = torch.randn(1, input_dims)
+
+            # Warm-up continual inference not needed for optimized version:
+            # for _ in range(sequence_len - 1):
+            #     learner.infer(sample)
+
+            def get_device_fn(*args):
+                nonlocal learner
+                return next(learner.model.parameters()).device
+
+            def transfer_to_device_fn(
+                sample: Union[torch.Tensor, List[Category], List[Image]],
+                device: torch.device,
+            ):
+                if isinstance(sample, torch.Tensor):
+                    return sample.to(device=device)
+
+                assert isinstance(sample, Category)
+                return Category(
+                    prediction=sample.data,
+                    confidence=sample.confidence.to(device=device),
+                )
+
+            results1 = benchmark(
+                model=learner.infer,
+                sample=sample,
+                num_runs=num_runs,
+                get_device_fn=get_device_fn,
+                transfer_to_device_fn=transfer_to_device_fn,
+                batch_size=batch_size,
+                print_fn=print,
+            )
+            print(yaml.dump({"learner.infer": results1}))
+
+
+if __name__ == "__main__":
+    benchmark_cotransenc()