opendr-eu · ad-daniel · Apr 27, 2022 · Mar 8, 2022 · Mar 8, 2022 · Mar 8, 2022
@@ -398,6 +398,75 @@ Inherited from [X3DLearner](/src/opendr/perception/activity_recognition/x3d/x3d_
   ```
 
 
+
+#### Performance Evaluation
+
+TABLE-1: Input shapes, prediction accuracy on Kinetics 400, floating point operations (FLOPs), parameter count and maximum allocated memory of activity recognition learners at inference.
+| Model   | Input shape (TxS2) | Acc. (%) | FLOPs (G) | Params (M) | Mem. (MB) |
+| ------- | ------------------ | -------- | --------- | ---------- | --------- |
+| X3D-L   | 16x3122            | 69.29    | 19.17     | 6.15       | 240.66    |
+| X3D-M   | 16x2242            | 67.24    | 4.97      | 4.97       | 126.29    |
+| X3D-S   | 13x1602            | 64.71    | 2.06      | 3.79       | 61.29     |
+| X3D-XS  | 4x1602             | 59.37    | 0.64      | 3.79       | 28.79     |
+| CoX3D-L | 1x3122             | 71.61    | 1.54      | 6.15       | 184.37    |
+| CoX3D-M | 1x2242             | 71.03    | 0.40      | 4.97       | 68.96     |
+| CoX3D-S | 1x1602             | 67.33    | 0.21      | 3.79       | 41.99     |
+
+
+TABLE-2: Speed (evaluations/second) of activity recognition learner inference on various computational devices.
+| Model   | CPU   | TX2  | Xavier | RTX 2080 Ti |
+| ------- | ----- | ---- | ------ | ----------- |
+| X3D-L   | 0.22  | 0.18 | 1.26   | 3.55        |
+| X3D-M   | 0.75  | 0.69 | 4.50   | 6.94        |
+| X3D-S   | 2.06  | 0.95 | 9.55   | 7.12        |
+| X3D-XS  | 6.51  | 1.14 | 12.23  | 7.99        |
+| CoX3D-L | 2.00  | 0.30 | 4.69   | 4.62        |
+| CoX3D-M | 6.65  | 1.12 | 9.76   | 10.12       |
+| CoX3D-S | 11.60 | 1.16 | 9.36   | 9.84        |
+
+
+TABLE-3: Throughput (evaluations/second) of activity recognition learner inference on various computational devices.
+The largest fitting power of two was used as batch size for each device. 
+| Model   | CPU   | TX2  | Xavier | RTX 2080 Ti |
+| ------- | ----- | ---- | ------ | ----------- |
+| X3D-L   | 0.22  | 0.21 | 1.73   | 3.55        |
+| X3D-M   | 0.75  | 1.10 | 6.20   | 11.22       |
+| X3D-S   | 2.06  | 2.47 | 7.83   | 29.51       |
+| X3D-XS  | 6.51  | 6.50 | 38.27  | 78.75       |
+| CoX3D-L | 2.00  | 0.62 | 10.40  | 14.47       |
+| CoX3D-M | 6.65  | 4.32 | 44.07  | 105.64      |
+| CoX3D-S | 11.60 | 8.22 | 64.91  | 196.54      |
+
+
+TABLE-4: Energy (Joules) of activity recognition learner inference on embedded devices. 
+| Model   | TX2    | Xavier |
+| ------- | ------ | ------ |
+| X3D-L   | 187.89 | 23.54  |
+| X3D-M   | 56.50  | 5.49   |
+| X3D-S   | 33.58  | 2.00   |
+| X3D-XS  | 26.15  | 1.45   |
+| CoX3D-L | 117.34 | 5.27   |
+| CoX3D-M | 24.53  | 1.74   |
+| CoX3D-S | 22.79  | 2.07   |
+
+
+TABLE-5: Human Activity Recognition platform compatibility evaluation.
+| Platform                                     | Test results |
+| -------------------------------------------- | ------------ |
+| x86 - Ubuntu 20.04 (bare installation - CPU) | Pass         |
+| x86 - Ubuntu 20.04 (bare installation - GPU) | Pass         |
+| x86 - Ubuntu 20.04 (pip installation)        | Pass         |
+| x86 - Ubuntu 20.04 (CPU docker)              | Pass         |
+| x86 - Ubuntu 20.04 (GPU docker)              | Pass         |
+| NVIDIA Jetson TX2                            | Pass\*       |
+| NVIDIA Jetson Xavier AGX                     | Pass\*       |
+
+*On NVIDIA Jetson devices, the Kinetics-400 dataset loader (dataset associated with available pretrained models) is not supported.
+While import triggers an error in version 1.0 of the toolkit, a patch has been submitted, which avoids the import-error for the upcoming version.
+Model inference works as expected.
+
+
+
 #### References
 <a name="x3d" href="https://arxiv.org/abs/2004.04730">[1]</a> X3D: Expanding Architectures for Efficient Video Recognition,
 [arXiv](https://arxiv.org/abs/2004.04730).
@@ -35,7 +35,7 @@ Documentation is available online: [https://eagerx.readthedocs.io](https://eager
    Instead of using low-dimensional angular observations, the environment now produces pixel images of the pendulum.
    In order to speed-up learning, we use a pre-trained classifier to convert these pixel images to estimated angular observations.
    Then, the agent uses these estimated angular observations similarly as in 'demo_2_pid' to successfully swing-up the pendulum.
-   
+
 Example usage:
 ```bash
 cd $OPENDR_HOME/projects/control/eagerx/demos
@@ -48,4 +48,40 @@ Setting `--device cpu` performs training and inference on CPU.
 Setting `--name example` sets the name of the environment.  
 Setting `--eps 200` sets the number of training episodes.  
 Setting `--eval-eps 10` sets the number of evaluation episodes.
-Adding `--render` enables rendering of the environment.
+Adding `--render` enables rendering of the environment.
+
+### Performance Evaluation
+
+In this subsection, we attempt to quantify the computational overhead that the communication protocol of EAGERx introduces.
+Ultimately, an EAGERx environment consists of nodes (e.g. sensors, actuators, classifiers, controllers, etc…) that communicate with each other via the EAGERx’s reactive communication protocol to ensure I/O synchronisation.
+We create an experimental setup where we interconnect a set of the same  nodes in series and let every node run at the same simulated rate (1 Hz).
+The nodes perform no significant computation in their callback, and use small messages (ROS message: std.msgs.msg/UInt64) in terms of size (Mb).
+Hence, the rate at which this environment can be simulated is mostly determined by the computational overhead of the protocol and the hardware used during the experiment (8 core - Intel Core i9-10980HK Processor).
+We will record the real-time rate (Hz) at which we are able to simulate the environment for a varying number of interconnected nodes, synchronisation modes (sync vs. async), and concurrency mode (multi-threaded vs multi-process).
+In async mode, every node will produce outputs at the set simulated rate (1 Hz) times a provided real-time factor (i.e. real-time rate = real-time factor * simulated rate).
+This real-time factor is set experimentally at the highest value, while each node can still keep its simulated rate.
+In sync mode, nodes are connected reactively, meaning that every node will wait for an input from the preceding node, before producing an output message to the node that succeeds it.
+This means that we do not need to set a real-time factor.
+Instead, nodes will run as fast as possible, while adhering to this simple rule.
+The recorded rate provides an indication of the computational overhead that the communication protocol of EAGERx introduces.
+The results are presented in table below.
+
+|            | Multi-threaded |            | Multi-process |            |
+|------------|:--------------:|:----------:|:-------------:|:----------:|
+| # of nodes |    Sync (Hz)   | Async (Hz) |   Sync (Hz)   | Async (Hz) |
+| 4          |       800      |     458    |      700      |    1800    |
+| 5          |       668      |     390    |      596      |    1772    |
+| 6          |       576      |     341    |      501      |    1770    |
+| 7          |       535      |     307    |      450      |    1691    |
+| 12         |       354      |     200    |      279      |    1290    |
+
+The platform compatibility evaluation is also reported below:
+
+| Platform                                     | Test results |
+|----------------------------------------------|:------------:|
+| x86 - Ubuntu 20.04 (bare installation - CPU) |     Pass     |
+| x86 - Ubuntu 20.04 (bare installation - GPU) |     Pass     |
+| x86 - Ubuntu 20.04 (pip installation)        |     Pass     |
+| x86 - Ubuntu 20.04 (CPU docker)              |     Pass     |
+| x86 - Ubuntu 20.04 (GPU docker)              |     Pass     |
+| NVIDIA Jetson TX2                            |     Pass     |
@@ -17,7 +17,7 @@ Bases: `engine.learners.Learner`
 
 The *EfficientPsLearner* class is a wrapper around the EfficientPS implementation of the original author's repository adding the OpenDR interface.
 
-The [EfficientPsLearner](/src/opendr/perception/panoptic_segmentation/efficient_ps/efficient_ps_learner.py) class the following public methods:
+The [EfficientPsLearner](/src/opendr/perception/panoptic_segmentation/efficient_ps/efficient_ps_learner.py) class has the following public methods:
 #### `EfficientPsLearner` constructor
 ```python
 EfficientPsLearner(lr, iters, batch_size, optimizer, lr_schedule, momentum, weight_decay, optimizer_config, checkpoint_after_iter, temp_path, device, num_workers, seed, config_file)
@@ -174,3 +174,41 @@ Parameters:
   The size of the figure in inches. Only used for the detailed version. Otherwise, the size of the input data is used.
 - **detailed**: *bool, default=False*\
   If True, the generated figure will be a compilation of the input color image, the semantic segmentation map, a contours plot showing the individual objects, and a combined panoptic segmentation overlay on the color image. Otherwise, only the latter will be shown.
+
+
+#### Performance Evaluation
+
+The speed (fps) is evaluated for the Cityscapes dataset (2048x1024 pixels):
+
+| Dataset    | GeForce GTX 980 | GeForce GTX TITAN X | TITAN RTX | Xavier AGX |
+|------------|-----------------|---------------------|-----------|------------|
+| Cityscapes | 1.3             | 1.1                 | 3.2       | 1.7        |
+
+The memory and energy usage is evaluated for different datasets.
+An NVIDIA Jetson Xavier AGX was used as the reference platform for energy measurements.
+Note that the exact number for the memory depends on the image resolution and the number of instances in an image.
+The reported memory is the max number seen during evaluation on the respective validation set.
+The energy is measured during the evaluation.
+
+| Dataset                | Memory (MB) | Energy (Joules) - Total per inference AGX |
+|------------------------|-------------|-------------------------------------------|
+| Cityscapes (2048x1024) | 11812       | 39.3                                      |
+| Kitti (1280x384)       | 3328        | 15.1                                      |
+
+The performance is evaluated using three different metrics, namely Panoptic Quality (PQ), Segmentation Quality (SQ), and Recognition Quality (RQ).
+
+| Dataset    | PQ   | SQ   | RQ   |
+|------------|------|------|------|
+| Cityscapes | 64.4 | 81.8 | 77.7 |
+| Kitti      | 42.6 | 77.2 | 53.1 |
+
+EfficientPS is compatible with the following platforms:
+
+| Platform                                     | Compatibility |
+|----------------------------------------------|---------------|
+| x86 - Ubuntu 20.04 (bare installation - CPU) | ❌            |
+| x86 - Ubuntu 20.04 (bare installation - GPU) | ✔️             |
+| x86 - Ubuntu 20.04 (pip installation)        | ❌            |
+| x86 - Ubuntu 20.04 (CPU docker)              | ❌            |
+| x86 - Ubuntu 20.04 (GPU docker)              | ✔️             |
+| NVIDIA Jetson Xavier AGX                     | ✔️             |
@@ -209,6 +209,41 @@ Parameters:
   img = draw_bounding_boxes(img.opencv(), bounding_boxes, learner.classes, show=True)
   ```
 
+#### Performance Evaluation
+
+In terms of speed, the performance of RetinaFace is summarized in the table below (in FPS).
+
+| Variant | RTX 2070 | TX2 | AGX |
+|---------|----------|-----|-----|
+| RetinaFace | 47 | 3 | 8 |
+| RetinaFace-MobileNet | 114 | 13 | 18 |
+
+Apart from the inference speed, we also report the memory usage, as well as energy consumption on a reference platform in the Table below.
+The measurement was made on a Jetson TX2 module.
+
+| Variant  | Memory (MB) | Energy (Joules)  - Total per inference  |
+|-------------------|---------|-------|
+| RetinaFace | 4443 | 21.83  | 
+| RetinaFace-MobileNet     | 4262 | 8.73  |
+
+Finally, we measure the recall on the WIDER face validation subset at 87.83%.
+Note that RetinaFace can make use of image pyramids and horizontal flipping to achieve even better recall at the cost of additional computations.
+For the MobileNet version, recall drops to 77.81%.  
+
+The platform compatibility evaluation is also reported below:
+
+| Platform  | Compatibility Evaluation |
+| ----------------------------------------------|-------|
+| x86 - Ubuntu 20.04 (bare installation - CPU)  | :heavy_check_mark:   |
+| x86 - Ubuntu 20.04 (bare installation - GPU)  | :heavy_check_mark:   |
+| x86 - Ubuntu 20.04 (pip installation)         | :heavy_check_mark:   |
+| x86 - Ubuntu 20.04 (CPU docker)               | :heavy_check_mark:   |
+| x86 - Ubuntu 20.04 (GPU docker)               | :heavy_check_mark:   |
+| NVIDIA Jetson TX2                             | :heavy_check_mark:   |
+| NVIDIA Jetson Xavier AGX                      | :heavy_check_mark:   |
+| NVIDIA Jetson Xavier NX                       | :heavy_check_mark:   |
+
 #### References
 <a name="retinaface-1" href="https://arxiv.org/abs/1905.00641">[1]</a> RetinaFace: Single-stage Dense Face Localisation in the Wild,
 [arXiv](https://arxiv.org/abs/1905.00641).
+
@@ -371,3 +371,45 @@ cap.release()
 cv2.destroyAllWindows()
 ```
 
+#### Performance Evaluation
+
+The performance evaluation results of the *FaceRecognitionLearner* are reported in the Table below:
+
+| Backbone          | CPU i7-9700K (FPS) | RTX 2070 (FPS) | Jetson TX2 (FPS) | Xavier NX (FPS) | Xavier AGX (FPS) |
+|-------------------|--------|----------------|------------------|-----------------|------------------|
+| MobileFaceNet | 137.83 | 224.26         | 29.84            | 28.85           | 37.17            |
+| IR-50   | 25.40  | 176.25         | 17.99            | 17.18           | 19.58            |
+
+
+Apart from the inference speed, which is reported in FPS, we also report the memory usage, as well as energy consumption on a reference platform in the Table below:
+
+| Backbone      | Memory (MB) | Energy (Joules)  - Total per inference |
+|---------------|------------|----------------------------------------|
+| MobileFaceNet | 949.75     | 0.41                                   | 
+| IR-50         | 1315.75    | 1.15                                   |
+
+
+NVIDIA Jetson AGX was used as the reference platform for measuring energy requirements for these experiments. 
+We calculated the average metrics of 100 runs. 
+
+The accuracy on Labeled Faces in the Wild (LFW), Celebrities in Frontal-Profile in the Wild, both frontal to frontal (CFP-FF) and frontal to profile (CFP-FP) setups, AgeDB-30 and VGGFace2 datasets is also reported in the Table below:
+
+| Backbone        | LFW    | CFP-FF | CFP-FP | AgeDB-30 | VGGFace2 |
+|-----------------|--------|--------|--------|----------|----------|
+| MobileFaceNet   | 99.46% | 99.27% | 93.62% | 95.49%   | 93.24%   |
+| IR-50           | 99.84% | 99.67% | 98.11% | 97.73%   | 95.3%    |
+
+
+The platform compatibility evaluation is also reported below:
+
+| Platform  | Compatibility Evaluation |
+| ----------------------------------------------|-------|
+| x86 - Ubuntu 20.04 (bare installation - CPU)  | :heavy_check_mark:   |
+| x86 - Ubuntu 20.04 (bare installation - GPU)  | :heavy_check_mark:   |
+| x86 - Ubuntu 20.04 (pip installation)         | :heavy_check_mark:   |
+| x86 - Ubuntu 20.04 (CPU docker)               | :heavy_check_mark:   |
+| x86 - Ubuntu 20.04 (GPU docker)               | :heavy_check_mark:   |
+| NVIDIA Jetson TX2                             | :heavy_check_mark:   |
+| NVIDIA Jetson Xavier AGX                      | :heavy_check_mark:   |
+| NVIDIA Jetson Xavier NX                       | :heavy_check_mark:   |
+
@@ -269,6 +269,47 @@ bounding_box_list, w_sensor1, _ = learner.infer(m1_img, m2_img)
 cv2.imshow('Detections', draw(m1_img.opencv(), bounding_box_list, w_sensor1))
 cv2.waitKey(0)
 ```
+
+#### Performance Evaluation
+
+The performance evaluation results of the *GemLearner* are reported in the Table below.
+These tests have been performed on several platforms, i.e. a laptop CPU, laptop GPU and on the Jetson TX2.
+Also, the results for two different backbones are shown, i.e. ResNet-50 and MobileNet-v2.
+Inference was performed on a 1280x720 RGB image and a 1280x720 infrared image.
+
+| Method       | CPU i7-10870H  (FPS) | RTX 3080 Laptop (FPS) | Jetson TX2 (FPS) |
+|--------------|:--------------------:|:---------------------:|:----------------:|
+| ResNet-50    |         0.87         |         11.83         |       0.83       |
+| MobileNet-v2 |         2.44         |         18.70         |       2.19       |
+
+
+Apart from the inference speed, which is reported in FPS, we also report the memory usage, as well as energy consumption on a reference platform in the Table below.
+Again, inference was performed on a 1280x720 RGB image and a 1280x720 infrared image.
+
+| Method       | Memory (MB) RTX 3080 Laptop | Energy (Joules)  - Total per inference Jetson TX2 |
+|--------------|:---------------------------:|:-------------------------------------------------:|
+| ResNet-50    |            1853.4           |                        28.2                       |
+| MobileNet-v2 |            931.2            |                        8.4                        |
+
+Below, the performance of GEM is in terms of accuracy is presented.
+These results were obtained on the L515-Indoor dataset, which can also be downloaded using the GemLearner class from OpenDR.
+
+| Method       | Mean Average Precision | Energy (Joules)  - Total per inference Jetson TX2 |
+|--------------|:----------------------:|:-------------------------------------------------:|
+| Resnet-50    |          0.982         |                        28.2                       |
+| MobileNet-v2 |          0.833         |                        8.4                        |
+
+The platform compatibility evaluation is also reported below:
+
+|                   Platform                   | Test results |
+|----------------------------------------------|:------------:|
+| x86 - Ubuntu 20.04 (bare installation - CPU) |     Pass     |
+| x86 - Ubuntu 20.04 (bare installation - GPU) |     Pass     |
+| x86 - Ubuntu 20.04 (pip installation)        |     Pass     |
+| x86 - Ubuntu 20.04 (CPU docker)              |     Pass     |
+| x86 - Ubuntu 20.04 (GPU docker)              |     Pass     |
+| NVIDIA Jetson TX2                            |     Pass     |
+
 #### References
 <a name="detr-paper" href="https://ai.facebook.com/research/publications/end-to-end-object-detection-with-transformers">[1]</a> Carion N., Massa F., Synnaeve G., Usunier N., Kirillov A., Zagoruyko S. (2020) End-to-End Object Detection with Transformers. In: Vedaldi A., Bischof H., Brox T., Frahm JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science, vol 12346. Springer, Cham. [doi](https://doi.org/10.1007/978-3-030-58452-8_13),
 [arXiv](https://arxiv.org/abs/2005.12872).

@@ -103,6 +103,28 @@ A demo in the form of a Jupyter Notebook is available
   model_3D = model_generator.infer(imgs_rgb=[rgb_img], imgs_msk=[msk_img], extract_pose=False)
   ```
 
+#### Performance Evaluation
+
+TABLE-1: OpenDR 3D human model generation speed evaluation.
+| Method                                          | CPU i7-9700K (ms) | RTX 2070 (ms) |
+| ----------------------------------------------- | ----------------- | ------------- |
+| Human Model Generation only                     | 488.2       | 212.3    | 
+| Human Model Generation + 3D pose approximation  | 679.8       | 531.6     |
+
+
+
+TABLE-2: 3D Human Model Generation platform compatibility evaluation.
+| Platform                                     | Test results |
+| -------------------------------------------- | ------------ |
+| x86 - Ubuntu 20.04 (bare installation - CPU) | Pass         |
+| x86 - Ubuntu 20.04 (bare installation - GPU) | Pass         |
+| x86 - Ubuntu 20.04 (pip installation)        | Pass         |
+| x86 - Ubuntu 20.04 (CPU docker)              | Pass*        |
+| x86 - Ubuntu 20.04 (GPU docker)              | Pass*        |
+| NVIDIA Jetson TX2                            | Not tested   |
+| NVIDIA Jetson Xavier NX                      | Not tested   |
+
+*On docker installation, the skeleton approximation of the 3D human models is not available.
 
 #### References
 <a name="pifu-paper" href="https://shunsukesaito.github.io/PIFu/">[1]</a>