IBM · MatthewSo · Nov 12, 2024 · Dec 9, 2024 · Dec 9, 2024 · Dec 9, 2024
diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,4 @@
 venv
 env
 results
+*.pyc
diff --git a/AnalogNAS_Tutorial.ipynb b/AnalogNAS_Tutorial.ipynb
diff --git a/README.md b/README.md
@@ -12,6 +12,164 @@
 | [**Docs**](https://github.com/IBM/analog-nas/blob/main/starter_notebook.ipynb)
 | [**References**](#references)
 
+## Additions for HPML Class
+Our project focused on creating an AutoEncoder architecture that can be optimized for Analog Hardware. 
+We created a search space for AutoEncoder architectures and another for RPU-configurations that can be used to run AutoEncoder architectures.
+As part of the project, we also created new evaluators that can be used to evaluate the performance of the AutoEncoder architectures.
+
+We used the existing infrastructure of this repo and made some key contributions to it to extend the functionality for this project.
+
+The majority of the code relevant to this project can be found within the analagainas main folder. Within that main folder, there are the following subfolders:
+
+* **search_spaces**: Contains a further subfolder that contains the AutoEncoder implementation details along with the associated configuration space for the AutoEncoder. Also contains new additions to the dataloader subfolder by the team.
+* **evaluators**: Contains the new and existing evaluators that are used to evaluate the performance of the models. The new ones added by the team are the RealtimeRpuEvaluator and the RealtimeTrainingEvaluator.
+* **search_algorithms**: Contains the search algorithms that are used to search for the optimal architecture. Contains separate py files for the various approaches. In particular, contains EAOptimized.py which is the main search algorithm used for the AutoEncoder.
+
+Here is a more specific look at the additions made to the repo:
+
+
+- RealtimeTrainingEvaluator
+    - Architecture-batchable evaluator class that integrates with the existing AnalogNAS optimizer search framework through query and query\_pop implementations. Allows for multiple architectures to be training on several GPUs simultaneously on multiple threads. Implemented to be model and dataset-agnostic. The outputs of this class could be used in the future to train estimators for arbitrary models.
+
+- BaseConfigSpace
+    - Introduced Object Oriented update to ConfigSpaces. Made it easily extensive to add future ConfigSpaces without having to update a single catch-all class.
+
+- AutoEncoderConfigSpace
+    - Configuration Space for the AutoEncoder architecture that was defined in Methodology.
+
+- AutoEncoder
+    - Config-driven AutoEncoder implementation that accepts parameters as defined by the corresponding ConfigSpace. Used construction-based approach to create the CIFAR and MNIST task-specific configurable AutoEncoders.
+
+- RPUConfigSpace
+    - Configuration Space for the RPU Architectures. 
+
+- Search Entrypoints
+    - Introduced several training entrypoints to enable the recreation of various search experiments. These include, AutoEncoder Search Demo, Cifar AutoEncoder Train Demo, and RPU Search Demo. These can launch various searches over different datasets/architectures.
+
+- Jupyter Notebooks
+    - Several Jupyter notebooks were created to analyze/retrain discovered architecture and generate experiment results.
+
+Much like the original implemenation, you can run the searches by running the following commands:
+
+* python nas_search_demo.py
+* python autoencoder_search_demo.py
+* python rpu_search_demo.py
+* python cifar_autoencoder_train.py
+
+Here is a sample snippet:
+
+```python
+import torchvision
+import torchvision.transforms as transforms
+from torch.utils.data import DataLoader
+import torch.nn as nn
+
+from analogainas.search_spaces.autoencoder.cifar_autoencoder import CifarAutoEncoder
+from analogainas.search_spaces.autoencoder.autoencoder_config_space import AutoEncoderConfigSpace
+from analogainas.evaluators.realtime_training_evaluator import RealtimeTrainingEvaluator
+from analogainas.search_algorithms.ea_optimized import EAOptimizer
+from analogainas.search_algorithms.worker import Worker
+from analogainas.search_spaces.dataloaders.autoencoder_structured_dataset import AutoEncoderStructuredDataset
+
+
+
+CS = AutoEncoderConfigSpace()
+
+print(CS.get_hyperparameters())
+
+print(CS.compute_cs_size())
+
+transform = transforms.Compose([
+    transforms.ToTensor(),
+    transforms.Normalize((0.4914, 0.4822, 0.4465),
+                         (0.2023, 0.1994, 0.2010))
+])
+
+train_cifar_dataset = AutoEncoderStructuredDataset(
+    torchvision.datasets.CIFAR10(root='./data', train=True, transform=transform, download=True)
+)
+
+train_dataloader = DataLoader(train_cifar_dataset, batch_size=8, shuffle=True)
+
+test_cifar_dataset = AutoEncoderStructuredDataset(
+    torchvision.datasets.CIFAR10(root='./data', train=False, transform=transform, download=True)
+)
+
+test_dataloader = DataLoader(test_cifar_dataset, batch_size=64, shuffle=True)
+
+criterion = nn.MSELoss()
+evaluator = RealtimeTrainingEvaluator(model_factory=CifarAutoEncoder, train_dataloader=train_dataloader, val_dataloader=test_dataloader, test_dataloader=test_dataloader, criterion=criterion, epochs=13, artifact_dir='CifarAutoEncoderTraining')
+
+optimizer = EAOptimizer(evaluator, population_size=50, nb_iter=10, batched_evaluation=True)
+
+NB_RUN = 1
+worker = Worker(network_factory=CifarAutoEncoder, cs=CS, optimizer=optimizer, runs=NB_RUN)
+
+print(worker.config_space)
+worker.search()
+``` 
+
+Sample results:
+
+Virtually all of the models were able to fit the MNIST dataset in the digital prediction space.
+
+<img src="images/sample_convergence_mnist.png" width="600">
+
+However, for the vast majority of models, once even a little RPU drift noise was added, the models lost all predictive power.
+
+
+Results after one day of drift noise on non-optimal model:
+
+<img src="images/non-optimal-one-day-mnist.png" width="600">
+
+Results after one month of drift noise on non-optimal model:
+
+<img src="images/non-optimal-one-month.png" width="600">
+
+However, after we conducted the search, we were able to find an architecture that is resilient to the drift noise.
+
+Results after one day of drift noise on optimal model:
+
+<img src="images/mnist-one-day-optimized.png" width="600">
+
+
+Results after one month of drift noise on optimal model:
+
+<img src="images/mnist-one-month-optimized.png" width="600">
+
+Clearly the evolved architecture is much preferred in this case.
+
+This was attempted again with Cifar10. 
+
+Here are the results for the non-optimal model:
+
+One day
+
+<img src="images/non-optimal-cifar-day.png" width="600">
+
+One month
+
+<img src="images/non-optimal-cifar-month.png" width="600">
+
+And here are the results for the optimal model:
+
+One day
+
+<img src="images/cifar-optimal-one-day.png" width="600">
+
+One month
+
+<img src="images/cifar-one-month-optimal.png" width="600">
+
+It should be added that even these "non-optimal" models are still better than the average, randomly selected model from the configuration space. 
+Many of the models are completely illegible after adding noise.
+
+The optimal models are able to maintain a high level of clarity even after a month of drift noise.
+
+Lastly, with the optimized model, we conducted an RPU search for the optimal RPU config and achieved the following results:
+
+![cifar_model_performance.png](images%2Fcifar_model_performance.png)
+
 ## Features 
 AnalogaiNAS package offers the following features: 
 

diff --git a/analogainas/__pycache__/__init__.cpython-38.pyc b/analogainas/__pycache__/__init__.cpython-38.pyc
diff --git a/analogainas/__pycache__/__init__.cpython-39.pyc b/analogainas/__pycache__/__init__.cpython-39.pyc
diff --git a/analogainas/__pycache__/utils.cpython-38.pyc b/analogainas/__pycache__/utils.cpython-38.pyc
diff --git a/analogainas/analog_helpers/analog_helpers.py b/analogainas/analog_helpers/analog_helpers.py
@@ -0,0 +1,65 @@
+# TORCH IMPORTS
+# AIHWKIT IMPORTS
+from aihwkit.simulator.configs import InferenceRPUConfig
+from aihwkit.simulator.configs.utils import WeightClipType
+from aihwkit.simulator.presets.utils import PresetIOParameters
+from aihwkit.inference.compensation.drift import GlobalDriftCompensation
+from aihwkit.optim import AnalogSGD
+
+from aihwkit.inference.noise.pcm import PCMLikeNoiseModel
+from aihwkit.simulator.parameters.enums import BoundManagementType
+
+
+def create_noise_model():
+    # Returns a noise model for the inference.
+    # Would have preferred to use CustomDriftPCMLikeNoiseModel but it is not available in the current environment supported in this repo.
+    # g_min, g_max = 0.0, 25.
+    # custom_drift_model = dict(g_lst=[g_min, 10., g_max],
+    #                           nu_mean_lst=[0.08, 0.05, 0.03],
+    #                           nu_std_lst=[0.03, 0.02, 0.01])
+    #
+    # noise_model = CustomDriftPCMLikeNoiseModel(custom_drift_model,
+    #                                            prog_noise_scale=0.0,   # turn off to show drift only
+    #                                            read_noise_scale=0.0,   # turn off to show drift only
+    #                                            drift_scale=1.0,
+    #                                            g_converter=SinglePairConductanceConverter(g_min=g_min,
+    #                                                                                       g_max=g_max),
+    #                                            )
+    noise_model = PCMLikeNoiseModel()
+    return noise_model
+
+def create_rpu_config(g_max=25,
+                      tile_size=256,
+                      dac_res=256,
+                      adc_res=256,
+                      noise_std=5.0):
+    # Returns an RPU configuration for the inference based on the given parameters.
+    # Implementation from AIHWKit
+    rpu_config = InferenceRPUConfig()
+
+    rpu_config.clip.type = WeightClipType.FIXED_VALUE
+    rpu_config.clip.fixed_value = 1.0
+    rpu_config.modifier.pdrop = 0  # Drop connect.
+
+    rpu_config.modifier.std_dev = noise_std
+
+    rpu_config.modifier.rel_to_actual_wmax = True
+    rpu_config.mapping.digital_bias = True
+    rpu_config.mapping.weight_scaling_omega = 0.4
+    rpu_config.mapping.weight_scaling_omega = True
+    rpu_config.mapping.max_input_size = tile_size
+    rpu_config.mapping.max_output_size = 255
+
+    rpu_config.mapping.learn_out_scaling_alpha = True
+
+    rpu_config.forward = PresetIOParameters()
+    rpu_config.forward.inp_res = 1/dac_res  # 8-bit DAC discretization.
+    rpu_config.forward.out_res = 1/adc_res  # 8-bit ADC discretization.
+    rpu_config.forward.bound_management = BoundManagementType.NONE
+
+    # Inference noise model.
+    rpu_config.noise_model = PCMLikeNoiseModel(g_max=g_max)
+
+    # drift compensation
+    rpu_config.drift_compensation = GlobalDriftCompensation()
+    return rpu_config
diff --git a/analogainas/evaluators/__pycache__/__init__.cpython-38.pyc b/analogainas/evaluators/__pycache__/__init__.cpython-38.pyc
diff --git a/analogainas/evaluators/__pycache__/__init__.cpython-39.pyc b/analogainas/evaluators/__pycache__/__init__.cpython-39.pyc
diff --git a/analogainas/evaluators/__pycache__/xgboost.cpython-38.pyc b/analogainas/evaluators/__pycache__/xgboost.cpython-38.pyc
diff --git a/analogainas/evaluators/__pycache__/xgboost.cpython-39.pyc b/analogainas/evaluators/__pycache__/xgboost.cpython-39.pyc
diff --git a/analogainas/evaluators/base_evaluator.py b/analogainas/evaluators/base_evaluator.py
@@ -7,23 +7,6 @@ class Evaluator:
     def __init__(self, model_type=None):
         self.model_type = model_type
 
-    def pre_process(self):
-        """
-        This is called at the start of the NAS algorithm,
-        before any architectures have been queried
-        """
-        pass
-
-    def fit(self, x_train, y_train):
-        """
-        Training the evaluator.
-
-        Args:
-            x_train: list of architectures
-            y_train: accuracies or ranks
-        """
-        pass
-
     def query(self, x_test):
         """
         Get the accuracy/rank prediction for x_test.
@@ -36,20 +19,6 @@ def query(self, x_test):
         """
         pass
 
-    def get_evaluator_stat(self):
-        """
-        Check whether the evaluator needs retraining.
-
-        Returns:
-            A dictionary of metrics.
-        """
-        reqs = {
-            "requires_retraining": False,
-            "test_accuracy": None,
-            "requires_hyperparameters": False,
-            "hyperparams": {}
-        }
-        return reqs
 
     def set_hyperparams(self, hyperparams):
         """

diff --git a/analogainas/evaluators/evaluation_metrics.py b/analogainas/evaluators/evaluation_metrics.py
@@ -0,0 +1,21 @@
+import torch
+from torch import nn
+
+def negative_mse_metric(dataloader, analog_model,  max_batches=5):
+    # This is a metric callback that will be used to evaluate the performance of the model
+    # using the mean squared error. The metric is negated because the search algorithm ranks
+    # architectures based on the metric.
+    losses = []
+    criterion = nn.MSELoss()
+    with torch.no_grad():
+        for i, (inputs, targets) in enumerate(dataloader):
+            outputs = analog_model(inputs)
+            loss = criterion(outputs, targets)
+            loss = -1 * loss.item()
+
+            losses.append(loss)
+
+            if i > max_batches:
+                break
+
+    return losses
diff --git a/analogainas/evaluators/mlp.py b/analogainas/evaluators/mlp.py
@@ -94,12 +94,8 @@ def fit(self, xtrain, ytrain,
         self.mean = np.mean(ytrain)
         self.std = np.std(ytrain)
 
-<<<<<<< HEAD
-        # TODO: Add encoding 
-=======
         scaler = StandardScaler()
         _xtrain = scaler.fit_transform(xtrain)
->>>>>>> public/main
 
         _xtrain = xtrain
         _ytrain = np.array(ytrain)
-Original file line number
+Diff line change
@@ -1,3 +1,4 @@
     venv
     env
     results
+    *.pyc