Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contributing AutoEncoder Changes to Repo #6

Open
wants to merge 147 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
147 commits
Select commit Hold shift + click to select a range
3aa55be
Updating mlp
MatthewSo Nov 12, 2024
ce6024b
Refactoring Config Space in Search Spaces to OOP
MatthewSo Dec 9, 2024
f7b497d
Adding AutoEncoderConfigSpace set hyperparams
MatthewSo Dec 9, 2024
d0aaf97
Updating AutoEncoder config sample arch uniformly
MatthewSo Dec 9, 2024
5a20878
testing new arch
nik-hz Dec 10, 2024
36d4795
added support for iamgenet resnet and tested search
nik-hz Dec 11, 2024
5a7c473
New Search Sample init
MatthewSo Dec 12, 2024
549553a
Updating
MatthewSo Dec 12, 2024
6e05ae1
Updating the new config_spaces reference via path
MatthewSo Dec 12, 2024
64168bd
Adding naive search sample attempt on AutoEncoder config space
MatthewSo Dec 12, 2024
c7e6996
Adding Analog-aware training loop. Added noise modeling from AIHWKit
MatthewSo Dec 13, 2024
bf7a403
Updating Evaluator Inheritance
MatthewSo Dec 14, 2024
b081cba
Initializing the autoencoder config-based arch
MatthewSo Dec 14, 2024
668e0b7
Updating
MatthewSo Dec 14, 2024
cc13580
Adding mirrored autoencoder architecture
MatthewSo Dec 14, 2024
2042f61
Correcting off by 1 error for convblock reference
MatthewSo Dec 14, 2024
03193ba
Updating
MatthewSo Dec 14, 2024
a2f5ab8
Updating
MatthewSo Dec 14, 2024
d96235e
Adding Realtime Evaluation Architecture String dict for string to str…
MatthewSo Dec 14, 2024
dccbc9c
fixing dataloading bug
MatthewSo Dec 14, 2024
f13b778
Adding MNIST AutoEncoder class
MatthewSo Dec 14, 2024
5e5b87c
Adding Dataset class for loading image to image tasks
MatthewSo Dec 14, 2024
6b6d19f
Updating
MatthewSo Dec 14, 2024
ccbb458
Correcting hashing"
MatthewSo Dec 14, 2024
fb124be
Updating
MatthewSo Dec 14, 2024
bdbcf9e
Adding CUDA for Realtime Evaluation Training
MatthewSo Dec 14, 2024
06a77fb
Fixing device location for validation"
MatthewSo Dec 14, 2024
5cb0f3a
Updating analog nas
MatthewSo Dec 14, 2024
2ae5ebc
Adding epoch logging
MatthewSo Dec 14, 2024
1a7c4c0
More logging
MatthewSo Dec 14, 2024
1c48cdf
Updating
MatthewSo Dec 14, 2024
a890dd0
Temp shape logging
MatthewSo Dec 14, 2024
cba2d14
Updating batch size
MatthewSo Dec 14, 2024
655a7fd
Updating batchsize
MatthewSo Dec 14, 2024
a54d19a
remove assignment during drift analog weights update for 1 and 30 days
MatthewSo Dec 15, 2024
32f5e2f
Guarantee Parity of the shape of encoder and decoder components
MatthewSo Dec 15, 2024
8043ed7
Part 2 - updating for parity of input and output
MatthewSo Dec 15, 2024
2ab5dc1
Updating gpu used
MatthewSo Dec 15, 2024
fb311b8
Testing out analog inference on GPU
MatthewSo Dec 15, 2024
9488036
Updating
MatthewSo Dec 15, 2024
e56dddf
Adding Threading for training architectures on several GPUs
MatthewSo Dec 15, 2024
a47d986
Adding arch dict to string map
MatthewSo Dec 15, 2024
a08f5ff
Updating logging
MatthewSo Dec 15, 2024
9cb73ed
Testing with Py Threads
MatthewSo Dec 15, 2024
921aa76
updating model string passing
MatthewSo Dec 15, 2024
08b25df
Updating number of training epochs
MatthewSo Dec 15, 2024
69d586e
Updating
MatthewSo Dec 15, 2024
2ae6d90
Updating population size
MatthewSo Dec 15, 2024
cb2ce99
Renaming test dataloader
MatthewSo Dec 15, 2024
fe201c6
Adding sigmoid
MatthewSo Dec 15, 2024
ceb0110
updating new search sample
MatthewSo Dec 15, 2024
b184e50
Removing sigmoid
MatthewSo Dec 15, 2024
1eefae4
Reverting autoencoder parity change
MatthewSo Dec 15, 2024
7ed2085
Equalization attempt using fc layer
MatthewSo Dec 15, 2024
88fddd5
Fixing reference to input
MatthewSo Dec 15, 2024
c8ebb5e
Removing small embedding dimensions
MatthewSo Dec 15, 2024
a0bad26
Adding update for arch acc printout
MatthewSo Dec 15, 2024
4aae559
Enabled batch consideration
MatthewSo Dec 15, 2024
40b1793
Adding final conv
MatthewSo Dec 15, 2024
415a65e
Removing last layer
MatthewSo Dec 15, 2024
e715dbd
Adding architecture agnostic mutate
MatthewSo Dec 15, 2024
c6e8211
Adding breakpoint
MatthewSo Dec 15, 2024
90675e4
Make the FC layer optional based on architecture
MatthewSo Dec 15, 2024
22a1504
Fixing mirroring
MatthewSo Dec 15, 2024
7e3c4ce
Adding another breakpoint
MatthewSo Dec 15, 2024
568d4cf
Adding breakpoint for new_P"
MatthewSo Dec 15, 2024
f35bd5a
changing uniform sampling strategy
MatthewSo Dec 15, 2024
151f4a6
Updating
MatthewSo Dec 15, 2024
b3fc012
Updating
MatthewSo Dec 15, 2024
7a2166c
Updating
MatthewSo Dec 15, 2024
2a45767
Updating search sample
MatthewSo Dec 15, 2024
f9fb1b6
Changing optimizer
MatthewSo Dec 15, 2024
353bd22
Fixing penultimate conv layer in encoder
MatthewSo Dec 15, 2024
cdfd4e8
Penultimate fix pt 2
MatthewSo Dec 15, 2024
3fa3daa
Pt 2 penultimate fix
MatthewSo Dec 15, 2024
207091d
Adding logging
MatthewSo Dec 15, 2024
f9edca4
fixing ratio
MatthewSo Dec 15, 2024
e9ef6b9
Cleaning up reference names
MatthewSo Dec 15, 2024
470b0c1
Updating
MatthewSo Dec 15, 2024
04c390d
Limiting config space
MatthewSo Dec 15, 2024
567ddb7
Updating batch size
MatthewSo Dec 15, 2024
38bb333
Adding max batches
MatthewSo Dec 16, 2024
5f8b9ab
Adding max batches, pt 2
MatthewSo Dec 16, 2024
7b8122a
Updating bs again
MatthewSo Dec 16, 2024
188bcca
Optimization for estimation
MatthewSo Dec 16, 2024
d2aaa59
Updating
MatthewSo Dec 16, 2024
98e4044
Estimation for RPU noise model
MatthewSo Dec 16, 2024
2375faa
Integrating with tensorboard
MatthewSo Dec 16, 2024
9891762
Adding tensorboard integration
MatthewSo Dec 16, 2024
6c5910b
Adding path to training dir
MatthewSo Dec 16, 2024
c27943a
Fixing batch counting
MatthewSo Dec 16, 2024
3de684f
Change tensoboard reference path
MatthewSo Dec 16, 2024
4d8e54d
Updating evaluator writer
MatthewSo Dec 16, 2024
8193254
Updating max batches per epoch
MatthewSo Dec 16, 2024
72e6363
moving batch_idx
MatthewSo Dec 16, 2024
3c17f2c
pass the avgs of day 1 and day 30 accuracy
MatthewSo Dec 16, 2024
bc9a888
Updating mutate path
MatthewSo Dec 16, 2024
e62c3f9
Adding logging
MatthewSo Dec 16, 2024
d7df49b
Updating mutate function
MatthewSo Dec 16, 2024
b853938
Remove fault import
MatthewSo Dec 16, 2024
ea67d5a
List conversion
MatthewSo Dec 16, 2024
a19819e
Updating new sample
MatthewSo Dec 16, 2024
aa6aa85
Index max batches within epoch only
MatthewSo Dec 16, 2024
96cadad
Updating loss metric
MatthewSo Dec 16, 2024
c981a8d
Add bypass threshold to prevent evaluation on model's whose digital p…
MatthewSo Dec 16, 2024
a1c2752
Updating
MatthewSo Dec 16, 2024
1251332
Mocking response for testing
MatthewSo Dec 16, 2024
c8af5c1
Mocking response for testing
MatthewSo Dec 16, 2024
87544ea
printing sorting
MatthewSo Dec 16, 2024
6bbeedb
updating sorting method
MatthewSo Dec 16, 2024
5822ef2
Removing mocking
MatthewSo Dec 16, 2024
8e02bbe
punish worse values
MatthewSo Dec 16, 2024
243e667
Updating new search sample
MatthewSo Dec 17, 2024
c51ae25
Adding step scheduler for longer trainins
MatthewSo Dec 17, 2024
c4cc15f
Update how patience is calculated
MatthewSo Dec 17, 2024
36fbdab
Updating
MatthewSo Dec 17, 2024
d63b04c
Added CIFAR Autoencoder architectuer
MatthewSo Dec 17, 2024
3afc5e1
Updating possible embedding dims
MatthewSo Dec 17, 2024
a9f3802
Updating for cifar trainng
MatthewSo Dec 17, 2024
a21f759
adding logging
MatthewSo Dec 17, 2024
b9b2346
Updating CIFAR class again
MatthewSo Dec 17, 2024
49464cf
Adding RPU Search space
MatthewSo Dec 18, 2024
4650d8a
Updating
MatthewSo Dec 18, 2024
a86b2dc
Fixing rpu search space sample
MatthewSo Dec 18, 2024
de21062
Updating
MatthewSo Dec 18, 2024
ae2e17f
Removing string component
MatthewSo Dec 18, 2024
ba5cb3c
Converting analog model to eval
MatthewSo Dec 18, 2024
ae511a4
UPdating 1 epoch
MatthewSo Dec 18, 2024
932b6c3
Updating rpu search sample
MatthewSo Dec 18, 2024
08de9dd
Updating EA optimized
MatthewSo Dec 18, 2024
c1a9d3d
Updating tile size
MatthewSo Dec 18, 2024
d746405
UPdating routine to update best arch
MatthewSo Dec 18, 2024
71ce467
Cleaning pyc
MatthewSo Dec 19, 2024
fe16dec
Cleaning pyc
MatthewSo Dec 19, 2024
f48b4b0
Clean pyc
MatthewSo Dec 19, 2024
fb19c49
Pyc cleanup
MatthewSo Dec 19, 2024
8f1b95b
pyc adjust
MatthewSo Dec 19, 2024
13eee22
Removing PYC from git tracking
MatthewSo Dec 19, 2024
1186942
Updating nas search demo
MatthewSo Dec 19, 2024
dae7ba4
Bug fix
MatthewSo Dec 19, 2024
a08e3f4
Fixing bug
MatthewSo Dec 19, 2024
092e408
Cleanup and README
MatthewSo Dec 20, 2024
3129fe7
Adding comments
MatthewSo Dec 20, 2024
a899722
Adding mnist playground
MatthewSo Dec 20, 2024
28f9521
Updating encoder decoder cifar
MatthewSo Dec 20, 2024
5b22539
Clearing notebook
MatthewSo Dec 20, 2024
5dd8f04
Updating search demo
MatthewSo Dec 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
venv
env
results
*.pyc
1,686 changes: 844 additions & 842 deletions AnalogNAS_Tutorial.ipynb

Large diffs are not rendered by default.

158 changes: 158 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,164 @@
| [**Docs**](https://github.com/IBM/analog-nas/blob/main/starter_notebook.ipynb)
| [**References**](#references)

## Additions for HPML Class
Our project focused on creating an AutoEncoder architecture that can be optimized for Analog Hardware.
We created a search space for AutoEncoder architectures and another for RPU-configurations that can be used to run AutoEncoder architectures.
As part of the project, we also created new evaluators that can be used to evaluate the performance of the AutoEncoder architectures.

We used the existing infrastructure of this repo and made some key contributions to it to extend the functionality for this project.

The majority of the code relevant to this project can be found within the analagainas main folder. Within that main folder, there are the following subfolders:

* **search_spaces**: Contains a further subfolder that contains the AutoEncoder implementation details along with the associated configuration space for the AutoEncoder. Also contains new additions to the dataloader subfolder by the team.
* **evaluators**: Contains the new and existing evaluators that are used to evaluate the performance of the models. The new ones added by the team are the RealtimeRpuEvaluator and the RealtimeTrainingEvaluator.
* **search_algorithms**: Contains the search algorithms that are used to search for the optimal architecture. Contains separate py files for the various approaches. In particular, contains EAOptimized.py which is the main search algorithm used for the AutoEncoder.

Here is a more specific look at the additions made to the repo:


- RealtimeTrainingEvaluator
- Architecture-batchable evaluator class that integrates with the existing AnalogNAS optimizer search framework through query and query\_pop implementations. Allows for multiple architectures to be training on several GPUs simultaneously on multiple threads. Implemented to be model and dataset-agnostic. The outputs of this class could be used in the future to train estimators for arbitrary models.

- BaseConfigSpace
- Introduced Object Oriented update to ConfigSpaces. Made it easily extensive to add future ConfigSpaces without having to update a single catch-all class.

- AutoEncoderConfigSpace
- Configuration Space for the AutoEncoder architecture that was defined in Methodology.

- AutoEncoder
- Config-driven AutoEncoder implementation that accepts parameters as defined by the corresponding ConfigSpace. Used construction-based approach to create the CIFAR and MNIST task-specific configurable AutoEncoders.

- RPUConfigSpace
- Configuration Space for the RPU Architectures.

- Search Entrypoints
- Introduced several training entrypoints to enable the recreation of various search experiments. These include, AutoEncoder Search Demo, Cifar AutoEncoder Train Demo, and RPU Search Demo. These can launch various searches over different datasets/architectures.

- Jupyter Notebooks
- Several Jupyter notebooks were created to analyze/retrain discovered architecture and generate experiment results.

Much like the original implemenation, you can run the searches by running the following commands:

* python nas_search_demo.py
* python autoencoder_search_demo.py
* python rpu_search_demo.py
* python cifar_autoencoder_train.py

Here is a sample snippet:

```python
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
import torch.nn as nn

from analogainas.search_spaces.autoencoder.cifar_autoencoder import CifarAutoEncoder
from analogainas.search_spaces.autoencoder.autoencoder_config_space import AutoEncoderConfigSpace
from analogainas.evaluators.realtime_training_evaluator import RealtimeTrainingEvaluator
from analogainas.search_algorithms.ea_optimized import EAOptimizer
from analogainas.search_algorithms.worker import Worker
from analogainas.search_spaces.dataloaders.autoencoder_structured_dataset import AutoEncoderStructuredDataset



CS = AutoEncoderConfigSpace()

print(CS.get_hyperparameters())

print(CS.compute_cs_size())

transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465),
(0.2023, 0.1994, 0.2010))
])

train_cifar_dataset = AutoEncoderStructuredDataset(
torchvision.datasets.CIFAR10(root='./data', train=True, transform=transform, download=True)
)

train_dataloader = DataLoader(train_cifar_dataset, batch_size=8, shuffle=True)

test_cifar_dataset = AutoEncoderStructuredDataset(
torchvision.datasets.CIFAR10(root='./data', train=False, transform=transform, download=True)
)

test_dataloader = DataLoader(test_cifar_dataset, batch_size=64, shuffle=True)

criterion = nn.MSELoss()
evaluator = RealtimeTrainingEvaluator(model_factory=CifarAutoEncoder, train_dataloader=train_dataloader, val_dataloader=test_dataloader, test_dataloader=test_dataloader, criterion=criterion, epochs=13, artifact_dir='CifarAutoEncoderTraining')

optimizer = EAOptimizer(evaluator, population_size=50, nb_iter=10, batched_evaluation=True)

NB_RUN = 1
worker = Worker(network_factory=CifarAutoEncoder, cs=CS, optimizer=optimizer, runs=NB_RUN)

print(worker.config_space)
worker.search()
```

Sample results:

Virtually all of the models were able to fit the MNIST dataset in the digital prediction space.

<img src="images/sample_convergence_mnist.png" width="600">

However, for the vast majority of models, once even a little RPU drift noise was added, the models lost all predictive power.


Results after one day of drift noise on non-optimal model:

<img src="images/non-optimal-one-day-mnist.png" width="600">

Results after one month of drift noise on non-optimal model:

<img src="images/non-optimal-one-month.png" width="600">

However, after we conducted the search, we were able to find an architecture that is resilient to the drift noise.

Results after one day of drift noise on optimal model:

<img src="images/mnist-one-day-optimized.png" width="600">


Results after one month of drift noise on optimal model:

<img src="images/mnist-one-month-optimized.png" width="600">

Clearly the evolved architecture is much preferred in this case.

This was attempted again with Cifar10.

Here are the results for the non-optimal model:

One day

<img src="images/non-optimal-cifar-day.png" width="600">

One month

<img src="images/non-optimal-cifar-month.png" width="600">

And here are the results for the optimal model:

One day

<img src="images/cifar-optimal-one-day.png" width="600">

One month

<img src="images/cifar-one-month-optimal.png" width="600">

It should be added that even these "non-optimal" models are still better than the average, randomly selected model from the configuration space.
Many of the models are completely illegible after adding noise.

The optimal models are able to maintain a high level of clarity even after a month of drift noise.

Lastly, with the optimized model, we conducted an RPU search for the optimal RPU config and achieved the following results:

![cifar_model_performance.png](images%2Fcifar_model_performance.png)

## Features
AnalogaiNAS package offers the following features:

Expand Down
Binary file removed analogainas/__pycache__/__init__.cpython-38.pyc
Binary file not shown.
Binary file removed analogainas/__pycache__/__init__.cpython-39.pyc
Binary file not shown.
Binary file removed analogainas/__pycache__/utils.cpython-38.pyc
Binary file not shown.
65 changes: 65 additions & 0 deletions analogainas/analog_helpers/analog_helpers.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# TORCH IMPORTS
# AIHWKIT IMPORTS
from aihwkit.simulator.configs import InferenceRPUConfig
from aihwkit.simulator.configs.utils import WeightClipType
from aihwkit.simulator.presets.utils import PresetIOParameters
from aihwkit.inference.compensation.drift import GlobalDriftCompensation
from aihwkit.optim import AnalogSGD

from aihwkit.inference.noise.pcm import PCMLikeNoiseModel
from aihwkit.simulator.parameters.enums import BoundManagementType


def create_noise_model():
# Returns a noise model for the inference.
# Would have preferred to use CustomDriftPCMLikeNoiseModel but it is not available in the current environment supported in this repo.
# g_min, g_max = 0.0, 25.
# custom_drift_model = dict(g_lst=[g_min, 10., g_max],
# nu_mean_lst=[0.08, 0.05, 0.03],
# nu_std_lst=[0.03, 0.02, 0.01])
#
# noise_model = CustomDriftPCMLikeNoiseModel(custom_drift_model,
# prog_noise_scale=0.0, # turn off to show drift only
# read_noise_scale=0.0, # turn off to show drift only
# drift_scale=1.0,
# g_converter=SinglePairConductanceConverter(g_min=g_min,
# g_max=g_max),
# )
noise_model = PCMLikeNoiseModel()
return noise_model

def create_rpu_config(g_max=25,
tile_size=256,
dac_res=256,
adc_res=256,
noise_std=5.0):
# Returns an RPU configuration for the inference based on the given parameters.
# Implementation from AIHWKit
rpu_config = InferenceRPUConfig()

rpu_config.clip.type = WeightClipType.FIXED_VALUE
rpu_config.clip.fixed_value = 1.0
rpu_config.modifier.pdrop = 0 # Drop connect.

rpu_config.modifier.std_dev = noise_std

rpu_config.modifier.rel_to_actual_wmax = True
rpu_config.mapping.digital_bias = True
rpu_config.mapping.weight_scaling_omega = 0.4
rpu_config.mapping.weight_scaling_omega = True
rpu_config.mapping.max_input_size = tile_size
rpu_config.mapping.max_output_size = 255

rpu_config.mapping.learn_out_scaling_alpha = True

rpu_config.forward = PresetIOParameters()
rpu_config.forward.inp_res = 1/dac_res # 8-bit DAC discretization.
rpu_config.forward.out_res = 1/adc_res # 8-bit ADC discretization.
rpu_config.forward.bound_management = BoundManagementType.NONE

# Inference noise model.
rpu_config.noise_model = PCMLikeNoiseModel(g_max=g_max)

# drift compensation
rpu_config.drift_compensation = GlobalDriftCompensation()
return rpu_config
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
31 changes: 0 additions & 31 deletions analogainas/evaluators/base_evaluator.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,23 +7,6 @@ class Evaluator:
def __init__(self, model_type=None):
self.model_type = model_type

def pre_process(self):
"""
This is called at the start of the NAS algorithm,
before any architectures have been queried
"""
pass

def fit(self, x_train, y_train):
"""
Training the evaluator.

Args:
x_train: list of architectures
y_train: accuracies or ranks
"""
pass

def query(self, x_test):
"""
Get the accuracy/rank prediction for x_test.
Expand All @@ -36,20 +19,6 @@ def query(self, x_test):
"""
pass

def get_evaluator_stat(self):
"""
Check whether the evaluator needs retraining.

Returns:
A dictionary of metrics.
"""
reqs = {
"requires_retraining": False,
"test_accuracy": None,
"requires_hyperparameters": False,
"hyperparams": {}
}
return reqs

def set_hyperparams(self, hyperparams):
"""
Expand Down
21 changes: 21 additions & 0 deletions analogainas/evaluators/evaluation_metrics.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
import torch
from torch import nn

def negative_mse_metric(dataloader, analog_model, max_batches=5):
# This is a metric callback that will be used to evaluate the performance of the model
# using the mean squared error. The metric is negated because the search algorithm ranks
# architectures based on the metric.
losses = []
criterion = nn.MSELoss()
with torch.no_grad():
for i, (inputs, targets) in enumerate(dataloader):
outputs = analog_model(inputs)
loss = criterion(outputs, targets)
loss = -1 * loss.item()

losses.append(loss)

if i > max_batches:
break

return losses
4 changes: 0 additions & 4 deletions analogainas/evaluators/mlp.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,12 +94,8 @@ def fit(self, xtrain, ytrain,
self.mean = np.mean(ytrain)
self.std = np.std(ytrain)

<<<<<<< HEAD
# TODO: Add encoding
=======
scaler = StandardScaler()
_xtrain = scaler.fit_transform(xtrain)
>>>>>>> public/main

_xtrain = xtrain
_ytrain = np.array(ytrain)
Expand Down
Loading