Linear Explanations for Individual Neurons

This is the official repository for our paper ICML 2024 Linear Explanations for Individual Neurons. [Arxiv link].

We propose that neurons are best understood as linear combinations of interpretable concepts, and propose an efficient way to generate such explanations: Linear Explanations (LE). In addition we introduce the simulation evaluation for vision models.

Method Overview

Setup

Install Python (3.11) and Pytorch (>=2.0) with torchvision: https://pytorch.org/get-started/locally/
Install other dependencies with pip install -r requirements.txt
To analyze ResNet-18 trained on Places365, download it with bash download_rn18_places.sh
To use ImageNet models/data, add the path to your folder containing ImageNet validation data to DATASET_ROOTS in data_utils.py

Running

Quickstart

Explain neurons in VGG-16(CIFAR100) using LE(Label):

python explain_neurons.py --target_name vgg16_bn_cifar100 --target_layer classifier[4] --dataset_name cifar100_val

Explain neurons in ResNet-18(Places365) using LE(SigLIP):

python explain_neurons.py --target_name resnet18_places365 --target_layer layer4 --pool_mode avg --dataset_name places365_val --mode siglip

Evaluate our LE(SigLIP) explanations of ResNet-18(Places365) with Simulation(Correlation scoring):

python simulate_correlation.py --target_name resnet18_places365 --target_layer layer4 --pool_mode avg --dataset_name places365_val --explanation_method le_siglip

Evaluate our LE(Label) explanations of ResNet-50(ImageNet) with Simulation(Ablation scoring):

python simulate_ablation.py

For additional commands and instructions for reproducing our experiments, see commands.txt.

Results

Simulation results (correlation scoring)

Target model	Network Dissection	MILAN	CLIP-Dissect	LE (Label)	LE (SigLIP)
ResNet-50 (ImageNet)	0.1242	0.0920	0.1871	0.2924	0.3772
ResNet-18 (Places365)	0.2038	0.1557	0.2208	0.3388	0.4372
VGG-16 (CIFAR-100)	-	-	0.2298	0.4330	0.4970
ViT-B/16 (ImageNet)	-	-	0.1722	0.3243	0.3489
ViT-L/32 (ImageNet)	-	-	0.0549	0.1879	0.2182

Average correlation scores for different explanation methods in seconds to last layer of the different models.

Example results

Sources

GLM-Saga: https://github.com/MadryLab/glm_saga

Note we modify the original implementation on line 116 and 119, line 151 and line 330 and 333 to get the same results regardless of how many neurons we explain at once.

RN18(Places365) model: https://github.com/CSAILVision/NetDissect-Lite

CIFAR models: https://github.com/chenyaofo/pytorch-cifar-models

CLIP models: https://github.com/mlfoundations/open_clip

Cite this work

T. Oikarinen and T.W. Weng, Linear Explanations for Individual Neurons, ICML 2024.

@inproceedings{oikarinen2024linear,
  title={Linear Explanations for Individual Neurons},
  author={Oikarinen, Tuomas and Weng, Tsui-Wei},
  booktitle={International Conference on Machine Learning},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
glm_saga		glm_saga
results		results
secondary		secondary
.gitignore		.gitignore
README.md		README.md
activation_importance.ipynb		activation_importance.ipynb
area_chart.ipynb		area_chart.ipynb
commands.txt		commands.txt
data_utils.py		data_utils.py
download_rn18_places.sh		download_rn18_places.sh
explain_neurons.py		explain_neurons.py
explain_neurons_no_gs.py		explain_neurons_no_gs.py
linear_explanation.py		linear_explanation.py
requirements.txt		requirements.txt
run_correlations.ipynb		run_correlations.ipynb
simulate_ablation.py		simulate_ablation.py
simulate_correlation.py		simulate_correlation.py
utils.py		utils.py
visualize_neuron.ipynb		visualize_neuron.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Linear Explanations for Individual Neurons

Method Overview

Setup

Running

Quickstart

Results

Simulation results (correlation scoring)

Example results

Sources

Cite this work

About

Releases

Packages

Languages

Trustworthy-ML-Lab/Linear-Explanations

Folders and files

Latest commit

History

Repository files navigation

Linear Explanations for Individual Neurons

Method Overview

Setup

Running

Quickstart

Results

Simulation results (correlation scoring)

Example results

Sources

Cite this work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages