vpulab
diff --git a/Diff for: ‎.gitignore
+1 b/Diff for: ‎.gitignore
+1
diff --git a/Diff for: ‎LICENSE
+1-1 b/Diff for: ‎LICENSE
+1-1
diff --git a/Diff for: ‎README.md
+129-2 b/Diff for: ‎README.md
+129-2
diff --git a/Diff for: ‎docs/assets/attention_maps.svg
+613 b/Diff for: ‎docs/assets/attention_maps.svg
+613
diff --git a/Diff for: ‎docs/assets/cat.jpg
80.5 KB b/Diff for: ‎docs/assets/cat.jpg
80.5 KB
diff --git a/Diff for: ‎docs/assets/cat_annotation.png
2.31 KB b/Diff for: ‎docs/assets/cat_annotation.png
2.31 KB
diff --git a/Diff for: ‎docs/assets/cat_optimized_token.npy
6.13 KB b/Diff for: ‎docs/assets/cat_optimized_token.npy
6.13 KB
diff --git a/Diff for: ‎docs/assets/diagram-OVAM.svg
+1 b/Diff for: ‎docs/assets/diagram-OVAM.svg
+1
diff --git a/Diff for: ‎docs/assets/optimized_testing_attention.svg
+588 b/Diff for: ‎docs/assets/optimized_testing_attention.svg
+588
diff --git a/Diff for: ‎docs/assets/optimized_training_attention.svg
+671 b/Diff for: ‎docs/assets/optimized_training_attention.svg
+671
diff --git a/Diff for: ‎docs/assets/teaser.svg
+1 b/Diff for: ‎docs/assets/teaser.svg
+1
diff --git a/Diff for: ‎examples/getting_started.ipynb
+623 b/Diff for: ‎examples/getting_started.ipynb
+623
diff --git a/Diff for: ‎ovam/__init__.py
+5 b/Diff for: ‎ovam/__init__.py
+5
diff --git a/Diff for: ‎ovam/base/attention_storage.py
+90 b/Diff for: ‎ovam/base/attention_storage.py
+90
diff --git a/Diff for: ‎ovam/base/block_hooker.py
+85 b/Diff for: ‎ovam/base/block_hooker.py
+85
diff --git a/Diff for: ‎ovam/base/daam_block.py
+33 b/Diff for: ‎ovam/base/daam_block.py
+33
@@ -158,3 +158,4 @@ cython_debug/
 #  and can be added to the global gitignore or merged into this file.  For a more nuclear
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
 #.idea/
+.DS_Store
@@ -1,6 +1,6 @@
 MIT License
 
-Copyright (c) 2023 Pablo Marcos
+Copyright (c) 2023
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 
@@ -1,2 +1,129 @@
-# ovam
-Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models
+# Open-Vocabulary Attention Maps (OVAM)
+
+**Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models**
+
+
+[]([![arXiv](https://img.shields.io/badge/arXiv-abcd.efgh-b31b1b.svg)](https://arxiv.org/abs/abcd.efgh))
+
+> Links have been removed for anonymity.
+
+![teaser](docs/assets/teaser.svg)
+
+In [our paper](https://arxig.org), we introduce Open-Vocabulary Attention Maps (OVAM), a training-free extension for text-to-image diffusion models to generate text-attribution maps based on open vocabulary descriptions. Also, we introduce a token optimization process to the creation of accurate attention maps, improving the performance of existing semantic segmentation methods based on diffusion cross-attention maps.
+
+![diagram](docs/assets/diagram-OVAM.svg)
+
+## Installation
+
+Create a new virtual or conda environment using (if applicable) and activate it. As example, using `venv`:
+
+```bash
+# Install a Python environment (Ensure 3.8 or higher)
+python -m venv venv
+source venv/bin/activate
+pip install --upgrade pip wheel
+```
+
+Install Pytorch with a compatible CUDA or other backend and [Diffusers 0.20](https://pypi.org/project/diffusers/0.20.2/). In our experiments, we tested the code in Ubuntu with CUDA 11.8 and in MacOS with MPS backend.
+
+```bash
+# Install PyTorch with CUDA 11.8
+pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
+```
+
+```bash
+# Or Pytorch with MPS backend for MacOS
+pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0
+```
+
+Install Python dependencies using project file or alternatively install them from `requirements.txt`:
+
+```bash
+# Install using pyproject.toml
+pip install .
+```
+
+Or alternatively, install dependencies from `requirements.txt` and add OVAM to your PYTHONPATH.
+
+## Getting started
+
+The jupyter notebook [examples/getting_started.ipynb](./examples/getting_started.ipynb) contains a full example of how to use OVAM with Stable Diffusion. In this section, we will show a simplified version of the notebook.
+
+### Setup
+Import related libraries and load Stable Diffusion
+
+```python
+import torch
+import matplotlib.pyplot as plt
+from diffusers import StableDiffusionPipeline
+from ovam.stable_diffusion import StableDiffusionHooker
+from ovam.utils import set_seed
+
+pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
+pipe = pipe.to("mps") #mps, cuda, ...
+```
+
+Generate and image with Stable Diffusion and store the attention maps using OVAM hooker.
+
+```python
+with StableDiffusionHooker(pipe) as hooker:
+    set_seed(123456)
+    out = pipe("monkey with hat walking")
+    image = out.images[0]
+```
+### Generate and attention map with open vocabulary
+
+Extract attention maps for the attribution prompt `monkey with hat walking and mouth`:
+
+```python
+ovam_evaluator = hooker.get_ovam_callable(
+    expand_size=(512, 512)
+)  # You can configure OVAM here (aggregation, activations, size, ...)
+
+with torch.no_grad():
+    attention_maps = ovam_evaluator("monkey with hat walking and mouth")
+    attention_maps = attention_maps[0].cpu().numpy() # (8, 512, 512)
+```
+
+Have been generated 8 attention maps for the tokens:  `0:<SoT>, 1:monkey, 2:with, 3:hat, 4:walking, 5:and, 6:mouth, 7:<EoT>`. Plot attention maps for words `monkey`, `hat` and `mouth`:
+
+```python
+# Get maps for monkey, hat and mouth
+monkey = attention_maps[1]
+hat = attention_maps[3]
+mouth = attention_maps[6]
+
+# Plot using matplotlib
+fig, (ax0, ax1, ax2, ax3) = plt.subplots(1, 4, figsize=(20, 5))
+ax0.imshow(image)
+ax1.imshow(monkey, alpha=monkey / monkey.max())
+ax2.imshow(hat, alpha=hat / hat.max())
+ax3.imshow(mouth, alpha=mouth / mouth.max())
+plt.show()
+```
+Result (matplotlib code simplified, full in [examples/getting_started.ipynb](./examples/getting_started.ipynb)):
+![result](docs/assets/attention_maps.svg)
+
+### Token optimization
+
+OVAM library include code to optimize the tokens to improve the attention maps. Given an image generated with Stable Diffusion using the text `a photograph of a cat in a park`, we optimized a cat token for obtaining a mask of the cat in the image (full example in notebook).
+
+![Token optimization](docs/assets/optimized_training_attention.svg)
+
+This token, can be later used for generating a mask of the cat in other testing images. For example, in this image generated with the text `cat perched on the sofa looking out of the window`.
+
+![Token optimization](docs/assets/optimized_testing_attention.svg)
+
+### Different Stable Diffusion versions
+
+The current code have been tested with Stable Diffusion 1.5, 2.0 base and 2.1 in Diffusers 0.20. We provide a module ovam/base with utility classes for adapt OVAM to other Diffusion Models.
+
+## Experiments
+
+## Aknowledgements
+
+We want to thank the authors of [DAAM](https://github.com/castorini/daam) for their helpful code. A big thanks also to the open-source community of [HuggingFace](https://huggingface.co/docs/diffusers/index), [PyTorch](https://pytorch.org/), and RunwayML for making [Stable Diffusion 1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5) available. We also aknowledge the work of the teams behind [DatasetDM](https://github.com/showlab/DatasetDM), [DiffuMask](https://github.com/weijiawu/DiffuMask) and [Grounded Diffusion](https://github.com/Lipurple/Grounded-Diffusion), which we used in our experiments.
+
+## Citation
+
+> Pending publication.
@@ -0,0 +1,5 @@
+from .stable_diffusion_sa import StableDiffusionHookerSA as StableDiffusionHooker
+
+__version__ = "0.0.1"
+
+__all__ = ["StableDiffusionHooker", "__version__"]
@@ -0,0 +1,90 @@
+"""Class in charge of storing the hidden states of a block.
+
+The class OnlineAttentionStorage allows a simple version that 
+stores all the hidden states in memory. The AttentionStorage
+class is a generic class that can be used to implement more
+complex storage classes.
+
+"""
+from typing import TYPE_CHECKING, Iterable, Optional, List
+
+if TYPE_CHECKING:
+    import torch
+
+__all__ = ["AttentionStorage", "OnlineAttentionStorage"]
+
+
+class AttentionStorage:
+    """Generic class for storing hidden states of upsample/downsample block.
+
+    Attributes
+    ----------
+    name: str
+        The name of the block in the UNet.
+    """
+
+    def __init__(self, name: Optional[str] = None) -> None:
+        self.name = name
+
+    def store(self, hidden_states: "torch.Tensor") -> None:
+        """Stores the hidden states.
+
+        Arguments
+        ---------
+        hidden_states: List[torch.Tensor]
+            The hidden states of a block generated by an image. The
+            hidden states are stored in the order they are passed.
+        """
+        raise NotImplementedError
+
+    def __len__(self) -> int:
+        """Returns the number of images stored"""
+        raise NotImplementedError
+
+    def __getitem__(self, idx: int) -> "torch.Tensor":
+        """Returns the hidden state at the given index."""
+        raise NotImplementedError
+
+    def __iter__(self) -> Iterable["torch.Tensor"]:
+        """Returns an iterator over the stored hidden states."""
+        for i in range(len(self)):
+            yield self[i]
+
+    def clear(self) -> None:
+        """Clears the stored hidden states."""
+        raise NotImplementedError
+
+
+class OnlineAttentionStorage(AttentionStorage):
+    """Class to store the hidden states in memory.
+
+        Attributes
+    ----------
+    block_name: str
+        The name of the block in the UNet.
+    """
+
+    def __init__(self, name: Optional[str] = None):
+        super().__init__(name)
+        self.hidden_states: List["torch.Tensor"] = []
+
+    def store(self, hidden_states: "torch.Tensor") -> None:
+        """Stores the hidden states.
+
+        Arguments
+        ---------
+        hidden_states: List[torch.Tensor]
+            The hidden states of a block generated by an image. The
+            hidden states are stored in the order they are passed.
+        """
+        self.hidden_states.append(hidden_states)
+
+    def __len__(self) -> int:
+        return len(self.hidden_states)
+
+    def __getitem__(self, idx: int) -> "torch.Tensor":
+        return self.hidden_states[idx]
+
+    def clear(self) -> None:
+        """Clears the stored hidden states."""
+        self.hidden_states.clear()
@@ -0,0 +1,85 @@
+from typing import TYPE_CHECKING
+
+from .hooker import ObjectHooker, ModuleType
+from .attention_storage import OnlineAttentionStorage
+
+if TYPE_CHECKING:
+    import torch
+    from .daam_block import DAAMBlock
+
+
+class BlockHooker(ObjectHooker["ModuleType"]):
+    """Hooker for the CrossAttention blocks.
+
+    Monkey patches the forward method of the cross attention blocks of the
+    Stable Diffusion UNET.
+
+    Arguments
+    ---------
+
+    module: CrossAttention
+        Cross Attention moduled to be hooked.
+    block_index: int
+        Block index
+
+    Attributes
+    ----------
+    module: CrossAttention
+        Cross Attention module hooked
+    block_index: int
+        Block index
+    hidden_states: List[torch.Tensor]
+        List of hidden states hoked with size [ h*w ] x n_heads, where
+        `h*w` is the size flattended of the unet hidden state through the block,
+         (equal to h*w / (2**2*factor)) and n_heads the number of attention heads
+         of the module.
+
+    Note
+    ----
+        This class is based on the original implementation `daam.trace.UNetCrossAttentionHooker`.
+    """
+
+    # Default class to store the hidden states (in memory)
+    STORAGE_CLASS = OnlineAttentionStorage
+
+    def __init__(self, module: "ModuleType", name: str):
+        super().__init__(module)
+        self.name = name
+        self.hidden_states = self.STORAGE_CLASS(name=name)
+
+    def __repr__(self):
+        return f"{self.__class__.__name__}({self.name})"
+
+    def store_hidden_states(self) -> None:
+        """Stores the hidden states in the parent trace"""
+        raise NotImplementedError
+
+    def clear(self) -> None:
+        """Clear the hidden states"""
+        self.hidden_states.clear()
+
+    def _hook_impl(self) -> None:
+        """Monkey patches the forward method in the cross attention block"""
+        self.monkey_patch("forward", self._hooked_forward)
+
+    def _hooked_forward(
+        hk_self: "BlockHooker",
+        _: "ModuleType",
+        hidden_states: "torch.Tensor",
+    ):
+        """Hooked forward of the cross attention module.
+
+        Stores the hidden states and perform the original attention.
+        """
+        raise NotImplementedError
+
+    def daam_block(self) -> "DAAMBlock":
+        """Builds a DAAMBlock with the current hidden states.
+
+        Arguments
+        ---------
+        **kwargs:
+            Arguments passed to the `DAAMBlock` constructor.
+        """
+
+        raise NotImplementedError
@@ -0,0 +1,33 @@
+from typing import TYPE_CHECKING
+
+from torch import nn
+
+if TYPE_CHECKING:
+    from .attention_storage import AttentionStorage
+
+
+class DAAMBlock(nn.Module):
+    """Generic DAAMBlock used to save the hidden states of the cross attention blocks.
+
+    Should be implemented by each of the different architectures.
+    It is used to save the hidden states of the cross attention blocks and to
+    build a callable DAAM function.
+    """
+
+    def __init__(
+        self,
+        hidden_states: "AttentionStorage",
+        name: str,
+    ):
+        super().__init__()
+        self.name = name
+        self.hidden_states = hidden_states
+
+    def forward(self, x):
+        """Compute the attention for a given input x"""
+
+        return NotImplementedError
+
+    def store_hidden_states(self) -> None:
+        """Stores the hidden states in the parent trace"""
+        raise NotImplementedError