Releases · kyegomez/LongNet

Changelog

Flash Multi-head Attention Integration
- Initially, the code was using torch.nn.MultiheadAttention. It has been changed to use FlashMultiHeadAttention from the flash_attn library. This change is expected to improve the efficiency of the model's attention mechanism.
GPU Support
- All computations are now done on a specified GPU device, defined at the beginning of the script. This significantly improves the model's speed and efficiency.
Use of 16-bit Floating Point Precision
- The data type for computations was changed from default (32-bit floating-point) to 16-bit floating-point (torch.float16). This reduces memory usage and improves the speed of computations on modern GPUs.
Added Dropout
- A dropout layer has been added after the attention operation in the DilatedAttention class. Dropout is a regularization technique that prevents overfitting by randomly setting a fraction of input units to 0 at each update during training time.
The changes were made in the following lines of code:
```
# Initialize dropout layer in the constructor
self.dropout = nn.Dropout(dropout)

# Apply dropout after performing attention in the forward function
attn_output = self.dropout(attn_output)
```
Added Unit Tests and Benchmarks
- Unit tests and benchmarking code have been added to ensure the correctness and efficiency of the DilatedAttention class.
Documentation and Example Updates
- Updated the documentation and usage examples for the DilatedAttention class to reflect the above changes.
Twitter Thread
- Created a Twitter thread in the style of Richard Feynman to promote the project and its new updates.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changelog

Releases: kyegomez/LongNet

0.0.3

0.0.2

Changelog

0.0.1