Skip to content

Releases: kyegomez/LongNet

0.0.3

07 Jul 04:24
Compare
Choose a tag to compare

flash attention module

0.0.2

06 Jul 14:09
Compare
Choose a tag to compare

Changelog

  1. Flash Multi-head Attention Integration

    • Initially, the code was using torch.nn.MultiheadAttention. It has been changed to use FlashMultiHeadAttention from the flash_attn library. This change is expected to improve the efficiency of the model's attention mechanism.
  2. GPU Support

    • All computations are now done on a specified GPU device, defined at the beginning of the script. This significantly improves the model's speed and efficiency.
  3. Use of 16-bit Floating Point Precision

    • The data type for computations was changed from default (32-bit floating-point) to 16-bit floating-point (torch.float16). This reduces memory usage and improves the speed of computations on modern GPUs.
  4. Added Dropout

    • A dropout layer has been added after the attention operation in the DilatedAttention class. Dropout is a regularization technique that prevents overfitting by randomly setting a fraction of input units to 0 at each update during training time.

    The changes were made in the following lines of code:

    # Initialize dropout layer in the constructor
    self.dropout = nn.Dropout(dropout)
    
    # Apply dropout after performing attention in the forward function
    attn_output = self.dropout(attn_output)
  5. Added Unit Tests and Benchmarks

    • Unit tests and benchmarking code have been added to ensure the correctness and efficiency of the DilatedAttention class.
  6. Documentation and Example Updates

    • Updated the documentation and usage examples for the DilatedAttention class to reflect the above changes.
  7. Twitter Thread

    • Created a Twitter thread in the style of Richard Feynman to promote the project and its new updates.

0.0.1

06 Jul 11:31
Compare
Choose a tag to compare
workflow