diff --git a/README.md b/README.md
index 07527f8..957f206 100644
--- a/README.md
+++ b/README.md
@@ -12,7 +12,7 @@ This code was fairly quickly thrown together and may contains many, many bugs. F
 
 ## Tokenize datasets
 
-First, we tokenize the data so we never have to worry about the tokenizer again. The tokenization script takes in a JSONL (each row containing the key `"text"` for the document text), and effectively concatenates, tokenizes, and slices into `max_seq_length` chunks.
+*Requires using the **Transformers** PR [here](https://github.com/huggingface/transformers/pull/21955/), based on the fork [here](https://github.com/zphang/transformers/tree/llama_push). First, we tokenize the data so we never have to worry about the tokenizer again. The tokenization script takes in a JSONL (each row containing the key `"text"` for the document text), and effectively concatenates, tokenizes, and slices into `max_seq_length` chunks.
 
 (This is a quick and dirty script that loads the whole dataset into memory.)