Prompt interrupted before continuation for Unicode UTF-8 emojis #63
Labels
bug
Something isn't working
duplicate
This issue or pull request already exists
enhancement
New feature or request
I have found that when having a Unicode UTF- emoji char like
Unicode Character “👍” (U+1F44D)
The prompts breaks up.
I'm reading a sample prompt from a text file:
Looking at logs I can see in fact that the tokenizers breaks at the (U+1F44D) char code:
resulting in a broken input prompt.
The text was updated successfully, but these errors were encountered: