Skip to content

Allow utf-8 trigger characters #388

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mikavilpas opened this issue Nov 26, 2024 · 1 comment
Closed

Allow utf-8 trigger characters #388

mikavilpas opened this issue Nov 26, 2024 · 1 comment
Labels
feature New feature or request sources Specific source provider or the system as a whole
Milestone

Comments

@mikavilpas
Copy link
Contributor

Feature Description

Currently it's not possible to trigger a search with https://github.com/mikavilpas/blink-ripgrep.nvim if the last character of the current word is a special character:

utf8-trigger.mov

I noticed the implementation seems to allow a list of characters that are allowed to trigger a search - maybe this is the cause of the issue (I have not defined the ö character to be a trigger character):

https://github.com/mikavilpas/blink.cmp/blob/f1b8abe2ca7f8b369c8cd48ddfb991a690d20345/lua/blink/cmp/sources/lib/init.lua?plain=1#L71-L86

However, utf-8 contains a very large amount of possible characters 😄 I think it's a very bad idea to list all of them - I think it will not perform well. Do you think it would be possible to allow sources to define if the character is acceptable with a function?

Because lua regexes do not apparently support utf-8 either, I think they will not work. Alternatively, chatgpt gave me vim.lpeg.R("\128\255") which seems to match many unicode characters well (there are examples in the tests)

@mikavilpas mikavilpas added the feature New feature or request label Nov 26, 2024
@fmoralesc
Copy link

This happens with the buffer source too. The culprit is the code that gets the candidate words, which uses a regex that only detects sequences of ASCII characters (see #130 ). I am currently modifying blink locally to use the regex [\p{Latn}][\p{Latn}-]{2,31}[\p{Latn}], which captures any latin-based word (and includes hyphens).

@Saghen Saghen added this to the Sources v2 milestone Nov 28, 2024
@Saghen Saghen added the sources Specific source provider or the system as a whole label Nov 28, 2024
@Saghen Saghen closed this as completed in 51d5f59 Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request sources Specific source provider or the system as a whole
Projects
None yet
Development

No branches or pull requests

3 participants