Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] New demo type/use case: semantic search (SemanticFinder) #84

Open
do-me opened this issue Apr 12, 2023 · 26 comments
Open

[Question] New demo type/use case: semantic search (SemanticFinder) #84

do-me opened this issue Apr 12, 2023 · 26 comments
Labels
question Further information is requested

Comments

@do-me
Copy link
Contributor

do-me commented Apr 12, 2023

Hi @xenova,
first of all thanks for the amazing library - it's awesome to be able to play around with the models without a backend!

I just created SemanticFinder, a semantic search engine in the browser with the help of transformers.js and sentence-transformers/all-MiniLM-L6-v2.

You can find some technical details in the blog post.

I was wondering whether you'd be interested in showcasing semantic search as new demo type. Technically, it's not a new model but it's a new use case with an existing model so I don't know whether it's out of scope.

Anyway, just wanted to let you know that you're work is very much appreciated!

@do-me do-me added the question Further information is requested label Apr 12, 2023
@xenova
Copy link
Collaborator

xenova commented Apr 12, 2023

This is so cool! I plan to completely rewrite the demo application which, as you can tell, is extremely simple... so this definitely sounds like something I can add!

PS: Do you have a Twitter post I can retweet? I'd love to share it! Edit: Found it!

@xenova
Copy link
Collaborator

xenova commented Jun 1, 2023

@do-me Just a heads up that I updated the feature-extraction API to support other models (not just sentence-transformers). To use the updated API, you just need to add { pooling: 'mean', normalize: true } to the pipeline call. Your demo site seems unaffected (as it is still using the previous version), but if you'd like to add support for other models, you can make the following changes:

For example:

Before:

let extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
let result = await extractor('This is a simple test.');
console.log(result);
// Tensor {
//     type: 'float32',
//     data: Float32Array [0.09094982594251633, -0.014774246141314507, ...],
//     dims: [1, 384]
// }

After:

let extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
let result = await extractor('This is a simple test.', { pooling: 'mean', normalize: true });
console.log(result);
// Tensor {
//     type: 'float32',
//     data: Float32Array [0.09094982594251633, -0.014774246141314507, ...],
//     dims: [1, 384]
// }

And if you don't want to do pooling/normalization, you can leave it out. You will then get the embeddings for each token in the sequence.

@xenova
Copy link
Collaborator

xenova commented Jun 1, 2023

Also - we're planning on releasing a semantic search demo next week 🥳 (so, watch this space!)

@do-me
Copy link
Contributor Author

do-me commented Jun 2, 2023

This is awesome, thanks for pinging me!

I'm very interested in this feature, mainly for speed improvements. Do you have some benchmarks at hand how the new pooling approach compares to sequential processing?

Also, I'd be curious to know if there's a sweet spot somewhere how many elements could/should be passed to the model at once.

And one more detail but it's probably also model dependent: can you track the progress of a batch/pool that has been passed to the model? E.g. if I pass 1000 elements at once, is there any theoretic way to return the progress so I can update the progress bar in the frontend meanwhile?

@do-me
Copy link
Contributor Author

do-me commented Jun 21, 2023

fyi
SemanticFinder just had a great contribution from @VarunNSrivastava improving the UI significantly with new features. Also updated the transformers.js version: New Demo

@lizozom
Copy link

lizozom commented Jul 12, 2023

Hey, joining the semantic search on the FE party 🥳 .

I'm wondering if we can leverage the power of threads in this scenario by setting env.backends.onnx.wasm.numThreads = 4.
I don't see any errors throw, but also no drastic performance improvements.

@xenova
Copy link
Collaborator

xenova commented Jul 12, 2023

@lizozom Hi there! 👋

So, the most likely reason for this is that SharedArrayBuffer is not available because COOP/COEP headers are not set for the hosted files. You can check your network tab when running the model and you should see ort-wasm-simd.wasm loaded instead of ort-wasm-simd-threaded.wasm. For more information, check out this related open issue: #161

To fix this, it depends where you are hosting the website, as these headers must be set by the server. At the moment, GitHub pages does not offer this (https://github.com/orgs/community/discussions/13309), but there are some workarounds (cc @josephrocca). On the other hand, we are actively working to support this feature in Hugging Face spaces (huggingface/huggingface_hub#1525), which should hopefully be ready soon!

@do-me
Copy link
Contributor Author

do-me commented Jul 12, 2023

Seems like netlify offers a little more flexibility. I'm a very happy user of netlify (hosting my blog there since 2019 without any trouble) and it's pretty easy to link a GitHub repo to it. @lizozom if needed, we might consider switching from GitHub pages to netlify.

@lizozom
Copy link

lizozom commented Jul 12, 2023

Cool!
I'll check and let you know.

@josephrocca
Copy link
Contributor

Current workaround is to put this file beside your HTML file, and then import it with a script tag in your document <head>. The Github Pages engineering lead said a few days ago that they are working on custom headers but there's no ETA.

I personally wouldn't go with Netlify, since their pricing is a bit to aggressive for my use cases, but depends on what you're doing. Netlify's free 100GB could be used up very quickly if you have a few assets like ML models or videos or whatever (even just a few thousand visitors - e.g. due to being shared on Twitter or HN). Cloudflare Pages is much better imo (unlimited bandwidth and requests for free), but again it depends on your use case - Netlify may suffice.

@do-me
Copy link
Contributor Author

do-me commented Jul 13, 2023

Thanks for the hint! Does Cloudflare Pages offer custom headers? 
Unlimited bandwidth sounds indeed great! Will check it out.
Luckily we don't need to host the models but only the static page with the framework (currently everything bundled is ~2Mb) so it's not that bad but still something to keep in mind.

@josephrocca
Copy link
Contributor

I haven't actually had to do that with Cloudflare Pages yet, but here are their docs for custom headers: https://developers.cloudflare.com/pages/platform/headers/

@lizozom
Copy link

lizozom commented Jul 16, 2023

I tested this out on a local webpack project, serving files with these headers:

  devServer: {
    headers: {
      'Cross-Origin-Opener-Policy': 'same-origin', 
      'Cross-Origin-Embedder-Policy': 'require-corp',
    },
  },

And indeed, this causes the threaded version (ort-wasm-simd-threaded.wasm) to be loaded.
I'm not seeing much of a performance difference right away, but I'll tinker with it some more.

@xenova In your opinion, should I expect to see performance improvements if I'm running a large batch of embeddings pipelines single vs. multi threaded?

@xenova
Copy link
Collaborator

xenova commented Jul 16, 2023

@lizozom yes, we should be seeing improvements, but I believe there is a bug in ORT which is not correctly allocating work among the threads. There is an ongoing discussion about this here: #161

@lizozom
Copy link

lizozom commented Jul 17, 2023

Sweet, I'll keep track.
Let me know if I can help there in any way!

@do-me
Copy link
Contributor Author

do-me commented Sep 20, 2023

@VarunNSrivastava built a really nice Chrome extension for SemenaticFinder. You can already install it locally as explained here.

We submitted it for review so it should be a matter of days (hopefully) or few weeks in the worst case.

It's working very well for many different types of pages (even pdfs if they end with .pdf!). There is a settings page too where it's highly recommended to raise the minimum segment length if there is lots of text on a page (like more than 10 pages for example). You can also choose a different model if you're working with non-English content.

I spotted the gap in the HF docs about developing a browser extension and was wondering whether we could give a hand in filling it? In the end, our application isn't too complex in terms or "moving" parts so it might make for a good example. Also, we already learnt about some caveats that might be good to write down.

@xenova
Copy link
Collaborator

xenova commented Sep 20, 2023

That would be amazing! 🤯 Yes please! You could even strip down the tutorial quite a bit if you want (the simpler, the better).

@do-me
Copy link
Contributor Author

do-me commented Sep 21, 2023

We're using vue components in the extension which might already be slightly too complex for a beginner's tutorial (this would be more of an intermediate/ slightly advanced version I guess). However, I have plans to write yet another extension with similar functionality and really keep it super simple. Will keep you posted but probably better in a new issue. 

I just have one question which is relevant to both, the extension and SemanticFinder, I just couldn't quite understand from the HF docs:

When using text2text-generation like Xenova/LaMini-Flan-T5-783M or summarization like Xenova/distilbart-cnn-6-6

var outputElement = document.getElementById("output");

async function allocatePipeline(instruction) {
  let classifier = await pipeline('text2text-generation', 'Xenova/LaMini-Flan-T5-783M');
  output = await classifier(instruction, {
    max_new_tokens: 100
  });

  outputElement.innerHTML = output[0];
}
allocatePipeline()
var outputElement = document.getElementById("output");

async function allocatePipeline(inText) {
  let generator = await pipeline('summarization', 'Xenova/distilbart-cnn-6-6');
  let out = await generator(inText, {
    max_new_tokens: 100,
  });

  outputElement.innerHTML = out[0].summary_text;
}

allocatePipeline("some test text to summarize");

how can I add a callback, so that my html component is updated each time a new token is created? I tried with different kinds of callbacks and searched through the API but I have the impression that I'm missing something quite obvious.

@xenova
Copy link
Collaborator

xenova commented Sep 21, 2023

The callback functionality is not very well-documented (perhaps for good reason), since it's non-standard and at the time of its creation, didn't have an equivalent mechanism in transformers.

For now, you can replicate what I did here using the callback_function generation parameter:

https://github.com/xenova/transformers.js/blob/c367f9d68b809bbbf81049c808bf6d219d761d23/examples/demo-site/src/worker.js#L191-L194

@xenova
Copy link
Collaborator

xenova commented Sep 21, 2023

We're using vue components in the extension which might already be slightly too complex for a beginner's tutorial (this would be more of an intermediate/ slightly advanced version I guess). However, I have plans to write yet another extension with similar functionality and really keep it super simple. Will keep you posted but probably better in a new issue.

PS: please check out this PR, it removes the redundant CustomCache class. Let me know if that helps!

@do-me
Copy link
Contributor Author

do-me commented Sep 22, 2023

For now, you can replicate what I did here using the callback_function generation parameter

Thanks a lot, this pointed me in the right direction!
However, I needed to import AutoTokenizer and use it this way

let tokenizer = await AutoTokenizer.from_pretrained(model);

I noticed that without a worker.js you cannot update the DOM for each generated token/beam as the event loop is blocked, which might be something for the docs. Making the callback async and using await in the callback function doesn't help. It's probably in the nature of the package architecture that it cannot work differently.

However, for a minimal example, demonstrating e.g. the speed of token generation, you can still log it to the console and watch it live:

    callback_function: function (beams) {
      const decodedText = tokenizer.decode(beams[0].output_token_ids, {
          skip_special_tokens: true});
      console.log(decodedText);
    }

Demo here.

image

@xenova
Copy link
Collaborator

xenova commented Sep 22, 2023

Yes that's correct, the best way I have found around this is to use the Web Worker API, and post messages back to the main thread in the callback_function:

https://github.com/xenova/transformers.js/blob/c367f9d68b809bbbf81049c808bf6d219d761d23/examples/demo-site/src/worker.js#L189-L202

and you initialize the worker like:
https://github.com/xenova/transformers.js/blob/c367f9d68b809bbbf81049c808bf6d219d761d23/examples/demo-site/src/main.js#L16-L19

@Fhrozen
Copy link

Fhrozen commented Feb 28, 2024

@xenova thank you for your extraordinary work.
@do-me I would like to know how did you connected to transformers using Vue.
I am currently working on a project with Vue3, in TS, and keep getting SyntaxError: Unexpected token '<', "<!DOCTYPE "... is not valid JSON when try to load a config or pipeline.

the code is simple:
in a Vue component:

<script setup lang="ts">
import { env, pipeline, AutoConfig  } from '@xenova/transformers'
await AutoConfig.from_pretrained(repoid)
</script>

or in a ts file:

import { env, pipeline, AutoConfig  } from '@xenova/transformers'
import { defineStore } from 'pinia'

export const TransformerJs = defineStore('transformers', () => {
  function setupOnnx() {
    // env.localModelPath = '@/assets/models/'
    env.allowRemoteModels = true
    env.allowLocalModels = false
  }
  async function downloadModel(repoid:string, taskid:any) {
    await AutoConfig.from_pretrained(repoid)
  }
  return { env, setupOnnx, downloadModel }
})

Did you change directly in the transformer.js to support Vue, or nothing special?

@xenova
Copy link
Collaborator

xenova commented Feb 28, 2024

@Fhrozen As long as you:

  1. Set env.allowLocalModels = false, and
  2. Delete cached files from devtools' Application tab

It should work. This will be fixed in Transformers.js v3, where allowLocalModels will default to false when running in the browser.

@do-me
Copy link
Contributor Author

do-me commented Feb 29, 2024

@Fhrozen, I'm pinging @VarunNSrivastava who created the entire vue-based browser plugin. Feel free to ask any questions!

@Fhrozen
Copy link

Fhrozen commented Feb 29, 2024

@xenova, Thank you very much for the details. As you mentioned, the issue was caused by the change allowremote from true to false.
@do-me, Thank you very much; I will be submitting any questions. However, I think I will be opening a different Issue that could be dedicated to Vue + Transformers.JS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants