Basic semantic search for a tweet archive. Part of the Community Archive ecosystem.
Live demo: https://defenderofbasic.github.io/twitter-semantic-search/
- Put your twitter archive in
archives/YOUR_USERNAME.zip
- Install Chroma for vector embedding & storage, should just be
pip install chroma
- Run the setup script with your username (will run a chroma server and generate the local embeddings)
./local-setup-zip.sh YOUR_USERNAME
- Run the frontend
cd frontend/
pnpm install
pnpm dev
Open http://localhost:3000/search-local.html
To run the local embedding & search on an archive JSON downloaded from the community archive, instead of a raw zip file, you can do the steps that the local-setup-zip.sh
performs but manually:
- Run the Chroma server
chroma run
- Put the JSON in
archives/USERNAME-combined.json
- Run the embedding script
cd generate-embeddings
pnpm install
pnpm local-embed USERNAME
Then run the frontend with pnpm dev
in frontend/
.
The general steps are, create & deploy the CloudFlare worker + vector DB (see instructions in cloudflare-worker/
directory). Then generate embeddings (run the script in generate-embeddings/
with your archive JSON in archives/
). Finally run the frontend/
and replace the cloudflare URL with your own, and a URL where the archive JSON is hosted.