An easy to use release is coming soon, but for now you can follow the instructions below to get started.
💡 Tip: Download and install Python if you can't run it from the terminal.
Setup and run the server:
Alembic is used to manage database versioning using migrations.
alembic upgrade head
alembic downgrade -1
Update backend/models.py then run:
alembic revision --autogenerate -m "added asset table"
** NOTE: If you get an error about an already existing table, you may want to drop the table and run 'alembic upgrade head' again. **
Clone the entire repository
git clone https://github.com/Kwaai-AI-Lab/assistant
-
Create a
.env
file In the root directory of your project, create a new file named.env
-
Add your environment variables
Open the.env
file in a text editor and add the required environment variables. Each variable should be on a new line in the format:
Example of .env
:
PAIOS_ALLOW_ORIGINS='http://localhost:5173,https://0.0.0.0:8443,https://localhost:3000'
PAIOS_DB_ENCRYPTION_KEY=''
CHUNK_SIZE='2000'
CHUNK_OVERLAP='400'
ADD_START_INDEX='True'
EMBEDDER_MODEL='llama3:latest'
SYSTEM_PROMPT='You are a helpful assistant for students learning needs.'
MAX_TOKENS='200'
TEMPERATURE='0.2'
TOP_K='40'
TOP_P='0.9'
PAIOS_SCHEME='https'
PAIOS_HOST='0.0.0.0'
PAIOS_EXPECTED_RP_ID='localhost'
PAIOS_PORT='8443'
PAIOS_URL='https://localhost:8443'
PAI_ASSISTANT_URL='https://localhost:3000'
PAIOS_JWT_SECRET=''
#Eleven labs
XI_API_URL='https://api.elevenlabs.io'
XI_API_KEY=''
XI_CHUNK_SIZE='1024'
#Ollama
OLLAMA_LOCAL_MODELS_URL='http://0.0.0.0:11434/api/tags'
OLLAMA_MODELS_DESCRIPTION_URL='https://ollama.com/library/'
PAIOS_URL: Url where is hosted the PAIOS
PAI_ASSISTANT_URL: Url where is hosted the PAIAssistant
TEMPERATURE: The temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8)
TOP_K: Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative.
TOP_P: Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)
CHUNK_SIZE: The maximum size of a chunk. CHUNK_OVERLAP: Target overlap between chunks. Overlapping chunks helps to mitigate loss of information when context is divided between chunks.
ADD_START_INDEX: We set add_start_index=True so that the character index at which each split Document starts within the initial Document is preserved as metadata attribute “start_index”.
EMBEDDER_MODEL: Check the final ❗ note.
XI_CHUNK_SIZE:Size of chunks to read/write at a time
MAX_TOKENS: Maximum number of tokens to predict when generating text. (Default: 128, -1 = infinite generation, -2 = fill context). Common lengths might be:
- A maximum of 50 tokens for very concise answers
- A maximum of 200 tokens for more substantial responses
Follow these steps to get an API key from Eleven Labs:
-
Create an Eleven Labs Account
If you don't already have an account, go to Eleven Labs and sign up for a free or paid plan. You'll need an active account to access the API.
-
Log In to Your Account
Once you have an account, log in to the Eleven Labs dashboard using your credentials.
-
Navigate to the API Section
After logging in, go to the API section of the dashboard. You can typically find this in the main navigation or giving click on your User name and then on API Keys option.
-
Generate and copy Your API Key
In the API section, you'll see an option to generate or view your API key. Click the button to generate a new API key if one isn’t already created. Once generated, your API key will be displayed. Copy it and store it securely, because you will not be able to display it again.
-
Store the API Key in Your Project’s
.env
File To securely use the API key in your project, add it to your.env
file like this:XI_API_KEY="sk_******"
In order to create an assistant, you will need to download Ollama
macOS
Windows preview
Linux
curl -fsSL https://ollama.com/install.sh | sh
After installing Ollama, Ollama will run in the background and the ollama
command line is available in cmd
, powershell
or your favorite terminal application. As usual the Ollama api will be served on
http://localhost:11434
.
You should see the message: "Ollama is running"
Download a model
ollama run llama3.2
Pull a model
This command can also be used to update a local model. Only the diff will be pulled.
ollama pull llama3.2
List all the installed models
This list contains the available models for you to choose from and set as the Large Language Model (LLM) for your Assistant.
ollama list
Remove a model
Keep in mind that once you delete a model from your computer, it will no longer be available for setting up with your Assistant.
ollama rm llama3.2
Ollama supports a list of models available on ollama.com/library
Here are some example models that can be downloaded:
Model | Parameters | Size | Download |
---|---|---|---|
Llama 3.2 | 3B | 2.0GB | ollama run llama3.2 |
Llama 3.2 | 1B | 1.3GB | ollama run llama3.2:1b |
Llama 3.1 | 8B | 4.7GB | ollama run llama3.1 |
Llama 3.1 | 70B | 40GB | ollama run llama3.1:70b |
Llama 3.1 | 405B | 231GB | ollama run llama3.1:405b |
Phi 3 Mini | 3.8B | 2.3GB | ollama run phi3 |
Phi 3 Medium | 14B | 7.9GB | ollama run phi3:medium |
Gemma 2 | 2B | 1.6GB | ollama run gemma2:2b |
Gemma 2 | 9B | 5.5GB | ollama run gemma2 |
Gemma 2 | 27B | 16GB | ollama run gemma2:27b |
Mistral | 7B | 4.1GB | ollama run mistral |
Moondream 2 | 1.4B | 829MB | ollama run moondream |
Neural Chat | 7B | 4.1GB | ollama run neural-chat |
Starling | 7B | 4.1GB | ollama run starling-lm |
Code Llama | 7B | 3.8GB | ollama run codellama |
Llama 2 Uncensored | 7B | 3.8GB | ollama run llama2-uncensored |
LLaVA | 7B | 4.5GB | ollama run llava |
Solar | 10.7B | 6.1GB | ollama run solar |
❗ Note: Currently, the Assistant's embbeder model cannot be configured through the Assistant's UI. This means that whatever model you specify in the .env file under the variable name EMBEDDER_MODEL must already be installed on your computer. You can verify the models installed by running the command ollama list |
---|
docker build -t assistant .
docker run -p 8443:8443 assistant
Visit https://localhost:8443/