This guide provides steps for setting up an end-to-end Conversational AI pipeline using NVIDIA NIMs on DigitalOcean GPU droplets, including client setup instructions.
Ensure you have the following:
- DigitalOcean CLI - doctl
- NGC API Key: Generate one from NVIDIA NGC and install the ngc-cli
- Anthropic API Key: Generate one from Anthropic
-
Create a GPU Droplet:
Use
doctl
to spin up a GPU Droplet, replacing<region>
and<ssh-key-fingerprint>
with appropriate values:doctl compute droplet create ab-ai-ctk --region <tor1/ams3> --image gpu-h100x1-base --size gpu-h100x1-80gb --ssh-keys <ssh-key-fingerprint>
-
Run the NIM Services:
## inital-setup ngc config set docker login nvcr.io Username: $oauthtoken Password: <ngc_api_key> # Note: This cache directory is to where models are downloaded inside the container. If this volume is not mounted, the container does a fresh download of the model every time the container starts mkdir ~/nim-cache export NIM_CACHE_PATH=~/nim-cache sudo chmod -R 777 $NIM_CACHE_PATH ## Run the services cd server # rename .env.example to .env and add the values in the .env file mv .env.example .env # spin up the nim services (asr and tts) docker-compose --env-file .env up
- Run the Speech2Speech Client:
Install the following dependencies on your client machine:
pip3.13 install -r requirements.txt
Use the following command to transcribe audio from your microphone:
python3.13 src/main.py --asr-server <public-ip>:50051 --tts-server <public-ip>:50052 --language-code en-US --input-device 0 --output-device 1 --stream
Replace <public-ip>
with the public IP of your GPU Droplet.