This sample demonstrates how to deploy Ollama with Defang, along with a Next.js frontend using the AI SDK for smooth streaming conversations. By default it runs a very small model (llama3.2:1b) which can perform well with just a CPU, but we've included lines that you can uncomment in the compose file to enable GPU support and run a larger model like gemma:7b. If you want to deploy to a GPU powered instance, you will need to use your own AWS account with Defang BYOC.
- Download Defang CLI
- (Optional) If you are using Defang BYOC authenticated with your AWS account
- (Optional for local development) Docker CLI
To run the application locally, you can use the following command:
docker compose -f compose.dev.yaml upNote
Download Defang CLI
Deploy your application to the defang playground by opening up your terminal and typing defang up.
Keep in mind that the playground does not support GPU instances.
If you want to deploy to your own cloud account, you can use Defang BYOC:
- Authenticate your AWS account, and that you have properly set your environment variables like
AWS_PROFILE,AWS_REGION,AWS_ACCESS_KEY_ID, andAWS_SECRET_ACCESS_KEY. - Run
defang upin a terminal that has access to your AWS environment variables.
Title: Ollama
Short Description: Ollama is a tool that lets you easily run large language models.
Tags: AI, LLM, ML, Llama, Mistral, Next.js, AI SDK,
Languages: Typescript