I've found 2 really cool pipeline builder tools for working with LangChain called Flowise and Langflow.
I find that designing your project architecture in Flowise/langflow first and then converting it to LangChain code using your local LLM is the nicest way to learn LangChain.
Using Docker for all these examples but you can use whatever method you want.
I am also experimenting with reading SQL from a regular database and using it as context for prompts. Seems to work pretty well.
Install LM Studio, download a model that will work on your GPU and start a local server.
Install Docker Desktop then run these commands in terminal:
docker run -d -p 7860:7860 logspace/langflow
docker run -d -p 8080:8080 localai/localai:v2.9.0-cublas-cuda12 bert-cpp
docker run -d -p 5432:5432 -e POSTGRES_PASSWORD=postgres pgvector/pgvector:pg16
git clone https://github.com/FlowiseAI/Flowise.git
cd Flowise
cp .env.example .env
docker-compose up -d
Create a SerpApi: Google Search API if you want to use search features.
Do your best to work the rest out.
git clone https://github.com/FlowiseAI/Flowise.git
cd Flowise
cp .env.example .env
docker-compose up -d
docker run -d -p 7860:7860 logspace/langflow
I am using LM Studio to host my own chat model.
If you have 24GB of VRAM I particularly like the following model for use with langchain:
TheBloke/deepseek-coder-33B-instruct-GGUF/deepseek-coder-33b-instruct.Q3_K_L.gguf
If your GPU has 12GB of VRAM you can try this model, but I have found the responses not always as good for prompting documents.
TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q8_0.gguf
I use LocalAI and one of the following models:
docker run -p 8080:8080 localai/localai:v2.9.0-cublas-cuda12 bert-cpp
docker run -p 8080:8080 localai/localai:v2.9.0-cublas-cuda12 all-minilm-l6-v2
I am using a Postgres Server with PGVector extension installed as my Vector Store.
docker run -p 5432:5432 -e POSTGRES_PASSWORD=postgres pgvector/pgvector:pg16
Your user will be postgres, change the password to something. The data will be loaded to the public.document schema.table.
I find this just as easy to use and understand as Pinecone except it's free. I am sure you could do some load balancing and horizontal scaling if you wanted to match its performance.
SerpApi: Google Search API Use this for adding search to your models. It grabs a few pages and loads them as context to your prompt.
When your Chat model detects a maths question it attempts to use numpy as it's tool to do the calculations as a context to the prompt.


