Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New API version with Todos #77

Merged
merged 79 commits into from
Mar 3, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
c8bb2a8
wip
SYoy Feb 6, 2025
bfc8594
begin implementing dispatch
dani2112 Feb 6, 2025
4f79924
restructure message handling
dani2112 Feb 6, 2025
068f7b4
warn if payload contains extra data
dani2112 Feb 6, 2025
e0f22e9
implement search sources
dani2112 Feb 7, 2025
ef73c4b
introduce modifier types
dani2112 Feb 7, 2025
18060a0
typecast the actions
dani2112 Feb 7, 2025
4675587
introduce some props for actions
dani2112 Feb 7, 2025
4fc4ffb
define allowed payload values
dani2112 Feb 10, 2025
eb1145d
fix follow up action recursion
dani2112 Feb 10, 2025
76e48c6
create rag provider 2
dani2112 Feb 10, 2025
3a392e7
minimal hook and component example
dani2112 Feb 10, 2025
90fb14c
current state
dani2112 Feb 10, 2025
b9b87a0
WIP debugging reset active Sources in AdvancedQueryField
SYoy Feb 12, 2025
7f83b3b
Next step, create sample application
SYoy Feb 12, 2025
e05df49
add new conversation
dani2112 Feb 14, 2025
a0fd11c
add trash icon in sources
dani2112 Feb 14, 2025
7e3d325
fix some unused var warnings
dani2112 Feb 17, 2025
b5b39f3
validate for allowed modifiers
dani2112 Feb 17, 2025
788e6cf
fix some more type errors
dani2112 Feb 17, 2025
4751dbd
changed sourcesdisplay layout
dani2112 Feb 17, 2025
2666357
fix layout of chat window
dani2112 Feb 17, 2025
e7557d4
add mock sources
dani2112 Feb 18, 2025
4ab47d0
remove potential naming conflict with key
dani2112 Feb 18, 2025
fb3b922
fix incorrect typing callbacks advancedqueryfield
dani2112 Feb 18, 2025
e9a7869
no type errors
dani2112 Feb 18, 2025
b216340
centralize error handling
dani2112 Feb 18, 2025
a2236fc
rename hooks2 to hooks
dani2112 Feb 19, 2025
26416f2
introduce rag status hook
dani2112 Feb 19, 2025
82541ad
rename ragprovider2
dani2112 Feb 19, 2025
94b8e29
move hooks
dani2112 Feb 19, 2025
92c52f3
fix error handling
dani2112 Feb 19, 2025
33f0aa2
begin refactoring of sync async handling
dani2112 Feb 19, 2025
0e059a9
handle async functions correctly
dani2112 Feb 19, 2025
5b57259
update test app
dani2112 Feb 19, 2025
fc47cc9
delay sources response
dani2112 Feb 19, 2025
ec3983e
rename rag state
dani2112 Feb 19, 2025
0f716cf
export hooks & commented out .test.tsx files
SYoy Feb 19, 2025
fd61ae6
allow undefined and other minor fixes
dani2112 Feb 20, 2025
a89cb0e
add really basic demo story
dani2112 Feb 20, 2025
94a2d53
reenable basic testing
dani2112 Feb 20, 2025
6b0df7e
move types to types file
dani2112 Feb 20, 2025
5621810
fix test type errors
dani2112 Feb 20, 2025
4fc946d
add config prop
dani2112 Feb 20, 2025
d641543
New layout of example app
ramses1998 Feb 20, 2025
b10472b
current state
dani2112 Feb 20, 2025
9c3223f
alter connectors
dani2112 Feb 20, 2025
167ff35
new app tsx
dani2112 Feb 21, 2025
cb3258d
fix faulty hook
dani2112 Feb 21, 2025
484ffec
remove log in hook
dani2112 Feb 21, 2025
72e7576
Merge branch 'new-api' into feature/new-app-layout
ramses1998 Feb 21, 2025
22e2e0a
copied recent app.tsx from lexio/src into current app.tsx of branch
ramses1998 Feb 21, 2025
f6a52b0
style of main content fixed
ramses1998 Feb 21, 2025
a510cc5
Updated App.tsx of example new
ramses1998 Feb 21, 2025
0e923e3
introduce non blocking actions
dani2112 Feb 21, 2025
a6e231e
fix streamchunk type
dani2112 Feb 21, 2025
2244cf9
minor changes
dani2112 Feb 21, 2025
08e5c0f
Changes ín package-lock.json in example/langchain/frontend reverted
ramses1998 Feb 24, 2025
bfde091
Added readMe to new example
ramses1998 Feb 24, 2025
c805462
Relative paths imports changed to import from the lexio library
ramses1998 Feb 24, 2025
a4dcd36
New example shifted in new directory examples/rag-ui
ramses1998 Feb 24, 2025
463494f
Merge branch 'new-api' into feature/new-app-layout
ramses1998 Feb 24, 2025
0bb610f
main.py updated: unnecessary endpoints removed
ramses1998 Feb 24, 2025
277f6cd
working types and example
dani2112 Feb 25, 2025
38716ea
MessageWithOptionalId type
dani2112 Feb 25, 2025
69b9eb8
generate sources in retrieve
dani2112 Feb 25, 2025
28f1c19
added mock data
ramses1998 Feb 25, 2025
f90332f
source type casted
ramses1998 Feb 26, 2025
644ac4a
backend from examples/rag-ui removed
ramses1998 Feb 26, 2025
0e62986
Merge pull request #75 from Renumics/feature/new-app-layout
SYoy Feb 26, 2025
1c51907
flatten response
dani2112 Feb 28, 2025
3f9d750
Merge branch 'new-api' of github.com:Renumics/renumics-rag-ui into ne…
dani2112 Feb 28, 2025
e645593
fix message handling
dani2112 Feb 28, 2025
ecbc808
fix promise on select source
dani2112 Feb 28, 2025
bafac77
merge main
dani2112 Feb 28, 2025
5bab1b3
fixed chatwindow merge conflict
SYoy Mar 3, 2025
8efc71b
added docu
SYoy Mar 3, 2025
ec8c5af
refactored hooks
SYoy Mar 3, 2025
e94a6a5
changed RAGProvider to LexioProvider
SYoy Mar 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions TODOS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Open tasks before release of new API version

- [ ] Style new conversion buttin in Chatwindow
- [ ] Docs: RagProvider / state management
- [ ] Docs: Hooks
- [ ] Docs: Update to API 2.0 (new features, user-action flow)
- [ ] Python API: Update to API 2.0
- [ ] (Optional) add user action modifiers to current set of operations (slack canvas)
- [ ] activeSources -> [], null, or set of retrievedSources
- [ ] Document .data attribute and how to omit it from backend calls to avoid timeouts
- [ ] Document Hooks (docstrings)
- [ ] Look into type inference (auto complete) for ActionHandlerResponses
- [ ] Refactor docs 'LexioProvider'
252 changes: 95 additions & 157 deletions examples/advanced-local-rag/backend/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,14 @@
device = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"

# Load model & tokenizer
model_name = "Qwen/Qwen2.5-7B-Instruct"
model_name = "Qwen/Qwen2.5-7B-Instruct" if device == "cuda" else "HuggingFaceTB/SmolLM2-360M-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_4bit=True).to(device)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
load_in_4bit=True if device == "cuda" else False,
bnb_4bit_compute_dtype=torch.float16 # Set compute dtype to float16
).to(device)

class Message(BaseModel):
role: str
Expand Down Expand Up @@ -74,7 +79,10 @@ def run_generation():
inputs=inputs,
streamer=streamer,
max_new_tokens=1024,
do_sample=False # set to True if you'd like more "interactive" sampling
do_sample=False,
top_p=None,
top_k=None,
temperature=None
)
finally:
# Ensure streamer is properly ended even if generation fails
Expand All @@ -87,42 +95,90 @@ def run_generation():
# 5) Return the streamer to the caller
return streamer

class RetrieveAndGenerateRequest(BaseModel):
query: str

@app.post("/api/retrieve-and-generate")
async def retrieve_and_generate(request: RetrieveAndGenerateRequest):
@app.get("/pdfs/{id}")
async def get_pdf(id: str):
"""
SSE endpoint that:
1) Retrieves relevant chunks from your DB
2) Yields a JSON with 'sources' first
3) Streams tokens from the model as they are generated (real time)
Endpoint to serve document content by looking up the ID in the database.
Handles both binary (PDF) and text-based (HTML, Markdown) content.
"""
query = request.query
# First look up the document path using the ID from the database
table = db_utils.get_table()
results = table.search().where(f"id = '{id}'", prefilter=True).to_list()

if not results:
raise HTTPException(status_code=404, detail="Document ID not found")

# Get the document path from the first result
doc_path = results[0]['doc_path']

if not os.path.exists(doc_path):
raise HTTPException(status_code=404, detail="File not found on disk")

# Determine content type based on file extension
if doc_path.endswith('.pdf'):
return FileResponse(
doc_path,
media_type='application/pdf',
filename=os.path.basename(doc_path)
)
elif doc_path.endswith('.html'):
return FileResponse(
doc_path,
media_type='text/html',
filename=os.path.basename(doc_path)
)
else: # Markdown or other text files
return FileResponse(
doc_path,
media_type='text/plain',
filename=os.path.basename(doc_path)
)

class ChatRequest(BaseModel):
messages: List[Message]
source_ids: Optional[List[str]] = None

@app.post("/api/chat")
async def chat_endpoint(request: ChatRequest):
"""
Unified SSE endpoint that handles both initial queries and follow-ups:
- messages: list of chat messages (required)
- source_ids: optional list of specific source IDs to use
If source_ids are provided, those specific sources will be used as context
If no source_ids are provided, the system will automatically retrieve relevant sources
based on the latest user query
"""
print("Request received:", request)
try:
# Start timing
start_time = time.time()
messages_list = request.messages
print(f"Chat history length: {len(messages_list)}")
print("Message roles:", [msg.role for msg in messages_list])

context_str = ""
sources = []

# 1) Time the embedding generation
embed_start = time.time()
query_embedding = db_utils.get_model().encode(query)
embed_time = time.time() - embed_start
print(f"Embedding generation took: {embed_time:.2f} seconds")
# Get the latest user message as the query for retrieval
latest_query = next((msg.content for msg in reversed(messages_list) if msg.role == "user"), None)

# 2) Time the database search
search_start = time.time()
table = db_utils.get_table()
results = (
table.search(query=query_embedding, vector_column_name="embedding")
.limit(5)
.to_list()
)

search_time = time.time() - search_start
print(f"Database search took: {search_time:.2f} seconds")

# 3) Time the sources processing
process_start = time.time()
if request.source_ids:
# Use specified sources if provided
print(f"Using provided source IDs: {request.source_ids}")
source_ids_str = "('" + "','".join(request.source_ids) + "')"
results = table.search().where(f"id in {source_ids_str}", prefilter=True).to_list()
else:
# Otherwise perform semantic search based on the latest query
print(f"Performing semantic search for: {latest_query}")
query_embedding = db_utils.get_model().encode(latest_query)
results = (
table.search(query=query_embedding, vector_column_name="embedding")
.limit(5)
.to_list()
)

# Process results into sources and context
sources = [
{
"doc_path": r["doc_path"],
Expand All @@ -142,160 +198,42 @@ async def retrieve_and_generate(request: RetrieveAndGenerateRequest):
}
for r in results
]
process_time = time.time() - process_start
print(f"Processing results took: {process_time:.2f} seconds")

# Log total preparation time
total_prep_time = time.time() - start_time
print(f"Total preparation time: {total_prep_time:.2f} seconds")

# 4) Build context

context_str = "\n\n".join([
f"[Document: {r['doc_path']}]\n{r['text']}"
for r in results
])
messages = [Message(role="user", content=query)]

# 5) Create async generator to yield SSE
async def event_generator():
try:
# First yield the sources
yield {"data": json.dumps({"sources": sources})}

# Now create the streamer & generate tokens
streamer = generate_stream(messages, context_str)

# For each partial token, yield SSE data
for token in streamer:
if token: # Only send if token is not empty
try:
data = json.dumps({"content": token, "done": False})
yield {"data": data}
await asyncio.sleep(0) # let the event loop flush data
except Exception as e:
print(f"Error during token streaming: {str(e)}")
continue

# Finally, yield "done"
yield {"data": json.dumps({"content": "", "done": True})}
except Exception as e:
print(f"Error in event generator: {str(e)}")
yield {"data": json.dumps({"error": str(e)})}

# 6) Return SSE
return EventSourceResponse(event_generator())

except Exception as e:
return {"error": str(e)}

class GenerateRequest(BaseModel):
messages: List[Message]
source_ids: Optional[List[str]] = None

@app.post("/api/generate")
async def generate_endpoint(request: GenerateRequest):
"""
SSE endpoint for follow-up requests using:
- messages: list of previous chat messages
- source_ids: optional list of doc references to build context
Streams the model response in real time.
"""
try:
# Log message history length and content
messages_list = request.messages
print(f"Chat history length: {len(messages_list)}")
print("Message roles:", [msg.role for msg in messages_list])

# Log source usage
source_ids_list = request.source_ids
print(f"Using source IDs: {source_ids_list if source_ids_list else 'No sources'}")

# 1) Build context from source IDs (if provided)
context_str = ""
if source_ids_list:
table = db_utils.get_table()
source_ids_str = "('" + "','".join(source_ids_list) + "')"
chunks = table.search().where(f"id in {source_ids_str}", prefilter=True).to_list()

# Log retrieved chunks info
print(f"Retrieved {len(chunks)} chunks from database")
for chunk in chunks:
print(f"Document: {chunk['doc_path']}")

context_str = "\n\n".join([
f"[Document: {chunk['doc_path']}]\n{chunk['text']}"
for chunk in chunks
])
print(f"Total context length: {len(context_str)} characters")

# 2) Build async generator for SSE
async def event_generator():
try:
# Create the streamer
# First yield the sources if we have any
if sources:
yield {"data": json.dumps({"sources": sources})}

# Create the streamer & generate tokens
streamer = generate_stream(messages_list, context_str)

# For each partial token, yield SSE data
for token in streamer:
if token: # Only send if token is not empty
if token:
try:
data = json.dumps({"content": token, "done": False})
yield {"data": data}
await asyncio.sleep(0) # yield control so data can flush
await asyncio.sleep(0)
except Exception as e:
print(f"Error during token streaming: {str(e)}")
continue

# Finally, yield "done"
yield {"data": json.dumps({"content": "", "done": True})}
except Exception as e:
print(f"Error in event generator: {str(e)}")
yield {"data": json.dumps({"error": str(e)})}

# 3) Return SSE
return EventSourceResponse(event_generator())

except Exception as e:
return {"error": str(e)}

@app.get("/pdfs/{id}")
async def get_pdf(id: str):
"""
Endpoint to serve document content by looking up the ID in the database.
Handles both binary (PDF) and text-based (HTML, Markdown) content.
"""
# First look up the document path using the ID from the database
table = db_utils.get_table()
results = table.search().where(f"id = '{id}'", prefilter=True).to_list()

if not results:
raise HTTPException(status_code=404, detail="Document ID not found")

# Get the document path from the first result
doc_path = results[0]['doc_path']

if not os.path.exists(doc_path):
raise HTTPException(status_code=404, detail="File not found on disk")

# Determine content type based on file extension
if doc_path.endswith('.pdf'):
return FileResponse(
doc_path,
media_type='application/pdf',
filename=os.path.basename(doc_path)
)
elif doc_path.endswith('.html'):
return FileResponse(
doc_path,
media_type='text/html',
filename=os.path.basename(doc_path)
)
else: # Markdown or other text files
return FileResponse(
doc_path,
media_type='text/plain',
filename=os.path.basename(doc_path)
)

if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
Loading
Loading