"Deep in the human unconscious is a pervasive need for a logical universe that makes sense. But the real universe is always one step beyond logic." - from "The Sayings of Muad'Dib"
Explore the docs »
View Demo
·
Report Bug
·
Request Feature
Table of Contents
My goal with this project was to leverage OpenAI’s embedding model to analyze text from the "Dune" book series by Frank Herbert. More specifically, I wanted to understand how words, characters, and motifs are grouped together. I sought to do this using OpenAI’s pretrained model and open-source tools like LangChain, which abstracts the complex logic of deep learning.
Applications:
- Semantic Search: Do searches based on semantics (meaning) rather than syntax, using Euclidean or cosine similarity search. This can allow people to go through a book based on descriptions they recall rather than a page number, simplifying the reading of large digital documents, PDFs, and Kindle books.
- Visualization: PCA can be done to decrease the dimensionality of each vector, allowing words and phrases to be visualized in graphs. This enables academics and researchers to understand relationships between words and concepts in a text.
- Chatbot Training: The vector database can act as a brain for pre-trained Language Models. For instance, you can train ChatGPT using embeddings from "Dune" and have it role-play as Paul Atreides, the main character.
This project is built using:
To get a local copy up and running, follow these steps.
- Python 3.x
- pip (Python package installer)
- Clone the repo
git clone https://github.com/your_username/Dune-Word-Embedding-Analysis.git
- Install required Python packages
pip install -r requirements.txt
- Create a .env file in the project root and add your API keys
OPENAI_API_KEY=your_openai_api_key ASTRAPY_TOKEN=your_astradb_application_token ASTRA_DB_API_ENDPOINT=your_astradb_api_endpoint ASTRA_DB_KEYSPACE=your_astradb_keyspace
- Create Embeddings
create_embeddings('path_to_your_text_file.txt')
- Tokenize Text
tokenize_text('path_to_your_text_file.txt')
- Ask Questions
ask_question("DuneEmbeddings")