Skip to content

AhnafTalukder/Dune-Word-Embedding-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Word Embeddings Analysis and Semantic Search Optimization with Dune


Logo

"Deep in the human unconscious is a pervasive need for a logical universe that makes sense. But the real universe is always one step beyond logic." - from "The Sayings of Muad'Dib"
Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage

About The Project

My goal with this project was to leverage OpenAI’s embedding model to analyze text from the "Dune" book series by Frank Herbert. More specifically, I wanted to understand how words, characters, and motifs are grouped together. I sought to do this using OpenAI’s pretrained model and open-source tools like LangChain, which abstracts the complex logic of deep learning.

Applications:

  • Semantic Search: Do searches based on semantics (meaning) rather than syntax, using Euclidean or cosine similarity search. This can allow people to go through a book based on descriptions they recall rather than a page number, simplifying the reading of large digital documents, PDFs, and Kindle books.
  • Visualization: PCA can be done to decrease the dimensionality of each vector, allowing words and phrases to be visualized in graphs. This enables academics and researchers to understand relationships between words and concepts in a text.
  • Chatbot Training: The vector database can act as a brain for pre-trained Language Models. For instance, you can train ChatGPT using embeddings from "Dune" and have it role-play as Paul Atreides, the main character.

(back to top)

Built With

This project is built using:

(back to top)

Getting Started

To get a local copy up and running, follow these steps.

Prerequisites

  • Python 3.x
  • pip (Python package installer)

Installation

  1. Clone the repo
    git clone https://github.com/your_username/Dune-Word-Embedding-Analysis.git
  2. Install required Python packages
     pip install -r requirements.txt
    
  3. Create a .env file in the project root and add your API keys
     OPENAI_API_KEY=your_openai_api_key
     ASTRAPY_TOKEN=your_astradb_application_token
     ASTRA_DB_API_ENDPOINT=your_astradb_api_endpoint
     ASTRA_DB_KEYSPACE=your_astradb_keyspace
     

(back to top)

Usage

  1. Create Embeddings
    create_embeddings('path_to_your_text_file.txt')
  2. Tokenize Text
    tokenize_text('path_to_your_text_file.txt')
  3. Ask Questions
    ask_question("DuneEmbeddings")

(back to top)

About

My goal with this project was to leverage OpenAI’s embedding model to analyze text from the "Dune" book series by Frank Herbert. More specifically, I wanted to understand how words, characters, and motifs are grouped together. I sought to do this using OpenAI’s pretrained model and open-source tools like LangChain.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages