GitHub - AhnafTalukder/Dune-Word-Embedding-Analysis: My goal with this project was to leverage OpenAI’s embedding model to analyze text from the "Dune" book series by Frank Herbert. More specifically, I wanted to understand how words, characters, and motifs are grouped together. I sought to do this using OpenAI’s pretrained model and open-source tools like LangChain.

Word Embeddings Analysis and Semantic Search Optimization with Dune

"Deep in the human unconscious is a pervasive need for a logical universe that makes sense. But the real universe is always one step beyond logic." - from "The Sayings of Muad'Dib"
Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents

About The Project
- Built With
Getting Started
- Prerequisites
- Installation
Usage

About The Project

My goal with this project was to leverage OpenAI’s embedding model to analyze text from the "Dune" book series by Frank Herbert. More specifically, I wanted to understand how words, characters, and motifs are grouped together. I sought to do this using OpenAI’s pretrained model and open-source tools like LangChain, which abstracts the complex logic of deep learning.

Applications:

Semantic Search: Do searches based on semantics (meaning) rather than syntax, using Euclidean or cosine similarity search. This can allow people to go through a book based on descriptions they recall rather than a page number, simplifying the reading of large digital documents, PDFs, and Kindle books.
Visualization: PCA can be done to decrease the dimensionality of each vector, allowing words and phrases to be visualized in graphs. This enables academics and researchers to understand relationships between words and concepts in a text.
Chatbot Training: The vector database can act as a brain for pre-trained Language Models. For instance, you can train ChatGPT using embeddings from "Dune" and have it role-play as Paul Atreides, the main character.

(back to top)

Built With

This project is built using:

(back to top)

Getting Started

To get a local copy up and running, follow these steps.

Prerequisites

Python 3.x
pip (Python package installer)

Installation

Clone the repo

git clone https://github.com/your_username/Dune-Word-Embedding-Analysis.git

Install required Python packages
```
 pip install -r requirements.txt
```

Create a .env file in the project root and add your API keys

 OPENAI_API_KEY=your_openai_api_key
 ASTRAPY_TOKEN=your_astradb_application_token
 ASTRA_DB_API_ENDPOINT=your_astradb_api_endpoint
 ASTRA_DB_KEYSPACE=your_astradb_keyspace

(back to top)

Usage

Create Embeddings

create_embeddings('path_to_your_text_file.txt')

Tokenize Text

tokenize_text('path_to_your_text_file.txt')

Ask Questions
```
ask_question("DuneEmbeddings")
```

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
data		data
.gitignore		.gitignore
README.md		README.md
combinefile.py		combinefile.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word Embeddings Analysis and Semantic Search Optimization with Dune

About The Project

Built With

Getting Started

Prerequisites

Installation

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Word Embeddings Analysis and Semantic Search Optimization with Dune

About The Project

Built With

Getting Started

Prerequisites

Installation

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages