This is the deprecated repository containing interactive notebooks for exploring the ThirdAI python library. Starting Sept 1st 2024, ThirdAI python package will no longer be supported. You can access all the functionality of the package and lot more on the new ThirdAI Platform
[Website]
·
[Report Issues]
·
[Careers]
All of ThirdAI's technology is powered by its BOLT library. BOLT is a deep-learning framework that leverages sparsity to enable training and deploying very large scale deep learning models on any CPU. This demos repo will help get you familiar with our products Neural DB and Universal Deep Transformer (UDT) through interactive notebooks.
NeuralDB is an efficient, private, teachable CPU-only text retrieval engine. You can insert all your PDFs, DOCXs, CSVs (and even parse URLs) into a NeuralDB and do semantic search and QnA on them. Read our three part blog on why you need NeuralDB here. Leveraging over a decade of research in efficient neural network training, NeuralDB has been meticulously optimized to operate effectively on conventional CPUs, making it accessible to any standard desktop machine. Additionally, since it can be trained and used anywhere, NeuralDB gives you airgapped privacy, ensuring your data never leaves your local machine.
With the capacity to scale Retreival Augmented Generation (RAG) capabilities over thousands of pages, NeuralDB revolutionizes the way you interact with your data.
Here is a quick overview of how NeuralDB works:
from thirdai import neural_db as ndb
db = neural_db.NeuralDB()
db.insert(
sources=[ndb.PDF(filename), ndb.DOCX(filename), ndb.CSV(filename)],
train=True
)
results = ndb.search(
query="what is the termination period of this contract?",
top_k=2,
)
for result in results:
print(result.text)
NeuralDB also provides teaching methods for incorporating human feedback into RAG.
# associate a source with a target
db.associate(source="parties involved", target="made by and between")
# associate text with a result
db.text_to_result("made by and between",0)
See the neural_db
folder for more examples and documentation.
Universal Deep Transformer (UDT) is our consolidated API for performing different ML tasks on a variety of data types. It handles text, numeric, categorical, multi-categorical, graph, and time series data while generalizing to tasks like NLP, multi-class classification, multi-label retrieval, regression etc. Just like NeuralDB, UDT is optimized for conventional CPUs and is accessible to any standard desktop machine.
Some applications of UDT include:
- Text Classification
- Named Entity Recognition
- Tabular Data Classification
- Netflix-style Movie Recommendation
- Query Reformulation
- Graph Node Classification
- Sentiment Analysis
- Intent Classification
- Zero Shot Search and Retrieval
- Fraud Detection
- and more!
Here is an example of the UDT API used for multi-label tabular classification:
from thirdai import bolt
model = bolt.UniversalDeepTransformer(
data_types={
"title": bolt.types.text(),
"category": bolt.types.categorical(),
"number": bolt.types.numerical(range=(0, 100)),
"label": bolt.types.categorical(delimiter=":")
},
target="label",
n_target_classes=2,
delimiter='\t',
)
model.train(filename.csv, epochs=5, learning_rate=0.001, metrics=["precision@1"])
model.predict({"title": "Red shoes", "category": "XL", "number": "12.6"})
See the universal_deep_transformer
folder for more examples and documentation.
Many notebooks come with an API key that will only work on the dataset in the demo. If you want to try out ThirdAI on your own dataset, simply register for a free license here.
To use your license do the following before constructing your NeuralDB or UDT models.
from thirdai import licensing
licensing.activate("") # insert your valid license key here
# create NeuralDB or UDT ...
Please refer to LICENSE.txt
for more information on usage terms.
ThirdAILabs - @ThirdAILab - [email protected]