Skip to content

Platform for analyzing and recommending Python packages and Python software stacks not only for AI/ML applications

License

Notifications You must be signed in to change notification settings

thoth-station/thoth

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Thoth

There has been seen a hype over the past years in AI and machine learning. Machine learning and AI applications are used in production systems which require significant effort to ensure applications behave correctly and are fully operational. Thoth is a recommendation system for (not only) AI and machine learning applications which use popular open source machine learning libraries. Thoth is capable of storing observations based on which it predicts possible misbehavior in application, application assembling errors or issues when integrating application with other components in a deployment.

Why Thoth?

Every developer is struggling with choosing the right version of libraries when developing an application. You might be asking - should I use version X instead of Y? What happens if I choose version Y? What indirect dependencies will be included in my application stack during resolving?

Moreover, development of an application is not the only phase in an application lifecycle. You can easily struggle with updates - will an update break my application? Should I update because of features or bug fixes in the newly released dependency? Will my application break? Why should I update?

A good practice for having as much reproducible and working builds as possible is to lock (or pin down) libraries which are used by application to specific and required versions at the same time. Thoth is pushing this idea further and instead of pinning down to the latest version of libraries where possible (as Pipenv or pip does) it choses the best libraries in the specific versions for your application based on aggregated knowledge. You can imagine Thoth being a fine-tuned resolver that can come up with better version pinning and recommendations for your applications based on observations available in Thoth's database for your application which runs in a specific runtime environment.

Imagine the following application consisting of Flask, gunicorn, TensorFlow and Pandas. As you can see on the figure below, dependencies do not create separate subgraphs. This causes that an update/downgrade of any dependency (eigher direct or indirect) causes changes in the whole application which can lead to misbehavior or (in better cases) fails with application run.

Interaction of application dependencies.

Zen of Thoth

  1. Bots are our cyborg team members.
  2. Python and machine learning is our first class citizen.
  3. Stateless architecture, if state is needed we use Ceph, OpenShift's internal state available through API and graph database for advanced graph traversal based queries.
  4. Reuse what has been previously invented.
  5. Errors should never pass silently.
  6. Clean design with clean Pythonic code counts.
  7. Use Ansible for always-ready midnight deployment.
  8. Self-living system with minimal operational overhead.
  9. Never say immidiate no and never say immidiate yes to any new idea.
  10. Be always open.

I want to become a Thoth contributor

If you would like to contirbute in the source code, you can check all the components of Thoth. Most of the componentsare designed to have a command line interface (such as solver, package-extract, ...) for easy development and when plugged to an OpenShift cluster, they can easily scale baesd on Thoth's design.

If you would like to deploy Thoth, see the core repository where deploymnet playbooks live with their step-by-step documentation on how to deploy Thoth into your OpenShift cluster (or to your local oc cluster up instance).

Also, check some of the instructions on how to contribute, run and verify your code based on Developer's guide.

I want to become a Thoth operator

Related operations, how to manage deployment, how to propagate container images from test to stage and prod, how to perform initial provisioning or how to operate Zuul, please follow instructions in thoth-ops repository and related documents in docs/ directory.

Game of Gods

Thoth is actually one of the gods living in the thoth-station. You can find other gods (named based on Egyptian mythology) that, together with Thoth, create their own universe. In this universe however, gods do not fight against each other. Instead, they create a pieceful co-operational ecosystem.

Currently available Gods

  • Thoth - the recommender system, holding knowledge based on which it creates advice
  • Sesheta - bot that is responsible for automated PR merges, gathering information about CI runs on new pull requests or automatically labeling new issues and pull requests
  • Kebechet - bot that is responsible for monitoring repositories, issuing pull-requests on new dependency releases, automatically issuing new releases on PyPI, and more
  • Amun - system which is the execution part of Thoth, it is capable of creating runtime environments based on specification, creating application stack and executing necessary tests (such as application, performance, ...) to gather observations for Thoth's database
  • Nepthys - a bot responsible for automatic documentation updates
  • Thamos - a CLI tool and integration library for communicating with Thoth
  • Isis - an API service exposing feature based queries and project similarity implemented on top of project2vec

See thoth-station organization on GitHub for more information.

Thoth Architecture Overview

To get into details of Thoth, see the Thoth's core repository. This repository contains an architecture overview of Thoth with an explanation of core components and data flow inside and outside of project Thoth.