-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #687 from MacOS/main
Adds contributing pages to the docs
Showing
4 changed files
with
477 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,307 @@ | ||
--- | ||
layout: default | ||
title: Code contributions | ||
parent: Contributing | ||
nav_order: 1 | ||
permalink: /contributing/code | ||
--- | ||
# Contiributing code | ||
One way to contribute to ``llmware`` is by contributing to the code base. | ||
|
||
We briefly describe some of the important modules of ``llmware`` next, so you can more easily navigate the code base. | ||
You may also take a look at our [fast start series from YouTube](https://www.youtube.com/playlist?list=PL1-dn33KwsmD7SB9iSO6vx4ZLRAWea1DB). | ||
|
||
## Core modules | ||
|
||
### Library | ||
<iframe width="560" height="315" src="https://www.youtube.com/embed/2xDefZ4oBOM?si=IAHkxpQkFwnWyYTL" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe> | ||
The *library* module implements the classes **Library** and **LibraryCatalog**. | ||
The **Library** class implements the *library* concept. | ||
A *library* is a collection of documents, where a document can be PDF, an image, or an office document. | ||
It is responsible for parsing, text chunking, and indexing. | ||
In other words, it does the heavy lifting of adding content. | ||
In the following, we shortly describe the functions for adding documents to the library. | ||
|
||
```python | ||
add_file( | ||
self, | ||
file_path): | ||
``` | ||
This method adds one document of any supported type to the library. | ||
|
||
```python | ||
add_files( | ||
self, | ||
input_folder_path=None, | ||
encoding="utf-8", | ||
chunk_size=400, | ||
get_images=True,get_tables=True, | ||
smart_chunking=2, | ||
max_chunk_size=600, | ||
table_grid=True, | ||
get_header_text=True, | ||
table_strategy=1, | ||
strip_header=False, | ||
verbose_level=2, | ||
copy_files_to_library=True): | ||
``` | ||
This method adds the documents of one folder to the library. | ||
|
||
```python | ||
add_website( | ||
self, | ||
url, | ||
get_links=True, | ||
max_links=5): | ||
``` | ||
This method adds a website, and links from the website, to the library. | ||
|
||
```python | ||
add_wiki( | ||
self, | ||
topic_list, | ||
target_results=10): | ||
``` | ||
This method adds a wikipedia article to the library. | ||
|
||
```python | ||
add_dialogs( | ||
self, | ||
input_folder=None): | ||
``` | ||
This method adds an AWS dialog transcript to the library. | ||
|
||
```python | ||
add_image( | ||
self, | ||
input_folder=None): | ||
``` | ||
This method adds images to the libary. | ||
|
||
```python | ||
add_pdf_by_ocr( | ||
self, | ||
input_folder=None): | ||
``` | ||
This method adds scanned PDFs to the library. | ||
|
||
```python | ||
add_pdf( | ||
self, | ||
input_folder=None): | ||
``` | ||
This method adds PDFs to the library. | ||
|
||
```python | ||
add_office( | ||
self, | ||
input_folder=None): | ||
``` | ||
This method adds office documents to the library. | ||
|
||
### Embeddings | ||
<iframe width="560" height="315" src="https://www.youtube.com/embed/xQEk6ohvfV0?si=GAPle5gVdVPkYKWv" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe> | ||
An *embedding* is a vector store and an embedding model. | ||
It is responsible for applying an embedding model to a library, storing the embeddings in a vector store, and providing access to the embeddings with natural language queries. | ||
We briefly describe the common methods offered by all vector stores below. | ||
|
||
```python | ||
def create_new_embedding( | ||
self, | ||
doc_ids=None, | ||
batch_size=500): | ||
``` | ||
This method creates the embeddings and adds them to the vector store. | ||
|
||
```python | ||
def search_index( | ||
self, | ||
query_embedding_vector, | ||
sample_count=10): | ||
``` | ||
This method searches the vector store given the query vector. | ||
|
||
```python | ||
def delete_index(self): | ||
``` | ||
This method deletes the created vector store index. | ||
|
||
|
||
### Prompts | ||
<iframe width="560" height="315" src="https://www.youtube.com/embed/swiu4oBVfbA?si=rKbgO3USADCqICqr" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe> | ||
A *prompt* is an input to model. | ||
The prompt is used by the model to generate the response. | ||
One important use case is that users want to augment a prompt, or a series of prompts, with additional information. | ||
Next, we describe methods for augmenting a prompt with additional information. | ||
|
||
```python | ||
def add_source_new_query( | ||
self, | ||
library, | ||
query=None, | ||
query_type="semantic", | ||
result_count=10): | ||
``` | ||
This method adds the results of the ``query`` to the prompt. | ||
|
||
```python | ||
def add_source_query_results( | ||
self, | ||
query_results): | ||
``` | ||
This method adds previous results from a query as a source to the prompt. | ||
|
||
```python | ||
def add_source_library( | ||
self, | ||
library_name): | ||
``` | ||
This method adds an entire library to the prompt. | ||
We recommend that you only use this when the library is sufficiently small. | ||
|
||
```python | ||
def add_source_wikipedia( | ||
self, | ||
topic, | ||
article_count=3, | ||
query=None): | ||
``` | ||
This method adds wikipedia articles to the prompt based on the provided ``topic``. | ||
|
||
```python | ||
def add_source_yahoo_finance( | ||
self, | ||
ticker=None, | ||
key_list=None): | ||
``` | ||
This method adds a Yahoo finance ticker to the prompt. | ||
|
||
```python | ||
def add_source_knowledge_graph( | ||
self, | ||
library, | ||
query): | ||
``` | ||
This method adds the summary output elements from a knowledge graph based on the provided ``query``. | ||
Please note that this method is experimental, i.e. unstable, and is subject to change dramatically in each new version. | ||
|
||
```python | ||
def add_source_website( | ||
self, | ||
url, | ||
query=None): | ||
``` | ||
This method adds the website pointed to by the ``url`` to the prompt. | ||
|
||
```python | ||
def add_source_document( | ||
self, | ||
input_fp, | ||
input_fn, | ||
query=None): | ||
``` | ||
This method adds a document, or documents, of any supported type to the prompt. | ||
If documents are added, then the ``query`` parameter can be used to filter the documents. | ||
|
||
```python | ||
def add_source_last_interaction_step( | ||
self): | ||
``` | ||
This method adds the last interaction to the prompt. | ||
The use case for this is to enable interactive dialog, i.e. chatting. | ||
|
||
### Model Catalog | ||
A *model catalog* is a collection of models. | ||
In the following, we briefly describe the methods for adding new models to the catalog. | ||
|
||
```python | ||
def register_new_hf_generative_model( | ||
self, | ||
hf_model_name=None, | ||
context_window=2048, | ||
prompt_wrapper="<INST>", | ||
display_name=None, | ||
temperature=0.3, | ||
trailing_space="", | ||
link=""): | ||
``` | ||
This method adds a new generative model from hugging face. | ||
Users can therefore add models from hugging face that are unsupported currently. | ||
|
||
```python | ||
def register_sentence_transformer_model( | ||
self, | ||
model_name, | ||
embedding_dims, | ||
context_window, | ||
display_name=None, | ||
link=""): | ||
``` | ||
This method adds a new sentence transformer. | ||
|
||
```python | ||
def register_gguf_model( | ||
self, | ||
model_name, | ||
gguf_model_repo, | ||
gguf_model_file_name, | ||
prompt_wrapper=None, | ||
eos_token_id=0, | ||
display_name=None, | ||
trailing_space="", | ||
temperature=0.3, | ||
context_window=2048, | ||
instruction_following=True): | ||
``` | ||
This method adds a new GGUF model. | ||
|
||
```python | ||
def register_open_chat_model( | ||
cls, | ||
model_name, | ||
api_base=None, | ||
model_type="chat", | ||
display_name=None, | ||
context_window=4096, | ||
instruction_following=True, | ||
prompt_wrapper="", | ||
temperature=0.5): | ||
``` | ||
This method adds any chat model that is available through a web API, e.g. a chat model that is available locally | ||
via localhost. | ||
|
||
```python | ||
def register_ollama_model( | ||
cls, | ||
model_name, | ||
host="localhost", | ||
port=11434, | ||
model_type="chat", | ||
raw=False, | ||
stream=False, | ||
display_name=None, | ||
context_window=4096, | ||
instruction_following=True, | ||
prompt_wrapper="", | ||
temperature=0.5): | ||
``` | ||
This method adds an OLLama model that is available through a web API. | ||
The method is similar to the ``register_open_chat_model`` method above. | ||
|
||
### Categories of code contributions | ||
|
||
#### New or Enhancement to existing Features | ||
You want to submit a code contribution that adds a new feature or enhances an existing one? | ||
Then the best way to start is by opening a discussion in our [GitHub discussions](https://github.com/llmware-ai/llmware/discussions). | ||
Please do this before you work on it, so you do not put effort into it just to realise after submission that | ||
it will not be merged. | ||
|
||
#### Bugs | ||
If you encounter a bug, you can | ||
|
||
- File an issue about the bug. | ||
- Provide a self-contained minimal example that reproduces the bug, which is extremely important. | ||
- Provide possible solutions for the bug. | ||
- Submit a pull a request to fix the bug. | ||
|
||
We encourage you to read [How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) from the Stackoverflow helpcenter, and the tag description of [self-container](https://stackoverflow.com/tags/self-contained/info), also from Stackoverflow. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
--- | ||
layout: default | ||
title: Contributing | ||
nav_order: 3 | ||
has_children: true | ||
description: llmware contributions. | ||
permalink: /contributing | ||
--- | ||
# Contributing to llmware | ||
|
||
{: .note} | ||
> The contributions to `llmware` are governed by our [Code of Conduct](https://github.com/llmware-ai/llmware/blob/main/CODE_OF_CONDUCT.md). | ||
{: .warning} | ||
> Have you found a security issue? Then please jump to [Security Vulnerabilities](#security-vulnerabilities). | ||
On this page, we provide information ``llmware`` contributions. | ||
There are **two ways** on how you can contribute. | ||
The first is by making **code contributions**, and the second by making contributions to the **documentation**. | ||
Please look at our [contribution suggestions](#how-can-you-contribute) if you need inspiration, or take a look at [open issues](#open-issues). | ||
|
||
Contributions to `llmware` are welcome from everyone. | ||
Our goal is to make the process simple, transparent, and straightforward. | ||
We are happy to receive suggestions on how the process can be improved. | ||
|
||
## How can you contribute? | ||
|
||
{: .note} | ||
> If you have never contributed before look for issues with the tag [``good first issue``](https://github.com/llmware-ai/llmware/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22). | ||
The most usual ways to contribute is to add new features, fix bugs, add tests, or add documentation. | ||
You can visit the [issues](https://github.com/llmware-ai/llmware/issues) site of the project and search for tags such as | ||
``bug``, ``enhancement``, ``documentation``, or ``test``. | ||
|
||
|
||
Here is a non exhaustive list of contributions you can make. | ||
|
||
1. Code refactoring | ||
2. Add new text data bases | ||
3. Add new vector data bases | ||
4. Fix bugs | ||
5. Add usage examples (see for example the issues [jupyter notebook - more examples and better support](https://github.com/llmware-ai/llmware/issues/508) and [google colab examples and start up scripts](https://github.com/llmware-ai/llmware/issues/507)) | ||
6. Add experimental features | ||
7. Improve code quality | ||
8. Improve documentation in the docs (what you are reading right now) | ||
9. Improve documentation by adding or updating docstrings in modules, classes, methods, or functions (see for example [Add docstrings](https://github.com/llmware-ai/llmware/issues/219)) | ||
10. Improve test coverage | ||
11. Answer questions in our [Discord channel](https://discord.gg/MhZn5Nc39h), especially in the [technical support forum](https://discord.com/channels/1179245642770559067/1218498778915672194) | ||
12. Post projects in which you use ``llmware`` in our Discord forum [made with llmware](https://discord.com/channels/1179245642770559067/1218567269471486012), ideially with a link to a public GitHub repository | ||
|
||
## Open Issues | ||
If you're interested in existing issues, you can | ||
|
||
- Look for issues, if you are a new to the project, look for issues with the `good first issue` label. | ||
- Provide answers for questions in our [GitHub discussions](https://github.com/llmware-ai/llmware/discussions) | ||
- Provide help for bug or enhancement issues. | ||
- Ask questions, reproduce the issues, or provide solutions. | ||
- Pull a request to fix the issue. | ||
|
||
|
||
|
||
## Security Vulnerabilities | ||
**If you believe you've found a security vulnerability, then please _do not_ submit an issue ticket or pull request or otherwise publicly disclose the issue.** | ||
Please follow the process at [Reporting a Vulnerability](https://github.com/llmware-ai/llmware/blob/main/Security.md) | ||
|
||
|
||
|
||
## GitHub workflow | ||
|
||
We follow the [``fork-and-pull``](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork) Git workflow. | ||
|
||
1. [Fork](https://docs.github.com/en/github/getting-started-with-github/fork-a-repo) the repository on GitHub. | ||
2. Clone your fork to your local machine with `git clone git@github.com:<yourname>/llmware.git`. | ||
3. Create a branch with `git checkout -b my-topic-branch`. | ||
4. Run the test suite by navigating to the tests/ folder and running ```./run-tests.py -s``` to ensure there are no failures | ||
5. [Commit](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/committing-changes-to-a-pull-request-branch-created-from-a-fork) changes to your own branch, then push to GitHub with `git push origin my-topic-branch`. | ||
6. Submit a [pull request](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests) so that we can review your changes. | ||
|
||
Remember to [synchronize your forked repository](https://docs.github.com/en/github/getting-started-with-github/fork-a-repo#keep-your-fork-synced) _before_ submitting proposed changes upstream. If you have an existing local repository, please update it before you start, to minimize the chance of merge conflicts. | ||
|
||
```shell | ||
git remote add upstream git@github.com:llmware-ai/llmware.git | ||
git fetch upstream | ||
git checkout upstream/main -b my-topic-branch | ||
``` | ||
|
||
## Community | ||
Questions and discussions are welcome in any shape or form. | ||
Please fell free to join our community on our discord channel, on which we are active daily. | ||
You are also welcome if you just want to post an idea! | ||
|
||
- [Discord Channel](https://discord.gg/MhZn5Nc39h) | ||
- [GitHub discussions](https://github.com/llmware-ai/llmware/discussions) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
--- | ||
layout: default | ||
title: Documentation contributions | ||
parent: Contributing | ||
nav_order: 2 | ||
permalink: contributing/documentation | ||
--- | ||
# Contributing documentation | ||
One way to contribute to ``llmware`` is by contributing documentation. | ||
|
||
There are **two ways** to contribute to the ``llmware`` documentation. | ||
The first is via **docstrings in the code**, and the second is **the docs**, which is what you are *currently reading*. | ||
In both areas, you can contribute in a lot of ways. | ||
Here is a non exhaustive list of these ways for the docstrings which also apply to the docs. | ||
|
||
1. Add documentation (e.g., adding a docstring to a function) | ||
2. Update documentation (e.g., update a docstring that is not in sync with the code) | ||
3. Simplify documentation (e.g., formulate a docstring more clearly) | ||
4. Enhance documentation (e.g., add more examples to a docstring or fix typos) | ||
|
||
## Docstrings | ||
**Docstrings** document the code within the code, which allows programmers to easily have a look while they are programming. | ||
For an exmaple, have a look at [this docstring](https://github.com/llmware-ai/llmware/blob/c9e12a7a150162986622738e127c37ac70f31cd6/llmware/agents.py#L27-L66) which documents the ``LLMfx`` class. | ||
|
||
We follow the docstring style of **numpy**, for which you can find an example [here](https://github.com/numpy/numpydoc/blob/main/doc/example.py) and [here](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_numpy.html). | ||
Please be sure to follow the conventions and go over your pull request before you submit it. | ||
|
||
|
||
## Docs | ||
|
||
{: .note} | ||
> All commands are executed from the `docs` sub-directory. | ||
Contributing to this documentation is extremely important as many users will refer to it. | ||
|
||
If you plan to contribute to the docs, we recommend that you locally install `jekyll` so you can test your changes locally. | ||
We also recommend, that you install `jekyll` into a a ruby enviroment so it does not interfere with any other installations you might have. | ||
|
||
We recommend that you install `rbenv` and `rvm` to manage your ruby installation. | ||
`rbenv` is a tool that mangages different ruby versions, similar to what `conda` does for `python`. | ||
Please [install rbenv](https://github.com/rbenv/rbenv?tab=readme-ov-file#installation) following their instructions, and the same for [install rvm](https://github.com/rvm/rvm?tab=readme-ov-file#installing-rvm). | ||
We recommend that you install a ruby version `>=3.0`. | ||
After having installed an isolated ruby version, you have to install the dependencies to build the docs locally. | ||
The `docs` directory has a `Gemfile` which specifies the dependencies. | ||
You can hence simply navigate to it and use the `bundle install` command. | ||
|
||
```bash | ||
bundle install | ||
``` | ||
|
||
You should now be able to build and serve the documentation locally. | ||
To do this, simply to the following. | ||
```bash | ||
bundle exec jekyll server --livereload --verbose | ||
``` | ||
In the browser of your choice, you can then go to `http://127.0.0.1:4000/` and you will be served the documentation, which is re-build and re-loaded after any change to the `docs`. | ||
``jekyll`` will create a ``_site`` directory where it saves the created files, please **never commit any files from the \_site directory**! | ||
|
||
## Open Issues | ||
If you're interested in existing issues, you can | ||
|
||
- Look for issues with the `good first issue` and `documentation` label as a good place to get started. | ||
- Provide answers for questions in our [GitHub discussions](https://github.com/llmware-ai/llmware/discussions) | ||
- Provide help for bug or enhancement issues. | ||
- Ask questions, reproduce the issues, or provide solutions. | ||
- Pull a request to fix the issue. |