Merge pull request #687 from MacOS/main

Adds contributing pages to the docs
llmware-ai · May 5, 2024 · 78ecb9a · 78ecb9a
2 parents a563e08 + 65a4d1d
commit 78ecb9a
Showing 4 changed files with 477 additions and 1 deletion.
diff --git a/docs/_includes/footer_custom.html b/docs/_includes/footer_custom.html
@@ -12,7 +12,17 @@
        |
     </li>
     <li class="d-inline-block mr-1 hugging-face-li">
-        <a href="https://huggingface.co/llmware"><span><img src="assets/images/hf-logo.svg" alt="Hugging Face" class="hugging-face-logo"/></span></a>
+        <a href="https://huggingface.co/llmware">
+            <span>
+                {% for static_file in site.static_files %}
+                    {% if static_file.basename == "hf-logo"%}
+                        {% assign hf_logo = static_file %}
+                    {% endif %}
+                {% endfor %}
+
+                <img src="{{ site.url }}{{ hf_logo.path }}" alt="Hugging Face" class="hugging-face-logo"/>
+            </span>
+        </a>
     </li>
     <li class="d-inline-block mr-1">
         <a href="https://www.youtube.com/@llmware"><i class="fa-brands fa-youtube"></i></a>

diff --git a/docs/contributing/code.md b/docs/contributing/code.md
@@ -0,0 +1,307 @@
+---
+layout: default
+title: Code contributions
+parent: Contributing
+nav_order: 1
+permalink: /contributing/code
+---
+# Contiributing code
+One way to contribute to ``llmware`` is by contributing to the code base.
+
+We briefly describe some of the important modules of ``llmware`` next, so you can more easily navigate the code base.
+You may also take a look at our [fast start series from YouTube](https://www.youtube.com/playlist?list=PL1-dn33KwsmD7SB9iSO6vx4ZLRAWea1DB).
+
+## Core modules
+
+### Library
+<iframe width="560" height="315" src="https://www.youtube.com/embed/2xDefZ4oBOM?si=IAHkxpQkFwnWyYTL" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
+The *library* module implements the classes **Library** and **LibraryCatalog**.
+The **Library** class implements the *library* concept.
+A *library* is a collection of documents, where a document can be PDF, an image, or an office document.
+It is responsible for parsing, text chunking, and indexing.
+In other words, it does the heavy lifting of adding content.
+In the following, we shortly describe the functions for adding documents to the library.
+
+```python
+add_file(
+    self,
+    file_path):
+```
+This method adds one document of any supported type to the library.
+
+```python
+add_files(
+    self,
+    input_folder_path=None,
+    encoding="utf-8",
+    chunk_size=400,
+    get_images=True,get_tables=True,
+    smart_chunking=2,
+    max_chunk_size=600,
+    table_grid=True,
+    get_header_text=True,
+    table_strategy=1,
+    strip_header=False,
+    verbose_level=2,
+    copy_files_to_library=True):
+```
+This method adds the documents of one folder to the library.
+
+```python
+add_website(
+    self,
+    url,
+    get_links=True,
+    max_links=5):
+```
+This method adds a website, and links from the website, to the library.
+
+```python
+add_wiki(
+    self,
+    topic_list,
+    target_results=10):
+```
+This method adds a wikipedia article to the library.
+
+```python
+add_dialogs(
+    self,
+    input_folder=None):
+```
+This method adds an AWS dialog transcript to the library.
+
+```python
+add_image(
+    self,
+    input_folder=None):
+```
+This method adds images to the libary.
+
+```python
+add_pdf_by_ocr(
+    self,
+    input_folder=None):
+```
+This method adds scanned PDFs to the library.
+
+```python
+add_pdf(
+    self,
+    input_folder=None):
+```
+This method adds PDFs to the library.
+
+```python
+add_office(
+    self,
+    input_folder=None):
+```
+This method adds office documents to the library.
+
+### Embeddings
+<iframe width="560" height="315" src="https://www.youtube.com/embed/xQEk6ohvfV0?si=GAPle5gVdVPkYKWv" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
+An *embedding* is a vector store and an embedding model.
+It is responsible for applying an embedding model to a library, storing the embeddings in a vector store, and providing access to the embeddings with natural language queries.
+We briefly describe the common methods offered by all vector stores below.
+
+```python
+def create_new_embedding(
+    self,
+    doc_ids=None,
+    batch_size=500):
+```
+This method creates the embeddings and adds them to the vector store.
+
+```python
+def search_index(
+    self,
+    query_embedding_vector,
+    sample_count=10):
+```
+This method searches the vector store given the query vector.
+
+```python
+def delete_index(self):
+```
+This method deletes the created vector store index.
+
+
+### Prompts
+<iframe width="560" height="315" src="https://www.youtube.com/embed/swiu4oBVfbA?si=rKbgO3USADCqICqr" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
+A *prompt* is an input to model.
+The prompt is used by the model to generate the response.
+One important use case is that users want to augment a prompt, or a series of prompts, with additional information.
+Next, we describe methods for augmenting a prompt with additional information.
+
+```python
+def add_source_new_query(
+    self,
+    library,
+    query=None,
+    query_type="semantic",
+    result_count=10):
+```
+This method adds the results of the ``query`` to the prompt.
+
+```python
+def add_source_query_results(
+    self,
+    query_results):
+```
+This method adds previous results from a query as a source to the prompt.
+
+```python
+def add_source_library(
+    self,
+    library_name):
+```
+This method adds an entire library to the prompt.
+We recommend that you only use this when the library is sufficiently small.
+
+```python
+def add_source_wikipedia(
+    self,
+    topic,
+    article_count=3,
+    query=None):
+```
+This method adds wikipedia articles to the prompt based on the provided ``topic``.
+
+```python
+def add_source_yahoo_finance(
+    self,
+    ticker=None,
+    key_list=None):
+```
+This method adds a Yahoo finance ticker to the prompt.
+
+```python
+def add_source_knowledge_graph(
+    self,
+    library,
+    query):
+```
+This method adds the summary output elements from a knowledge graph based on the provided ``query``.
+Please note that this method is experimental, i.e. unstable, and is subject to change dramatically in each new version.
+
+```python
+def add_source_website(
+    self,
+    url,
+    query=None):
+```
+This method adds the website pointed to by the ``url`` to the prompt.
+
+```python
+def add_source_document(
+    self,
+    input_fp,
+    input_fn,
+    query=None):
+```
+This method adds a document, or documents, of any supported type to the prompt.
+If documents are added, then the ``query`` parameter can be used to filter the documents.
+
+```python
+def add_source_last_interaction_step(
+    self):
+```
+This method adds the last interaction to the prompt.
+The use case for this is to enable interactive dialog, i.e. chatting.
+
+### Model Catalog
+A *model catalog* is a collection of models.
+In the following, we briefly describe the methods for adding new models to the catalog.
+
+```python
+def register_new_hf_generative_model(
+    self,
+    hf_model_name=None,
+    context_window=2048,
+    prompt_wrapper="<INST>",
+    display_name=None,
+    temperature=0.3,
+    trailing_space="",
+    link=""):
+```
+This method adds a new generative model from hugging face.
+Users can therefore add models from hugging face that are unsupported currently.
+
+```python
+def register_sentence_transformer_model(
+    self,
+    model_name,
+    embedding_dims,
+    context_window,
+    display_name=None,
+    link=""):
+```
+This method adds a new sentence transformer.
+
+```python
+def register_gguf_model(
+    self,
+    model_name,
+    gguf_model_repo,
+    gguf_model_file_name,
+    prompt_wrapper=None,
+    eos_token_id=0,
+    display_name=None,
+    trailing_space="",
+    temperature=0.3,
+    context_window=2048,
+    instruction_following=True):
+```
+This method adds a new GGUF model.
+
+```python
+def register_open_chat_model(
+    cls,
+    model_name,
+    api_base=None,
+    model_type="chat",
+    display_name=None,
+    context_window=4096,
+    instruction_following=True,
+    prompt_wrapper="",
+    temperature=0.5):
+```
+This method adds any chat model that is available through a web API, e.g. a chat model that is available locally
+via localhost.
+
+```python
+def register_ollama_model(
+    cls,
+    model_name,
+    host="localhost",
+    port=11434,
+    model_type="chat",
+    raw=False,
+    stream=False,
+    display_name=None,
+    context_window=4096,
+    instruction_following=True,
+    prompt_wrapper="",
+    temperature=0.5):
+```
+This method adds an OLLama model that is available through a web API.
+The method is similar to the ``register_open_chat_model`` method above.
+
+### Categories of code contributions
+
+#### New or Enhancement to existing Features
+You want to submit a code contribution that adds a new feature or enhances an existing one?
+Then the best way to start is by opening a discussion in our [GitHub discussions](https://github.com/llmware-ai/llmware/discussions).
+Please do this before you work on it, so you do not put effort into it just to realise after submission that
+it will not be merged.
+
+#### Bugs
+If you encounter a bug, you can
+
+- File an issue about the bug.
+- Provide a self-contained minimal example that reproduces the bug, which is extremely important.
+- Provide possible solutions for the bug.
+- Submit a pull a request to fix the bug.
+
+We encourage you to read [How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) from the Stackoverflow helpcenter, and the tag description of [self-container](https://stackoverflow.com/tags/self-contained/info), also from Stackoverflow.
diff --git a/docs/contributing/contributing.md b/docs/contributing/contributing.md
@@ -0,0 +1,93 @@
+---
+layout: default
+title: Contributing
+nav_order: 3
+has_children: true
+description: llmware contributions.
+permalink: /contributing
+---
+# Contributing to llmware
+
+{: .note}
+> The contributions to `llmware` are governed by our [Code of Conduct](https://github.com/llmware-ai/llmware/blob/main/CODE_OF_CONDUCT.md).
+
+{: .warning}
+> Have you found a security issue? Then please jump to [Security Vulnerabilities](#security-vulnerabilities).
+
+On this page, we provide information ``llmware`` contributions.
+There are **two ways** on how you can contribute.
+The first is by making **code contributions**, and the second by making contributions to the **documentation**.
+Please look at our [contribution suggestions](#how-can-you-contribute) if you need inspiration, or take a look at [open issues](#open-issues).
+
+Contributions to `llmware` are welcome from everyone.
+Our goal is to make the process simple, transparent, and straightforward.
+We are happy to receive suggestions on how the process can be improved.
+
+## How can you contribute?
+
+{: .note}
+> If you have never contributed before look for issues with the tag [``good first issue``](https://github.com/llmware-ai/llmware/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22).
+
+The most usual ways to contribute is to add new features, fix bugs, add tests, or add documentation.
+You can visit the [issues](https://github.com/llmware-ai/llmware/issues) site of the project and search for tags such as
+``bug``, ``enhancement``, ``documentation``, or ``test``.
+
+
+Here is a non exhaustive list of contributions you can make.
+
+1. Code refactoring
+2. Add new text data bases 
+3. Add new vector data bases 
+4. Fix bugs
+5. Add usage examples (see for example the issues [jupyter notebook - more examples and better support](https://github.com/llmware-ai/llmware/issues/508) and [google colab examples and start up scripts](https://github.com/llmware-ai/llmware/issues/507))
+6. Add experimental features
+7. Improve code quality
+8. Improve documentation in the docs (what you are reading right now)
+9. Improve documentation by adding or updating docstrings in modules, classes, methods, or functions (see for example [Add docstrings](https://github.com/llmware-ai/llmware/issues/219))
+10. Improve test coverage
+11. Answer questions in our [Discord channel](https://discord.gg/MhZn5Nc39h), especially in the [technical support forum](https://discord.com/channels/1179245642770559067/1218498778915672194)
+12. Post projects in which you use ``llmware`` in our Discord forum [made with llmware](https://discord.com/channels/1179245642770559067/1218567269471486012), ideially with a link to a public GitHub repository
+
+## Open Issues
+If you're interested in existing issues, you can
+
+- Look for issues, if you are a new to the project, look for issues with the `good first issue` label.
+- Provide answers for questions in our [GitHub discussions](https://github.com/llmware-ai/llmware/discussions)
+- Provide help for bug or enhancement issues. 
+  - Ask questions, reproduce the issues, or provide solutions.
+  - Pull a request to fix the issue.
+
+
+
+## Security Vulnerabilities
+**If you believe you've found a security vulnerability, then please _do not_ submit an issue ticket or pull request or otherwise publicly disclose the issue.**
+Please follow the process at [Reporting a Vulnerability](https://github.com/llmware-ai/llmware/blob/main/Security.md)
+
+
+
+## GitHub workflow
+
+We follow the [``fork-and-pull``](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork) Git workflow.
+
+1.  [Fork](https://docs.github.com/en/github/getting-started-with-github/fork-a-repo) the repository on GitHub.
+2. Clone your fork to your local machine with `git clone git@github.com:<yourname>/llmware.git`.
+3. Create a branch with `git checkout -b my-topic-branch`.
+4. Run the test suite by navigating to the tests/ folder and running ```./run-tests.py -s``` to ensure there are no failures
+5. [Commit](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/committing-changes-to-a-pull-request-branch-created-from-a-fork) changes to your own branch, then push to GitHub with `git push origin my-topic-branch`.
+6. Submit a [pull request](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests) so that we can review your changes.
+
+Remember to [synchronize your forked repository](https://docs.github.com/en/github/getting-started-with-github/fork-a-repo#keep-your-fork-synced) _before_ submitting proposed changes upstream. If you have an existing local repository, please update it before you start, to minimize the chance of merge conflicts.
+
+```shell
+git remote add upstream git@github.com:llmware-ai/llmware.git
+git fetch upstream
+git checkout upstream/main -b my-topic-branch
+```
+
+## Community
+Questions and discussions are welcome in any shape or form.
+Please fell free to join our community on our discord channel, on which we are active daily.
+You are also welcome if you just want to post an idea!
+
+- [Discord Channel](https://discord.gg/MhZn5Nc39h)
+- [GitHub discussions](https://github.com/llmware-ai/llmware/discussions)
diff --git a/docs/contributing/documentation.md b/docs/contributing/documentation.md
@@ -0,0 +1,66 @@
+---
+layout: default
+title: Documentation contributions
+parent: Contributing
+nav_order: 2
+permalink: contributing/documentation
+---
+# Contributing documentation
+One way to contribute to ``llmware`` is by contributing documentation.
+
+There are **two ways** to contribute to the ``llmware`` documentation.
+The first is via **docstrings in the code**, and the second is **the docs**, which is what you are *currently reading*.
+In both areas, you can contribute in a lot of ways.
+Here is a non exhaustive list of these ways for the docstrings which also apply to the docs.
+
+1. Add documentation (e.g., adding a docstring to a function)
+2. Update documentation (e.g., update a docstring that is not in sync with the code)
+3. Simplify documentation (e.g., formulate a docstring more clearly)
+4. Enhance documentation (e.g., add more examples to a docstring or fix typos)
+
+## Docstrings
+**Docstrings** document the code within the code, which allows programmers to easily have a look while they are programming.
+For an exmaple, have a look at [this docstring](https://github.com/llmware-ai/llmware/blob/c9e12a7a150162986622738e127c37ac70f31cd6/llmware/agents.py#L27-L66) which documents the ``LLMfx`` class.
+
+We follow the docstring style of **numpy**, for which you can find an example [here](https://github.com/numpy/numpydoc/blob/main/doc/example.py) and [here](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_numpy.html).
+Please be sure to follow the conventions and go over your pull request before you submit it.
+
+
+## Docs
+
+{: .note}
+> All commands are executed from the `docs` sub-directory.
+
+Contributing to this documentation is extremely important as many users will refer to it.
+
+If you plan to contribute to the docs, we recommend that you locally install `jekyll` so you can test your changes locally.
+We also recommend, that you install `jekyll` into a a ruby enviroment so it does not interfere with any other installations you might have.
+
+We recommend that you install `rbenv` and `rvm` to manage your ruby installation.
+`rbenv` is a tool that mangages different ruby versions, similar to what `conda` does for `python`.
+Please [install rbenv](https://github.com/rbenv/rbenv?tab=readme-ov-file#installation) following their instructions, and the same for [install rvm](https://github.com/rvm/rvm?tab=readme-ov-file#installing-rvm).
+We recommend that you install a ruby version `>=3.0`.
+After having installed an isolated ruby version, you have to install the dependencies to build the docs locally.
+The `docs` directory has a `Gemfile` which specifies the dependencies.
+You can hence simply navigate to it and use the `bundle install` command.
+
+```bash
+bundle install
+```
+
+You should now be able to build and serve the documentation locally.
+To do this, simply to the following.
+```bash
+bundle exec jekyll server --livereload --verbose
+```
+In the browser of your choice, you can then go to `http://127.0.0.1:4000/` and you will be served the documentation, which is re-build and re-loaded after any change to the `docs`.
+``jekyll`` will create a ``_site`` directory where it saves the created files, please **never commit any files from the \_site directory**!
+
+## Open Issues
+If you're interested in existing issues, you can
+
+- Look for issues with the `good first issue` and `documentation` label as a good place to get started.
+- Provide answers for questions in our [GitHub discussions](https://github.com/llmware-ai/llmware/discussions)
+- Provide help for bug or enhancement issues. 
+  - Ask questions, reproduce the issues, or provide solutions.
+  - Pull a request to fix the issue.