Skip to content

37Rb/discourse-chatbot-kb-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Discourse Chatbot Knowledge Base Tools

Knowledge base management tools for Discourse Chatbot

Discourse Chatbot is an AI chatbot plugin for Discourse that can be used to make a support bot on your forum. Visit Meta and Github to learn more about it.

Discourse Chatbot uses RAG to provide the AI model with domain-specific knowledge of your forum. While it's possible to let the bot access every public post on your forum, it may be benficial to limit access to a specific, non-public "knowledge base" category where you curate only high-quality posts that you want the bot to see.

Discourse Chatbot semantic-searches the forum for individual posts instead of topics which makes it challenging for the bot to find complete questions and answers. Often the question/problem is in one post while the answer/solution is in a later post. There may even be multiple back and forth posts leading up the the answer. The topic title and tags may also be useful to the bot but aren't included in this search because it's limited to posts.

Chatbot Knowledge Base Tools help you import knowledge from your forum & website into the knowledge base category in a format that's well-suited for the bot. For example, you can:

  • Import entire topics into the knowledge base as a single post so that the bot can see the entire conversation with all relevent context.
  • Import pages from your website into the knowledge base so the bot can use them in addition to your forum content.

Usage

A JavaScript object called ChatbotKnowledgeBase provides access to the tools. Use the JavaScript console in your browser while at your Discourse forum to create an instance of ChatbotKnowledgeBase and call its functions. You need to be logged in.

Example Javascript Console

Create the ChatbotKnowledgeBase object

Create a new ChatbotKnowledgeBase object and assign it to a variable.

x = new ChatbotKnowledgeBase()

By default, it will look for a category named Chatbot to use as the knowledge base category. If your knowledge base categiory has a different name, you can provide it when you create the object.

x = new ChatbotKnowledgeBase("My Knowledge Base Category")

Import a Topic

Use importTopic() to import a topic into the knowledge base.

importTopic(topicId, options)
  • topicId - The source topic ID
  • options - An object to specify optional behavior. Available options are
    • update: KB topic ID - Update an existing KB topic instead of creating a new one
    • include: Array of post numbers - Only include the given post numbers
    • exclude: Array of post numbers - Exclude the given post numbers

Examples

Import topic 26970 as a new KB topic.

x.importTopic(26970)

Import topic 26989 by updating KB topic 27102. Do this when a topic that's already been imported has changed.

x.importTopic(26989, { update: 27102 })

Only include posts 1, 2, and 4 in the KB topic.

x.importTopic(26989, { include: [1,2,4] })

Exclude posts 3, 5, and 6 from the KB topic.

x.importTopic(26989, { exclude: [3,5,6] })

Import a Category

Use importCategory() to import (or update) every topic in a category.

importCategory(categoryName, options)
  • categoryName: The name of the category to import from
  • options - An object to specify optional behavior. Available options are
    • limit: Integer - Only import the limit latest topics in the category

Use await with importCategory() so that you can see when it's done.

Examples

Import (or update) every topic in the How To category.

await x.importCategory("How To")

Import (or update) the 25 latest topics in the How To category.

await x.importCategory("How To", { limit: 25 })

Import a Web Page

Use importWebPage() to import a web page into the knowledge base.

importWebPage(url, options)
  • url - The URL if the web page
  • options - An object to specify optional behavior. Available options are
    • update: KB topic ID - Update an existing KB topic instead of creating a new one
    • removeTags: Array of HTML tags - Don't import content in these tags
    • removeIds: Array of HTML IDs - Don't import content in in elements with these IDs
    • dryRun: Boolean - Just print the markdown without importing anything

importWebPage() uses Turndown to convert HTML pages to markdown before importing them into the knowledge base.

Web pages have a lot of extra stuff you probably don't want to import. Use the removeTags and removeIds options to exclude content you don't want in the knowledge base.

Tags in the defaultImportWebPageRemoveTags property are exluded by default. You can modify this or create a new array for the removeTags option.

x.defaultImportWebPageRemoveTags = ['meta', 'style', 'link', 'script', 'noscript', 'applet', 'area', 'object', 'nav', 'base', 'embed', 'object', 'param', 'header', 'hgroup', 'footer']

Examples

Import https://suretyhome.com/why-surety/ as a new KB topic.

x.importWebPage("https://suretyhome.com/why-surety/")

Import https://suretyhome.com/why-surety/ by updating KB topic 28500. Do this when a web page that's already been imported has changed.

x.importWebPage("https://suretyhome.com/why-surety/", { update: 28500 })

Don't include any content in the div with ID = "mini-cart".

x.importWebPage("https://suretyhome.com/why-surety/", { removeIds: ["mini-cart"] })

Show the markdown instead of importing it into the knowledge base.

x.importWebPage("https://suretyhome.com/why-surety/", { dryRun: true })

Update All Imports

Use updateAllImports() to update all existing imports that have changed.

updateAllImports(types = ['topic', 'page'])
  • types: Array of import types - The types to update, defaults to all

Use await with updateAllImports() so that you can see when it's done.

Examples

Update all imports. (of all types)

await x.updateAllImports()

Update all topic imports.

await x.updateAllImports(['topic'])

Installation

Install Theme Component

Install as a theme component from the Git repository.

https://github.com/37Rb/discourse-chatbot-kb-tools.git

Security Setup

ImportWebPage needs to access resources on other domains. You need to configure CSP and CORS to allow it.

Configure CSP by in Discourse admin content security policy script src setting (found under Security). Add https://unpkg.com/turndown/dist/turndown.js as a script source so that Turndown can be loaded.

Configure CORS by adding an Access-Control-Allow-Origin HTTP header to the website you will import from allowing your Discourse forum origin (https:// and domain name) to access content on that website. Alternatively, you can disable CORS checks in your browser when importing web pages.

JavaScript Console

Why JavaScript Console?

KB tools are used via the JavaScript console because it's the easiest way to run code as a logged in user without having to pass around API keys or develop a GUI. If they get heavily used it may make sense in the future to build them into the Discourse GUI.

Hide Network Errors

Sometimes the Discourse API responds with an error status code even during normal usage. For example

  • If you try to import a topic that's already been imported without using the update option then the API will respond with 422 Unprocessable Content.
  • During bulk imports the API will sometimes respond with 429 Too Many Requests, asking the client to slow down.

These are normal and the tools handle them fine but the Javascript console might show them as big red errors which is annoying. You can configure it to hide those.

Discourse Chatbot Embedding Tools

Semantic search is great but trying to optimize or troubleshoot it can be difficult because it happens behind the scenes. Embedding Tools is a command line program that helps you see what's happening behind the curtain. You can easily

  • Run a semantic search and see which posts show up at the top, along with their similary scores
  • Calulate the similarity score between a search query and post embedding
  • Get an embedding for a search query from OpenAI

Usage

It's a python script that lives in the "external" folder of the Git repository.

% cd external
% ./embeddings.py -h
usage: embeddings.py [-h] {embedding,similarity,search} ...

Discourse Chatbot embedding tools.

positional arguments:
  {embedding,similarity,search}
                        Available commands
    embedding           Show the embedding for a query
    similarity          Show the similarity between a post, file, or string and a query
    search              Show search results for a query

options:
  -h, --help            show this help message and exit

Run Semantic Search

Show the top search results of a semantic search for the query across all the post embeddings.

% ./embeddings.py search -h
usage: embeddings.py search [-h] [-l LIMIT] query

positional arguments:
  query                 The query

options:
  -h, --help            show this help message and exit
  -l LIMIT, --limit LIMIT
                        Show this many search results

Examples

Semantic search for the query, "Mount a PG9303".

% ./embeddings.py search "Mount a PG9303"

Semantic search for the query, "Mount a PG9303" and only show the top 3 results.

% ./embeddings.py search "Mount a PG9303"  --limit 3

Calculate Similarity Score

Calculate the similarity score between a query and a specific post embedding, a string, or the contents of a file. The similarity score is 1 minus the cosine distance between the query embedding and the post embedding.

% ./embeddings.py similarity -h
usage: embeddings.py similarity [-h] [-e EMBEDDING] [-p POST] [-t TOPIC] [-n NUMBER] [-f FILE] [-s STRING] query

positional arguments:
  query                 The query

options:
  -h, --help            show this help message and exit
  -e EMBEDDING, --embedding EMBEDDING
                        Compare to the embedding with ID
  -p POST, --post POST  Compare to the post with ID
  -t TOPIC, --topic TOPIC
                        Compare to a post in the topic with ID
  -n NUMBER, --number NUMBER
                        The post number in a topic (used with --topic)
  -f FILE, --file FILE  Compare to the contents of a file
  -s STRING, --string STRING
                        Compare the query to a given string

One of --embedding, --post, or --topic are required to find the post embedding that the query will be compared to.

Examples

Show the similarity score between the first post in topic 28476 and the query, "Mount a PG9303".

 % ./embeddings.py similarity "Mount a PG9303" --topic 28476

Show the similarity score between the contents of the file testing.md and the query, "The cat ran away".

% ./embeddings.py similarity -f testing.md "The cat ran away"

Show the similarity score between the given string, "The dog ran across the street" and the query, "The dog ran across the road".

./embeddings.py similarity -s "The dog ran across the street" "The dog ran across the road"

Get Embedding

Get the embedding vector for a query from OpenAI.

% ./embeddings.py embedding -h
usage: embeddings.py embedding [-h] query

positional arguments:
  query       The query

options:
  -h, --help  show this help message and exit

Examples

% ./embeddings.py embedding "Mount a PG9303"

Export Embeddings From Discourse

Run the Data Explorer query to export embeddings from Discourse as a CSV file. Useful if you want to automate semantic search testing.

% ./embeddings.py export -h
usage: embeddings.py export [-h] -d DOMAIN -q QUERY -o OUTPUT

options:
  -h, --help            show this help message and exit
  -d DOMAIN, --domain DOMAIN
                        Domain name of the Discourse site
  -q QUERY, --query QUERY
                        Query ID
  -o OUTPUT, --output OUTPUT
                        Output file name

Example

% ./embeddings.py export -d support.suretyhome.com -q 6 -o embeddings-export.csv

Installation

Clone the Git repository.

% git clone https://github.com/37Rb/discourse-chatbot-kb-tools.git

The script requires Python3 to be installed. Then install these pip packages.

% pip install requests
% pip install scipy
% pip install openai
% pip install termcolor

Set your OPENAI_API_KEY environment variable to an API key you get from your OpenAI account.

% export OPENAI_API_KEY=XXXXXXXXXXXXXXXXXXXXXXX

By default, embeddings are created using the text-embedding-ada-002 model. Optionally, you can choose a different model by setting the EMBEDDINGS_MODEL environment variable.

% export EMBEDDINGS_MODEL=text-embedding-3-small

This last step is only required if you want to run the search command or calculate the similarity of a query and a post on your forum. Export your Chatbot embeddings to a CSV file using the Data Explorer plugin. Create the following query.

SELECT e.id, e.post_id AS post, p.topic_id AS topic, p.post_number,
       t.title as topic_title, e.embedding
FROM chatbot_post_embeddings e LEFT JOIN
     posts p ON e.post_id = p.id JOIN
     topics t ON p.topic_id = t.id
WHERE p.deleted_at IS NULL

Run it and then download the results as CSV. Once downloaded, set your EMBEDDINGS_FILE environment variable as the path to that CSV file.

% export EMBEDDINGS_FILE=~/Downloads/chatbot-embeddings-blah-blah-blah.csv

If you want to export as CSV using using this tool instead of the Discourse UI, set these environment variables.

Set your DISCOURSE_API_KEY environment variable to an API key you get from Discourse.

% export DISCOURSE_API_KEY=XXXXXXXXXXXXXXXXXXXXXXX

If your DISCOURSE_API_KEY is associated with a user other than system, then set your DISCOURSE_API_USER environment variable to that user. Defaults to system.

% export DISCOURSE_API_USER=XXXXXXX

About

Knowledge Base Management Tools for Discourse Chatbot

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages