Skip to content

Latest commit

 

History

History
64 lines (43 loc) · 3.76 KB

local-models.md

File metadata and controls

64 lines (43 loc) · 3.76 KB

Local Models

Note

This section is about small, specialized, non-LLM transformer AI models. ⏩ See here for local LLMs

When starting the bpm-ai-inference extension container in addition to the main connector container, you gain the possibility to use free, 100% local AI models instead of API based services.

These models are tiny compared to your average LLM but specialized for a specific task (like classification for example).

This means that accuracy will not always match that of a big LLM, but sometimes comes surprisingly close.

Here are some general notes and limitations:

  • Average model size is 1-2 GB
  • Models are loaded on demand
  • For best experience, 16+ GB of RAM and at least 4 CPU cores should be available (check docker engine config!)
  • Most models work best with English. We try to provide multilingual alternatives, but mileage may vary
  • The models are usually less flexible and behave a bit differently than LLMs, the connectors try to mask that as good as possible. See details for specific connectors below

Start the inference container manually:

docker compose --profile inference up -d

... or select the appropriate option in the wizard setup script.

Decide Connector

Select Text Classifier or Image Classifier as LLM / Model.

Select a fitting model based on the desired speed/accuracy tradeoff and language.

You can also use any model from the HuggingFace Hub that supports the zero-shot-classification or text-classification task (or zero-shot-image-classification or image-classification for Image Classifier).

In the case of Image Classifier you must provide a single variable as input that contains a path/url to an image file or single-page PDF (for multiple pages only the first is used). This is because this kind of model can only accept a single image and there is no meaningful way to combine multiple results into one. Use multiple connectors/activities instead.

Usage Differences to LLMs

  • List of possible values is always required for zero-shot models (all that are pre-configured).
    • If your model has fixed output labels, possible values must be left empty!
  • The decision task is best left empty or provided as a single, fully formed question
  • The model does not provide a reasoning for its decision, the corresponding result field is null

Extract Connector

Select Text Extraction Model as LLM / Model.

Select a fitting model based on the desired speed/accuracy tradeoff and language.

You can also use any model from the HuggingFace Hub that supports the question-answering task.

Usage Differences to LLMs

  • Extraction field descriptions are best provided as fully formed questions
  • Extraction Mode Multiple Entities is experimental and may not yield good results in all cases
  • Fields are extracted one-by-one so different to an LLM the model lacks the context of already extracted fields, which may lead to wrong or duplicate extraction. To mitigate that, you can include template variables (e.g. {alreadyExtractedFieldName}) in your descriptions referencing already extracted fields (use dot notation for nested objects). All in all, the extracting schema needs more tuning and engineering for more complex cases than would be necessary with an LLM.

Translate Connector

Select Neural Machine Translation as LLM / Model.

Select a fitting model based on the desired speed/accuracy tradeoff and language (currently only Opus-MT is available as local model).

Usage Differences to LLMs

  • Each language pair and direction uses a dedicated model, so if you expect a lot of combinations, this may be inefficient
  • We currently support the following languages: DANISH, DUTCH, ENGLISH, FINNISH, FRENCH, GERMAN, ITALIAN, NORWEGIAN, POLISH, PORTUGUESE, SPANISH, SWEDISH, UKRAINIAN