Personal conjured assistant template for fellow onmyōji 🔖
A self-contained, learning, fully offline virtual assistant.
- Start by running
init.sh
after you create your ownconfig.yml
file. - Run
git submodule update --init --recursive
to pull down beanstalkd, thencd beanstalkd
andmake
. - Run
start.sh
onceecosystem.config.js
is created.
Requires pm2, nvm, and rvm under a dedicated user account.
┌────────────────────────────────────────┐
│ PM2 Daemon │
└─────┬───────────────────────────┬──────┘
│ │
▼ ▼
┌─────────┐ ┌─────┐
│ core.rb ├───────────────────┤ │
└────┬────┘ │ │
│ │ │
▼ │ │
┌───┐ │ B │
│ │ ┌───────────┐ │ e │
│ ├────►│ module 00 ├────┤ a │
│ │ └───────────┘ │ n │
│ M │ │ s │
│ o │ ┌───────────┐ │ t │
│ d ├────►│ module 01 ├────┤ a │
│ u │ └───────────┘ │ l │
│ l │ │ k │
│ e │ ┌───────────┐ │ d │
│ s ├────►│ module 02 ├────┤ │
│ │ └───────────┘ │ │
│ │ │ │
│ │ ┌───────────┐ │ │
│ ├────►│ module NN ├────┤ │
└───┘ └───────────┘ └─────┘
Events from external resources (chat clients, databases, filesystems) are processed by the appropriate module, or queued into beanstalkd
as raw lines of ruby code. Each module is responsible for routing its events, which can be sent to another module or core.rb
, which will spawn a new thread and executes eval()
on the message body.
Every directory under modules
with a valid wrapper.sh
file will automatically be detected by core.rb
and sent to PM2 for startup and persistence.
Llama 3.1 8B Instruct
Quantization: Q5_K_M
llama_model_quantize_internal: model size = 30633.02 MB
llama_model_quantize_internal: quant size = 5459.93 MB
context size: 1200
"Context Size" = defines the maximum sequence length the model can process during inference or training. The context size determines how much text the model can "see" at once when generating predictions or understanding the input.
Q4_K_S
, Q4_K_M
, Q4_K_L
In 4-bit quantization, each parameter now requires only 0.5 bytes. For a 70 billion parameter model, the memory footprint becomes:
Memory for model weights:
70B params×0.5 bytes/param=35 GB of VRAM
** Coming Soon **
The Large Language Model (LLM) used in this project is currently Llama 3.1, which is trained on the following:
- 67.0% CommonCrawl
- 15.0% C4
- 4.5% GitHub
- 4.5% Wikipedia
- 4.5% Books
- 2.5% ArXiv
- 2.0% StackExchange
*** VERY MUCH A WORK IN PROGRESS ***