Skip to content

massimopiazza/veesto

Repository files navigation

veesto

VLM edge inference on iOS, for monitoring & surveillance

Setup

Preferred setup: run the bootstrap script

Clone the repository, then run the bootstrap script:

git clone git@github.com:massimopiazza/veesto.git
cd veesto
./scripts/bootstrap-model-assets.sh

If you already have the repository cloned, run ./scripts/bootstrap-model-assets.sh again after git pull. The script is safe to re-run. It only downloads missing files and verifies the expected layout for a local build.

The script does all of the data hydration needed for a local build:

  1. It downloads the missing SmolVLM ONNX files from Hugging Face into veesto/SmolVLM-onnx/.
  2. It downloads the Qwen3-VL 2B Instruct 8-bit bundle from Google Drive into veesto/Qwen3VL-CoreML/.
  3. It downloads the LiquidAI GGUF bundles into veesto/LiquidAI-GGUF/.
  4. It clones and builds the bundled llama.cpp + libmtmd iOS runtime into Vendor/llama.cpp/build-apple/llama.xcframework.
  5. It verifies that all required model assets are present in the correct folders.

Manual configuration

If you do not want to use the bootstrap script, you can manage the downloads manually.

1. Download the SmolVLM ONNX files from Hugging Face

The ONNX model files are not included in this repository due to their size. Download them from HuggingFace and place them in the veesto/SmolVLM-onnx/ folder:

  1. decoder_model_merged.onnx
  2. embed_tokens.onnx
  3. vision_encoder.onnx

Place all three files in: veesto/SmolVLM-onnx/

2. Download the Qwen3-VL 2B Instruct 8-bit bundle from Google Drive

Qwen3-VL 2B Instruct 8-bit assets

Place them under veesto/Qwen3VL-CoreML/ with this exact layout:

veesto/Qwen3VL-CoreML/
  qwen3vl_2b_instruct_decoder_model_merged.int8.onnx
  qwen3vl_2b_instruct_decoder_model_merged.int8.onnx.data
  qwen3vl_2b_instruct_embed_tokens.onnx
  qwen3vl_2b_instruct_tokenizer.json
  qwen3vl_shared_vision_448_pal16.mlpackage/
    Manifest.json
    Data/com.apple.CoreML/model.mlmodel
    Data/com.apple.CoreML/weights/weight.bin

3. Download the LiquidAI GGUF bundles from Hugging Face

The LiquidAI models are stored outside Git and are fetched by the bootstrap script through the Hugging Face CLI:

hf download LiquidAI/LFM2-VL-450M-GGUF \
  --local-dir veesto/LiquidAI-GGUF \
  --include "LFM2-VL-450M-Q8_0.gguf" \
  --include "mmproj-LFM2-VL-450M-Q8_0.gguf"

hf download LiquidAI/LFM2.5-VL-1.6B-GGUF \
  --local-dir veesto/LiquidAI-GGUF \
  --include "LFM2.5-VL-1.6B-Q8_0.gguf" \
  --include "mmproj-LFM2.5-VL-1.6b-Q8_0.gguf"

This produces the following layout:

veesto/LiquidAI-GGUF/
  LFM2-VL-450M-Q8_0.gguf
  mmproj-LFM2-VL-450M-Q8_0.gguf
  LFM2.5-VL-1.6B-Q8_0.gguf
  mmproj-LFM2.5-VL-1.6b-Q8_0.gguf

4. Build the bundled llama.cpp runtime

The LiquidAI GGUF family uses a locally built llama.cpp runtime with libmtmd enabled for multimodal image support. The bootstrap script will clone the pinned upstream commit and build:

Vendor/llama.cpp/build-apple/llama.xcframework

If you want to build it manually, run:

./scripts/build-llama-runtime.sh

About

VLM edge inference on iOS, for monitoring & surveillance

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors