Releases: mudler/LocalAI
v2.20.0
TL;DR
- 🌍 Explorer & Community: Explore global community pools at explorer.localai.io
- 👀 Demo instance available: Test out LocalAI at demo.localai.io
- 🤗 Integration: Hugging Face Local apps now include LocalAI
- 🐛 Bug Fixes: Diffusers and hipblas issues resolved
- 🎨 New Feature: FLUX-1 image generation support
- 🏎️ Strict Mode: Stay compliant with OpenAI’s latest API changes
- 💪 Multiple P2P Clusters: Run multiple clusters within the same network
- 🧪 Deprecation Notice:
gpt4all.cpp
andpetals
backends deprecated
🌍 Explorer and Global Community Pools
Now you can share your LocalAI instance with the global community or explore available instances by visiting explorer.localai.io. This decentralized network powers our demo instance, creating a truly collaborative AI experience.
How It Works
Using the Explorer, you can easily share or connect to clusters. For detailed instructions on creating new clusters or connecting to existing ones, check out our documentation.
👀 Demo Instance Now Available
Curious about what LocalAI can do? Dive right in with our live demo at demo.localai.io! Thanks to our generous sponsors, this instance is publicly available and configured via peer-to-peer (P2P) networks. If you'd like to connect, follow the instructions here.
🤗 Hugging Face Integration
I am excited to announce that LocalAI is now integrated within Hugging Face’s local apps! This means you can select LocalAI directly within Hugging Face to build and deploy models with the power and flexibility of our platform. Experience seamless integration with a single click!
This integration was made possible through this PR.
🎨 FLUX-1 Image Generation Support
FLUX-1 lands in LocalAI! With this update, LocalAI can now generate stunning images using FLUX-1, even in federated mode. Whether you're experimenting with new designs or creating production-quality visuals, FLUX-1 has you covered.
Try it out at demo.localai.io and see what LocalAI + FLUX-1 can do!
🐛 Diffusers and hipblas Fixes
Great news for AMD users! If you’ve encountered issues with the Diffusers backend or hipblas, those bugs have been resolved. We’ve transitioned to uv
for managing Python dependencies, ensuring a smoother experience. For more details, check out Issue #1592.
🏎️ Strict Mode for API Compliance
To stay up to date with OpenAI’s latest changes, now LocalAI have support as well for Strict Mode ( https://openai.com/index/introducing-structured-outputs-in-the-api/ ). This new feature ensures compatibility with the most recent API updates, enforcing stricter JSON outputs using BNF grammar rules.
To activate, simply set strict: true
in your API calls, even if it’s disabled in your configuration.
Key Notes:
- Setting
strict: true
enables grammar enforcement, even if disabled in your config. - If
format_type
is set tojson_schema
, BNF grammars will be automatically generated from the schema.
🛑 Disable Gallery
Need to streamline your setup? You can now disable the gallery endpoint using LOCALAI_DISABLE_GALLERY_ENDPOINT
. For more options, check out the full list of commands with --help
.
🌞 P2P and Federation Enhancements
Several enhancements have been made to improve your experience with P2P and federated clusters:
- Load Balancing by Default: This feature is now enabled by default (disable it with
LOCALAI_RANDOM_WORKER
if needed). - Target Specific Workers: Directly target workers in federated mode using
LOCALAI_TARGET_WORKER
.
💪 Run Multiple P2P Clusters in the Same Network
You can now run multiple clusters within the same network by specifying a network ID via CLI. This allows you to logically separate clusters while using the same shared token. Just set LOCALAI_P2P_NETWORK_ID
to a UUID that matches across instances.
Please note, while this offers segmentation, it’s not fully secure—anyone with the network token can view available services within the network.
🧪 Deprecation Notice: gpt4all.cpp
and petals
Backends
As we continue to evolve, we are officially deprecating the gpt4all.cpp
and petals
backends. The newer llama.cpp
offers a superior set of features and better performance, making it the preferred choice moving forward.
From this release onward, gpt4all
models in ggml
format are no longer compatible. Additionally, the petals
backend has been deprecated as well. LocalAI’s new P2P capabilities now offer a comprehensive replacement for these features.
What's Changed
Breaking Changes 🛠
Bug fixes 🐛
- fix(ui): do not show duplicate entries if not installed by gallery by @mudler in #3107
- fix: be consistent in downloading files, check for scanner errors by @mudler in #3108
- fix: ensure correct version of torch is always installed based on BUI… by @cryptk in #2890
- fix(python): move accelerate and GPU-specific libs to build-type by @mudler in #3194
- fix(apple): disable BUILD_TYPE metal on fallback by @mudler in #3199
- fix(vall-e-x): pin hipblas deps by @mudler in #3201
- fix(diffusers): use nightly rocm for hipblas builds by @mudler in #3202
- fix(explorer): reset counter when network is active by @mudler in #3213
- fix(p2p): allocate tunnels only when needed by @mudler in #3259
- fix(gallery): be consistent and disable UI routes as well by @mudler in #3262
- fix(parler-tts): bump and require after build type deps by @mudler in #3272
- fix: add llvm to extra images by @mudler in #3321
- fix(p2p): re-use p2p host when running federated mode by @mudler in #3341
- fix(ci): pin to llvmlite 0.43 by @mudler in #3342
- fix(p2p): avoid starting the node twice by @mudler in #3349
- fix(chat): re-generated uuid, created, and text on each request by @mudler in #3359
Exciting New Features 🎉
- feat(guesser): add gemma2 by @sozercan in #3118
- feat(venv): shared env by @mudler in #3195
- feat(openai): add
json_schema
format type and strict mode by @mudler in #3193 - feat(p2p): allow to run multiple clusters in the same p2p network by @mudler in #3128
- feat(p2p): add network explorer and community pools by @mudler in #3125
- feat(explorer): relax token deletion with error threshold by @mudler in #3211
- feat(diffusers): support flux models by @mudler in #3129
- feat(explorer): make possible to run sync in a separate process by @mudler in #3224
- feat(federated): allow to pickup a specific worker, improve loadbalancing by @mudler in #3243
- feat: Initial Version of vscode DevContainer by @dave-gray101 in #3217
- feat(explorer): visual improvements by @mudler in #3247
- feat(gallery): lazy load images by @mudler in #3246
- chore(explorer): add join instructions by @mudler in #3255
- chore: allow to disable gallery endpoints, improve p2p connection handling by @mudler in #3256
- chore(ux): add animated header with anime.js in p2p sections by @mudler in #3271
- chore(p2p): make commands easier to copy-paste by @mudler in #3273
- chore(ux): allow to create and drag dots in the animation by @mudler in #3287
- feat(federation): do not allocate local services for load balancing by @mudler in #3337
- feat(p2p): allow to set intervals b...
v2.19.4
What's Changed
🧠 Models
- chore(model-gallery): ⬆️ update checksum by @localai-bot in #3040
- chore(model-gallery): ⬆️ update checksum by @localai-bot in #3043
- models(gallery): add magnum-32b-v1 by @mudler in #3044
- models(gallery): add lumimaid-v0.2-70b-i1 by @mudler in #3045
- models(gallery): add sekhmet_aleph-l3.1-8b-v0.1-i1 by @mudler in #3046
- models(gallery): add l3.1-8b-llamoutcast-i1 by @mudler in #3047
- models(gallery): add l3.1-8b-celeste-v1.5 by @mudler in #3080
- models(gallery): add llama-guard-3-8b by @mudler in #3082
- models(gallery): add meta-llama-3-instruct-8.9b-brainstorm-5x-form-11 by @mudler in #3083
- models(gallery): add sunfall-simpo by @mudler in #3088
- models(gallery): add genius-llama3.1-i1 by @mudler in #3089
- models(gallery): add seeker-9b by @mudler in #3090
- models(gallery): add llama3.1-chinese-chat by @mudler in #3091
- models(gallery): add gemmasutra-pro-27b-v1 by @mudler in #3092
- models(gallery): add leetwizard by @mudler in #3093
- models(gallery): add tarnished-9b-i1 by @mudler in #3096
- models(gallery): add meta-llama-3-instruct-12.2b-brainstorm-20x-form-8 by @mudler in #3097
- models(gallery): add loki-base-i1 by @mudler in #3098
- models(gallery): add tifa by @mudler in #3099
👒 Dependencies
- chore(deps): Bump langchain from 0.2.10 to 0.2.11 in /examples/langchain/langchainpy-localai-example by @dependabot in #3053
- chore(deps): Bump openai from 1.37.0 to 1.37.1 in /examples/langchain/langchainpy-localai-example by @dependabot in #3051
- chore(deps): Bump setuptools from 70.3.0 to 72.1.0 in /backend/python/autogptq by @dependabot in #3048
- chore(deps): Bump setuptools from 70.3.0 to 72.1.0 in /backend/python/vllm by @dependabot in #3061
- chore(deps): Bump chromadb from 0.5.4 to 0.5.5 in /examples/langchain-chroma by @dependabot in #3060
- chore(deps): Bump setuptools from 70.3.0 to 72.1.0 in /backend/python/parler-tts by @dependabot in #3062
- chore(deps): Bump setuptools from 70.3.0 to 72.1.0 in /backend/python/rerankers by @dependabot in #3067
- chore(deps): Bump setuptools from 69.5.1 to 72.1.0 in /backend/python/transformers-musicgen by @dependabot in #3066
- chore(deps): Bump setuptools from 70.3.0 to 72.1.0 in /backend/python/coqui by @dependabot in #3068
- chore(deps): Bump setuptools from 70.3.0 to 72.1.0 in /backend/python/vall-e-x by @dependabot in #3069
- chore(deps): Bump setuptools from 70.3.0 to 72.1.0 in /backend/python/petals by @dependabot in #3070
- chore(deps): Bump setuptools from 69.5.1 to 72.1.0 in /backend/python/transformers by @dependabot in #3071
- chore(deps): Bump streamlit from 1.36.0 to 1.37.0 in /examples/streamlit-bot by @dependabot in #3072
Other Changes
- docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #3039
- fix: install.sh bash specific equality check by @dave-gray101 in #3038
- chore: ⬆️ Update ggerganov/llama.cpp by @localai-bot in #3075
- Revert "chore(deps): Bump setuptools from 69.5.1 to 72.1.0 in /backend/python/transformers-musicgen" by @mudler in #3077
- Revert "chore(deps): Bump setuptools from 69.5.1 to 72.1.0 in /backend/python/transformers" by @mudler in #3078
- Revert "chore(deps): Bump setuptools from 70.3.0 to 72.1.0 in /backend/python/vllm" by @mudler in #3079
- fix(llama-cpp): do not compress with UPX by @mudler in #3084
- fix(ci): update openvoice checkpoints URLs by @mudler in #3085
- chore: ⬆️ Update ggerganov/llama.cpp by @localai-bot in #3086
- chore: ⬆️ Update ggerganov/llama.cpp by @localai-bot in #3102
Full Changelog: v2.19.3...v2.19.4
v2.19.3
What's Changed
Bug fixes 🐛
- fix(gallery): do not attempt to delete duplicate files by @mudler in #3031
- fix(gallery): do clear out errors once displayed by @mudler in #3033
Exciting New Features 🎉
🧠 Models
- models(gallery): add llama3.1-claude by @mudler in #3005
- models(gallery): add darkidol llama3.1 by @mudler in #3008
- models(gallery): add gemmoy by @mudler in #3009
- chore: add function calling template for llama 3.1 models by @mudler in #3010
- chore: models(gallery): ⬆️ update checksum by @localai-bot in #3013
- models(gallery): add mistral-nemo by @mudler in #3019
- models(gallery): add llama3.1-8b-fireplace2 by @mudler in #3018
- models(gallery): add lumimaid-v0.2-12b by @mudler in #3020
- models(gallery): add darkidol-llama-3.1-8b-instruct-1.1-uncensored-iq… by @mudler in #3021
- models(gallery): add meta-llama-3.1-8b-instruct-abliterated by @mudler in #3022
- models(gallery): add llama-3.1-70b-japanese-instruct-2407 by @mudler in #3023
- models(gallery): add llama-3.1-8b-instruct-fei-v1-uncensored by @mudler in #3024
- models(gallery): add openbuddy-llama3.1-8b-v22.1-131k by @mudler in #3025
- models(gallery): add lumimaid-8b by @mudler in #3026
- models(gallery): add llama3 with enforced functioncall with grammars by @mudler in #3027
- chore(model-gallery): ⬆️ update checksum by @localai-bot in #3036
👒 Dependencies
- chore: ⬆️ Update ggerganov/llama.cpp by @localai-bot in #3003
- chore: ⬆️ Update ggerganov/llama.cpp by @localai-bot in #3012
- chore: ⬆️ Update ggerganov/llama.cpp by @localai-bot in #3016
- chore: ⬆️ Update ggerganov/llama.cpp by @localai-bot in #3030
- chore: ⬆️ Update ggerganov/whisper.cpp by @localai-bot in #3029
- chore: ⬆️ Update ggerganov/llama.cpp by @localai-bot in #3034
Other Changes
- docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #3002
- refactor: break down json grammar parser in different files by @mudler in #3004
- fix: PR title tag for checksum checker script workflow by @dave-gray101 in #3014
Full Changelog: v2.19.2...v2.19.3
v2.19.2
This release is a patch release to fix well known issues from 2.19.x
What's Changed
Bug fixes 🐛
- fix: pin setuptools 69.5.1 by @fakezeta in #2949
- fix(cuda): downgrade to 12.0 to increase compatibility range by @mudler in #2994
- fix(llama.cpp): do not set anymore lora_base by @mudler in #2999
Exciting New Features 🎉
- ci(Makefile): reduce binary size by compressing by @mudler in #2947
- feat(p2p): warn the user to start with --p2p by @mudler in #2993
🧠 Models
- models(gallery): add tulu 8b and 70b by @mudler in #2931
- models(gallery): add suzume-orpo by @mudler in #2932
- models(gallery): add archangel_sft_pythia2-8b by @mudler in #2933
- models(gallery): add celestev1.2 by @mudler in #2937
- models(gallery): add calme-2.3-phi3-4b by @mudler in #2939
- models(gallery): add calme-2.8-qwen2-7b by @mudler in #2940
- models(gallery): add StellarDong-72b by @mudler in #2941
- models(gallery): add calme-2.4-llama3-70b by @mudler in #2942
- models(gallery): add llama3.1 70b and 8b by @mudler in #3000
📖 Documentation and examples
- docs: add federation by @mudler in #2929
- docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #2935
👒 Dependencies
- chore: ⬆️ Update ggerganov/llama.cpp by @localai-bot in #2936
- chore: ⬆️ Update ggerganov/llama.cpp by @localai-bot in #2943
- chore(deps): Bump grpcio from 1.64.1 to 1.65.1 in /backend/python/openvoice by @dependabot in #2956
- chore(deps): Bump grpcio from 1.65.0 to 1.65.1 in /backend/python/sentencetransformers by @dependabot in #2955
- chore(deps): Bump grpcio from 1.65.0 to 1.65.1 in /backend/python/bark by @dependabot in #2951
- chore(deps): Bump docs/themes/hugo-theme-relearn from
1b2e139
to7aec99b
by @dependabot in #2952 - chore(deps): Bump langchain from 0.2.8 to 0.2.10 in /examples/langchain/langchainpy-localai-example by @dependabot in #2959
- chore(deps): Bump numpy from 1.26.4 to 2.0.1 in /examples/langchain/langchainpy-localai-example by @dependabot in #2958
- chore(deps): Bump sqlalchemy from 2.0.30 to 2.0.31 in /examples/langchain/langchainpy-localai-example by @dependabot in #2957
- chore(deps): Bump grpcio from 1.65.0 to 1.65.1 in /backend/python/vllm by @dependabot in #2964
- chore(deps): Bump llama-index from 0.10.55 to 0.10.56 in /examples/chainlit by @dependabot in #2966
- chore(deps): Bump grpcio from 1.65.0 to 1.65.1 in /backend/python/common/template by @dependabot in #2963
- chore(deps): Bump weaviate-client from 4.6.5 to 4.6.7 in /examples/chainlit by @dependabot in #2965
- chore(deps): Bump grpcio from 1.65.0 to 1.65.1 in /backend/python/transformers by @dependabot in #2970
- chore(deps): Bump openai from 1.35.13 to 1.37.0 in /examples/functions by @dependabot in #2973
- chore(deps): Bump grpcio from 1.65.0 to 1.65.1 in /backend/python/diffusers by @dependabot in #2969
- chore(deps): Bump grpcio from 1.65.0 to 1.65.1 in /backend/python/exllama2 by @dependabot in #2971
- chore(deps): Bump grpcio from 1.65.0 to 1.65.1 in /backend/python/rerankers by @dependabot in #2974
- chore(deps): Bump grpcio from 1.65.0 to 1.65.1 in /backend/python/coqui by @dependabot in #2980
- chore(deps): Bump grpcio from 1.65.0 to 1.65.1 in /backend/python/parler-tts by @dependabot in #2982
- chore(deps): Bump grpcio from 1.65.0 to 1.65.1 in /backend/python/vall-e-x by @dependabot in #2981
- chore(deps): Bump grpcio from 1.65.0 to 1.65.1 in /backend/python/transformers-musicgen by @dependabot in #2990
- chore(deps): Bump grpcio from 1.65.0 to 1.65.1 in /backend/python/autogptq by @dependabot in #2984
- chore(deps): Bump llama-index from 0.10.55 to 0.10.56 in /examples/langchain-chroma by @dependabot in #2986
- chore(deps): Bump grpcio from 1.65.0 to 1.65.1 in /backend/python/mamba by @dependabot in #2989
- chore: ⬆️ Update ggerganov/llama.cpp by @localai-bot in #2992
- chore(deps): Bump langchain-community from 0.2.7 to 0.2.9 in /examples/langchain/langchainpy-localai-example by @dependabot in #2960
- chore(deps): Bump openai from 1.35.13 to 1.37.0 in /examples/langchain/langchainpy-localai-example by @dependabot in #2961
- chore(deps): Bump langchain from 0.2.8 to 0.2.10 in /examples/functions by @dependabot in #2975
- chore(deps): Bump openai from 1.35.13 to 1.37.0 in /examples/langchain-chroma by @dependabot in #2988
- chore(deps): Bump langchain from 0.2.8 to 0.2.10 in /examples/langchain-chroma by @dependabot in #2987
- chore: ⬆️ Update ggerganov/llama.cpp by @localai-bot in #2995
Other Changes
Full Changelog: v2.19.1...v2.19.2
v2.19.1
LocalAI 2.19.1 is out! 📣
TLDR; Summary spotlight
- 🖧 Federated Instances via P2P: LocalAI now supports federated instances with P2P, offering both load-balanced and non-load-balanced options.
- 🎛️ P2P Dashboard: A new dashboard to guide and assist in setting up P2P instances with auto-discovery using shared tokens.
- 🔊 TTS Integration: Text-to-Speech (TTS) is now included in the binary releases.
- 🛠️ Enhanced Installer: The installer script now supports setting up federated instances.
- 📥 Model Pulling: Models can now be pulled directly via URL.
- 🖼️ WebUI Enhancements: Visual improvements and cleanups to the WebUI and model lists.
- 🧠 llama-cpp Backend: The llama-cpp (grpc) backend now supports embedding ( https://localai.io/features/embeddings/#llamacpp-embeddings )
- ⚙️ Tool Support: Small enhancements to tools with disabled grammars.
🖧 LocalAI Federation and AI swarms
LocalAI is revolutionizing the future of distributed AI workloads by making it simpler and more accessible. No more complex setups, Docker or Kubernetes configurations – LocalAI allows you to create your own AI cluster with minimal friction. By auto-discovering and sharing work or weights of the LLM model across your existing devices, LocalAI aims to scale both horizontally and vertically with ease.
How it works?
Starting LocalAI with --p2p
generates a shared token for connecting multiple instances: and that's all you need to create AI clusters, eliminating the need for intricate network setups. Simply navigate to the "Swarm" section in the WebUI and follow the on-screen instructions.
For fully shared instances, initiate LocalAI with --p2p --federated
and adhere to the Swarm section's guidance. This feature, while still experimental, offers a tech preview quality experience.
Federated LocalAI
Launch multiple LocalAI instances and cluster them together to share requests across the cluster. The "Swarm" tab in the WebUI provides one-liner instructions on connecting various LocalAI instances using a shared token. Instances will auto-discover each other, even across different networks.
Check out a demonstration video: Watch now
LocalAI P2P Workers
Distribute weights across nodes by starting multiple LocalAI workers, currently available only on the llama.cpp backend, with plans to expand to other backends soon.
Check out a demonstration video: Watch now
What's Changed
Bug fixes 🐛
- fix: make sure the GNUMake jobserver is passed to cmake for the llama.cpp build by @cryptk in #2697
- Using exec when starting a backend instead of spawning a new process by @a17t in #2720
- fix(cuda): downgrade default version from 12.5 to 12.4 by @mudler in #2707
- fix: Lora loading by @vaaale in #2893
- fix: short-circuit when nodes aren't detected by @mudler in #2909
- fix: do not list txt files as potential models by @mudler in #2910
🖧 P2P area
- feat(p2p): Federation and AI swarms by @mudler in #2723
- feat(p2p): allow to disable DHT and use only LAN by @mudler in #2751
Exciting New Features 🎉
- Allows to remove a backend from the list by @mauromorales in #2721
- ci(Makefile): adds tts in binary releases by @mudler in #2695
- feat: HF
/scan
endpoint by @dave-gray101 in #2566 - feat(model-list): be consistent, skip known files from listing by @mudler in #2760
- feat(models): pull models from urls by @mudler in #2750
- feat(webui): show also models without a config in the welcome page by @mudler in #2772
- feat(install.sh): support federated install by @mudler in #2752
- feat(llama.cpp): support embeddings endpoints by @mudler in #2871
- feat(functions): parse broken JSON when we parse the raw results, use dynamic rules for grammar keys by @mudler in #2912
- feat(federation): add load balanced option by @mudler in #2915
🧠 Models
- models(gallery): ⬆️ update checksum by @localai-bot in #2701
- models(gallery): add l3-8b-everything-cot by @mudler in #2705
- models(gallery): add hercules-5.0-qwen2-7b by @mudler in #2708
- models(gallery): add llama3-8b-darkidol-2.2-uncensored-1048k-iq-imatrix by @mudler in #2710
- models(gallery): add llama-3-llamilitary by @mudler in #2711
- models(gallery): add tess-v2.5-gemma-2-27b-alpha by @mudler in #2712
- models(gallery): add arcee-agent by @mudler in #2713
- models(gallery): add gemma2-daybreak by @mudler in #2714
- models(gallery): add L3-Stheno-Maid-Blackroot-Grand-HORROR-16B-GGUF by @mudler in #2715
- models(gallery): add qwen2-7b-instruct-v0.8 by @mudler in #2717
- models(gallery): add internlm2_5-7b-chat-1m by @mudler in #2719
- models(gallery): add gemma-2-9b-it-sppo-iter3 by @mudler in #2722
- models(gallery): add llama-3_8b_unaligned_alpha by @mudler in #2727
- models(gallery): add l3-8b-lunaris-v1 by @mudler in #2729
- models(gallery): add llama-3_8b_unaligned_alpha_rp_soup-i1 by @mudler in #2734
- models(gallery): add hathor_respawn-l3-8b-v0.8 by @mudler in #2738
- models(gallery): add llama3-8b-instruct-replete-adapted by @mudler in #2739
- models(gallery): add llama-3-perky-pat-instruct-8b by @mudler in #2740
- models(gallery): add l3-uncen-merger-omelette-rp-v0.2-8b by @mudler in #2741
- models(gallery): add nymph_8b-i1 by @mudler in #2742
- models(gallery): add smegmma-9b-v1 by @mudler in #2743
- models(gallery): add hathor_tahsin-l3-8b-v0.85 by @mudler in #2762
- models(gallery): add replete-coder-instruct-8b-merged by @mudler in #2782
- models(gallery): add arliai-llama-3-8b-formax-v1.0 by @mudler in #2783
- models(gallery): add smegmma-deluxe-9b-v1 by @mudler in #2784
- models(gallery): add l3-ms-astoria-8b by @mudler in #2785
- models(gallery): add halomaidrp-v1.33-15b-l3-i1 by @mudler in #2786
- models(gallery): add llama-3-patronus-lynx-70b-instruct by @mudler in #2788
- models(gallery): add llamax3 by @mudler in #2849
- models(gallery): add arliai-llama-3-8b-dolfin-v0.5 by @mudler in #2852
- models(gallery): add tiger-gemma-9b-v1-i1 by @mudler in #2853
- feat: models(gallery): add deepseek-v2-lite by @mudler in #2658
- models(gallery): ⬆️ update checksum by @localai-bot in #2860
- models(gallery): add phi-3.1-mini-4k-instruct by @mudler in #2863
- models(gallery): ⬆️ update checksum by @localai-bot in #2887
- models(gallery): add ezo model series (llama3, gemma) by @mudler in #2891
- models(gallery): add l3-8b-niitama-v1 by @mudler in #2895
- models(gallery): add mathstral-7b-v0.1-imat by @mudler in #2901
- models(gallery): add MythicalMaid/EtherealMaid 15b by @mudler in #2902
- models(gallery): add flammenai/Mahou-1.3d-mistral-7B by @mudler in #2903
- models(gallery): add big-tiger-gemma-27b-v1 by @mudler in #2918
- models(gallery): add phillama-3.8b-v0.1 by @mudler in #2920
- models(gallery): add qwen2-wukong-7b by @mudler in #2921
- models(gallery): add einstein-v4-7b by @mudler in #2922
- models(gallery): add gemma-2b-translation-v0.150 by @mudler in #2923
- models(gallery)...
v2.19.0
LocalAI 2.19.0 is out! 📣
TLDR; Summary spotlight
- 🖧 Federated Instances via P2P: LocalAI now supports federated instances with P2P, offering both load-balanced and non-load-balanced options.
- 🎛️ P2P Dashboard: A new dashboard to guide and assist in setting up P2P instances with auto-discovery using shared tokens.
- 🔊 TTS Integration: Text-to-Speech (TTS) is now included in the binary releases.
- 🛠️ Enhanced Installer: The installer script now supports setting up federated instances.
- 📥 Model Pulling: Models can now be pulled directly via URL.
- 🖼️ WebUI Enhancements: Visual improvements and cleanups to the WebUI and model lists.
- 🧠 llama-cpp Backend: The llama-cpp (grpc) backend now supports embedding ( https://localai.io/features/embeddings/#llamacpp-embeddings )
- ⚙️ Tool Support: Small enhancements to tools with disabled grammars.
🖧 LocalAI Federation and AI swarms
LocalAI is revolutionizing the future of distributed AI workloads by making it simpler and more accessible. No more complex setups, Docker or Kubernetes configurations – LocalAI allows you to create your own AI cluster with minimal friction. By auto-discovering and sharing work or weights of the LLM model across your existing devices, LocalAI aims to scale both horizontally and vertically with ease.
How it works?
Starting LocalAI with --p2p
generates a shared token for connecting multiple instances: and that's all you need to create AI clusters, eliminating the need for intricate network setups. Simply navigate to the "Swarm" section in the WebUI and follow the on-screen instructions.
For fully shared instances, initiate LocalAI with --p2p --federated
and adhere to the Swarm section's guidance. This feature, while still experimental, offers a tech preview quality experience.
Federated LocalAI
Launch multiple LocalAI instances and cluster them together to share requests across the cluster. The "Swarm" tab in the WebUI provides one-liner instructions on connecting various LocalAI instances using a shared token. Instances will auto-discover each other, even across different networks.
Check out a demonstration video: Watch now
LocalAI P2P Workers
Distribute weights across nodes by starting multiple LocalAI workers, currently available only on the llama.cpp backend, with plans to expand to other backends soon.
Check out a demonstration video: Watch now
What's Changed
Bug fixes 🐛
- fix: make sure the GNUMake jobserver is passed to cmake for the llama.cpp build by @cryptk in #2697
- Using exec when starting a backend instead of spawning a new process by @a17t in #2720
- fix(cuda): downgrade default version from 12.5 to 12.4 by @mudler in #2707
- fix: Lora loading by @vaaale in #2893
- fix: short-circuit when nodes aren't detected by @mudler in #2909
- fix: do not list txt files as potential models by @mudler in #2910
🖧 P2P area
- feat(p2p): Federation and AI swarms by @mudler in #2723
- feat(p2p): allow to disable DHT and use only LAN by @mudler in #2751
Exciting New Features 🎉
- Allows to remove a backend from the list by @mauromorales in #2721
- ci(Makefile): adds tts in binary releases by @mudler in #2695
- feat: HF
/scan
endpoint by @dave-gray101 in #2566 - feat(model-list): be consistent, skip known files from listing by @mudler in #2760
- feat(models): pull models from urls by @mudler in #2750
- feat(webui): show also models without a config in the welcome page by @mudler in #2772
- feat(install.sh): support federated install by @mudler in #2752
- feat(llama.cpp): support embeddings endpoints by @mudler in #2871
- feat(functions): parse broken JSON when we parse the raw results, use dynamic rules for grammar keys by @mudler in #2912
- feat(federation): add load balanced option by @mudler in #2915
🧠 Models
- models(gallery): ⬆️ update checksum by @localai-bot in #2701
- models(gallery): add l3-8b-everything-cot by @mudler in #2705
- models(gallery): add hercules-5.0-qwen2-7b by @mudler in #2708
- models(gallery): add llama3-8b-darkidol-2.2-uncensored-1048k-iq-imatrix by @mudler in #2710
- models(gallery): add llama-3-llamilitary by @mudler in #2711
- models(gallery): add tess-v2.5-gemma-2-27b-alpha by @mudler in #2712
- models(gallery): add arcee-agent by @mudler in #2713
- models(gallery): add gemma2-daybreak by @mudler in #2714
- models(gallery): add L3-Stheno-Maid-Blackroot-Grand-HORROR-16B-GGUF by @mudler in #2715
- models(gallery): add qwen2-7b-instruct-v0.8 by @mudler in #2717
- models(gallery): add internlm2_5-7b-chat-1m by @mudler in #2719
- models(gallery): add gemma-2-9b-it-sppo-iter3 by @mudler in #2722
- models(gallery): add llama-3_8b_unaligned_alpha by @mudler in #2727
- models(gallery): add l3-8b-lunaris-v1 by @mudler in #2729
- models(gallery): add llama-3_8b_unaligned_alpha_rp_soup-i1 by @mudler in #2734
- models(gallery): add hathor_respawn-l3-8b-v0.8 by @mudler in #2738
- models(gallery): add llama3-8b-instruct-replete-adapted by @mudler in #2739
- models(gallery): add llama-3-perky-pat-instruct-8b by @mudler in #2740
- models(gallery): add l3-uncen-merger-omelette-rp-v0.2-8b by @mudler in #2741
- models(gallery): add nymph_8b-i1 by @mudler in #2742
- models(gallery): add smegmma-9b-v1 by @mudler in #2743
- models(gallery): add hathor_tahsin-l3-8b-v0.85 by @mudler in #2762
- models(gallery): add replete-coder-instruct-8b-merged by @mudler in #2782
- models(gallery): add arliai-llama-3-8b-formax-v1.0 by @mudler in #2783
- models(gallery): add smegmma-deluxe-9b-v1 by @mudler in #2784
- models(gallery): add l3-ms-astoria-8b by @mudler in #2785
- models(gallery): add halomaidrp-v1.33-15b-l3-i1 by @mudler in #2786
- models(gallery): add llama-3-patronus-lynx-70b-instruct by @mudler in #2788
- models(gallery): add llamax3 by @mudler in #2849
- models(gallery): add arliai-llama-3-8b-dolfin-v0.5 by @mudler in #2852
- models(gallery): add tiger-gemma-9b-v1-i1 by @mudler in #2853
- feat: models(gallery): add deepseek-v2-lite by @mudler in #2658
- models(gallery): ⬆️ update checksum by @localai-bot in #2860
- models(gallery): add phi-3.1-mini-4k-instruct by @mudler in #2863
- models(gallery): ⬆️ update checksum by @localai-bot in #2887
- models(gallery): add ezo model series (llama3, gemma) by @mudler in #2891
- models(gallery): add l3-8b-niitama-v1 by @mudler in #2895
- models(gallery): add mathstral-7b-v0.1-imat by @mudler in #2901
- models(gallery): add MythicalMaid/EtherealMaid 15b by @mudler in #2902
- models(gallery): add flammenai/Mahou-1.3d-mistral-7B by @mudler in #2903
- models(gallery): add big-tiger-gemma-27b-v1 by @mudler in #2918
- models(gallery): add phillama-3.8b-v0.1 by @mudler in #2920
- models(gallery): add qwen2-wukong-7b by @mudler in #2921
- models(gallery): add einstein-v4-7b by @mudler in #2922
- models(gallery): add gemma-2b-translation-v0.150 by @mudler in #2923
- models(gallery)...
v2.18.1
What's Changed
Bug fixes 🐛
- fix(talk): identify the model by ID instead of name by @mudler in #2685
- fix(initializer): do select backends that exist by @mudler in #2694
Exciting New Features 🎉
🧠 Models
- models(gallery): add new-dawn-llama by @mudler in #2672
- models(gallery): ⬆️ update checksum by @localai-bot in #2678
- models(gallery): add l3-aethora-15b-v2 by @mudler in #2679
- models(gallery): add bungo-l3-8b-iq-imatrix by @mudler in #2682
- models(gallery): add llama3-8b-darkidol-2.1-uncensored-1048k-iq-imatrix by @mudler in #2686
- models(gallery): add llm-compiler by @mudler in #2684
- models(gallery): add llama3-turbcat-instruct-8b by @mudler in #2687
- models(gallery): ⬆️ update checksum by @localai-bot in #2690
👒 Dependencies
- ⬆️ Update ggerganov/llama.cpp by @localai-bot in #2677
- ⬆️ Update docs version mudler/LocalAI by @localai-bot in #2676
- ⬆️ Update ggerganov/llama.cpp by @localai-bot in #2683
- ⬆️ Update ggerganov/llama.cpp by @localai-bot in #2689
- ⬆️ Update ggerganov/llama.cpp by @localai-bot in #2696
Full Changelog: v2.18.0...v2.18.1
v2.18.0
⭐ Highlights
Here’s a quick overview of what’s new in 2.18.0:
- 🐳 Support for models in OCI registry (includes ollama)
- 🌋 Support for llama.cpp with vulkan (container images only for now)
- 🗣️ the transcription endpoint now can also translate with
translate
- ⚙️ Adds
repeat_last_n
andproperties_order
as model configurations - ⬆️ CUDA 12.5 Upgrade: we are now tracking the latest CUDA version (12.5).
- 💎 Gemma 2 model support!
🐋 Support for OCI Images and Ollama Models
You can now specify models using oci://
and ollama://
prefixes in your YAML config files. Here’s an example for Ollama models:
parameters:
model: ollama://...
Start the Ollama model directly with:
local-ai run ollama://gemma:2b
Or download only the model by using:
local-ai models install ollama://gemma:2b
For standard OCI images, use the oci://
prefix. To build a compatible container image, use docker
for example.
Your Dockerfile should look like this:
FROM scratch
COPY ./my_gguf_file.gguf /
You can actually use it to store also other model types (for instance safetensors files for stable diffusion) and YAML config files !
🌋 Vulkan Support for Llama.cpp
We’ve introduced Vulkan support for Llama.cpp! Check out our new image tags latest-vulkan-ffmpeg-core
and v2.18.0-vulkan-ffmpeg-core
.
🗣️ Transcription and Translation
Our transcription endpoint now supports translation! Simply add translate: true
to your transcription requests to translate the transcription to English.
⚙️ Enhanced Model Configuration
We’ve added new configuration options repeat_last_n
and properties_order
to give you more control. Here’s how you can set them up in your model YAML file:
# Force JSON to return properties in the specified order
function:
grammar:
properties_order: "name,arguments"
And for setting repeat_last_n
(specific to Llama.cpp):
parameters:
repeat_last_n: 64
💎 Gemma 2!
Google has just dropped gemma 2 models (blog post here), you can already install and run gemma 2 models in LocalAI with
local-ai run gemma-2-27b-it
local-ai run gemma-2-9b-it
What's Changed
Bug fixes 🐛
- fix(install.sh): correctly handle systemd service installation by @mudler in #2627
- fix(worker): use dynaload for single binaries by @mudler in #2620
- fix(install.sh): fix version typo by @mudler in #2645
- fix(install.sh): move ARCH detection so it works also for mac by @mudler in #2646
- fix(cli): remove duplicate alias by @mudler in #2654
Exciting New Features 🎉
- feat: Upgrade to CUDA 12.5 by @reneleonhardt in #2601
- feat(oci): support OCI images and Ollama models by @mudler in #2628
- feat(whisper): add translate option by @mudler in #2649
- feat(vulkan): add vulkan support to the llama.cpp backend by @mudler in #2648
- feat(ui): allow to select between all the available models in the chat by @mudler in #2657
- feat(build): only build llama.cpp relevant targets by @mudler in #2659
- feat(options): add
repeat_last_n
by @mudler in #2660 - feat(grammar): expose properties_order by @mudler in #2662
🧠 Models
- models(gallery): add l3-umbral-mind-rp-v1.0-8b-iq-imatrix by @mudler in #2608
- models(gallery): ⬆️ update checksum by @localai-bot in #2607
- models(gallery): add llama-3-sec-chat by @mudler in #2611
- models(gallery): add llama-3-cursedstock-v1.8-8b-iq-imatrix by @mudler in #2612
- models(gallery): add llama3-8b-darkidol-1.1-iq-imatrix by @mudler in #2613
- models(gallery): add magnum-72b-v1 by @mudler in #2614
- models(gallery): add qwen2-1.5b-ita by @mudler in #2615
- models(gallery): add hermes-2-theta-llama-3-70b by @mudler in #2626
- models(gallery): ⬆️ update checksum by @localai-bot in #2630
- models(gallery): add dark-idol-1.2 by @mudler in #2663
- models(gallery): add einstein v7 qwen2 by @mudler in #2664
- models(gallery): add arcee-spark by @mudler in #2665
- models(gallery): add gemma2-9b-it and gemma2-27b-it by @mudler in #2670
📖 Documentation and examples
- docs: update to include installer and update advanced YAML options by @mudler in #2631
- feat(swagger): update swagger by @localai-bot in #2651
- feat(swagger): update swagger by @localai-bot in #2666
- telegram-bot example: Update LocalAI version (fixes #2638) by @greygoo in #2640
👒 Dependencies
- ⬆️ Update docs version mudler/LocalAI by @localai-bot in #2605
- ⬆️ Update ggerganov/llama.cpp by @localai-bot in #2606
- ⬆️ Update ggerganov/llama.cpp by @localai-bot in #2617
- ⬆️ Update ggerganov/llama.cpp by @localai-bot in #2629
- ⬆️ Update ggerganov/llama.cpp by @localai-bot in #2632
- deps(llama.cpp): bump to latest, update build variables by @mudler in #2669
- ⬆️ Update ggerganov/llama.cpp by @localai-bot in #2652
- ⬆️ Update ggerganov/llama.cpp by @localai-bot in #2671
Other Changes
- ci: bump parallel jobs by @mudler in #2633
- chore: fix go.mod module by @sozercan in #2635
- rf: centralize base64 image handling and secscan cleanup by @dave-gray101 in #2595
- refactor: gallery inconsistencies by @mudler in #2647
New Contributors
Full Changelog: v2.17.1...v2.18.0
v2.17.1
Highlights
This is a patch release to address issues with Linux single binary releases. It also adds support for Stable diffusion 3!
Stable diffusion 3
You can use Stable diffusion 3 by installing the model in the gallery (stable-diffusion-3-medium
) or by placing this YAML file in the model folder:
backend: diffusers
diffusers:
cuda: true
enable_parameters: negative_prompt,num_inference_steps
pipeline_type: StableDiffusion3Pipeline
f16: false
name: sd3
parameters:
model: v2ray/stable-diffusion-3-medium-diffusers
step: 25
You can try then generating an image:
http://localhost:9091/v1/images/generations -H "Content-Type: application/json" -d '{
"prompt": "A cute baby sea otter", "model": "sd3"
}
Example result:
What's Changed
Bug fixes 🐛
Exciting New Features 🎉
- feat(sd-3): add stablediffusion 3 support by @mudler in #2591
- feat(talk): display an informative box, better colors by @mudler in #2600
📖 Documentation and examples
- ⬆️ Update docs version mudler/LocalAI by @localai-bot in #2593
👒 Dependencies
- ⬆️ Update ggerganov/llama.cpp by @localai-bot in #2594
Other Changes
- ⬆️ Update ggerganov/llama.cpp by @localai-bot in #2603
Full Changelog: v2.17.0...v2.17.1
v2.17.0
Ahoj! this new release of LocalAI comes with tons of updates, and enhancements behind the scenes!
🌟 Highlights TLDR;
- Automatic identification of GGUF models
- New WebUI page to talk with an LLM!
- https://models.localai.io is live! 🚀
- Better arm64 and Apple silicon support
- More models to the gallery!
- New quickstart installer script
- Enhancements to mixed grammar support
- Major improvements to transformers
- Linux single binary now supports rocm, nvidia, and intel
🤖 Automatic model identification for llama.cpp-based models
Just drop your GGUF files into the model folders, and let LocalAI handle the configurations. YAML files are now reserved for those who love to tinker with advanced setups.
🔊 Talk to your LLM!
Introduced a new page that allows direct interaction with the LLM using audio transcription and TTS capabilities. This feature is so fun - now you can just talk with any LLM with a couple of clicks away.
🍏 Apple single-binary
Experience enhanced support for the Apple ecosystem with a comprehensive single-binary that packs all necessary libraries, ensuring LocalAI runs smoothly on MacOS and ARM64 architectures.
ARM64
Expanded our support for ARM64 with new Docker images and single binary options, ensuring better compatibility and performance on ARM-based systems.
Note: currently we support only arm core images, for instance: localai/localai:master-ffmpeg-core
, localai/localai:latest-ffmpeg-core
, localai/localai:v2.17.0-ffmpeg-core
.
🐞 Bug Fixes and small enhancements
We’ve ironed out several issues, including image endpoint response types and other minor problems, boosting the stability and reliability of our applications. It is now also possible to enable CSRF when starting LocalAI, thanks to @dave-gray101.
🌐 Models and Galleries
Enhanced the model gallery with new additions like Mirai Nova, Mahou, and several updates to existing models ensuring better performance and accuracy.
Now you can check new models also in https://models.localai.io, without running LocalAI!
Installation and Setup
A new install.sh script is now available for quick and hassle-free installations, streamlining the setup process for new users.
curl https://localai.io/install.sh | sh
Installation can be configured with Environment variables, for example:
curl https://localai.io/install.sh | VAR=value sh
List of the Environment Variables:
- DOCKER_INSTALL: Set to "true" to enable the installation of Docker images.
- USE_AIO: Set to "true" to use the all-in-one LocalAI Docker image.
- API_KEY: Specify an API key for accessing LocalAI, if required.
- CORE_IMAGES: Set to "true" to download core LocalAI images.
- PORT: Specifies the port on which LocalAI will run (default is 8080).
- THREADS: Number of processor threads the application should use. Defaults to the number of logical cores minus one.
- VERSION: Specifies the version of LocalAI to install. Defaults to the latest available version.
- MODELS_PATH: Directory path where LocalAI models are stored (default is /usr/share/local-ai/models).
We are looking into improving the installer, and as this is a first iteration any feedback is welcome! Open up an issue if something doesn't work for you!
Enhancements to mixed grammar support
Mixed grammar support continues receiving improvements behind the scenes.
🐍 Transformers backend enhancements
- Temperature = 0 correctly handled as greedy search
- Handles custom words as stop words
- Implement KV cache
- Phi 3 no more requires
trust_remote_code: true
flag
Shout-out to @fakezeta for these enhancements!
Install models with the CLI
Now the CLI can install models directly from the gallery. For instance:
local-ai run <model_name_in gallery>
This command ensures the model is installed in the model folder at startup.
🐧 Linux single binary now supports rocm, nvidia, and intel
Single binaries for Linux now contain Intel, AMD GPU, and NVIDIA support. Note that you need to install the dependencies separately in the system to leverage these features. In upcoming releases, this requirement will be handled by the installer script.
📣 Let's Make Some Noise!
A gigantic THANK YOU to everyone who’s contributed—your feedback, bug squashing, and feature suggestions are what make LocalAI shine. To all our heroes out there supporting other users and sharing their expertise, you’re the real MVPs!
Remember, LocalAI thrives on community support—not big corporate bucks. If you love what we're building, show some love! A shoutout on social (@LocalAI_OSS and @mudler_it on twitter/X), joining our sponsors, or simply starring us on GitHub makes all the difference.
Also, if you haven't yet joined our Discord, come on over! Here's the link: https://discord.gg/uJAeKSAGDy
Thanks a ton, and.. enjoy this release!
What's Changed
Bug fixes 🐛
- fix: gpu fetch device info by @sozercan in #2403
- fix(watcher): do not emit fatal errors by @mudler in #2410
- fix: install pytorch from proper index for hipblas builds by @cryptk in #2413
- fix: pin version of setuptools for intel builds to work around #2406 by @cryptk in #2414
- bugfix: CUDA acceleration not working by @fakezeta in #2475
- fix:
pkg/downloader
should respect basePath forfile://
urls by @dave-gray101 in #2481 - fix: chat webui response parsing by @sozercan in #2515
- fix(stream): do not break channel consumption by @mudler in #2517
- fix(Makefile): enable STATIC on dist by @mudler in #2569
Exciting New Features 🎉
- feat(images): do not install python deps in the core image by @mudler in #2425
- feat(hipblas): extend default hipblas GPU_TARGETS by @mudler in #2426
- feat(build): add arm64 core containers by @mudler in #2421
- feat(functions): allow parallel calls with mixed/no grammars by @mudler in #2432
- feat(image): support
response_type
in the OpenAI API request by @prajwalnayak7 in #2347 - feat(swagger): update swagger by @localai-bot in #2436
- feat(functions): better free string matching, allow to expect strings after JSON by @mudler in #2445
- build(Makefile): add back single target to build native llama-cpp by @mudler in #2448
- feat(functions): allow
response_regex
to be a list by @mudler in #2447 - TTS API improvements by @blob42 in #2308
- feat(transformers): various enhancements to the transformers backend by @fakezeta in #2468
- feat(webui): enhance card visibility by @mudler in #2473
- feat(default): use number of physical cores as default by @mudler in #2483
- feat: fiber CSRF by @dave-gray101 in #2482
- feat(amdgpu): try to build in single binary by @mudler in #2485
- feat:
OpaqueErrors
to hide error information by @dave-gray101 in #2486 - build(intel): bundle intel variants in single-binary by @mudler in #2494
- feat(install): add install.sh for quick installs by @mudler in #2489
- feat(llama.cpp): guess model defaults from file by @mudler in #2522
- feat(ui): add page to talk with voice, transcription, and tts by @mudler in #2520
- feat(arm64): enable single-binary builds by @mudler in #2490
- feat(util): add util command to print GGUF informations by @mudler in #2528
- feat(defaults): add defaults for Command-R models by @mudler in #2529
- feat(detection): detect by template in gguf file, add qwen2, phi, mistral and chatml by @mudler in #2536
- feat(gallery): show available models in website, allow
local-ai models install
to install from galleries by @mudler in #2555 - feat(gallery): uniform download from CLI by @mudler in #2559
- feat(guesser): identify gemma models by @mudler in #2561
- feat(binary): support extracted bundled libs on darwin by @mudler in #2563
- feat(darwin): embed grpc libs by @mudler in #2567
- feat(build): bundle libs for arm64 and x86 linux binaries by @mudler in #2572
- feat(libpath): refactor and expose functions for external library paths by @mudler in #2578
🧠 Models
- models(gallery): add Mirai Nova by @mudler in https://github.com/mudler/LocalAI/pu...