The Enterprise AI includes XIM (Xeon Inference Microservice) and scalable cloud native framework which is part of OPEA(Open Platform Enterprise AI).
Xeon Inference Microservice (XIM) is a scalable and stateless container service exposing standard resful APIs. It allow Intel accelerators to optimize the inference engine and customized model for AIGC workload.
| Layer name | Description |
|---|---|
| Accelerators | A XIM could be optimized by any of Intel Accelerators like AMX/VNNI/AVX512 etc |
| Optimized Engine | Intel provide many engine for different purposes like OneAPI, xFT, IPEX |
| Models | A model can be customized for xFT format in different Quantization like BF16/INT8/FP4 etc |
| Microservices | A container services with stateless design to support scalable ochrestartion |
| API | LangChain/LlamaIndex and existing vendor like OpenAI provide industrial standard restfule API to expsoe service |
Please refer here for more details.
More Business pipeline please refer to OPEA's GenAIExamples
| Name | Description | Registry |
|---|---|---|
| ASR (whisper) | Auto Speech Recognition | registry.cn-hangzhou.aliyuncs.com/kenplusplus/whisper-server |
| ASR + Diarize (whisperx) | Speech Recognition + Speaker Recognition | registry.cn-hangzhou.aliyuncs.com/kenplusplus/whisperx-server |
| ASR (fast-whisper) | Accelerated ASR | registry.cn-hangzhou.aliyuncs.com/kenplusplus/faster-whisper-server |
| FastChat | AMX opted IPEX based LLM | registry.cn-hangzhou.aliyuncs.com/kenplusplus/fastchat-server |
| TTS (OpenVoice) | Text to Speech | registry.cn-hangzhou.aliyuncs.com/kenplusplus/openvoice-server |
| TTS (OpenTTS) | Text to Speech | registry.cn-hangzhou.aliyuncs.com/kenplusplus/opentts-server |
Following models are used:
| Name | Size | Micro Services | Description |
|---|---|---|---|
| THUDM/chatglm2-6b | 12G | FastChat | LLM model |
| Trelis/Llama-2-7b-chat-hf-shared-bf16 | 25G | FastChat | LLM model using BF16 for AMX |
| lmsys/vicuna-7b-v1.3 | 13.5G | FastChat | LLM model using INT8 for VNNI |
| Systran/faster-whisper-tiny | 75M | faster-whisper | Speech Recognition model |
| pyannote/speaker-diarization-3.1 | 14M | whisperx-server | Speaker Diarize |
| pyannote/segmentation-3.0 | 5.8M | whisperx-server | Speech Segmentation |
| jonatasgrosman/wav2vec2-large-xlsr-53-chinese-zh-cn | 2.4G | whisperx-server | Chinese Speech to vector |
| pyannote/wespeaker-voxceleb-resnet34-LM | 51M | whisperx-server | Extract embedding |
| silero-vad | 17M | openvoice-server | Voice Activity Detector |
| whisper(small) | 244M | whisper-server | OpenAI whisper model |
TBD
TBD
TBD
TBD
TBD
TBD
TBD



