openvinotoolkit · ilya-lavrenov · Apr 4, 2025 · Mar 25, 2025 · Mar 25, 2025 · Mar 25, 2025
diff --git a/site/docs/getting-started/introduction.mdx b/site/docs/getting-started/introduction.mdx
@@ -15,7 +15,7 @@ This library is friendly to PC and laptop execution, and optimized for resource
 
 ## Key Features and Benefits
 
-- **📦 Pre-built Generative AI Pipelines:** Ready-to-use pipelines for text generation (LLMs), image generation (Diffuser-based), speech processing (Whisper), and visual language models (VLMs). See all [supported use cases](/docs/category/use-cases).
+- **📦 Pre-built Generative AI Pipelines:** Ready-to-use pipelines for text generation (LLMs), image generation (Diffuser-based), speech recognition (Whisper), and visual language models (VLMs). See all [supported use cases](/docs/category/use-cases).
 - **👣 Minimal Footprint:** Smaller binary size and reduced memory footprint compared to other frameworks.
 - **🚀 Performance Optimization:** Hardware-specific optimizations for CPU, GPU, and NPU devices.
 - **👨‍💻 Programming Language Support:** Comprehensive APIs in both Python and C++.

diff --git a/site/docs/guides/streaming.mdx b/site/docs/guides/streaming.mdx
@@ -7,7 +7,7 @@ sidebar_position: 3
 For more interactive UIs during generation, you can stream output tokens.
 
 :::info
-Streaming is supported for both `LLMPipeline` and `VLMPipeline`.
+Streaming is supported for `LLMPipeline`, `VLMPipeline` and `WhisperPipeline`.
 :::
 
 ## Streaming Function
@@ -18,6 +18,7 @@ In this example, a function outputs words to the console immediately upon genera
     <TabItemPython>
         ```python showLineNumbers
         import openvino_genai as ov_genai
+
         pipe = ov_genai.LLMPipeline(model_path, "CPU")
 
         # highlight-start
@@ -86,6 +87,7 @@ You can also create your custom streamer for more sophisticated processing:
     <TabItemPython>
         ```python showLineNumbers
         import openvino_genai as ov_genai
+
         pipe = ov_genai.LLMPipeline(model_path, "CPU")
 
         # highlight-start
@@ -95,8 +97,8 @@ You can also create your custom streamer for more sophisticated processing:
                 super().__init__()
                 # Initialization logic.
 
-            def write(self, token_id) -> bool:
-                # Custom decoding/tokens processing logic.
+            def write(self, token: int | list[int]) -> ov_genai.StreamingStatus:
+                # Custom processing logic for new decoded token(s).
 
                 # Return flag corresponds whether generation should be stopped.
                 return ov_genai.StreamingStatus.RUNNING
@@ -130,8 +132,15 @@ You can also create your custom streamer for more sophisticated processing:
         // Create custom streamer class
         class CustomStreamer: public ov::genai::StreamerBase {
         public:
-            bool write(int64_t token) {
-                // Custom decoding/tokens processing logic.
+            ov::genai::StreamingStatus write(int64_t token) {
+                // Custom processing logic for new decoded token.
+
+                // Return flag corresponds whether generation should be stopped.
+                return ov::genai::StreamingStatus::RUNNING;
+            };
+
+            ov::genai::StreamingStatus write(const std::vector<int64_t>& tokens) {
+                // Custom processing logic for new vector of decoded tokens.
 
                 // Return flag corresponds whether generation should be stopped.
                 return ov::genai::StreamingStatus::RUNNING;
@@ -168,5 +177,5 @@ You can also create your custom streamer for more sophisticated processing:
 </LanguageTabs>
 
 :::info
-For fully implemented iterable `CustomStreamer` refer to [multinomial_causal_lm](https://github.com/openvinotoolkit/openvino.genai/blob/releases/2025/0/samples/python/text_generation/multinomial_causal_lm.py) sample.
+For fully implemented iterable `CustomStreamer` refer to [multinomial_causal_lm](https://github.com/openvinotoolkit/openvino.genai/blob/master/samples/python/text_generation/multinomial_causal_lm.py) sample.
 :::
diff --git a/site/docs/supported-models/index.mdx b/site/docs/supported-models/index.mdx
@@ -65,7 +65,7 @@ pip install timm einops
     ```
 :::
 
-## Speech Processing Models (Whisper-based)
+## Speech Recognition Models (Whisper-based)
 
 <WhisperModelsTable />
 

diff --git a/site/docs/use-cases/_shared/_basic_generation_configuration.mdx b/site/docs/use-cases/_shared/_basic_generation_configuration.mdx
@@ -1,12 +1,5 @@
 #### Basic Generation Configuration
 
-1. Get the model default config with `get_generation_config()`
-2. Modify parameters
-3. Apply the updated config using one of the following methods:
-    - Use `set_generation_config(config)`
-    - Pass config directly to `generate()` (e.g. `generate(prompt, config)`)
-    - Specify options as inputs in the `generate()` method (e.g. `generate(prompt, max_new_tokens=100)`)
-
 {/* Python and C++ code examples */}
 {props.children}
 
@@ -21,6 +14,6 @@
 - `top_p`: Selects from the smallest set of tokens whose cumulative probability exceeds p. Helps balance diversity and quality.
 - `repetition_penalty`: Reduces the likelihood of repeating tokens. Values above 1.0 discourage repetition.
 
-For the full list of generation parameters, refer to the [API reference](https://docs.openvino.ai/2025/api/genai_api/_autosummary/openvino_genai.GenerationConfig.html#openvino-genai-generationconfig).
+For the full list of generation parameters, refer to the [Generation Config API](https://docs.openvino.ai/2025/api/genai_api/_autosummary/openvino_genai.GenerationConfig.html#openvino-genai-generationconfig).
 
 :::
diff --git a/site/docs/use-cases/_shared/_beam_search_generation.mdx b/site/docs/use-cases/_shared/_beam_search_generation.mdx
@@ -0,0 +1,17 @@
+#### Optimizing Generation with Grouped Beam Search
+
+Beam search helps explore multiple possible text completions simultaneously, often leading to higher quality outputs.
+
+{/* Python and C++ code examples */}
+{props.children}
+
+:::info Understanding Beam Search Generation Parameters
+
+- `max_new_tokens`: The maximum numbers of tokens to generate, excluding the number of tokens in the prompt. `max_new_tokens` has priority over `max_length`.
+- `num_beams`: The number of beams for beam search. 1 disables beam search.
+- `num_beam_groups`: The number of groups to divide `num_beams` into in order to ensure diversity among different groups of beams.
+- `diversity_penalty`: value is subtracted from a beam's score if it generates the same token as any beam from other group at a particular time.
+
+For the full list of generation parameters, refer to the [Generation Config API](https://docs.openvino.ai/2025/api/genai_api/_autosummary/openvino_genai.GenerationConfig.html#openvino-genai-generationconfig).
+
+:::
diff --git a/site/docs/use-cases/_shared/_generation_configuration_workflow.mdx b/site/docs/use-cases/_shared/_generation_configuration_workflow.mdx
@@ -0,0 +1,8 @@
+#### Generation Configuration Workflow
+
+1. Get the model default config with `get_generation_config()`
+2. Modify parameters
+3. Apply the updated config using one of the following methods:
+    - Use `set_generation_config(config)`
+    - Pass config directly to `generate()` (e.g. `generate(prompt, config)`)
+    - Specify options as inputs in the `generate()` method (e.g. `generate(prompt, max_new_tokens=100)`)
diff --git a/site/docs/use-cases/image-generation/_sections/_usage_options/index.mdx b/site/docs/use-cases/image-generation/_sections/_usage_options/index.mdx
@@ -1,3 +1,5 @@
+import GenerationConfigurationWorkflow from '@site/docs/use-cases/_shared/_generation_configuration_workflow.mdx';
+
 ## Additional Usage Options
 
 :::tip
@@ -6,6 +8,10 @@ Check out [Python](https://github.com/openvinotoolkit/openvino.genai/tree/master
 
 ### Use Different Generation Parameters
 
+<GenerationConfigurationWorkflow />
+
+#### Image Generation Configuration
+
 You can adjust several parameters to control the image generation process, including dimensions and the number of inference steps:
 
 <LanguageTabs>
@@ -65,7 +71,7 @@ You can adjust several parameters to control the image generation process, inclu
 - `guidance_scale`: Balances prompt adherence vs. creativity. Higher values follow prompt more strictly, lower values allow more creative freedom.
 - `rng_seed`: Controls randomness for reproducible results. Same seed produces identical images across runs.
 
-For the full list of generation parameters, refer to the [API reference](https://docs.openvino.ai/2025/api/genai_api/_autosummary/openvino_genai.ImageGenerationConfig.html).
+For the full list of generation parameters, refer to the [Image Generation Config API](https://docs.openvino.ai/2025/api/genai_api/_autosummary/openvino_genai.ImageGenerationConfig.html).
 
 :::
 

diff --git a/site/docs/use-cases/image-processing/_sections/_usage_options/index.mdx b/site/docs/use-cases/image-processing/_sections/_usage_options/index.mdx
@@ -1,5 +1,6 @@
 import BasicGenerationConfiguration from '@site/docs/use-cases/_shared/_basic_generation_configuration.mdx';
 import ChatScenario from '@site/docs/use-cases/_shared/_chat_scenario.mdx';
+import GenerationConfigurationWorkflow from '@site/docs/use-cases/_shared/_generation_configuration_workflow.mdx';
 import Streaming from '@site/docs/use-cases/_shared/_streaming.mdx';
 
 ## Additional Usage Options
@@ -12,11 +13,14 @@ Check out [Python](https://github.com/openvinotoolkit/openvino.genai/tree/master
 
 Similar to [text generation](/docs/use-cases/text-generation/#use-different-generation-parameters), VLM pipelines support various generation parameters to control the text output.
 
+<GenerationConfigurationWorkflow />
+
 <BasicGenerationConfiguration>
   <LanguageTabs>
       <TabItemPython>
           ```python
           import openvino_genai as ov_genai
+
           pipe = ov_genai.VLMPipeline(model_path, "CPU")
 
           # Get default configuration

diff --git a/site/docs/use-cases/speech-processing.md b/site/docs/use-cases/speech-processing.md
diff --git a/site/docs/use-cases/speech-recognition/_sections/_run_model/_code_example_cpp.mdx b/site/docs/use-cases/speech-recognition/_sections/_run_model/_code_example_cpp.mdx
@@ -0,0 +1,19 @@
+import CodeBlock from '@theme/CodeBlock';
+
+<CodeBlock language="cpp" showLineNumbers>
+{`#include "openvino/genai/whisper_pipeline.hpp"
+#include "audio_utils.hpp"
+#include <iostream>
+
+int main(int argc, char* argv[]) {
+    std::filesystem::path models_path = argv[1];
+    std::string wav_file_path = argv[2];
+
+    ov::genai::RawSpeechInput raw_speech = utils::audio::read_wav(wav_file_path);
+
+    ov::genai::WhisperPipeline pipe(models_path, "${props.device || 'CPU'}");
+    auto result = pipe.generate(raw_speech, ov::genai::max_new_tokens(100));
+    std::cout << result << std::endl;
+}
+`}
+</CodeBlock>
diff --git a/...docs/use-cases/speech-recognition/_sections/_run_model/_code_example_python.mdx b/...docs/use-cases/speech-recognition/_sections/_run_model/_code_example_python.mdx
@@ -0,0 +1,17 @@
+import CodeBlock from '@theme/CodeBlock';
+
+<CodeBlock language="python" showLineNumbers>
+{`import openvino_genai as ov_genai
+import librosa
+
+def read_wav(filepath):
+    raw_speech, samplerate = librosa.load(filepath, sr=16000)
+    return raw_speech.tolist()
+
+raw_speech = read_wav('sample.wav')
+
+pipe = ov_genai.WhisperPipeline(model_path, "${props.device || 'CPU'}")
+result = pipe.generate(raw_speech, max_new_tokens=100)
+print(result)
+`}
+</CodeBlock>
diff --git a/site/docs/use-cases/speech-recognition/_sections/_run_model/index.mdx b/site/docs/use-cases/speech-recognition/_sections/_run_model/index.mdx
@@ -0,0 +1,41 @@
+import CodeExampleCPP from './_code_example_cpp.mdx';
+import CodeExamplePython from './_code_example_python.mdx';
+
+## Run Model Using OpenVINO GenAI
+
+OpenVINO GenAI introduces the [`WhisperPipeline`](https://docs.openvino.ai/2025/api/genai_api/_autosummary/openvino_genai.WhisperPipeline.html) pipeline for inference of speech recognition Whisper models.
+You can construct it straight away from the folder with the converted model.
+It will automatically load the model, tokenizer, detokenizer and default generation configuration.
+
+:::info
+`WhisperPipeline` expects normalized audio files in WAV format at sampling rate of 16 kHz as input.
+:::
+
+<LanguageTabs>
+    <TabItemPython>
+        <Tabs groupId="device">
+            <TabItem label="CPU" value="cpu">
+                <CodeExamplePython device="CPU" />
+            </TabItem>
+            <TabItem label="GPU" value="gpu">
+                <CodeExamplePython device="GPU" />
+            </TabItem>
+        </Tabs>
+    </TabItemPython>
+    <TabItemCpp>
+        <Tabs groupId="device">
+            <TabItem label="CPU" value="cpu">
+                <CodeExampleCPP device="CPU" />
+            </TabItem>
+            <TabItem label="GPU" value="gpu">
+                <CodeExampleCPP device="GPU" />
+            </TabItem>
+        </Tabs>
+    </TabItemCpp>
+</LanguageTabs>
+
+:::tip
+
+Use CPU or GPU as devices without any other code change.
+
+:::
-Original file line number
+Diff line change
@@ Expand Up / @@ -65,7 +65,7 @@ pip install timm einops @@
         ```
     :::
-    ## Speech Processing Models (Whisper-based)
+    ## Speech Recognition Models (Whisper-based)
     <WhisperModelsTable />
@@ Expand Down @@