@@ -34,7 +34,7 @@ TensorRT LLM classifies APIs into two categories:
3434All API schemas are:
3535- Stored as YAML files in the codebase
3636- Protected by unit tests in ` tests/unittest/api_stability/ `
37- - Automatically validated to ensure consistency
37+ - Automatically validated to ensure consistency
3838
3939## API Change Principles
4040
@@ -44,22 +44,22 @@ All API schemas are:
4444
4545Argument names should describe what the argument represents, not how it is used internally.
4646
47- ✅ ** Good** : ` max_new_tokens ` (clear meaning)
47+ ✅ ** Good** : ` max_new_tokens ` (clear meaning)
4848❌ ** Bad** : ` num ` (ambiguous)
4949
5050** Reflect Argument Type and Granularity**
5151
5252- For ** boolean** knobs, prefix with verbs like ` enable_ ` and so on.
5353 Examples: ` enable_cache ` , ` enable_flash_attention `
5454
55- - For ** numerical threshold** knobs, suffix with ` _limit ` , ` _size ` , ` _count ` , ` _len_ ` or ` _ratio `
55+ - For ** numerical threshold** knobs, suffix with ` _limit ` , ` _size ` , ` _count ` , ` _len_ ` or ` _ratio `
5656 Examples: ` max_seq_len ` , ` prefill_batch_size `
5757
5858** Avoid Redundant Prefixes**
5959
6060Example (in ` MoeConfig ` ):
6161
62- ✅ ** Good** : ` backend `
62+ ✅ ** Good** : ` backend `
6363❌ ** Bad** : ` moe_backend ` (redundant since it's already in ` MoeConfig ` )
6464
6565** Use Specific Names for Narrow Scenarios**
@@ -68,7 +68,7 @@ When adding knobs for specific use cases, make the name convey the restriction c
6868
6969Example (argument to the LLM class):
7070
71- ✅ ** Good** : ` rope_scaling_factor ` → clearly indicates it's for RoPE
71+ ✅ ** Good** : ` rope_scaling_factor ` → clearly indicates it's for RoPE
7272❌ ** Bad** : ` scaling_factor ` → too generic and prone to misuse
7373
7474### 2. Hierarchical Configuration
@@ -77,13 +77,13 @@ Organize complex or hierarchical arguments into **dedicated configuration datacl
7777
7878** Guidelines**
7979
80- - Use the ` XxxConfig ` suffix consistently
80+ - Use the ` XxxConfig ` suffix consistently
8181 Examples: ` ModelConfig ` , ` ParallelConfig ` , ` MoeConfig `
82-
83- - ** Reflect conceptual hierarchy**
82+
83+ - ** Reflect conceptual hierarchy**
8484 The dataclass name should represent a coherent functional unit, not an arbitrary grouping
85-
86- - ** Avoid over-nesting**
85+
86+ - ** Avoid over-nesting**
8787 Use only one level of configuration hierarchy whenever possible (e.g., ` LlmArgs → ParallelConfig ` ) to balance readability and modularity
8888
8989### 3. Prefer ` LlmArgs ` Over Environment Variables
@@ -154,15 +154,15 @@ garbage_collection_gen0_threshold: int = Field(
154154
155155Add the field to the appropriate schema file:
156156
157- - ** Non-committed arguments** : ` tests/unittest/api_stability/references/llm_args .yaml `
157+ - ** Non-committed arguments** : ` tests/unittest/api_stability/references/llm .yaml `
158158 ``` yaml
159159 garbage_collection_gen0_threshold :
160160 type : int
161161 default : 20000
162162 status : beta # Must match the status in code
163163 ` ` `
164164
165- - **Committed arguments**: ` tests/unittest/api_stability/references_committed/llm_args .yaml`
165+ - **Committed arguments**: ` tests/unittest/api_stability/references_committed/llm .yaml`
166166 ` ` ` yaml
167167 garbage_collection_gen0_threshold:
168168 type: int
@@ -196,16 +196,16 @@ For non-committed APIs, use the `@set_api_status` decorator:
196196` ` ` python
197197@set_api_status("beta")
198198def generate_with_streaming(
199- self,
200- prompts: List[str],
199+ self,
200+ prompts: List[str],
201201 **kwargs
202202) -> Iterator[GenerationOutput]:
203203 """Generate text with streaming output.
204-
204+
205205 Args:
206206 prompts: Input prompts for generation
207207 **kwargs: Additional generation parameters
208-
208+
209209 Returns:
210210 Iterator of generation outputs
211211 """
0 commit comments