Fix typos (#5766)

omahs · web-flow · commit 61d878cd69ec · 2025-05-12T15:54:23.000+02:00
diff --git a/distribution/lambda/README.md b/distribution/lambda/README.md
@@ -43,11 +43,11 @@ pipenv install
 ### Example stacks
 
 Provided demonstration setups:
-- HDFS example data: index the the [HDFS
+- HDFS example data: index the [HDFS
   dataset](https://quickwit-datasets-public.s3.amazonaws.com/hdfs-logs-multitenants-10000.json)
   by triggering the Quickwit lambda manually.
 - Mock Data generator: start a mock data generator lambda that pushes mock JSON
-  data every X minutes to S3. Those file trigger the Quickwit indexer lambda
+  data every X minutes to S3. Those files trigger the Quickwit indexer lambda
   automatically.
 
 ### Deploy and run
@@ -56,7 +56,7 @@ The Makefile is a useful entrypoint to show how the Lambda deployment can used.
 
 Configure your shell and AWS account:
 ```bash
-# replace with you AWS account ID and preferred region
+# replace with your AWS account ID and preferred region
 export CDK_ACCOUNT=123456789
 export CDK_REGION=us-east-1
 make bootstrap
@@ -131,7 +131,7 @@ SEARCHER_API_KEY=my-at-least-20-char-long-key make deploy-mock-data
 > deployment, the key should be fetched from something like [AWS Secrets
 > Manager](https://docs.aws.amazon.com/cdk/v2/guide/get_secrets_manager_value.html).
 
-Note that the response is always gzipped compressed, regardless the
+Note that the response is always gzipped compressed, regardless of the
 `Accept-Encoding` request header:
 
 ```bash
diff --git a/docs/configuration/index-config.md b/docs/configuration/index-config.md
@@ -225,7 +225,7 @@ The timezone name format specifier (`%Z`) is not supported currently.
 - `unix_timestamp`: parse float and integer numbers to Unix timestamps. Floating-point values are converted to timestamps expressed in seconds. Integer values are converted to Unix timestamps whose precision, determined in `seconds`, `milliseconds`, `microseconds`, or `nanoseconds`, is inferred from the number of input digits. Internally, datetimes are converted to UTC (if the time zone is specified) and stored as *i64* integers. As a result, Quickwit only supports timestamp values ranging from `Apr 13, 1972 23:59:55` to `Mar 16, 2242 12:56:31`.
 
 :::warning
-Converting timestamps from float to integer values may occurs with a loss of precision.
+Converting timestamps from float to integer values may occur with a loss of precision.
 :::
 
 When a `datetime` field is stored as a fast field, the `fast_precision` parameter indicates the precision used to truncate the values before encoding, which improves compression (truncation here means zeroing). The `fast_precision` parameter can take the following values: `seconds`, `milliseconds`, `microseconds`, or `nanoseconds`. It only affects what is stored in fast fields when a `datetime` field is marked as "fast". Finally, operations on `datetime` fast fields, e.g. via aggregations, need to be done at the nanosecond level.
@@ -365,7 +365,7 @@ fast:
 | `description` | Optional description for the field. | `None` |
 | `stored`    | Whether value is stored in the document store | `true` |
 | `indexed`   | Whether value is indexed | `true` |
-| `fast`     | Whether value is stored in a fast field. The default behaviour for text in the JSON is to store the text unchanged. An normalizer can be configured via `normalizer: lowercase`. ([See normalizers](#description-of-available-normalizers)) for a list of available normalizers. | `false` |
+| `fast`     | Whether value is stored in a fast field. The default behaviour for text in the JSON is to store the text unchanged. A normalizer can be configured via `normalizer: lowercase`. ([See normalizers](#description-of-available-normalizers)) for a list of available normalizers. | `false` |
 | `tokenizer` | **Only affects strings in the json object**. Name of the `Tokenizer`, choices between `raw`, `default`, `en_stem` and `chinese_compatible` | `raw` |
 | `record`    | **Only affects strings in the json object**. Describes the amount of information indexed, choices between `basic`, `freq` and `position` | `basic` |
 | `expand_dots`    | If true, json keys containing a `.` should be expanded. For instance, if `expand_dots` is set to true, `{"k8s.node.id": "node-2"}` will be indexed as if it was `{"k8s": {"node": {"id": "node2"}}}`. The benefit is that escaping the `.` will not be required at query time. In other words, `k8s.node.id:node2` will match the document. This does not impact the way the document is stored.  | `true` |
@@ -418,7 +418,7 @@ field_mappings:
 #### concatenate
 
 Quickwit supports mapping the content of multiple fields to a single one. This can be more efficient at query time than
-searching through dozens of `default_search_fields`. It also allow querying inside a json field without knowing the path
+searching through dozens of `default_search_fields`. It also allows querying inside a json field without knowing the path
 to the field being searched.
 
 ```yaml
@@ -438,7 +438,7 @@ At query time, concatenate fields don't support range queries.
 Only the following types are supported inside a concatenate field: text, bool, i64, u64, f64, json. Other types are rejected
 at index creation, or silently discarded during indexation if they are found inside a json field.
 Adding an object field to a concatenate field doesn't automatically add its subfields (yet).
-<!-- typing is made so it wouldn't be too hard do add, as well as things like params_* matching all fields which starts name with params_ , but the feature isn't implemented yet -->
+<!-- typing is made so it wouldn't be too hard to add, as well as things like params_* matching all fields which starts name with params_ , but the feature isn't implemented yet -->
 It isn't possible to add subfields from a json field to a concatenate field. For instance if `attributes` is a json field, it's not possible to add only `attributes.color` to a concatenate field.
 
 For json fields and dynamic fields, the path is not indexed, only values are. For instance, given the following document:
@@ -507,7 +507,7 @@ When the `dynamic_mapping` is set as indexed (default), fields mapped through
 dynamic mode can be searched by targeting the path needed to access them from
 the root of the JSON object.
 
-For instance, in a entirely schemaless settings, a minimal index configuration could be:
+For instance, in an entirely schemaless settings, a minimal index configuration could be:
 
 ```yaml
 version: 0.7
diff --git a/docs/internals/backward-compatibility.md b/docs/internals/backward-compatibility.md
@@ -79,7 +79,7 @@ pub(crate) enum VersionedXXXXXX {
 - run the backward compatibility tests (see below)
 - for older versions, check the diff between the `xxx.expected.modified.json` files created and the matching `xxx.expected.json` files. 
 If the changes are acceptable, replace the content of the `xxx.expected.json` files and commit them.
-- check the the `yyyy.json` that was created for the new version and commit it along with the `yyyy.expected.json` file (identical).
+- check the `yyyy.json` that was created for the new version and commit it along with the `yyyy.expected.json` file (identical).
 - possibly update the generation of the default XXXX instance used for regression. It is in the function `TestableForRegression::sample_for_regression`.
 
 
@@ -117,7 +117,7 @@ the CI will catch it.
 
 #### Adding a new test case.
 
-If the serialization format changes, an new version should be created and the unit test will
+If the serialization format changes, a new version should be created and the unit test will
 automatically add a new unit test generated from the sample tested objects.
 Concretely, it will just write two files `XXXX.json` and `XXXX.expected.json` for each model.
 
diff --git a/docs/reference/aggregation.md b/docs/reference/aggregation.md
@@ -5,7 +5,7 @@ sidebar_position: 30
 
 An aggregation summarizes your data as statistics on buckets or metrics.
 
-Aggregations can provide answer to questions like:
+Aggregations can provide answers to questions like:
 
 - What is the average price of all sold articles?
 - How many errors with status code 500 do we have per day?
@@ -182,7 +182,7 @@ The returned format is currently fixed at `RFC3339`.
 
 By default buckets are returned between the min and max value of the documents, including empty buckets. Setting `min_doc_count > 0` will filter empty buckets.
 
-The value range of the buckets can bet extended via [`extended_bounds`](#extended_bounds) or limit the range via [`hard_bounds`](#hard_bounds).
+The value range of the buckets can be extended via [`extended_bounds`](#extended_bounds) or limit the range via [`hard_bounds`](#hard_bounds).
 
 #### Example
 
@@ -222,7 +222,7 @@ The interval to chunk your data range. Each bucket spans a value range of [0..in
 Intervals implicitly defines an absolute grid of buckets `[interval * k, interval * (k + 1))`.
 Offset makes it possible to shift this grid into `[offset + interval * k, offset + interval (k + 1))`. Offset has to be in the range [0, interval).
 
-As an example, if there are two documents with value 8 and 12 and interval 10.0, they would fall into the buckets with the key 0 and 10. With offset 5 and interval 10, they would both fall into the bucket with they key 5 and the range [5..15)
+As an example, if there are two documents with value 8 and 12 and interval 10.0, they would fall into the buckets with the key 0 and 10. With offset 5 and interval 10, they would both fall into the bucket with the key 5 and the range [5..15)
 
 ```json
 {
@@ -381,7 +381,7 @@ Offset makes it possible to shift this grid into `[offset + interval * k, offset
 
 This is especially useful when using `fixed_interval`, to shift the first bucket e.g. at the start of the year.
 
-The `offset` parameter is has the same syntax as the `fixed_interval` parameter, but also allows for negative values.
+The `offset` parameter has the same syntax as the `fixed_interval` parameter, but also allows for negative values.
 
 ###### **min_doc_count**
 
@@ -508,7 +508,7 @@ term-count.
 
 Even with a larger `shard_size` value, doc_count values for a terms aggregation may be
 approximate. As a result, any sub-aggregations on the terms aggregation may also be approximate.
-`sum_other_doc_count` is the number of documents that didn’t make it into the the top size
+`sum_other_doc_count` is the number of documents that didn’t make it into the top size
 terms. If this is greater than 0, the terms agg had to throw away some
 buckets, either because they didn’t fit into `size` on the root node or they didn’t fit into
 `shard_size` on the leaf node.
@@ -907,7 +907,7 @@ The default value is 2.
 
 ### Sum
 
-A single-value metric aggregation that that sums up numeric values that are that are extracted from the aggregated documents.
+A single-value metric aggregation that sums up numeric values that are that are extracted from the aggregated documents.
 Supported field types are `u64`, `f64`, `i64`, and `datetime`.
 
 **Request**
diff --git a/quickwit/quickwit-actors/src/actor_context.rs b/quickwit/quickwit-actors/src/actor_context.rs
@@ -225,7 +225,7 @@ impl<A: Actor> ActorContext<A> {
     /// If the reply is important, chances are the `.ask(...)` method is
     /// more indicated.
     ///
-    /// Droppping the receiver channel will not cancel the
+    /// Dropping the receiver channel will not cancel the
     /// processing of the message. It is a very common usage.
     /// In fact most actors are expected to send message in a
     /// fire-and-forget fashion.
diff --git a/quickwit/quickwit-control-plane/src/indexing_scheduler/scheduling/mod.rs b/quickwit/quickwit-control-plane/src/indexing_scheduler/scheduling/mod.rs
@@ -474,7 +474,7 @@ fn assign_shards(
     let mut shard_to_indexer: HashMap<ShardId, String> =
         HashMap::with_capacity(missing_shards.len());
 
-    // In a first pass we first assign as many shards on the their hosting nodes as possible.
+    // In a first pass we first assign as many shards on their hosting nodes as possible.
     let mut remaining_missing_shards: Vec<ShardId> = Vec::new();
     for shard_id in missing_shards {
         // As a heuristic, we pick the first node hosting the shard that is available.
diff --git a/quickwit/quickwit-indexing/src/actors/indexing_pipeline.rs b/quickwit/quickwit-indexing/src/actors/indexing_pipeline.rs
@@ -945,7 +945,7 @@ mod tests {
             .process_pending_and_observe()
             .await;
         assert_eq!(obs.generation, 1);
-        // Let's shutdown the indexer, this will trigger the the indexing pipeline failure and the
+        // Let's shutdown the indexer, this will trigger the indexing pipeline failure and the
         // restart.
         let indexer = universe.get::<Indexer>().into_iter().next().unwrap();
         let _ = indexer.ask(Command::Quit).await;
diff --git a/quickwit/quickwit-ingest/src/ingest_v2/router.rs b/quickwit/quickwit-ingest/src/ingest_v2/router.rs
@@ -155,7 +155,7 @@ impl IngestRouter {
     }
 
     /// Inspects the shard table for each subrequest and returns the appropriate
-    /// [`GetOrCreateOpenShardsRequest`] request if open shards do not exist for all the them.
+    /// [`GetOrCreateOpenShardsRequest`] request if open shards do not exist for all of them.
     async fn make_get_or_create_open_shard_request(
         &self,
         workbench: &mut IngestWorkbench,
diff --git a/quickwit/quickwit-ingest/src/lib.rs b/quickwit/quickwit-ingest/src/lib.rs
@@ -83,7 +83,7 @@ pub async fn init_ingest_api(
     Ok(ingest_api_service)
 }
 
-/// Returns the instance of the single IngestApiService via a copy of it's Mailbox.
+/// Returns the instance of the single IngestApiService via a copy of its Mailbox.
 pub async fn get_ingest_api_service(
     queues_dir_path: &Path,
 ) -> anyhow::Result<Mailbox<IngestApiService>> {
diff --git a/quickwit/quickwit-integration-tests/src/test_utils/cluster_sandbox.rs b/quickwit/quickwit-integration-tests/src/test_utils/cluster_sandbox.rs
@@ -165,7 +165,7 @@ impl ClusterSandboxBuilder {
     }
 }
 
-/// Intermediate state where the ports of all the the test cluster nodes have
+/// Intermediate state where the ports of all the test cluster nodes have
 /// been reserved and the configurations have been generated.
 pub struct ResolvedClusterConfig {
     temp_dir: TempDir,
diff --git a/quickwit/quickwit-integration-tests/src/tests/update_tests/restart_indexer_tests.rs b/quickwit/quickwit-integration-tests/src/tests/update_tests/restart_indexer_tests.rs
@@ -123,7 +123,7 @@ async fn test_update_doc_mapping_restart_indexing_pipeline() {
 
     // we ingest again, this might end up with the new or old doc mapping depending on how quickly
     // the pipeline gets killed and restarted (in practice as this cluster is very lightly loaded,
-    // it will almost always kill the pipeline before these documents are commited)
+    // it will almost always kill the pipeline before these documents are committed)
     sandbox
         .rest_client(QuickwitService::Indexer)
         .ingest(
diff --git a/quickwit/quickwit-metastore/README.md b/quickwit/quickwit-metastore/README.md
@@ -7,7 +7,7 @@ locally to test the postgres metastore implementation.
 
 `docker-compose up postgres`
 
-It's data is saved in the tmp directory, and
+Its data is saved in the tmp directory, and
 is not necessarily cleaned up between two runs.
 
 You can execute `make rm-postgres` to remove the
diff --git a/quickwit/quickwit-metastore/src/tests/list_splits.rs b/quickwit/quickwit-metastore/src/tests/list_splits.rs
@@ -1661,8 +1661,8 @@ pub async fn test_metastore_list_splits_from_all_indexes<
                 &split.split_metadata.split_id,
             )
         })
-        // when running this test against a clean database, this line isn't neeeded. In practice,
-        // any test that leaves any split behind breaks this tes tif we remove this filter
+        // when running this test against a clean database, this line isn't needed. In practice,
+        // any test that leaves any split behind breaks this test if we remove this filter
         .filter(|(index_uid, _split_id)| {
             [index_uid_1.clone(), index_uid_2.clone()].contains(index_uid)
         })
diff --git a/quickwit/quickwit-proto/protos/quickwit/search.proto b/quickwit/quickwit-proto/protos/quickwit/search.proto
@@ -128,7 +128,7 @@ message ListFieldsRequest {
   optional int64 start_timestamp = 3;
   optional int64 end_timestamp = 4;
 
-  // Control if the the request will fail if split_ids contains a split that does not exist.
+  // Control if the request will fail if split_ids contains a split that does not exist.
   // optional bool fail_on_missing_index = 6;
 }
 
diff --git a/quickwit/quickwit-search/src/cluster_client.rs b/quickwit/quickwit-search/src/cluster_client.rs
@@ -138,7 +138,7 @@ impl ClusterClient {
         mut client: SearchServiceClient,
     ) -> UnboundedReceiverStream<crate::Result<LeafSearchStreamResponse>> {
         // We need a dedicated channel to send results with retry. First we send only the successful
-        // responses and and ignore errors. If there are some errors, we make one retry and
+        // responses and ignore errors. If there are some errors, we make one retry and
         // in this case we send all results.
         let (result_sender, result_receiver) = unbounded_channel();
         let client_pool = self.search_job_placer.clone();
diff --git a/quickwit/quickwit-search/src/leaf.rs b/quickwit/quickwit-search/src/leaf.rs
@@ -1070,7 +1070,7 @@ impl CanSplitDoBetter {
         match self {
             CanSplitDoBetter::SplitIdHigher(_) => {
                 // In this case there is no sort order, we order by split id.
-                // If the the first split has enough documents, we can convert the other queries to
+                // If the first split has enough documents, we can convert the other queries to
                 // count only queries
                 for (_split, request) in split_with_req.iter_mut().skip(min_required_splits) {
                     disable_search_request_hits(request);
diff --git a/quickwit/quickwit-serve/src/elasticsearch_api/rest_handler.rs b/quickwit/quickwit-serve/src/elasticsearch_api/rest_handler.rs
@@ -151,7 +151,7 @@ pub fn es_compat_stats_handler(
 }
 
 /// Check if the parameter is a known query parameter to reject
-fn is_unsuppported_qp(param: &str) -> bool {
+fn is_unsupported_qp(param: &str) -> bool {
     ["wait_for_status", "timeout", "level"].contains(&param)
 }
 
@@ -180,7 +180,7 @@ async fn es_compat_cluster_health(
     query_params: HashMap<String, String>,
     cluster: Cluster,
 ) -> impl warp::Reply {
-    if let Some(invalid_param) = query_params.keys().find(|key| is_unsuppported_qp(key)) {
+    if let Some(invalid_param) = query_params.keys().find(|key| is_unsupported_qp(key)) {
         let error_body = warp::reply::json(&json!({
             "error": "Unsupported parameter.",
             "param": invalid_param
@@ -881,7 +881,7 @@ async fn es_compat_index_multi_search(
                         append_shard_doc,
                         _source_excludes,
                         _source_includes,
-                        true, //< allow_partial_results. Set to to true to match ES's behavior.
+                        true, //< allow_partial_results. Set to true to match ES's behavior.
                     )?;
                 search_response_rest.took = elapsed.as_millis() as u32;
                 Ok::<_, ElasticsearchError>(search_response_rest)
@@ -1037,7 +1037,7 @@ fn convert_to_es_search_response(
         },
         aggregations,
         scroll_id: resp.scroll_id,
-        // There is not concept of shards here, but use this to convey split search failures.
+        // There is no concept of shards here, but use this to convey split search failures.
         shards: ShardStatistics {
             total: num_total_splits,
             successful: num_successful_splits,
diff --git a/quickwit/quickwit-serve/src/metrics_api.rs b/quickwit/quickwit-serve/src/metrics_api.rs
@@ -22,7 +22,7 @@ use warp::reply::with_status;
 /// other bits of information attached.
 ///
 /// If a crate plans to encompass different schemas, handlers, etc...
-/// Then it should have it's own specific API group.
+/// Then it should have its own specific API group.
 pub struct MetricsApi;
 
 #[utoipa::path(

Original file line number	Diff line number	Diff line change
`@@ -155,7 +155,7 @@ impl IngestRouter {`
`155`	`155`	`}`
`156`	`156`
`157`	`157`	`/// Inspects the shard table for each subrequest and returns the appropriate`
`158`		- /// [`GetOrCreateOpenShardsRequest`] request if open shards do not exist for all the them.
	`158`	+ /// [`GetOrCreateOpenShardsRequest`] request if open shards do not exist for all of them.
`159`	`159`	`async fn make_get_or_create_open_shard_request(`
`160`	`160`	`&self,`
`161`	`161`	`workbench: &mut IngestWorkbench,`
Original file line number	Diff line number	Diff line change
`@@ -83,7 +83,7 @@ pub async fn init_ingest_api(`
`83`	`83`	`Ok(ingest_api_service)`
`84`	`84`	`}`
`85`	`85`
`86`		`-/// Returns the instance of the single IngestApiService via a copy of it's Mailbox.`
	`86`	`+/// Returns the instance of the single IngestApiService via a copy of its Mailbox.`
`87`	`87`	`pub async fn get_ingest_api_service(`
`88`	`88`	`queues_dir_path: &Path,`
`89`	`89`	`) -> anyhow::Result<Mailbox<IngestApiService>> {`
Original file line number	Diff line number	Diff line change
`@@ -165,7 +165,7 @@ impl ClusterSandboxBuilder {`
`165`	`165`	`}`
`166`	`166`	`}`
`167`	`167`
`168`		`-/// Intermediate state where the ports of all the the test cluster nodes have`
	`168`	`+/// Intermediate state where the ports of all the test cluster nodes have`
`169`	`169`	`/// been reserved and the configurations have been generated.`
`170`	`170`	`pub struct ResolvedClusterConfig {`
`171`	`171`	`temp_dir: TempDir,`