[Doc] Files() supports NAS (StarRocks#54030)

Signed-off-by: 絵空事スピリット <[email protected]>
pinterest · Dec 17, 2024 · 82ef268 · 82ef268
1 parent e78d2f6
commit 82ef268
Show file tree

Hide file tree

Showing 6 changed files with 172 additions and 10 deletions.
diff --git a/docs/en/loading/loading_introduction/feature-support-loading-and-unloading.md b/docs/en/loading/loading_introduction/feature-support-loading-and-unloading.md
@@ -37,7 +37,7 @@ This document outlines the features of various data loading and unloading method
     </tr>
     <tr>
         <td>INSERT from FILES</td>
-        <td rowspan="2">HDFS, S3, OSS, Azure, GCS</td>
+        <td rowspan="2">HDFS, S3, OSS, Azure, GCS, NFS(NAS) [5]</td>
         <td>Yes (v3.3+)</td>
         <td>To be supported</td>
         <td>Yes (v3.1+)</td>
@@ -104,6 +104,8 @@ This document outlines the features of various data loading and unloading method
 
 [4]\: Currently, only INSERT from FILES is supported for loading with PIPE.
 
+[5]\: You need to mount a NAS device as NFS under the same directory of each BE or CN node to access the files in NFS via the `file://` protocol.
+
 :::
 
 #### JSON CDC formats
@@ -159,7 +161,7 @@ This document outlines the features of various data loading and unloading method
     <tr>
         <td>INSERT INTO FILES</td>
         <td>N/A</td>
-        <td>HDFS, S3, OSS, Azure, GCS</td>
+        <td>HDFS, S3, OSS, Azure, GCS, NFS(NAS) [3]</td>
         <td>Yes (v3.3+)</td>
         <td>To be supported</td>
         <td>Yes (v3.2+)</td>
@@ -208,6 +210,8 @@ This document outlines the features of various data loading and unloading method
 
 [2]\: Currently, unloading data using PIPE is not supported.
 
+[3]\: You need to mount a NAS device as NFS under the same directory of each BE or CN node to access the files in NFS via the `file://` protocol.
+
 :::
 
 ## File format-related parameters

diff --git a/docs/en/sql-reference/sql-functions/table-functions/files.md b/docs/en/sql-reference/sql-functions/table-functions/files.md
@@ -19,6 +19,7 @@ Currently, the FILES() function supports the following data sources and file for
   - Google Cloud Storage
   - Other S3-compatible storage system
   - Microsoft Azure Blob Storage
+  - NFS(NAS)
 - **File formats:**
   - Parquet
   - ORC
@@ -90,6 +91,19 @@ The URI used to access the files. You can specify a path or a file.
     -- Example: "path" = "wasbs://[email protected]/path/file.parquet"
     ```
 
+- To access NFS(NAS):
+
+  ```SQL
+  "path" = "file:///<absolute_path>"
+  -- Example: "path" = "file:///home/ubuntu/parquetfile/file.parquet"
+  ```
+
+  :::note
+
+  To access the files in NFS via the `file://` protocol, you need to mount a NAS device as NFS under the same directory of each BE or CN node.
+
+  :::
+
 #### data_format
 
 The format of the data file. Valid values: `parquet`, `orc`, and `csv`.
@@ -448,6 +462,15 @@ SELECT * FROM FILES(
 2 rows in set (22.335 sec)
 ```
 
+Query the data from the Parquet files in NFS(NAS):
+
+```SQL
+SELECT * FROM FILES(
+  'path' = 'file:///home/ubuntu/parquetfile/*.parquet', 
+  'format' = 'parquet'
+);
+```
+
 #### Example 2: Insert the data rows from a file
 
 Insert the data rows from the Parquet file **parquet/insert_wiki_edit_append.parquet** within the AWS S3 bucket `inserttest` into the table `insert_wiki_edit`:
@@ -465,6 +488,18 @@ Query OK, 2 rows affected (23.03 sec)
 {'label':'insert_d8d4b2ee-ac5c-11ed-a2cf-4e1110a8f63b', 'status':'VISIBLE', 'txnId':'2440'}
 ```
 
+Insert the data rows from the CSV files in NFS(NAS) into the table `insert_wiki_edit`:
+
+```SQL
+INSERT INTO insert_wiki_edit
+  SELECT * FROM FILES(
+    'path' = 'file:///home/ubuntu/csvfile/*.csv', 
+    'format' = 'csv', 
+    'csv.column_separator' = ',', 
+    'csv.row_delimiter' = '\n'
+  );
+```
+
 #### Example 3: CTAS with data rows from a file
 
 Create a table named `ctas_wiki_edit` and insert the data rows from the Parquet file **parquet/insert_wiki_edit_append.parquet** within the AWS S3 bucket `inserttest` into the table:
@@ -657,8 +692,7 @@ DESC FILES(
 Unload all data rows in `sales_records` as multiple Parquet files under the path **/unload/partitioned/** in the HDFS cluster. These files are stored in different subpaths distinguished by the values in the column `sales_time`.
 
 ```SQL
-INSERT INTO 
-FILES(
+INSERT INTO FILES(
     "path" = "hdfs://xxx.xx.xxx.xx:9000/unload/partitioned/",
     "format" = "parquet",
     "hadoop.security.authentication" = "simple",
@@ -669,3 +703,23 @@ FILES(
 )
 SELECT * FROM sales_records;
 ```
+
+Unload the query results into CSV and Parquet files in NFS(NAS):
+
+```SQL
+-- CSV
+INSERT INTO FILES(
+    'path' = 'file:///home/ubuntu/csvfile/', 
+    'format' = 'csv', 
+    'csv.column_separator' = ',', 
+    'csv.row_delimitor' = '\n'
+)
+SELECT * FROM sales_records;
+
+-- Parquet
+INSERT INTO FILES(
+    'path' = 'file:///home/ubuntu/parquetfile/',
+    'format' = 'parquet'
+)
+SELECT * FROM sales_records;
+```
diff --git a/docs/en/unloading/unload_using_insert_into_files.md b/docs/en/unloading/unload_using_insert_into_files.md
@@ -12,7 +12,7 @@ Compared to other data export methods supported by StarRocks, unloading data wit
 
 > **NOTE**
 >
-> Please note that unloading data with INSERT INTO FILES does not support exporting data into local file systems.
+> Please note that unloading data with INSERT INTO FILES does not support directly exporting data into local file systems. However, you can export the data into local files by using NFS. See [Unload to local files using NFS](#unload-to-local-files-using-nfs).
 
 ## Preparation
 
@@ -201,6 +201,30 @@ FILES(
 SELECT * FROM sales_records;
 ```
 
+### Unload to local files using NFS
+
+To access the files in NFS via the `file://` protocol, you need to mount a NAS device as NFS under the same directory of each BE or CN node.
+
+Example:
+
+```SQL
+-- Unload data into CSV files.
+INSERT INTO FILES(
+  'path' = 'file:///home/ubuntu/csvfile/', 
+  'format' = 'csv', 
+  'csv.column_separator' = ',', 
+  'csv.row_delimitor' = '\n'
+)
+SELECT * FROM sales_records;
+
+-- Unload data into Parquet files.
+INSERT INTO FILES(
+  'path' = 'file:///home/ubuntu/parquetfile/',
+   'format' = 'parquet'
+)
+SELECT * FROM sales_records;
+```
+
 ## See also
 
 - For more instructions on the usage of INSERT, see [SQL reference - INSERT](../sql-reference/sql-statements/loading_unloading/INSERT.md).

diff --git a/docs/zh/loading/loading_introduction/feature-support-loading-and-unloading.md b/docs/zh/loading/loading_introduction/feature-support-loading-and-unloading.md
@@ -37,7 +37,7 @@ sidebar_label: "能力边界"
     </tr>
     <tr>
         <td>INSERT from FILES</td>
-        <td rowspan="2">HDFS, S3, OSS, Azure, GCS</td>
+        <td rowspan="2">HDFS, S3, OSS, Azure, GCS, NFS(NAS) [5]</td>
         <td>Yes (v3.3+)</td>
         <td>待支持</td>
         <td>Yes (v3.1+)</td>
@@ -104,6 +104,8 @@ sidebar_label: "能力边界"
 
 [4]：PIPE 当前只支持 INSERT from FILES。
 
+[5]：需要将同一 NAS 设备作为 NFS 挂载到每个 BE 或 CN 节点的相同目录下，才能通过 `file://` 协议访问 NFS 中的文件。
+
 :::
 
 #### JSON CDC 格式
@@ -159,7 +161,7 @@ sidebar_label: "能力边界"
     <tr>
         <td>INSERT INTO FILES</td>
         <td>N/A</td>
-        <td>HDFS, S3, OSS, Azure, GCS</td>
+        <td>HDFS, S3, OSS, Azure, GCS, NFS(NAS) [3]</td>
         <td>Yes (v3.3+)</td>
         <td>待支持</td>
         <td>Yes (v3.2+)</td>
@@ -208,6 +210,8 @@ sidebar_label: "能力边界"
 
 [2]：目前，不支持使用 PIPE 导出数据。
 
+[3]：需要将同一 NAS 设备作为 NFS 挂载到每个 BE 或 CN 节点的相同目录下，才能通过 `file://` 协议访问 NFS 中的文件。
+
 :::
 
 ## 文件格式相关参数

diff --git a/docs/zh/sql-reference/sql-functions/table-functions/files.md b/docs/zh/sql-reference/sql-functions/table-functions/files.md
@@ -18,6 +18,7 @@ displayed_sidebar: docs
   - AWS S3
   - Google Cloud Storage
   - Microsoft Azure Blob Storage
+  - NFS(NAS)
 - **文件格式：**
   - Parquet
   - ORC
@@ -89,6 +90,19 @@ FILES( data_location , data_format [, schema_detect ] [, StorageCredentialParams
     -- 示例： "path" = "wasbs://[email protected]/path/file.parquet"
     ```
 
+- 要访问 NFS(NAS)，您需要将此参数指定为：
+
+  ```SQL
+  "path" = "file:///<absolute_path>"
+  -- 示例： "path" = "file:///home/ubuntu/parquetfile/file.parquet"
+  ```
+
+  :::note
+
+  如需通过 `file://` 协议访问 NFS 中的文件，需要将同一 NAS 设备作为 NFS 挂载到每个 BE 或 CN 节点的相同目录下。
+
+  :::
+
 #### data_format
 
 数据文件的格式。有效值：`parquet`、`orc` 和 `csv`。
@@ -447,6 +461,15 @@ SELECT * FROM FILES(
 2 rows in set (22.335 sec)
 ```
 
+查询 NFS(NAS) 中的 Parquet 文件：
+
+```SQL
+SELECT * FROM FILES(
+  'path' = 'file:///home/ubuntu/parquetfile/*.parquet', 
+  'format' = 'parquet'
+);
+```
+
 #### 示例二：导入文件中的数据
 
 将 AWS S3 存储桶 `inserttest` 内 Parquet 文件 **parquet/insert_wiki_edit_append.parquet** 中的数据插入至表 `insert_wiki_edit` 中：
@@ -464,6 +487,18 @@ Query OK, 2 rows affected (23.03 sec)
 {'label':'insert_d8d4b2ee-ac5c-11ed-a2cf-4e1110a8f63b', 'status':'VISIBLE', 'txnId':'2440'}
 ```
 
+将 NFS(NAS) 中 CSV 文件的数据插入至表 `insert_wiki_edit` 中：
+
+```SQL
+INSERT INTO insert_wiki_edit
+  SELECT * FROM FILES(
+    'path' = 'file:///home/ubuntu/csvfile/*.csv', 
+    'format' = 'csv', 
+    'csv.column_separator' = ',', 
+    'csv.row_delimiter' = '\n'
+  );
+```
+
 #### 示例三：使用文件中的数据建表
 
 基于 AWS S3 存储桶 `inserttest` 内 Parquet 文件 **parquet/insert_wiki_edit_append.parquet** 中的数据创建表 `ctas_wiki_edit`：
@@ -656,8 +691,7 @@ DESC FILES(
 将 `sales_records` 中的所有数据行导出为多个 Parquet 文件，存储在 HDFS 集群的路径 **/unload/partitioned/** 下。这些文件存储在不同的子路径中，这些子路径根据列 `sales_time` 中的值来区分。
 
 ```SQL
-INSERT INTO 
-FILES(
+INSERT INTO FILES(
     "path" = "hdfs://xxx.xx.xxx.xx:9000/unload/partitioned/",
     "format" = "parquet",
     "hadoop.security.authentication" = "simple",
@@ -668,3 +702,23 @@ FILES(
 )
 SELECT * FROM sales_records;
 ```
+
+将查询结果导出至 NFS(NAS) 中的 CSV 或 Parquet 文件中：
+
+```SQL
+-- CSV
+INSERT INTO FILES(
+    'path' = 'file:///home/ubuntu/csvfile/', 
+    'format' = 'csv', 
+    'csv.column_separator' = ',', 
+    'csv.row_delimitor' = '\n'
+)
+SELECT * FROM sales_records;
+
+-- Parquet
+INSERT INTO FILES(
+    'path' = 'file:///home/ubuntu/parquetfile/',
+    'format' = 'parquet'
+)
+SELECT * FROM sales_records;
+```
diff --git a/docs/zh/unloading/unload_using_insert_into_files.md b/docs/zh/unloading/unload_using_insert_into_files.md
@@ -12,7 +12,7 @@ displayed_sidebar: docs
 
 > **说明**
 >
-> 使用 INSERT INTO FILES 导出数据不支持将数据导出至本地文件系统。
+> 使用 INSERT INTO FILES 导出数据不支持将数据直接导出至本地文件系统，但您可以使用 NFS 将数据导出到本地文件。请参阅 [使用 NFS 导出到本地文件](#使用-nfs-导出到本地文件)。
 
 ## 准备工作
 
@@ -199,6 +199,28 @@ FILES(
 SELECT * FROM sales_records;
 ```
 
+### 使用 NFS 导出到本地文件
+
+如需通过 `file://` 协议访问 NFS 中的文件，需要将同一 NAS 设备作为 NFS 挂载到每个 BE 或 CN 节点的相同目录下。
+
+```SQL
+-- 导出为 CSV 文件。
+INSERT INTO FILES(
+  'path' = 'file:///home/ubuntu/csvfile/', 
+  'format' = 'csv', 
+  'csv.column_separator' = ',', 
+  'csv.row_delimitor' = '\n'
+)
+SELECT * FROM sales_records;
+
+-- 导出为 Parquet 文件。
+INSERT INTO FILES(
+  'path' = 'file:///home/ubuntu/parquetfile/',
+   'format' = 'parquet'
+)
+SELECT * FROM sales_records;
+```
+
 ## 另请参阅
 
 - 有关使用 INSERT 的更多说明，请参阅 [SQL 参考 - INSERT](../sql-reference/sql-statements/loading_unloading/INSERT.md)。