forked from StarRocks/starrocks
-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Doc] Files() supports NAS (StarRocks#54030)
Signed-off-by: 絵空事スピリット <[email protected]>
- Loading branch information
1 parent
e78d2f6
commit 82ef268
Showing
6 changed files
with
172 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,6 +19,7 @@ Currently, the FILES() function supports the following data sources and file for | |
- Google Cloud Storage | ||
- Other S3-compatible storage system | ||
- Microsoft Azure Blob Storage | ||
- NFS(NAS) | ||
- **File formats:** | ||
- Parquet | ||
- ORC | ||
|
@@ -90,6 +91,19 @@ The URI used to access the files. You can specify a path or a file. | |
-- Example: "path" = "wasbs://[email protected]/path/file.parquet" | ||
``` | ||
|
||
- To access NFS(NAS): | ||
|
||
```SQL | ||
"path" = "file:///<absolute_path>" | ||
-- Example: "path" = "file:///home/ubuntu/parquetfile/file.parquet" | ||
``` | ||
|
||
:::note | ||
|
||
To access the files in NFS via the `file://` protocol, you need to mount a NAS device as NFS under the same directory of each BE or CN node. | ||
|
||
::: | ||
|
||
#### data_format | ||
|
||
The format of the data file. Valid values: `parquet`, `orc`, and `csv`. | ||
|
@@ -448,6 +462,15 @@ SELECT * FROM FILES( | |
2 rows in set (22.335 sec) | ||
``` | ||
Query the data from the Parquet files in NFS(NAS): | ||
```SQL | ||
SELECT * FROM FILES( | ||
'path' = 'file:///home/ubuntu/parquetfile/*.parquet', | ||
'format' = 'parquet' | ||
); | ||
``` | ||
#### Example 2: Insert the data rows from a file | ||
Insert the data rows from the Parquet file **parquet/insert_wiki_edit_append.parquet** within the AWS S3 bucket `inserttest` into the table `insert_wiki_edit`: | ||
|
@@ -465,6 +488,18 @@ Query OK, 2 rows affected (23.03 sec) | |
{'label':'insert_d8d4b2ee-ac5c-11ed-a2cf-4e1110a8f63b', 'status':'VISIBLE', 'txnId':'2440'} | ||
``` | ||
Insert the data rows from the CSV files in NFS(NAS) into the table `insert_wiki_edit`: | ||
```SQL | ||
INSERT INTO insert_wiki_edit | ||
SELECT * FROM FILES( | ||
'path' = 'file:///home/ubuntu/csvfile/*.csv', | ||
'format' = 'csv', | ||
'csv.column_separator' = ',', | ||
'csv.row_delimiter' = '\n' | ||
); | ||
``` | ||
#### Example 3: CTAS with data rows from a file | ||
Create a table named `ctas_wiki_edit` and insert the data rows from the Parquet file **parquet/insert_wiki_edit_append.parquet** within the AWS S3 bucket `inserttest` into the table: | ||
|
@@ -657,8 +692,7 @@ DESC FILES( | |
Unload all data rows in `sales_records` as multiple Parquet files under the path **/unload/partitioned/** in the HDFS cluster. These files are stored in different subpaths distinguished by the values in the column `sales_time`. | ||
```SQL | ||
INSERT INTO | ||
FILES( | ||
INSERT INTO FILES( | ||
"path" = "hdfs://xxx.xx.xxx.xx:9000/unload/partitioned/", | ||
"format" = "parquet", | ||
"hadoop.security.authentication" = "simple", | ||
|
@@ -669,3 +703,23 @@ FILES( | |
) | ||
SELECT * FROM sales_records; | ||
``` | ||
Unload the query results into CSV and Parquet files in NFS(NAS): | ||
```SQL | ||
-- CSV | ||
INSERT INTO FILES( | ||
'path' = 'file:///home/ubuntu/csvfile/', | ||
'format' = 'csv', | ||
'csv.column_separator' = ',', | ||
'csv.row_delimitor' = '\n' | ||
) | ||
SELECT * FROM sales_records; | ||
-- Parquet | ||
INSERT INTO FILES( | ||
'path' = 'file:///home/ubuntu/parquetfile/', | ||
'format' = 'parquet' | ||
) | ||
SELECT * FROM sales_records; | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,6 +18,7 @@ displayed_sidebar: docs | |
- AWS S3 | ||
- Google Cloud Storage | ||
- Microsoft Azure Blob Storage | ||
- NFS(NAS) | ||
- **文件格式:** | ||
- Parquet | ||
- ORC | ||
|
@@ -89,6 +90,19 @@ FILES( data_location , data_format [, schema_detect ] [, StorageCredentialParams | |
-- 示例: "path" = "wasbs://[email protected]/path/file.parquet" | ||
``` | ||
|
||
- 要访问 NFS(NAS),您需要将此参数指定为: | ||
|
||
```SQL | ||
"path" = "file:///<absolute_path>" | ||
-- 示例: "path" = "file:///home/ubuntu/parquetfile/file.parquet" | ||
``` | ||
|
||
:::note | ||
|
||
如需通过 `file://` 协议访问 NFS 中的文件,需要将同一 NAS 设备作为 NFS 挂载到每个 BE 或 CN 节点的相同目录下。 | ||
|
||
::: | ||
|
||
#### data_format | ||
|
||
数据文件的格式。有效值:`parquet`、`orc` 和 `csv`。 | ||
|
@@ -447,6 +461,15 @@ SELECT * FROM FILES( | |
2 rows in set (22.335 sec) | ||
``` | ||
查询 NFS(NAS) 中的 Parquet 文件: | ||
```SQL | ||
SELECT * FROM FILES( | ||
'path' = 'file:///home/ubuntu/parquetfile/*.parquet', | ||
'format' = 'parquet' | ||
); | ||
``` | ||
#### 示例二:导入文件中的数据 | ||
将 AWS S3 存储桶 `inserttest` 内 Parquet 文件 **parquet/insert_wiki_edit_append.parquet** 中的数据插入至表 `insert_wiki_edit` 中: | ||
|
@@ -464,6 +487,18 @@ Query OK, 2 rows affected (23.03 sec) | |
{'label':'insert_d8d4b2ee-ac5c-11ed-a2cf-4e1110a8f63b', 'status':'VISIBLE', 'txnId':'2440'} | ||
``` | ||
将 NFS(NAS) 中 CSV 文件的数据插入至表 `insert_wiki_edit` 中: | ||
```SQL | ||
INSERT INTO insert_wiki_edit | ||
SELECT * FROM FILES( | ||
'path' = 'file:///home/ubuntu/csvfile/*.csv', | ||
'format' = 'csv', | ||
'csv.column_separator' = ',', | ||
'csv.row_delimiter' = '\n' | ||
); | ||
``` | ||
#### 示例三:使用文件中的数据建表 | ||
基于 AWS S3 存储桶 `inserttest` 内 Parquet 文件 **parquet/insert_wiki_edit_append.parquet** 中的数据创建表 `ctas_wiki_edit`: | ||
|
@@ -656,8 +691,7 @@ DESC FILES( | |
将 `sales_records` 中的所有数据行导出为多个 Parquet 文件,存储在 HDFS 集群的路径 **/unload/partitioned/** 下。这些文件存储在不同的子路径中,这些子路径根据列 `sales_time` 中的值来区分。 | ||
```SQL | ||
INSERT INTO | ||
FILES( | ||
INSERT INTO FILES( | ||
"path" = "hdfs://xxx.xx.xxx.xx:9000/unload/partitioned/", | ||
"format" = "parquet", | ||
"hadoop.security.authentication" = "simple", | ||
|
@@ -668,3 +702,23 @@ FILES( | |
) | ||
SELECT * FROM sales_records; | ||
``` | ||
将查询结果导出至 NFS(NAS) 中的 CSV 或 Parquet 文件中: | ||
```SQL | ||
-- CSV | ||
INSERT INTO FILES( | ||
'path' = 'file:///home/ubuntu/csvfile/', | ||
'format' = 'csv', | ||
'csv.column_separator' = ',', | ||
'csv.row_delimitor' = '\n' | ||
) | ||
SELECT * FROM sales_records; | ||
-- Parquet | ||
INSERT INTO FILES( | ||
'path' = 'file:///home/ubuntu/parquetfile/', | ||
'format' = 'parquet' | ||
) | ||
SELECT * FROM sales_records; | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters