You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: packages/vmind/src/gpt/dataProcess/prompts.ts
+32-56
Original file line number
Diff line number
Diff line change
@@ -195,78 +195,51 @@ Response:
195
195
196
196
exportconstgetQueryDatasetPrompt=(
197
197
showThoughts: boolean
198
-
)=>`You are an expert in data analysis. Here is a raw dataset named dataSource. User will tell you his command and column information of DataSource. You need to generate a standard SQL query to select useful fields from dataSource according to the template following the Steps and Description. Return the JSON object only.
199
-
# Note
200
-
1. You are running on a simple SQL engine, so the advanced features, such as RANK() OVER, TOP, JOIN, UNION, etc., are not supported. Please follow the SQL template and Description strictly.
201
-
2. Don't guess the specific data content in your SQL. Don't use conditional statement.
202
-
3. If you think the fields in dataSource cannot meet user requirements, do not further generate new fields. Just ignore user's command and use these fields.
198
+
)=>`You are an expert in data analysis. Here is a raw dataset named dataSource. User will tell you his command and column information of dataSource. Your task is to generate SimQuery and fieldInfo according to SimQuery Instruction. Response one JSON object only.
203
199
200
+
# SimQuery Instruction
201
+
- SimQuery is a simplified SQL-like language. Supported keywords in SimQuery: ["SELECT", "FROM", "WHERE", "GROUP BY", "HAVING", "ORDER BY", "LIMIT"].
202
+
- A SimQuery query looks like this: "SELECT columnA, SUM(columnB) as sum_b FROM dataSource WHERE columnA = value1 GROUP BY columnA HAVING sum_b>0 ORDER BY sum_b LIMIT 10".
203
+
- Columns in SELECT can only be original columns or aggregated columns. Supported aggregation methods in SimQuery: ["MAX()", "MIN()", "SUM()", "COUNT()", "AVG()"].
204
+
- The "WHERE" and "HAVING" in SimQuery can only use original columns or aggregated columns in dataSource. Supported Operators in SimQuery:[ ">", ">=", "<", "<=", "=", "!=", "in", "not in", "is null", "is not null", "between", "not between", "like", "not like"]. Don't use non-existent columns.
205
+
- Don't use unsupported keywords such as CASE WHEN...ELSE...END or PERCENTILE_CONT. Don't use unsupported aggregation methods on columns. Don't use unsupported operators. Unsupported keywords, methods and operators will cause system crash. If current keywords and methods can't meet your needs, just simple select the column without any process.
206
+
- Make your SimQuery as simple as possible.
204
207
205
-
# SQL template:
206
-
SELECT xxx FROM xxx (WHERE xxx) GROUP BY xxx (HAVING xxx) (ORDER BY xxx) (LIMIT xxx).
207
-
208
+
You need to follow the steps below.
208
209
209
210
# Steps
210
-
1. Just use user's command to select useful fields directly. Ignore other parts of user's command.
211
-
2. Select useful dimension fields from dataSource. Use the original dimension field without any process.
212
-
3. Aggregate the measure fields. Supported aggregation function: MAX(), MIN(), SUM(), COUNT(), AVG(). Note: don't aggregate measures using functions that are not supported such as PERCENTILE_CONT(). Don't use conditional statement.
213
-
4. Group the data using dimension fields and fill it in GROUP BY.
214
-
5. You can also use WHERE, HAVING, ORDER BY, LIMIT in your SQL if necessary.
215
-
216
-
217
-
# Description
218
-
1. The part in brackets is optional. xxx in the SQL template can only be original columns or aggregated columns. Select Data only from one table. Don't use unsupported features such as RANK(), TOP, UNION, etc.
219
-
2. Make your SQL as simple as possible. Strictly follow the SQL template to generate SQL. Don't use JOIN, UNION, subquery or other feature that is not in the SQL template. Don't process fields in ways other than supported aggregation functions.
220
-
3. Please don't change or translate the field names in your SQL statement.
221
-
4. Don't ignore GROUP BY in your SQL.
211
+
1. Extract the part related to the data from the user's instruction. Ignore other parts that is not related to the data.
212
+
2. Select useful dimension and measure columns from dataSource. You can only use columns in Column Information and do not assume non-existent columns. If the existing columns can't meet user's command, just select the most related columns in Column Information.
213
+
3. Use the original dimension columns without any process. Aggregate the measure columns using aggregation methods supported in SimQuery. Don't use unsupported methods. If current keywords and methods can't meet your needs, just simple select the column without any process.
214
+
4. Group the data using dimension columns.
215
+
5. You can also use WHERE, HAVING, ORDER BY, LIMIT in your SimQuery if necessary. Use the supported operators to finish the WHERE and HAVING of SimQuery. You can only use binary expression such as columnA = value1, sum_b > 0. You can only use dimension values appearing in the domain of dimension columns in your expression.
222
216
217
+
Let's think step by step.
223
218
224
-
Response in JSON format without any additional words. Your JSON object must contain sql and fieldInfo.
219
+
Response one JSON object without any additional words. Your JSON object must contain SimQuery and fieldInfo.
225
220
226
-
Make your SQL as simple as possible.
227
-
228
-
Response in the following JSON format:
221
+
Response in the following format:
229
222
\`\`\`
230
223
{
231
-
sql: string; //your sql statement. Note that it's a string in a JSON object so it must be in one line without any \\n.
232
-
fieldInfo: {
233
-
fieldName: string; //name of the field.
234
-
description?: string; //description of the field. If it is an aggregated field, please describe how it is generated in detail.
235
-
}[]; //array of the information about the fields in your sql. Describing its aggregation method and other information of the fields.
"sql": "SELECT key, SUM(value) AS performance FROM dataSource GROUP BY key",
249
-
"fieldInfo": [
250
-
{
251
-
"fieldName": "key",
252
-
"description": "The identifier of the person."
253
-
},
254
-
{
255
-
"fieldName": "performance",
256
-
"description": "An aggregated field representing the performance of the person in different aspects. It is generated by aggregating the 'value' field."
257
-
}
258
-
]
259
-
}
260
-
\`\`\`
261
-
----------------------------------
262
-
263
235
User's Command: Show me the change of the GDP rankings of each country.
0 commit comments