Skip to content

Commit

Permalink
[Query API] Lineage Query (#1080)
Browse files Browse the repository at this point in the history
* feat(lineage-query): impl of lineage query

* doc(lineage-query): basic message documentation and deprectation
  • Loading branch information
EagleoutIce authored Oct 13, 2024
1 parent d377ecf commit 22b5993
Show file tree
Hide file tree
Showing 9 changed files with 273 additions and 46 deletions.
7 changes: 6 additions & 1 deletion src/documentation/data/server/doc-data-server-messages.ts
Original file line number Diff line number Diff line change
Expand Up @@ -503,10 +503,15 @@ See [above](#message-request-file-analysis) for the general structure of the res
end
deactivate Server
`,
shortDescription: 'Obtain the lineage of a given slicing criterion.',
shortDescription: '([DEPRECATED](${FlowrWikiBaseRef}/Query%20API)) Obtain the lineage of a given slicing criterion.',
text: async(shell: RShell) => {
return `
${block({
type: 'WARNING',
content: `We deprecated the lineage request in favor of the \`lineage\` [Query](${FlowrWikiBaseRef}/Query%20API).`
})}
In order to retrieve the lineage of an object, you have to send a file analysis request first. The \`filetoken\` you assign is of use here as you can re-use it to repeatedly retrieve the lineage of the same file.
Besides that, you will need to add a [criterion](${FlowrWikiBaseRef}/Terminology#slicing-criterion) that specifies the object whose lineage you're interested in.
Expand Down
7 changes: 7 additions & 0 deletions src/documentation/doc-util/doc-query.ts
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,13 @@ export function asciiSummaryOfQueryResult(formatter: OutputFormatter, totalInMs:
}))`);
}
continue;
} else if(query === 'lineage') {
const out = queryResults as QueryResults<'lineage'>['lineage'];
result.push(`Query: ${bold(query, formatter)} (${printAsMs(out['.meta'].timing, 0)})`);
for(const [criteria, lineage] of Object.entries(out.lineages)) {
result.push(` ╰ ${criteria}: {${summarizeIdsIfTooLong([...lineage])}}`);
}
continue;
}

result.push(`Query: ${bold(query, formatter)}`);
Expand Down
36 changes: 36 additions & 0 deletions src/documentation/print-query-wiki.ts
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import { executeIdMapQuery } from '../queries/catalog/id-map-query/id-map-query-
import { executeNormalizedAstQuery } from '../queries/catalog/normalized-ast-query/normalized-ast-query-executor';
import { executeDataflowClusterQuery } from '../queries/catalog/cluster-query/cluster-query-executor';
import { executeStaticSliceClusterQuery } from '../queries/catalog/static-slice-query/static-slice-query-executor';
import { executeLineageQuery } from '../queries/catalog/lineage-query/lineage-query-executor';


registerQueryDocumentation('call-context', {
Expand Down Expand Up @@ -132,6 +133,41 @@ ${
}
});

registerQueryDocumentation('lineage', {
name: 'Lineage Query',
type: 'active',
shortDescription: 'Returns lineage of a criteria.',
functionName: executeLineageQuery.name,
functionFile: '../queries/catalog/lineage-query/lineage-query-executor.ts',
buildExplanation: async(shell: RShell) => {
const exampleCode = 'x <- 1\nx';

return `
This query calculates the _lineage_ of a given slicing criterion. The lineage traces back all parts that the
respective variables stems from given the reads, definitions, and returns in the dataflow graph.
To understand this, let's start with a simple example query, to get the lineage of the second use of \`x\` in the following code:
${codeBlock('r', exampleCode)}
For this, we use the criterion \`2@x\` (which is the first use of \`x\` in the second line).
${
await showQuery(shell, exampleCode, [{
type: 'lineage',
criterion: '2@x'
}], { showCode: false })
}
In this simple scenario, the _lineage_ is equivalent to the slice (and in-fact the complete code).
In general the lineage is smaller and makes no executability guarantees.
It is just a quick and neither complete nor sound way to get information on where the variable originates from.
This query replaces the old [\`request-lineage\`](${FlowrWikiBaseRef}/Interface#message-request-lineage) message.
`;
}
});

registerQueryDocumentation('dataflow-cluster', {
name: 'Dataflow Cluster Query',
type: 'active',
Expand Down
25 changes: 25 additions & 0 deletions src/queries/catalog/lineage-query/lineage-query-executor.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
import type { BasicQueryData } from '../../query';
import type {
LineageQuery,
LineageQueryResult
} from './lineage-query-format';
import { log } from '../../../util/log';
import { getLineage } from '../../../cli/repl/commands/repl-lineage';

export function executeLineageQuery({ graph, ast }: BasicQueryData, queries: readonly LineageQuery[]): LineageQueryResult {
const start = Date.now();
const result: LineageQueryResult['lineages'] = {};
for(const { criterion } of queries) {
if(result[criterion]) {
log.warn('Duplicate criterion in lineage query:', criterion);
}
result[criterion] = getLineage(criterion, graph, ast.idMap);
}

return {
'.meta': {
timing: Date.now() - start
},
lineages: result
};
}
16 changes: 16 additions & 0 deletions src/queries/catalog/lineage-query/lineage-query-format.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
import type { BaseQueryFormat, BaseQueryResult } from '../../base-query-format';
import type { SingleSlicingCriterion } from '../../../slicing/criterion/parse';
import type { NodeId } from '../../../r-bridge/lang-4.x/ast/model/processing/node-id';

/**
* Calculates the lineage of the given criterion.
*/
export interface LineageQuery extends BaseQueryFormat {
readonly type: 'lineage';
readonly criterion: SingleSlicingCriterion;
}

export interface LineageQueryResult extends BaseQueryResult {
/** Maps each criterion to the found lineage, duplicates are ignored. */
readonly lineages: Record<SingleSlicingCriterion, Set<NodeId>>;
}
8 changes: 7 additions & 1 deletion src/queries/query-schema.ts
Original file line number Diff line number Diff line change
Expand Up @@ -38,14 +38,20 @@ export const StaticSliceQuerySchema = Joi.object({
noMagicComments: Joi.boolean().optional().description('Should the magic comments (force-including lines within the slice) be ignored?')
}).description('Slice query used to slice the dataflow graph');

export const LineageQuerySchema = Joi.object({
type: Joi.string().valid('lineage').required().description('The type of the query.'),
id: Joi.string().required().description('The ID of the node to get the lineage of.')
}).description('Lineage query used to find the lineage of a node in the dataflow graph');


export const SupportedQueriesSchema = Joi.alternatives(
CallContextQuerySchema,
DataflowQuerySchema,
IdMapQuerySchema,
NormalizedAstQuerySchema,
DataflowClusterQuerySchema,
StaticSliceQuerySchema
StaticSliceQuerySchema,
LineageQuerySchema
).description('Supported queries');

export const CompoundQuerySchema = Joi.object({
Expand Down
7 changes: 5 additions & 2 deletions src/queries/query.ts
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,10 @@ import type { DataflowClusterQuery } from './catalog/cluster-query/cluster-query
import { executeDataflowClusterQuery } from './catalog/cluster-query/cluster-query-executor';
import type { StaticSliceQuery } from './catalog/static-slice-query/static-slice-query-format';
import { executeStaticSliceClusterQuery } from './catalog/static-slice-query/static-slice-query-executor';
import type { LineageQuery } from './catalog/lineage-query/lineage-query-format';
import { executeLineageQuery } from './catalog/lineage-query/lineage-query-executor';

export type Query = CallContextQuery | DataflowQuery | NormalizedAstQuery | IdMapQuery | DataflowClusterQuery | StaticSliceQuery;
export type Query = CallContextQuery | DataflowQuery | NormalizedAstQuery | IdMapQuery | DataflowClusterQuery | StaticSliceQuery | LineageQuery;

export type QueryArgumentsWithType<QueryType extends BaseQueryFormat['type']> = Query & { type: QueryType };

Expand All @@ -41,7 +43,8 @@ export const SupportedQueries = {
'id-map': executeIdMapQuery,
'normalized-ast': executeNormalizedAstQuery,
'dataflow-cluster': executeDataflowClusterQuery,
'static-slice': executeStaticSliceClusterQuery
'static-slice': executeStaticSliceClusterQuery,
'lineage': executeLineageQuery
} as const satisfies SupportedQueries;

export type SupportedQueryTypes = keyof typeof SupportedQueries;
Expand Down
24 changes: 24 additions & 0 deletions test/functionality/dataflow/query/lineage-query-tests.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import { assertQuery } from '../../_helper/query';
import { label } from '../../_helper/label';
import { withShell } from '../../_helper/shell';
import type {
LineageQuery,
LineageQueryResult
} from '../../../../src/queries/catalog/lineage-query/lineage-query-format';
import { getLineage } from '../../../../src/cli/repl/commands/repl-lineage';

describe('Lineage Query', withShell(shell => {
function testQuery(name: string, code: string, query: readonly LineageQuery[]) {
assertQuery(label(name), shell, code, query, ({ dataflow }) => ({
'lineage': {
lineages: query.reduce((acc, { criterion }) => {
acc[criterion] = getLineage(criterion, dataflow.graph);
return acc;
}, {} as LineageQueryResult['lineages'])
}
}));
}

testQuery('Single Expression', 'x + 1', [{ type: 'lineage', criterion: '1@x' }]);
testQuery('Multiple Queries', 'x + 1', [{ type: 'lineage', criterion: '1@x' }, { type: 'lineage', criterion: '1@x' }, { type: 'lineage', criterion: '1@x' }]);
}));
Loading

2 comments on commit 22b5993

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"artificial" Benchmark Suite

Benchmark suite Current: 22b5993 Previous: 87e0d90 Ratio
Retrieve AST from R code 242.64114190909092 ms (101.22604012672635) 236.1199211818182 ms (98.66983024776289) 1.03
Normalize R AST 17.49391509090909 ms (31.589782917523895) 18.017819681818185 ms (30.826317957537874) 0.97
Produce dataflow information 39.29354286363637 ms (85.54310797296287) 39.13999777272727 ms (83.84025914001471) 1.00
Total per-file 814.0051005454545 ms (1465.8419149899091) 812.0954968181819 ms (1454.9698832785984) 1.00
Static slicing 2.048432091656461 ms (1.1401675279364303) 2.1591346071288307 ms (1.3606098316777646) 0.95
Reconstruct code 0.23730528557602315 ms (0.18858444915660885) 0.22943671546843153 ms (0.1742124327312039) 1.03
Total per-slice 2.300791116017012 ms (1.2177999463718405) 2.4043281804202783 ms (1.4361374673050025) 0.96
failed to reconstruct/re-parse 0 # 0 # 1
times hit threshold 0 # 0 # 1
reduction (characters) 0.7869360165281424 # 0.7869360165281424 # 1
reduction (normalized tokens) 0.7639690077689504 # 0.7639690077689504 # 1
memory (df-graph) 95.46617542613636 KiB (244.77619956879823) 95.46617542613636 KiB (244.77619956879823) 1

This comment was automatically generated by workflow using github-action-benchmark.

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"social-science" Benchmark Suite

Benchmark suite Current: 22b5993 Previous: 87e0d90 Ratio
Retrieve AST from R code 241.23622498 ms (46.07994097014288) 242.00403574 ms (45.320075472333585) 1.00
Normalize R AST 18.595274059999998 ms (14.07642764847493) 20.69905216 ms (15.186291643827508) 0.90
Produce dataflow information 74.84718168 ms (88.1798105405472) 76.57805664 ms (89.58088354130591) 0.98
Total per-file 7771.5371134 ms (29395.65842744732) 7709.17616424 ms (28829.19832835734) 1.01
Static slicing 16.085559379520824 ms (44.843038892015315) 16.011826354372698 ms (44.07740390571893) 1.00
Reconstruct code 0.25443330397531116 ms (0.15103622777253903) 0.24862159317179558 ms (0.1500116592324732) 1.02
Total per-slice 16.347825067105454 ms (44.87252993299477) 16.268530062071488 ms (44.104014536012066) 1.00
failed to reconstruct/re-parse 0 # 0 # 1
times hit threshold 0 # 0 # 1
reduction (characters) 0.8712997340230448 # 0.8712997340230448 # 1
reduction (normalized tokens) 0.8102441553774778 # 0.8102441553774778 # 1
memory (df-graph) 99.8990234375 KiB (113.72812769327498) 99.8990234375 KiB (113.72812769327498) 1

This comment was automatically generated by workflow using github-action-benchmark.

Please sign in to comment.