Skip to content

Commit

Permalink
[pointer analysis] added support for named list arguments (#1151)
Browse files Browse the repository at this point in the history
* feat: added processor for list function call

* feat: named arguments are defined as indices for target of list assignment

* feat: reads edge is added to list element when accessing via '$'

* test(list-access): added slicing tests

* test(list-access): added dataflow tests

* test-fix: fix list-access test suites

* feat(list-access): all parameter indices are concatenated

* feat(list-access): pass indices of replacement to markAsAssignment

* feat(list-access): all existing definitions are read by access

* feat-fix(list-access): correct nodeId is used for ContainerIndex

* feat(list-access): merge definition properties with same name

This way only the correct definition is stored. Any overwritten definition isn't stored anymore. Currently, this only works for definition that happen in each branch.

* feat(list-access): add isSingleIndex property to container indices

* test(list-access): extended slicing tests

* feat(list-access): replacement function are not marked as maybe anymore

The merge indices logic was merged to a separate method.
When there are indices defined, then the function is not marked as maybe.
This enables overwriting the previous indices.

* feat(list-access): add whole list reference only to list call

Before, an access to a list object would reference the whole object, now that's only
when the list function is called. Single assignments are referenced directly by their
access operator. This allows us to skip not relevant list accesses.

* test-fix(list-access): list call is always in slice

To ensure a executable slice, the list call has to be always in the slice.

* feat(list-access): overwrite definition of indices if list is redefined

When a list is redefined the former definiton is replaced, therefore storing the indices would cause
keeping the previous definition in the slice.

* refactor(access): extracted number and index based access to methods

* test(list-access): add tests for nested list access

Nested list access comes with new complications. For each list in the root index, the names
must be resolved recursively, to reference the correct index.

* feat(nested-list): add subindices to index if index is another list

This enables nested defintion/access/assignment.

* feat(nested-list): add reads edges to accessed indices and their indices

This is done recursively to include all indices that had an impact on the result.

* feat(nested-list): recursively resolve nested access

* refactor(list-defs): renamed isSingleIndex to isContainer

Also inverted semantic

* feat(list-access): declared empty list is included in slice

* refactor(list-access): move utility methods to separate file

This way, they can be accessed by the write and read operations

* feat(nested-list): add support for nested assignment

* test(list-access): add supported capabilities ids to tests

* refactor(pointer-analysis): basic ts cleanup

* feat(pointer-analysis): support configuration

* doc(pointer-analysis): wiki update

* test(pointer-analysis): some new conditional tests

* lint-fix: handle linter errors

---------

Co-authored-by: Florian Sihler <[email protected]>
  • Loading branch information
Slartibartfass2 and EagleoutIce authored Jan 5, 2025
1 parent 40e1558 commit 4b73726
Show file tree
Hide file tree
Showing 17 changed files with 1,100 additions and 122 deletions.
14 changes: 11 additions & 3 deletions src/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,13 @@ export interface FlowrConfigOptions extends MergeableRecord {
/**
* How to resolve variables and their values
*/
readonly variables: VariableResolve
readonly variables: VariableResolve,
/**
* Whether to track pointers in the dataflow graph,
* if not, the graph will be over-approximated wrt.
* containers and accesses
*/
readonly pointerTracking: boolean
}

}
Expand All @@ -61,7 +67,8 @@ export const defaultConfigOptions: FlowrConfigOptions = {
}
},
solver: {
variables: VariableResolve.Alias
variables: VariableResolve.Alias,
pointerTracking: true
}
};

Expand All @@ -77,7 +84,8 @@ export const flowrConfigFileSchema = Joi.object({
}).optional().description('Semantics regarding the handlings of the environment.')
}).description('Configure language semantics and how flowR handles them.'),
solver: Joi.object({
variables: Joi.string().valid(...Object.values(VariableResolve)).description('How to resolve variables and their values.')
variables: Joi.string().valid(...Object.values(VariableResolve)).description('How to resolve variables and their values.'),
pointerTracking: Joi.boolean().description('Whether to track pointers in the dataflow graph, if not, the graph will be over-approximated wrt. containers and accesses.')
}).description('How to resolve constants, constraints, cells, ...')
}).description('The configuration file format for flowR.');

Expand Down
2 changes: 2 additions & 0 deletions src/dataflow/environments/built-in.ts
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ import { processApply } from '../internal/process/functions/call/built-in/built-
import { registerBuiltInDefinitions } from './built-in-config';
import { DefaultBuiltinConfig } from './default-builtin-config';
import type { LinkTo } from '../../queries/catalog/call-context-query/call-context-query-format';
import { processList } from '../internal/process/functions/call/built-in/built-in-list';



Expand Down Expand Up @@ -147,6 +148,7 @@ export const BuiltInProcessorMapper = {
'builtin:repeat-loop': processRepeatLoop,
'builtin:while-loop': processWhileLoop,
'builtin:replacement': processReplacementFunction,
'builtin:list': processList,
} as const satisfies Record<`builtin:${string}`, BuiltInIdentifierProcessorWithConfig<never>>;

export type BuiltInMappingName = keyof typeof BuiltInProcessorMapper;
Expand Down
5 changes: 3 additions & 2 deletions src/dataflow/environments/default-builtin-config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ export const DefaultBuiltinConfig: BuiltInDefinitions = [
{
type: 'function',
names: [
'~', '+', '-', '*', '/', '^', '!', '?', '**', '==', '!=', '>', '<', '>=', '<=', '%%', '%/%', '%*%', '%in%', ':', 'list',
'~', '+', '-', '*', '/', '^', '!', '?', '**', '==', '!=', '>', '<', '>=', '<=', '%%', '%/%', '%*%', '%in%', ':',
'rep', 'seq', 'seq_len', 'seq_along', 'seq.int', 'gsub', 'which', 'class', 'dimnames', 'min', 'max',
'intersect', 'subset', 'match', 'sqrt', 'abs', 'round', 'floor', 'ceiling', 'signif', 'trunc', 'log', 'log10', 'log2', 'sum', 'mean',
'unique', 'paste', 'paste0', 'read.csv', 'stop', 'is.null', 'numeric', 'as.character', 'as.integer', 'as.logical', 'as.numeric', 'as.matrix',
Expand All @@ -32,7 +32,7 @@ export const DefaultBuiltinConfig: BuiltInDefinitions = [
{
type: 'function',
names: [
'c', 't'
'c', 't', 'aperm' /* vector construction, concatenation, transpose function, permutation generation */
],
processor: 'builtin:default',
config: { readAllArguments: true },
Expand Down Expand Up @@ -117,6 +117,7 @@ export const DefaultBuiltinConfig: BuiltInDefinitions = [
{ type: 'function', names: ['repeat'], processor: 'builtin:repeat-loop', config: {}, assumePrimitive: true },
{ type: 'function', names: ['while'], processor: 'builtin:while-loop', config: {}, assumePrimitive: true },
{ type: 'function', names: ['do.call'], processor: 'builtin:apply', config: { indexOfFunction: 0, unquoteFunction: true }, assumePrimitive: true },
{ type: 'function', names: ['list'], processor: 'builtin:list', config: {}, assumePrimitive: true },
{
type: 'function',
names: [
Expand Down
95 changes: 94 additions & 1 deletion src/dataflow/environments/define.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,21 @@ import { BuiltInEnvironment } from './environment';
import type { IEnvironment, REnvironmentInformation } from './environment';

import { cloneEnvironmentInformation } from './clone';
import type { IdentifierDefinition } from './identifier';
import type { IdentifierDefinition, InGraphIdentifierDefinition } from './identifier';
import type { ContainerIndex, ContainerIndices } from '../graph/vertex';
import { isParentContainerIndex } from '../graph/vertex';


function defInEnv(newEnvironments: IEnvironment, name: string, definition: IdentifierDefinition) {
const existing = newEnvironments.memory.get(name);

// When there are defined indices, merge the definitions
const inGraphDefinition = definition as InGraphIdentifierDefinition;
if(existing !== undefined && inGraphDefinition.indicesCollection !== undefined && inGraphDefinition.controlDependencies === undefined) {
newEnvironments.memory.set(name, mergeDefinitions(existing, inGraphDefinition));
return;
}

// check if it is maybe or not
if(existing === undefined || definition.controlDependencies === undefined) {
newEnvironments.memory.set(name, [definition]);
Expand All @@ -15,6 +26,88 @@ function defInEnv(newEnvironments: IEnvironment, name: string, definition: Ident
}
}

function mergeDefinitions(existing: IdentifierDefinition[], definition: InGraphIdentifierDefinition): InGraphIdentifierDefinition[] {
// When new definition is not a single index, e.g., a list redefinition, then reset existing definition
if(definition.indicesCollection?.some(indices => indices.isContainer)) {
return [definition];
}

const existingDefs = existing.map((def) => def as InGraphIdentifierDefinition).filter((def) => def !== undefined);
const overwriteIndices = definition.indicesCollection?.flatMap(indices => indices.indices) ?? [];
// Compare existing and new definitions,
// add new definitions and remove existing definitions that are overwritten by new definition
const newExistingDefs: InGraphIdentifierDefinition[] = [];
for(const overwriteIndex of overwriteIndices) {
for(const existingDef of existingDefs) {
if(existingDef.indicesCollection === undefined) {
continue;
}

const newIndicesCollection = overwriteContainerIndices(existingDef.indicesCollection, overwriteIndex);

// if indices are now empty list, don't keep empty definition
if(newIndicesCollection.length > 0) {
newExistingDefs.push({
...existingDef,
indicesCollection: newIndicesCollection,
});
}
}
}
// store changed existing definitions and add new one
return [...newExistingDefs, definition];
}

function overwriteContainerIndices(
existingIndices: ContainerIndices[],
overwriteIndex: ContainerIndex
): ContainerIndices[] {
const newIndicesCollection: ContainerIndices[] = [];

for(const indices of existingIndices) {
let newIndices: ContainerIndex[];
// When overwrite index is container itself, then only overwrite sub-index
if(isParentContainerIndex(overwriteIndex)) {
newIndices = [];
for(const index of indices.indices) {
if(index.lexeme === overwriteIndex.lexeme && isParentContainerIndex(index)) {
const overwriteSubIndices = overwriteIndex.subIndices.flatMap(a => a.indices);

let newSubIndices: ContainerIndices[] = index.subIndices;
for(const overwriteSubIndex of overwriteSubIndices) {
newSubIndices = overwriteContainerIndices(newSubIndices, overwriteSubIndex);
}

if(newSubIndices.length > 0) {
newIndices.push({
...index,
subIndices: newSubIndices,
});
}
}
if(index.lexeme !== overwriteIndex.lexeme || !isParentContainerIndex(index)) {
newIndices.push(index);
}
}
} else if(indices.isContainer) {
// If indices are not a single, e.g., a list, take the whole definition
newIndices = indices.indices;
} else {
// Filter existing indices with the same name
newIndices = indices.indices.filter(def => def.lexeme !== overwriteIndex.lexeme);
}

if(indices.isContainer || newIndices.length > 0) {
newIndicesCollection.push({
...indices,
indices: newIndices,
});
}
}

return newIndicesCollection;
}

/**
* Insert the given `definition` --- defined within the given scope --- into the passed along `environments` will take care of propagation.
* Does not modify the passed along `environments` in-place! It returns the new reference.
Expand Down
8 changes: 7 additions & 1 deletion src/dataflow/environments/identifier.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import type { BuiltInIdentifierConstant, BuiltInIdentifierDefinition } from './built-in';
import type { NodeId } from '../../r-bridge/lang-4.x/ast/model/processing/node-id';
import type { ControlDependency } from '../info';
import type { ContainerIndicesCollection } from '../graph/vertex';

export type Identifier = string & { __brand?: 'identifier' }

Expand Down Expand Up @@ -97,14 +98,19 @@ export interface IdentifierReference {
*
* @see {@link IdentifierReference}
*/
interface InGraphIdentifierDefinition extends IdentifierReference {
export interface InGraphIdentifierDefinition extends IdentifierReference {
readonly type: InGraphReferenceType
/**
* The assignment node which ultimately defined this identifier
* (the arrow operator for e.g. `x <- 3`, or `assign` call in `assign("x", 3)`)
*/
readonly definedAt: NodeId

readonly value?: NodeId[]
/**
* this attribute links a definition to indices (pointer links) it may be affected by or related to
*/
indicesCollection?: ContainerIndicesCollection
}

/**
Expand Down
66 changes: 65 additions & 1 deletion src/dataflow/graph/vertex.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,66 @@ export enum VertexType {
FunctionDefinition = 'function-definition'
}

/**
* A single index of a container, which is not a container itself.
*
* This can be e.g. a string, number or boolean index.
*/
export interface ContainerLeafIndex {
/**
* Destinctive lexeme of index e.g 'name' for `list(name = 'John')`
*/
readonly lexeme: string,

/**
* NodeId of index in graph.
*/
readonly nodeId: NodeId,
}

/**
* A single index of a container, which is a container itself.
*
* This can be, e.g., a list, vector, or data frame.
*
* @see {@link ContainerLeafIndex} - for a single index of a container which is not a container itself
* @see {@link isParentContainerIndex} - to check if an index is a parent container index
*/
export interface ContainerParentIndex extends ContainerLeafIndex {
/**
* Sub-indices of index.
*/
readonly subIndices: ContainerIndices[],
}

export function isParentContainerIndex(index: ContainerIndex): index is ContainerParentIndex {
return 'subIndices' in index;
}

/**
* A single index of a container.
*/
export type ContainerIndex = ContainerLeafIndex | ContainerParentIndex;

/**
* List of indices of a single statement like `list(a=3, b=2)`
*/
export interface ContainerIndices {
readonly indices: ContainerIndex[],
/**
* Differentiate between single and multiple indices.
*
* For `list(name = 'John')` `isContainer` would be true, because a list may define more than one index.
* `isContainer` is true for e.g. single index assignments like `person$name <- 'John'`.
*/
readonly isContainer: boolean,
}

/**
* Collection of Indices of several statements.
*/
export type ContainerIndicesCollection = ContainerIndices[] | undefined

/**
* Arguments required to construct a vertex in the {@link DataflowGraph|dataflow graph}.
*
Expand All @@ -34,11 +94,15 @@ interface DataflowGraphVertexBase extends MergeableRecord {
/**
* The environment in which the vertex is set.
*/
environment?: REnvironmentInformation | undefined
environment?: REnvironmentInformation
/**
* @see {@link ControlDependency} - the collection of control dependencies which have an influence on whether the vertex is executed.
*/
controlDependencies: ControlDependency[] | undefined
/**
* this attribute links a vertex to indices (pointer links) it may be affected by or related to
*/
indicesCollection?: ContainerIndicesCollection
}

/**
Expand Down
Loading

0 comments on commit 4b73726

Please sign in to comment.