-
Notifications
You must be signed in to change notification settings - Fork 942
[2/4] Offline pipeline evaluation and tests #8908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: wuandy/RealPpl_1
Are you sure you want to change the base?
Conversation
|
Vertex AI Mock Responses Check
|
*/ | ||
public _userDataWriter: AbstractUserDataWriter, | ||
readonly stages: Stage[], | ||
readonly converter: unknown = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not critical, but we've been removing converter from the pipeline constructors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleted.
import { JsonProtoSerializer } from '../remote/serializer'; | ||
|
||
/** | ||
* Base-class implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should update this comment before the release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, but need some further polishment still.
* @param stages | ||
* @param converter | ||
*/ | ||
newPipeline( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can keep this pattern in RealtimePipeline, but it's not required. It was useful in Pipeline / LitePipeline. Could be replaced with a direct call to new RealtimePipeline() for simpler code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
new Sort( | ||
this.readUserData( | ||
'sort', | ||
this.readUserData('sort', optionsOrOrderings.orderings) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a redundant call to readUserData
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
AggregateFunction, | ||
ListOfExprs, | ||
isNan, | ||
isError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming these linter errors are false positives caused by the way the code was split. Either way, I'm sure we will have some formatting and linter fixes when merging to main.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted those for now.
} else if (functionExpr.name === 'timestamp_sub') { | ||
return new CoreTimestampSub(functionExpr); | ||
} | ||
} else if (expr.exprType === 'AggregateFunction') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this still work since AggregateFunction no longer extends Expr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove them for now since we are not going to support anytime soon.
return EvaluateResult.newError(); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern seems to be repeated often. I wonder if we can refactor this into some shared code to reduce code size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let result = {};
if (evaluateParams(expr.params, [['BOOLEAN']], result) {
// not successful
return result.errorResult; // either newError or newNull
}
// use evaluated params
result.evaluated[0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep the current pattern, because:
- That is how the pattern backend uses, so it would be easier to compare when we want to figure out semantics difference if any.
- It might be more verbose, but I found it easy to read by being explicit.
} | ||
|
||
if (foundNullAtLeastOnce) { | ||
return EvaluateResult.newNull(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like you will not hit this case, the loop would have already returned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
foundNull is the bookkeeper for the inner loop and foundNullAtLeastOnce is keeping track if we have seen any null value.
Consider [1,2,3].arrayContainsAll([null, 1, 2]), this would hit this if block and returns here.
This logic is pretty confusing because of the nested loop, and i do wonder if there is a better way to write this.
Once again, the structure is copied from backend impl, if we decide to change it we should convince backend to adopt the new style. So the new way needs to be significantly better than the current way.
} | ||
case 'NULL': { | ||
foundNull = true; | ||
foundNullAtLeastOnce = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These foundNull(.*) seem to be redundant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see below.
stringValue: evaluated.value?.stringValue | ||
?.split('') | ||
.reverse() | ||
.join('') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should mark this as an implementation to evaluate efficiency, if you have not already. On the surface, this appears like it would be slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation changed here.
switch (evaluated.type) { | ||
case 'NULL': | ||
return EvaluateResult.newNull(); | ||
case 'STRING': { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverse and some other functions support Bytes in addition to string. Are bytes encoded as string somewhere upstream of this call, or will we need to add support for bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why it was missing, maybe it was not there in the backend when i did this.
Added the support, and fixed unicode support.
6710057
to
0c570ab
Compare
@@ -242,41 +288,322 @@ export function toPipeline(query: Query, db: Firestore): Pipeline { | |||
} | |||
} | |||
|
|||
return pipeline; | |||
return pipeline.stages; | |||
} | |||
|
|||
function whereConditionsFromCursor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably got lost in the merge. But the implementation in feat/pipelines is more performant and easier to read. Consider using that implementation instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
hasOrder = true; | ||
// add exists to force sparse semantics | ||
// Is this really needed? | ||
// newStages.push(new Where(new And(stage.orders.map(order => order.expr.exists())))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does sparse semantics need to be enforced? If not, should we remove this commented out code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted.
'Empty document paths are not allowed in DocumentsSource' | ||
); | ||
} | ||
if (stage.docPaths) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stage.docPaths
will always be truthy since we referenced stage.docPaths.length
above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
stage: DocumentsSource, | ||
input: PipelineInputOutput[] | ||
): PipelineInputOutput[] { | ||
if (stage.docPaths.length === 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (stage.docPaths && stage.docPaths.length === 0) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider this if ther is ever a condition where docPaths is unset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
{ serializer: pipeline.serializer }, | ||
d2 as MutableDocument | ||
) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice if we could memoize leftValue and rightValue. May be something for a future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* @param value The value of the constant. | ||
*/ | ||
constructor(private value: unknown) { | ||
constructor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at a minimum we want @hideconstructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
export function currentContext(): FunctionExpr { | ||
return new FunctionExpr('current_context', []); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should delete this expression. It will not be exposed in the SDKs. This may have been added back in a merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted.
* // Check if the document has a field named "phoneNumber" | ||
* exists("phoneNumber"); | ||
* // Check if the result of a calculation is NaN | ||
* isNaN("value"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears as if the comments for this and the overload above were reverted in merge. The comments should apply to exists
not isNaN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
* | ||
* ```typescript | ||
* // Check if the result of a calculation is NaN | ||
* isNaN(field("value").divide(0)); | ||
* ``` | ||
* | ||
* @param value The expression to check. | ||
* @return A new {@code Expr} representing the 'isNaN' check. | ||
* @return A new {@code Expr} representing the 'isNull' check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment needs fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* // Check if the result of a calculation is NaN | ||
* isNaN("value"); | ||
* // Check if the result of a calculation is null. | ||
* isNull("value"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment needs fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return new CorePipeline(pipeline.serializer, newStages); | ||
} | ||
|
||
export type QueryOrPipeline = Query | CorePipeline; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because QueryOrPipeline is a union type that is used throughout the implementation, to support porting to other platforms, we should change this to a common interface between the two types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think that is necessary?
C++(std::variant) and Kotlin(sealed class) has their own support for union types.
return queryEquals(left as Query, right as Query); | ||
} | ||
|
||
export type TargetOrPipeline = Target | CorePipeline; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question to be covered with the team. But what allows us to skip creating of an equivalent "Pipeline Target" class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Target and Query are almost identical, only difference is query know about limit to last while target does not.
With pipeline, we will no longer need this separation because we do not want to support features like limit to last only in the SDK.
stage.orders.some( | ||
order => | ||
order.expr instanceof Field && | ||
order.expr.fieldName() === DOCUMENT_KEY_NAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does NAME guarantee stable sort on a collection group?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I guess it will. This is the full path not just the ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
6f0e635
to
58cfd1d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
import { JsonProtoSerializer } from '../remote/serializer'; | ||
|
||
/** | ||
* Base-class implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, but need some further polishment still.
*/ | ||
public _userDataWriter: AbstractUserDataWriter, | ||
readonly stages: Stage[], | ||
readonly converter: unknown = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleted.
* @param stages | ||
* @param converter | ||
*/ | ||
newPipeline( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
new Sort( | ||
this.readUserData( | ||
'sort', | ||
this.readUserData('sort', optionsOrOrderings.orderings) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
AggregateFunction, | ||
ListOfExprs, | ||
isNan, | ||
isError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted those for now.
* @param value The value of the constant. | ||
*/ | ||
constructor(private value: unknown) { | ||
constructor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
export function currentContext(): FunctionExpr { | ||
return new FunctionExpr('current_context', []); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted.
* // Check if the document has a field named "phoneNumber" | ||
* exists("phoneNumber"); | ||
* // Check if the result of a calculation is NaN | ||
* isNaN("value"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
* | ||
* ```typescript | ||
* // Check if the result of a calculation is NaN | ||
* isNaN(field("value").divide(0)); | ||
* ``` | ||
* | ||
* @param value The expression to check. | ||
* @return A new {@code Expr} representing the 'isNaN' check. | ||
* @return A new {@code Expr} representing the 'isNull' check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* // Check if the result of a calculation is NaN | ||
* isNaN("value"); | ||
* // Check if the result of a calculation is null. | ||
* isNull("value"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// For stages augmenting outputs | ||
else if (stage instanceof AddFields || stage instanceof Select) { | ||
if (stage instanceof AddFields) { | ||
newStages.push(new AddFields(addSystemFields(stage.fields))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make more sense to validate and reject customer from overwriting system fields?
Also, what about other stages that can overwrite system fields? UNNEST, REPLACE, REMOVE_FIELDS
No description provided.