Skip to content

Commit

Permalink
[#24701] YSQL: Always fetch wholerow attribute in non-fast path UPDATEs
Browse files Browse the repository at this point in the history
Summary:
PG in commit 86dc90056dfdbd9d1b891718d2e5614e3e432f35 reworked the UPDATE node's execution to get only the updated and junk columns from its child node. It (the UPDATE node) then fetches the existing "old" tuple and merges it with the output of the child node to construct the new tuple.

This led to performance degradation in Yugabyte because:
- Before this change, the UPDATE node sent only write operations that could be buffered (with some restrictions) and batch executed.
- With this change, the UPDATE node sent a read followed by a write operation per row, triggering 2 RPCs per row.

YB commit cf176f4 / D31673 addressed the above problem by sending the old tuple as "wholerow" junk attribute to the UPDATE node. With this, the read operation inside the UPDATE node to fetch the old tuple was no longer required. While this approach was correct, it made the following mistakes:

- Justification: Citing a `failed to fetch tuple being updated`, it claimed that performing read inside the UPDATE node has a correctness issue that materializes only in Yugabyte (see D31673's summary for details). That claim was incorrect. This issue could have been resolved by using the correct `in_txn_limit`. (Skipping the details here as that approach is not being taken anyway).

- Over-optimization: YB does not need the old tuple in some cases. So, it tried to fetch the wholerow attribute only when necessary. While doing so, it missed (at least) one case: when a subset of foreign key columns are updated. Fetching the old row is necessary to check for foreign key violations in this case correctly. Consider:

```
CREATE TABLE pk(a INT, b INT, PRIMARY KEY (a, b));
CREATE TABLE fk(a INT, b INT, FOREIGN KEY(a, b) REFERENCES pk);
INSERT INTO pk VALUES (1, 1);
INSERT INTO fk VALUES (1, 1);
UPDATE fk set a = a + 1; -- should fail due to foreign key violation
```
Without the old tuple, during the UPDATE operation, the only tuple present in the relation becomes  (a = 2, b = null). Consequently, the foreign key violation is not thrown.

There can potentially be more cases where not fetching the wholerow can lead to incorrect results. Moreover, in most cases, the old tuple is needed anyway. This revision removes this optimization and fetches the wholerow attribute in all non-fast path UPDATEs.
Jira: DB-13777

Test Plan:
   ./yb_build.sh --java-test org.yb.pgsql.TestPgRegressForeignKey

Reviewers: amartsinchyk, kramanathan

Reviewed By: amartsinchyk

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D40671
  • Loading branch information
arpang committed Dec 18, 2024
1 parent 38ffe64 commit 28c8fe5
Show file tree
Hide file tree
Showing 17 changed files with 136 additions and 204 deletions.
34 changes: 15 additions & 19 deletions src/postgres/src/backend/executor/execExpr.c
Original file line number Diff line number Diff line change
Expand Up @@ -521,8 +521,7 @@ ExecBuildUpdateProjection(List *targetList,
TupleDesc relDesc,
ExprContext *econtext,
TupleTableSlot *slot,
PlanState *parent,
bool ybUseScanTuple)
PlanState *parent)
{
ProjectionInfo *projInfo = makeNode(ProjectionInfo);
ExprState *state;
Expand Down Expand Up @@ -585,24 +584,21 @@ ExecBuildUpdateProjection(List *targetList,
assignedCols = bms_add_member(assignedCols, targetattnum);
}

if (ybUseScanTuple)
/*
* We need to insert EEOP_*_FETCHSOME steps to ensure the input tuples are
* sufficiently deconstructed. The scan tuple must be deconstructed at
* least as far as the last old column we need.
*/
for (int attnum = relDesc->natts; attnum > 0; attnum--)
{
/*
* We need to insert EEOP_*_FETCHSOME steps to ensure the input tuples are
* sufficiently deconstructed. The scan tuple must be deconstructed at
* least as far as the last old column we need.
*/
for (int attnum = relDesc->natts; attnum > 0; attnum--)
{
Form_pg_attribute attr = TupleDescAttr(relDesc, attnum - 1);
Form_pg_attribute attr = TupleDescAttr(relDesc, attnum - 1);

if (attr->attisdropped)
continue;
if (bms_is_member(attnum, assignedCols))
continue;
deform.last_scan = attnum;
break;
}
if (attr->attisdropped)
continue;
if (bms_is_member(attnum, assignedCols))
continue;
deform.last_scan = attnum;
break;
}

/*
Expand Down Expand Up @@ -708,7 +704,7 @@ ExecBuildUpdateProjection(List *targetList,
{
Form_pg_attribute attr = TupleDescAttr(relDesc, attnum - 1);

if (attr->attisdropped || (!ybUseScanTuple && !bms_is_member(attnum, assignedCols)))
if (attr->attisdropped)
{
/* Put a null into the ExprState's resvalue/resnull ... */
scratch.opcode = EEOP_CONST;
Expand Down
6 changes: 2 additions & 4 deletions src/postgres/src/backend/executor/execPartition.c
Original file line number Diff line number Diff line change
Expand Up @@ -831,8 +831,7 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
partrelDesc,
econtext,
onconfl->oc_ProjSlot,
&mtstate->ps,
node->ybUseScanTupleInUpdate);
&mtstate->ps);

/*
* If there is a WHERE clause, initialize state where it will
Expand Down Expand Up @@ -953,8 +952,7 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
RelationGetDescr(leaf_part_rri->ri_RelationDesc),
econtext,
leaf_part_rri->ri_newTupleSlot,
NULL,
node->ybUseScanTupleInUpdate);
NULL);
break;
case CMD_DELETE:
break;
Expand Down
116 changes: 53 additions & 63 deletions src/postgres/src/backend/executor/nodeModifyTable.c
Original file line number Diff line number Diff line change
Expand Up @@ -738,8 +738,7 @@ ExecInitUpdateProjection(ModifyTableState *mtstate,
relDesc,
mtstate->ps.ps_ExprContext,
resultRelInfo->ri_newTupleSlot,
&mtstate->ps,
node->ybUseScanTupleInUpdate);
&mtstate->ps);

resultRelInfo->ri_projectNewInfoValid = true;
}
Expand Down Expand Up @@ -3953,8 +3952,7 @@ ExecInitMerge(ModifyTableState *mtstate, EState *estate)
relationDesc,
econtext,
resultRelInfo->ri_newTupleSlot,
&mtstate->ps,
node->ybUseScanTupleInUpdate);
&mtstate->ps);
mtstate->mt_merge_subcommands |= MERGE_UPDATE;
break;
case CMD_DELETE:
Expand Down Expand Up @@ -4204,7 +4202,6 @@ static TupleTableSlot *
ExecModifyTable(PlanState *pstate)
{
ModifyTableState *node = castNode(ModifyTableState, pstate);
ModifyTable *plan = (ModifyTable *) node->ps.plan;
ModifyTableContext context;
EState *estate = node->ps.state;
CmdType operation = node->operation;
Expand Down Expand Up @@ -4414,48 +4411,43 @@ ExecModifyTable(PlanState *pstate)
bool isNull;

relkind = resultRelInfo->ri_RelationDesc->rd_rel->relkind;
/*
* For YugaByte relations extract the old row from the wholerow junk
* attribute if needed.
*/
if (plan->ybHasWholeRowAttribute)
{
AttrNumber resno;
Plan *subplan = outerPlan(node->ps.plan);

resno = ExecFindJunkAttributeInTlist(subplan->targetlist, "wholerow");
datum = ExecGetJunkAttribute(slot, resno, &isNull);

/* shouldn't ever get a null result... */
if (isNull)
elog(ERROR, "wholerow is NULL");

oldtupdata.t_data = DatumGetHeapTupleHeader(datum);
oldtupdata.t_len =
HeapTupleHeaderGetDatumLength(oldtupdata.t_data);
ItemPointerSetInvalid(&(oldtupdata.t_self));
/* Historically, view triggers see invalid t_tableOid. */
oldtupdata.t_tableOid =
(relkind == RELKIND_VIEW) ? InvalidOid :
RelationGetRelid(resultRelInfo->ri_RelationDesc);

resno = ExecFindJunkAttributeInTlist(subplan->targetlist, "ybctid");
datum = ExecGetJunkAttribute(slot, resno, &isNull);

if (IsYBRelation(relation))
{
/* ri_RowIdAttNo refers to a ybctid attribute */
Assert(AttributeNumberIsValid(resultRelInfo->ri_RowIdAttNo));
datum = ExecGetJunkAttribute(slot, resultRelInfo->ri_RowIdAttNo,
&isNull);
/* shouldn't ever get a null result... */
if (isNull)
elog(ERROR, "ybctid is NULL");

TABLETUPLE_YBCTID(context.planSlot) = datum;
HEAPTUPLE_YBCTID(&oldtupdata) = datum;

oldtuple = &oldtupdata;
if (AttributeNumberIsValid(resultRelInfo->ri_YbWholeRowAttNo))
{
/* Extract the old row from wholerow junk attribute. */
Datum wholerow = ExecGetJunkAttribute(
slot, resultRelInfo->ri_YbWholeRowAttNo, &isNull);

/* shouldn't ever get a null result... */
if (isNull)
elog(ERROR, "wholerow is NULL");

oldtupdata.t_data = DatumGetHeapTupleHeader(wholerow);
oldtupdata.t_len =
HeapTupleHeaderGetDatumLength(oldtupdata.t_data);
ItemPointerSetInvalid(&(oldtupdata.t_self));
oldtupdata.t_tableOid =
RelationGetRelid(resultRelInfo->ri_RelationDesc);
HEAPTUPLE_YBCTID(&oldtupdata) = datum;
oldtuple = &oldtupdata;
}
}
else if (relkind == RELKIND_RELATION ||
relkind == RELKIND_MATVIEW ||
relkind == RELKIND_PARTITIONED_TABLE)
{
/* ri_RowIdAttNo refers to a ctid/ybctid attribute */
/* ri_RowIdAttNo refers to a ctid attribute */
Assert(AttributeNumberIsValid(resultRelInfo->ri_RowIdAttNo));
datum = ExecGetJunkAttribute(slot,
resultRelInfo->ri_RowIdAttNo,
Expand All @@ -4478,22 +4470,10 @@ ExecModifyTable(PlanState *pstate)
ExecMerge(&context, node->resultRelInfo, NULL, node->canSetTag);
continue; /* no RETURNING support yet */
}

if(IsYBRelation(relation))
elog(ERROR, "ybctid is NULL");
else
elog(ERROR, "ctid is NULL");
}

if(IsYBRelation(relation))
{
TABLETUPLE_YBCTID(context.planSlot) = datum;
}
else
{
tupleid = (ItemPointer) DatumGetPointer(datum);
tuple_ctid = *tupleid; /* be sure we don't free ctid!! */
elog(ERROR, "ctid is NULL");
}
tupleid = (ItemPointer) DatumGetPointer(datum);
tuple_ctid = *tupleid; /* be sure we don't free ctid!! */
tupleid = &tuple_ctid;
}

Expand Down Expand Up @@ -4580,18 +4560,17 @@ ExecModifyTable(PlanState *pstate)
}
else
{
/*
* For UPDATE, oldtuple must be populated for YB
* relations.
*/
Assert(!IsYBRelation(relation));
/* Fetch the most recent version of old tuple. */
Relation relation = resultRelInfo->ri_RelationDesc;
bool row_found = false;
if (IsYBRelation(relation))
row_found = true;
else
{
row_found = table_tuple_fetch_row_version(
relation, tupleid, SnapshotAny, oldSlot);
}
Relation relation = resultRelInfo->ri_RelationDesc;

if (!row_found)
if (!table_tuple_fetch_row_version(relation, tupleid,
SnapshotAny,
oldSlot))
elog(ERROR, "failed to fetch tuple being updated");
}
slot = internalGetUpdateNewTuple(resultRelInfo, context.planSlot,
Expand Down Expand Up @@ -4888,12 +4867,24 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
relkind = resultRelInfo->ri_RelationDesc->rd_rel->relkind;
if (IsYBRelation(resultRelInfo->ri_RelationDesc))
{
/* YB note: MERGE command not supported yet. */
Assert(operation != CMD_MERGE);

if (mtstate->yb_fetch_target_tuple)
{
resultRelInfo->ri_RowIdAttNo =
ExecFindJunkAttributeInTlist(subplan->targetlist, "ybctid");
if (!AttributeNumberIsValid(resultRelInfo->ri_RowIdAttNo))
elog(ERROR, "could not find junk ybctid column");

resultRelInfo->ri_YbWholeRowAttNo =
ExecFindJunkAttributeInTlist(subplan->targetlist, "wholerow");

Assert(!YbWholeRowAttrRequired(
resultRelInfo->ri_RelationDesc, operation) ||
AttributeNumberIsValid(
resultRelInfo->ri_YbWholeRowAttNo));

}
}
else if (relkind == RELKIND_RELATION ||
Expand Down Expand Up @@ -5101,8 +5092,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
relationDesc,
econtext,
onconfl->oc_ProjSlot,
&mtstate->ps,
node->ybUseScanTupleInUpdate);
&mtstate->ps);

/* initialize state to evaluate the WHERE clause, if any */
if (node->onConflictWhere)
Expand Down
2 changes: 0 additions & 2 deletions src/postgres/src/backend/nodes/copyfuncs.c
Original file line number Diff line number Diff line change
Expand Up @@ -240,8 +240,6 @@ _copyModifyTable(const ModifyTable *from)
COPY_NODE_FIELD(yb_skip_entities);
COPY_NODE_FIELD(yb_update_affected_entities);
COPY_SCALAR_FIELD(no_row_trigger);
COPY_SCALAR_FIELD(ybUseScanTupleInUpdate);
COPY_SCALAR_FIELD(ybHasWholeRowAttribute);

return newnode;
}
Expand Down
2 changes: 0 additions & 2 deletions src/postgres/src/backend/nodes/outfuncs.c
Original file line number Diff line number Diff line change
Expand Up @@ -445,8 +445,6 @@ _outModifyTable(StringInfo str, const ModifyTable *node)
WRITE_NODE_FIELD(yb_skip_entities);
WRITE_NODE_FIELD(yb_update_affected_entities);
WRITE_BOOL_FIELD(no_row_trigger);
WRITE_BOOL_FIELD(ybUseScanTupleInUpdate);
WRITE_BOOL_FIELD(ybHasWholeRowAttribute);
}

static void
Expand Down
2 changes: 0 additions & 2 deletions src/postgres/src/backend/nodes/readfuncs.c
Original file line number Diff line number Diff line change
Expand Up @@ -1748,8 +1748,6 @@ _readModifyTable(void)
READ_NODE_FIELD(yb_skip_entities);
READ_NODE_FIELD(yb_update_affected_entities);
READ_BOOL_FIELD(no_row_trigger);
READ_BOOL_FIELD(ybUseScanTupleInUpdate);
READ_BOOL_FIELD(ybHasWholeRowAttribute);

READ_DONE();
}
Expand Down
19 changes: 0 additions & 19 deletions src/postgres/src/backend/optimizer/plan/createplan.c
Original file line number Diff line number Diff line change
Expand Up @@ -9100,25 +9100,6 @@ make_modifytable(PlannerInfo *root, Plan *subplan,
node->yb_skip_entities = NULL;
node->yb_update_affected_entities = NULL;
node->no_row_trigger = false;

/*
* It suffices to use the first entry of resultRelations to compute
* ybUseScanTupleInUpdate.
* resultRelations contains more than one entry for partitioned relations
* and ybUseScanTupleInUpdate is true for all partitioned relations.
*/
Index rti = linitial_int(resultRelations);
RangeTblEntry *rte = root->simple_rte_array[rti];
Relation relation = RelationIdGetRelation(rte->relid);
Bitmapset *updatedCols = bms_add_members(
get_dependent_generated_columns(root, rti, rte->updatedCols),
rte->updatedCols);
node->ybUseScanTupleInUpdate = YbUseScanTupleInUpdate(
relation, updatedCols, root->parse->returningList);
node->ybHasWholeRowAttribute = YbUseWholeRowJunkAttribute(
relation, updatedCols, operation, root->parse->returningList);
bms_free(updatedCols);
RelationClose(relation);
return node;
}

Expand Down
7 changes: 2 additions & 5 deletions src/postgres/src/backend/optimizer/util/appendinfo.c
Original file line number Diff line number Diff line change
Expand Up @@ -875,12 +875,9 @@ add_row_identity_columns(PlannerInfo *root, Index rtindex,
if (IsYBRelation(target_relation))
{
/*
* If there are secondary indices on the target table, or if we have a
* row-level trigger corresponding to the operations, then also return
* the whole row.
* Emit wholerow if required.
*/
if (YbUseWholeRowJunkAttribute(target_relation, target_rte->updatedCols,
commandType, root->parse->returningList))
if (YbWholeRowAttrRequired(target_relation, commandType))
{
var = makeVar(rtindex, InvalidAttrNumber, RECORDOID, -1, InvalidOid,
0);
Expand Down
Loading

0 comments on commit 28c8fe5

Please sign in to comment.