Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize expiry based de-duplication, dsmr #1810

Merged
merged 80 commits into from
Jan 15, 2025

Conversation

tsachiherman
Copy link
Contributor

@tsachiherman tsachiherman commented Nov 22, 2024

What ?

Integrate the generic de-deduplication logic into the dsmr

@tsachiherman tsachiherman self-assigned this Nov 22, 2024
x/dsmr/block.go Outdated Show resolved Hide resolved
x/dsmr/node.go Outdated Show resolved Hide resolved
x/dsmr/node.go Outdated Show resolved Hide resolved
x/dsmr/node.go Outdated Show resolved Hide resolved
x/dsmr/node.go Outdated Show resolved Hide resolved
@tsachiherman tsachiherman marked this pull request as ready for review November 25, 2024 18:20
x/dsmr/node.go Outdated Show resolved Hide resolved
x/dsmr/node_test.go Outdated Show resolved Hide resolved
x/dsmr/node_test.go Outdated Show resolved Hide resolved
x/dsmr/node_test.go Outdated Show resolved Hide resolved
@tsachiherman
Copy link
Contributor Author

tsachiherman commented Nov 25, 2024 via email

Comment on lines 73 to 81
// make sure we have no repeats within the block itself.
blkTxsIDs := make(map[ids.ID]bool, len(blk.Txs()))
for _, tx := range blk.Txs() {
id := tx.GetID()
if _, has := blkTxsIDs[id]; has {
return fmt.Errorf("%w: duplicate in block", ErrDuplicateContainer)
}
blkTxsIDs[id] = true
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this check twice now in chain since we previously handled this within NewExecutionBlock

We should only apply the check once, which do you think is the better place for it? The other addition within NewExecutionBlock is pre-populating the signature job. It would probably be better to remove that from the execution block and handle it in AsyncVerify. One good reason to do this is that handling this within ParseBlock can be a DoS vector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why testing this twice is an issue on it's own, and I think that I have a better solution here;
Testing it in NewExecutionBlock is not ideal from my perspective, as it should be construction step ( i.e. no errors ).
How do you feel about the following:
in ExecutionBlock, we would add the following function:

func (b *ExecutionBlock) ValidateDuplicateTransactions() error {
    if len(b.Txs) != b.txsSet.Len() {
        ErrDuplicateTx
    }
    return nil

than in validitywindow.go VerifyExpiryReplayProtection, we can just call this method:

   ...
   if err := blk.ValidateDuplicateTransactions(); err != nil {
      return err
   }
   ...

and remove the test from NewExecutionBlock

x/dsmr/block.go Outdated Show resolved Hide resolved
x/dsmr/node.go Outdated Show resolved Hide resolved
x/dsmr/node.go Outdated Show resolved Hide resolved
x/dsmr/node.go Outdated
Comment on lines 247 to 250
if block.Tmstmp <= parentBlock.Tmstmp && parentBlock.Hght > 0 {
return ErrTimestampNotMonotonicallyIncreasing
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this check from this PR?

We should enforce timestamp >= parent timestamp (should allow them to be equal in case a block builder pushes the timestamp ahead of wall clock time for some nodes.

We should not allow the case that a malicious node builds a block with a timestamp that we consider valid less than 1s ahead of our wall clock time, but still ahead of our wall clock time, such that when we build a block on top of it, we fail because the current timestamp is ahead of our local timestamp.

We should update the check applied in BuildBlock imo.

We should also never execute the genesis block, so the check for parentBlock.Hght > 0 should be removed.

if blk, has := ti.blocks[id]; has {
return blk, nil
}
return nil, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fetching a non-existing block should return an error, can we return database.ErrNotFound if the block is not available?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the ux of us using the sentinel database.ErrNotFound experience to be very awkward. This is due to the existing interface definition and not the changes in this PR though.... if we had something like GetExecutionBlock() (ExecutionBlock[T], bool, error) we wouldn't have to introspect the error and could just return an error if error is non-nil and check the bool if it just wasn't there.

@@ -69,6 +70,15 @@ func (v *TimeValidityWindow[Container]) VerifyExpiryReplayProtection(
if dup.Len() > 0 {
return fmt.Errorf("%w: duplicate in ancestry", ErrDuplicateContainer)
}
// make sure we have no repeats within the block itself.
blkTxsIDs := make(map[ids.ID]bool, len(blk.Txs()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use set.Set here?

Comment on lines 58 to 60
func (c ChunkCertificate) GetExpiry() int64 { return c.Expiry }

func (c *ChunkCertificate) GetSlot() int64 { return c.Expiry }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are duplicated - GetSlot is meant to be Expiry

x/dsmr/node.go Outdated
log logging.Logger,
tracer trace.Tracer,
chainIndex ChainIndex,
validityWindowDuration int64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use time.Duration here?

x/dsmr/node.go Outdated
Comment on lines 204 to 207
oldestAllowed := timestamp - n.validityWindowDuration
if oldestAllowed < 0 {
oldestAllowed = 0
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: min(0, oldestAllowed)

blocks map[ids.ID]validitywindow.ExecutionBlock[*ChunkCertificate]
}

func (ti *testingChainIndex) GetExecutionBlock(_ context.Context, id ids.ID) (validitywindow.ExecutionBlock[*ChunkCertificate], error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits:

  1. receiver name should just be t
  2. id -> blkID or blockID

x/dsmr/block.go Outdated
Comment on lines 127 to 131
func (b Block) GetID() ids.ID {
return b.blkID
}

func (b Block) Parent() ids.ID {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care which we pick, but we should be consistent on the naming of either Foo() or GetFoo().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to dodge this one by saying that this method is no longer needed.

x/dsmr/block.go Outdated
Comment on lines 143 to 145
func (b Block) Txs() []*ChunkCertificate {
return b.ChunkCerts
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems weird because these are not returning txs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I agree. let's discuss this as a group ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the term containers?

x/dsmr/node.go Outdated
Comment on lines 58 to 59
log logging.Logger,
tracer trace.Tracer,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: make these (logger + tracer) the first args in this fn

x/dsmr/node.go Outdated
Comment on lines 37 to 40
type (
ChainIndex = validitywindow.ChainIndex[*ChunkCertificate]
timeValidityWindow = *validitywindow.TimeValidityWindow[*ChunkCertificate]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we're doing this to avoid the caller depending on an internal package. I'm wondering if it even makes sense for validitywindow to be in internal at all... should this just be merged into dsmr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronbuchwald asked to remove this section completely.

x/dsmr/block.go Outdated
"github.com/ava-labs/hypersdk/utils"
)

const InitialChunkSize = 250 * 1024

type Tx interface {
GetID() ids.ID
GetExpiry() int64
emap.Item
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exposes the internal/emap package into the caller. I think the previous wrapping pattern where we wrapped this interface w/ a type that implemented the emap interface actually looked cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once merged, we won't need wrapping interface anymore. . unless I'm missing something ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @joshua-kim 's point is to have the internal type implemented as makes sense in this package and then wrap it with another type that converts between that structure and the required interface when we need to use it in the validity window or expiry map where we need a specific interface. Correct me if I'm wrong @joshua-kim

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

@aaronbuchwald aaronbuchwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, can we re-write the tests with the following assumptions:

  • validity window is a separate component from DSMR in practice and will be responsible for keeping itself up to date
  • therefore, it should be tested independently as unit tests for the validity window and within integration tests when used in conjunction with DSMR/chain
  • we should add test cases for DSMR package that treat the validity window as a blackbox ie. return a mocked error from VerifyExpiryReplayProtection and IsRepeat and a mocked value of duplicates from IsRepeat that indicates one of the chunks is a duplicate and should be eliminated
  • DSMR tests should not be responsible for setting the execution block index (treated as a separate component)


func int64ToID(n int64) ids.ID {
var id ids.ID
binary.LittleEndian.PutUint64(id[0:8], uint64(n))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use BigEndian as we use it throughout the rest of the repo?

return false
}

func NewExecutionBlock(parent int64, timestamp int64, height uint64, contrainers ...int64) ExecutionBlock {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we fix the typo contrainers ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constructor is fairly difficult to read in the tests as it is a sequence of similar numbers including a variadic final parameter. Could we switch the last argument to being a slice, so that it's clearer to read which numbers are the containers?

return c.Expiry
}

func NewContainer(expiry int64) Container {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this package to the validity window test file if this test package is not required externally?

Comment on lines 19 to 24
// method for returning an id of the item
func (c Container) GetID() ids.ID {
return c.ID
}

// method for returning this items timestamp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove comments that do not add any new information?

}
}

// testing structures.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove comments that do not add any new information?

x/dsmr/block.go Outdated
Comment on lines 115 to 130
func (e validityWindowBlock) Timestamp() int64 {
return e.Block.Timestamp
}

func (e validityWindowBlock) Height() uint64 {
return e.Block.Height
}

func (e validityWindowBlock) Contains(id ids.ID) bool {
return e.certs.Contains(id)
}

func (e validityWindowBlock) Parent() ids.ID {
return e.Block.ParentID
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add those functions directly to the block type? It will need to implement them as well as part of the snow refactor where it must fulfill the snow.Block type

4,
codec.Address{},
))
_, err = node.Accept(context.Background(), blk)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should never call Accept on the same block twice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. This Accept was not needed, so I've just dropped it.

Comment on lines 1009 to 1023
r.NoError(node.BuildChunk(
context.Background(),
[]dsmrtest.Tx{
{
ID: ids.GenerateTestID(),
Expiry: 4,
},
},
4,
codec.Address{},
))

blk, err := node.BuildBlock(context.Background(), node.LastAccepted, 2)
r.NoError(err)
r.NoError(node.Verify(context.Background(), node.LastAccepted, blk))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is testing what it's intended to. This should test that the validity window correctly eliminates chunks that are in the chunk pool from block building. However, we build, verify, and accept the block, which will remove the chunk from the chunk pool. Therefore, no chunks are actually "eliminated" by the validity window in this test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added coverage for that in TestNode_BuildBlock_IncludesChunks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current flow here is build -> verify -> accept and then build another block and confirm that the chunks included in the already accepted block are correctly eliminated so BuildBlock returns the expected error.

At that point, the node is in a state of having an empty chunk pool and an updated last accepted block. I guess this is just testing that BuildBlock does not build an invalid block with the same chunk that was already included. Isn't that the same as the no available chunk certs in the test above?

@tsachiherman
Copy link
Contributor Author

  • DSMR tests should not be responsible for setting the execution block index (treated as a separate component)

Definitely doable, but would require passing in the underlying TimeValidityWindow, so that we could mock the success case. Than, no indexer would be needed.

Comment on lines 73 to 81
// make sure we have no repeats within the block itself.
blkContainerIDs := set.NewSet[ids.ID](len(blk.Containers()))
for _, container := range blk.Containers() {
id := container.GetID()
if blkContainerIDs.Contains(id) {
return fmt.Errorf("%w: duplicate in block", ErrDuplicateContainer)
}
blkContainerIDs.Add(id)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes it inconsistent across tx deduplication and chunk deduplication how this is handled. Currently tx deduplication will make sure that there are no duplicate transactions included in a block when it parses or creates it, so that the validitywindow package does not need to check for duplicates within the requested block.

We should align on one way or the other. I think it's a smaller change to this PR to go with the current style for execution block of checking for duplicates when parsing/creating the block, but think either version is fine.

Copy link
Contributor Author

@tsachiherman tsachiherman Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct that it's being performed twice : one in validitywindow.VerifyExpiryReplayProtection and one in chain.NewExecutionBlock. Yet, the functionality is missing in dsmr.NewValidityWindowBlock.

This functionality is logically needed for the correctness of the replay protection, although an early testing would work just as well.

my preference would be to keep it in validitywindow so that we can claim that the entire replay-protection is encapsulated in one place.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, could we remove the check for this in ExecutionBlock in that case, so that we don't duplicate the check?

x/dsmr/block.go Outdated Show resolved Hide resolved
x/dsmr/node_test.go Outdated Show resolved Hide resolved
Comment on lines 21 to 25
Name string
Accepted []executionBlock
VerifyBlock executionBlock
OldestAllowed int64
ExpectedError error
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we unexport the field names here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an Accepted field to indicate the last block considered accepted as opposed to in processing as we do for TestValidityWindowIsRepeat below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure exactly how to do that, since we really need the included chunks to be set correctly, as it's being tested.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we do the exact same as we do below and just test VerifyExpiryReplayProtection instead of IsRepeat ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

			r := require.New(t)

			chainIndex := &testChainIndex{}
			tvw := NewTimeValidityWindow(&logging.NoLog{}, trace.Noop, chainIndex)
			r.NotNil(tvw)
			for i, blk := range test.blocks {
				if i <= int(test.accepted) {
					tvw.Accept(blk)
				}
				chainIndex.set(blk.GetID(), blk)
			}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh - now I understood what you meant. You want that the last entry in the blocks would be the one used for the VerifyExpiryReplayProtection

internal/validitywindow/validitywindow_test.go Outdated Show resolved Hide resolved
internal/validitywindow/validitywindow_test.go Outdated Show resolved Hide resolved
x/dsmr/node_test.go Outdated Show resolved Hide resolved
x/dsmr/node_test.go Outdated Show resolved Hide resolved
x/dsmr/node_test.go Outdated Show resolved Hide resolved
Comment on lines 1009 to 1023
r.NoError(node.BuildChunk(
context.Background(),
[]dsmrtest.Tx{
{
ID: ids.GenerateTestID(),
Expiry: 4,
},
},
4,
codec.Address{},
))

blk, err := node.BuildBlock(context.Background(), node.LastAccepted, 2)
r.NoError(err)
r.NoError(node.Verify(context.Background(), node.LastAccepted, blk))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current flow here is build -> verify -> accept and then build another block and confirm that the chunks included in the already accepted block are correctly eliminated so BuildBlock returns the expected error.

At that point, the node is in a state of having an empty chunk pool and an updated last accepted block. I guess this is just testing that BuildBlock does not build an invalid block with the same chunk that was already included. Isn't that the same as the no available chunk certs in the test above?

@tsachiherman
Copy link
Contributor Author

The current flow here is build -> verify -> accept and then build another block and confirm that the chunks included in the already accepted block are correctly eliminated so BuildBlock returns the expected error.

At that point, the node is in a state of having an empty chunk pool and an updated last accepted block. I guess this is just testing that BuildBlock does not build an invalid block with the same chunk that was already included. Isn't that the same as the no available chunk certs in the test above?

No, it's not exactly the same. Please note of two things:

  1. in both node.BuildBlock we called for a block with a timestamp of 3. The state of the node wasn't altered by the call since BuildBlock returned an error.
  2. after adding a new chunk and calling node.BuildBlock again, we had no error being returned.

That allows us to test that the source for the error is the absence of chunks. ( note that all the chunks in this test have expiry of 4, far beyond the block time ).

@aaronbuchwald aaronbuchwald merged commit 143eaea into main Jan 15, 2025
17 checks passed
@aaronbuchwald aaronbuchwald deleted the tsachi/refactor_validity_window3 branch January 15, 2025 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants