Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: node skips some epoch length txs making bridgeState corrupt #349

Merged
merged 5 commits into from
Nov 3, 2019

Conversation

troggy
Copy link
Member

@troggy troggy commented Nov 1, 2019

Requires: leapdao/leap-core#155, #348
Resolves #340
Test to reproduce with: leapdao/leap-sandbox#69

Tendermint replay protection

Tendermint has a build-in replay protection which will reject the same tx from being added to the mempool twice. It seems to work by comparing raw tx bytes. Should two transactions have the same raw bytes, the second one will not make it to the mempool.

Now, EpochLength transaction in leap-core is merely a [type,epochLength] tuple. Tx.epochLength(X) will have the same raw bytes for the same X. Consider the following pretty valid transaction flow:

1. Tx.epochLength(2)
2. Tx.epochLength(4)
3. Tx.epochLength(2)

The third transaction will be rejected by tendermint's replay protection.

How it affects leap-node

The node listens for root chain events, including EpochLength event. Everytime event comes, the node
a) keeps record of the epoch length from event (in the bridgeState.epochLengths list). bridgeState
b) creates epochLength transaction and relays it. eventsRelay
c) handles epochLength tx, once received through ABCI from tendermint. checkEpochLength

The step (c) expects all the a,b,c steps always executed for every EpochLength event. However, if the node receives two EpochLength(X) events for the same epoch length, only steps (a) and (b) will be executed. Step (c) would never happen for the second event, because the transaction created on (b) will be rejected by Tendermint's replay protection. In the end, we have bridgeState.epochLength mutated by (a), which will make checkEpochLength fail for any further epoch length tx.

Proposed solution

Add more differentiation to EpochLength transactions. I suggest adding blockHeight to it. See: leapdao/leap-core#155

These two things needs to be merged first for the fix to work: leapdao/leap-core#155, #348

@codecov
Copy link

codecov bot commented Nov 1, 2019

Codecov Report

Merging #349 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #349      +/-   ##
==========================================
+ Coverage   95.16%   95.17%   +0.01%     
==========================================
  Files          72       72              
  Lines        1096     1099       +3     
  Branches      206      206              
==========================================
+ Hits         1043     1046       +3     
  Misses         47       47              
  Partials        6        6
Impacted Files Coverage Δ
src/tx/applyTx/checkEpochLength.js 100% <100%> (ø) ⬆️
src/bridgeState.js 69.23% <100%> (+0.29%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 628c596...1063dce. Read the comment docs.

@troggy
Copy link
Member Author

troggy commented Nov 1, 2019

temporary using [email protected] which includes the fix leapdao/leap-core#155

…rupt

Tendermint has a [build-in replay protection](https://tendermint.com/docs/app-dev/app-development.html#replay-protection) which will reject the same tx from being added to the mempool twice. It seems to work by comparing raw tx bytes. Should two transactions have the same raw bytes, the second one will not make it to the mempool.

Now, EpochLength transaction in leap-core is merely a `[type,epochLength]` tuple. `Tx.epochLength(X)` will have the same raw bytes for the same `X`. Consider the following pretty valid transaction flow:

```
1. Tx.epochLength(2)
2. Tx.epochLength(4)
3. Tx.epochLength(2)
```

The third transaction will be rejected by tendermint's replay protection.

The node listens for root chain events, including `EpochLength` event. Everytime event comes, the node
a) keeps record of the epoch length from event (in the `bridgeState.epochLengths` list). [bridgeState](https://github.com/leapdao/leap-node/blob/5195438c54b7c2670e7314d102b690255b4a203b/src/bridgeState.js#L135)
b) creates `epochLength` transaction and relays it. [eventsRelay](https://github.com/leapdao/leap-node/blob/5195438c54b7c2670e7314d102b690255b4a203b/src/eventsRelay.js#L80)
c) handles `epochLength` tx, once received through ABCI from tendermint. [checkEpochLength](https://github.com/leapdao/leap-node/blob/5195438c54b7c2670e7314d102b690255b4a203b/src/tx/applyTx/checkEpochLength.js#L10)

The step (c) expects all the a,b,c steps always executed for every `EpochLength` event. However, if the node receives to `EpochLength(X)` events for the same epoch length, only steps (a) and (b) will be executed. Step (c) would never happen for the second event, because the transaction created on (b) will be rejected by Tendermint's replay protection. In the end, we have `bridgeState.epochLength` mutated by (a), which will make `checkEpochLength` fail for any further epoch length tx.

Add more differentiation to EpochLength transactions. I suggest adding blockHeight to it.
@troggy troggy force-pushed the fix/340-epoch-length branch from a1aa10f to 3158b23 Compare November 1, 2019 18:09
@sunify
Copy link
Member

sunify commented Nov 2, 2019

Holy shit. I wonder how we ain't dead yet :D

I have a feeling we should create the new tx type and rename the old one to «Tx.epochLength_legacy» or something like that. If this PR will merged it'll make sync with testnet or mainnet impossible — Tx.fromRaw will fail with malformed epoch tx. error

@troggy
Copy link
Member Author

troggy commented Nov 2, 2019

Tx.fromRaw will fail with malformed epoch tx. error

great catch, totally overlooked this.

Copy link
Member

@sunify sunify left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have almost identical (in terms of code) tx type MIN_GAS_PRICE and It affected by the same problem. We can create the new issue or fix here — up to you

const { epochLength } = event.returnValues;
const tx = Tx.epochLength(Number(epochLength));
const tx = Tx.epochLength(Number(epochLength), Number(event.blockNumber));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can save blockNumber in bridgeState.epochLengths as well and use it for tx validation 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get the idea. You propose to use these blockNumbers here when creating epochLength tx ?

Copy link
Member

@sunify sunify Nov 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In bridgeState we store all epochLength changes. We can add blockNumber into this array ([epochLength, blockNumber][]) and use it in this check

Btw, do not forget to update check anyway — it should handle two tx types (it handles zero currently :-))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm.. I've updated the check.

I don't see why we need to store blockNumber in bridgeState — we can just take it from the transaction (like I do in my code). If there is no blockNumber in epochLength, chance are that checkEpochLength won't be called at all, exactly the issue we are fixing here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or you mean, we can compare height from bridgeState with the one from tx ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes :-)

@troggy
Copy link
Member Author

troggy commented Nov 2, 2019

We have almost identical (in terms of code) tx type MIN_GAS_PRICE and It affected by the same problem. We can create the new issue or fix here — up to you

👍
Opened as issue for that: leapdao/leap-core#156

troggy added a commit to leapdao/leap-core that referenced this pull request Nov 2, 2019
* fix: add blockHeight to EpochLength tx to make them unique.

See leapdao/leap-node#349 for rationale

* 0.36.2-beta

* add new tx type EPOCH_LENGTH_V2.

Use it for epoch length txs with block height, keep the EPOCH_LENGTH type as it is to support
legacy transactions (so we can replay the old networks)

* 0.37.0-beta.1

* Tx.epochLength: make params mandatory

* 0.37.0-beta.2
@troggy
Copy link
Member Author

troggy commented Nov 2, 2019

If this PR will merged it'll make sync with testnet or mainnet impossible — Tx.fromRaw will fail with malformed epoch tx. error

addressed with new tx type. I've quickly checked with testnet sync — the sync goes beyond the height 4, no error.

@troggy troggy requested a review from sunify November 2, 2019 14:34
@sunify sunify force-pushed the fix/340-epoch-length branch from 1376ef2 to 1063dce Compare November 3, 2019 03:24
Copy link
Member

@sunify sunify left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job!

image

(i've already merged integration-tests to run it here)

@sunify sunify merged commit 1a0173c into master Nov 3, 2019
@sunify sunify deleted the fix/340-epoch-length branch November 3, 2019 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot increase epoch length
2 participants