Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue#622: change usize to u64 #641

Closed
wants to merge 18 commits into from

Conversation

steven004
Copy link
Contributor

Mainly focus on BytesAmount related usize type to u64 to support >4G sector_size, even on a 32-bit system.

This is not a comprehensive change at this moment.

There are many other places to consider, especially code in fr32.rs I did not touch. I want to only do limited change in one commit, to avoid introduce unexpected issues.
Next, based on comments from review, more changes could be done.

- add sectorbuilder implementation and interface links
- fix verifier links
- remove sectorbase interfaces since they are not needed at this moment.
sync the latest from original - 0416
There are some reasons here: 
1) the original definition of sector size in sector-base\src\api\disk_backed_storage.rs is u64 type;
2) the usize type for a 32-bit system, could only support < 1<<32 sector size, in this case, it will make the system does not work for the seal function (overflow). 
3) There are many place actually it will convert usize to u64, why do not use u64 at first?
@@ -251,7 +245,7 @@ fn setup_params(
"sector_bytes ({}) must be a multiple of 32",
sector_bytes,
);
let nodes = sector_bytes / 32;
let nodes = (sector_bytes / 32) as usize;
Copy link
Collaborator

@porcuquine porcuquine May 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With large enough sectors (> 128GiB), nodes will exceed u32 too.

Copy link
Contributor Author

@steven004 steven004 May 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not only nodes, and perhaps sector counts as well in some other files (when a big repo > 1EB, the sector number could > 4G), I would think it is better to do incremental changes in multiple commits, and make sure every step is correct.

How do you think about this? @porcuquine

@@ -103,7 +103,7 @@ impl SectorManager for DiskManager {
file.seek(SeekFrom::Start(start_offset))
.map_err(|err| SectorManagerErr::CallerError(format!("{:?}", err)))?;

let mut buf = vec![0; usize::from(num_bytes)];
let mut buf = vec![0; (u64::from(num_bytes)) as usize];
Copy link
Collaborator

@porcuquine porcuquine May 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing something, but I think that if usize is u32 (on a 32-bit system), then this will truncate so isn't any better than converting UnpaddedBytesAmount to usize would have been.

In other words, I think this is fundamentally a matter of us wanting to address more than 32-bits worth of memory; and trying to get around it only indirects the issue.

It seems to me that the only simple way to avoid this is to avoid the situation. That's probably done most gracefully by putting a bounds check into the (removed above) conversion to usize. I imagine there's some clever way we could get around all this, but I suspect it would involve jumping through a lot of hoops — and at the end of the day, we'd still never want to try replicate too-large sectors on a 32-bit machine.

I'm starting to think the main solution to this is to never create sectors larger than 4GiB (max) if we are on a 32-bit system. We could, and perhaps should, check that when the SectorBuilder is requested in the first place. @dignifiedquire @laser

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of places I remain there for future decision. Originally, I put an TODO comment there in the code to reflect that as you mentioned, but, not sure why it could not pass the fmt check, then deleted it in the commit b7818d0

Copy link
Collaborator

@porcuquine porcuquine May 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's how I see it:

Code like the above, where we are creating/indexing vectors/slices must use usize — as in your cast above. This is enforced by the compiler (for good reason).

For now, there is no plan to try to support replication with less RAM than required to hold a sector — so even if we can incrementally remove some instances of allocation/reference that require 64 bits, we still won't have changed the fundamental fact that the system can't handle the situation.

I think we should defer rippling partial changes which create the illusion that we can address sectors larger than the architecture will allow. Instead, we should use usize internally in any place that may eventually propagate to an index variable or vector size.

If we really want to expose u64 as the official type, we can either:

  • Always fail on 32-bit machines (seems like a bad choice).
  • Perform a bounds check and fail on 32-bit machines when attempting to create a SectorBuilder handling sectors of > 4GiB.

Given the second option, we could also just use usize as the externally visible type (and use the appropriate FFI equivalent) — so that callers cannot pass values we won't be able to handle. That pushes the failure completely outside the FPS and 'makes illegal states unrepresentable'.

Copy link
Contributor Author

@steven004 steven004 May 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we really want to expose u64 as the official type, we can either:

  • Always fail on 32-bit machines (seems like a bad choice).
  • Perform a bounds check and fail on 32-bit machines when attempting to create a SectorBuilder handling sectors of > 4GiB.

Agreed, we could add the boundary check in places where u64 to usize functions for 32-bit system. Do you prefer we put it into this commit, or we could do it in the next commit?

I am thinking the best way is to ask compiler to do this, i.e. we have the code in place, and insert some compiler indicators to let the compiler decide if the code should be in or not in the binary depending on the system type (32-bit or 64-bit). I am not that familiar to add this kind of compiler indicator in rust, anyone has any suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the second option, we could also just use usize as the externally visible type (and use the appropriate FFI equivalent) — so that callers cannot pass values we won't be able to handle. That pushes the failure completely outside the FPS and 'makes illegal states unrepresentable'.

Interesting. Should a mining rig operator who is running a 32-bit system be able to verify a PoSt or PoRep proof generated for sectors with size > 4GiB (by some other miner)? Both verify_post and verify_seal FFI-exposed functions have a sector_size: u64 parameter.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. There's no reason why 32-bit systems should be unable to verify large sectors. As long as sectors are not greater than 128GiB, we should still be able to represent nodes in 32 bits. So we don't need to propagate a change into graph, etc. We should (I think) coerce to usize as early as possible and put in a check at that point, failing (for the proving path) if this will overflow usize.

If and when we need to verify sectors over 128GiB, we will need to revisit this. I suggest we code this in such a way that we can't silently fail to do so in that event.

@steven004
Copy link
Contributor Author

@laser @porcuquine Could you please make a call if this could be merged, or any specific things need to do before merge, or after merge?

I am afraid there will be more and more changes happen on master branch, and the merge will be difficult if keep it pending too long.

@porcuquine
Copy link
Collaborator

@laser @porcuquine Could you please make a call if this could be merged, or any specific things need to do before merge, or after merge?

I am afraid there will be more and more changes happen on master branch, and the merge will be difficult if keep it pending too long.

Yes, @steven004, sorry this has dragged out. We've decided to follow the plan outlined in comments above:

  • Use usize everywhere internally.
  • Use u64 for API calls across the FFI border.
  • Convert from u64 to usize in order to generate parameters required for both proving and verification.
  • Fail with error if API client tries to replicate a too-large sector on a 32-bit system. i.e. bounds-check at conversion time, as early as possible.
  • Note that because we only need to convert the number of 32-byte nodes to usize in order to verify, this will let us support up to 128GiB sectors on 32-bit systems (verification only). This is enough for now.

@porcuquine
Copy link
Collaborator

@steven004
Copy link
Contributor Author

Sounds good. Thanks.

@dignifiedquire
Copy link
Contributor

Closing in favor of #652

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants