-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Note that highest compression levels are not useful in cloud storage #182
Comments
Two important points:
|
Are each of these processes using more than one thread? If not, then one process=one core, right? I'm assuming the higher performance on the 96 core machine is due to more memory bandwidth or cache or something. |
They are using more than one thread to fetch and multiple to decompress. The notebooks are in the airlock, so we can go over them Monday hopefully. |
I should also add that the 96 is logical cores and only 48 physical. |
This sounds great! I'm interested to see how this can translate to processing with Cubed. Is the machine on EC2 (if so which instance type is it?) or are you reading from S3 from a machine outside AWS? Also, what compressors are you using? |
An interesting and counterintuitive observation we should make is that trying to achieve the highest possible levels of compression for call_genotype is actually pointless. From @benjeffery's experiments on S3, we can decode to RAM at ~42GiB/s using about 32 cores using the standard compressor settings which emphasise highest compression levels. This corresponds to a network throughput of about 100MiB/s, which is nowhere near what the instance is capable of. So, if we used a faster codec which had slightly lower compression levels, I think we could increase that (or at least do it with fewer cores).
The neat thing is that S3 charges by object access, not by data volume, so it costs the same to store the slightly larger chunks as the smaller ones.
The text was updated successfully, but these errors were encountered: