-
Notifications
You must be signed in to change notification settings - Fork 93
Implementation of Enclave Key Refresh #358
Conversation
9654e0a
to
2c7d403
Compare
tc/sgx/trusted_worker_manager/enclave_untrusted/enclave_bridge/signup.cpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved after you make corrections I suggested.
672f9e3
to
292c2c1
Compare
PROCESSING = 7 | ||
BUSY = 8 | ||
UNKNOWN_ERROR = 9 | ||
WORKER_KEY_REFRESHED = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WORKER_KEY_REFRESHED -> WORKER_ENCRYPTION_KEY_REFRESHED
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
enclave_manager.enclave_id).encode("UTF-8") | ||
# Calculate sha256 of worker id to get 32 bytes. The TC spec proxy | ||
# model contracts expect byte32. Then take a hexdigest for hex str. | ||
worker_id = hashlib.sha256(worker_id).hexdigest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create a function generate_worker_id() and call it at multiple places instead of duplicating this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
enclave_manager.process_work_orders(kv_helper) | ||
logger.info("Enclave manager sleeping for %d secs", sleep_interval) | ||
time.sleep(sleep_interval) | ||
key_refresh_time = key_refresh_time + sleep_interval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decouple key refresh from work order execution. You could use asyncio coroutine to create key refresh as a separate task. That's will also avoid using sleep interval which is used for work order polling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will explore asyncio and use it to do key refresh
logger.error("\n Unable to get worker \n") | ||
sys.exit(-1) | ||
|
||
_start_client(config, options, jrpc_req_id, worker_id, worker_registry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any specific reason why prepare worker is done outside start_client ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved worker prepare code to start_client function.
// to do this; for now, the constant will be good enough. | ||
static const size_t cMaxSealedDataSize = 8192; | ||
static const size_t cMaxPublicDataSize = 4096; | ||
//static const size_t cMaxPublicDataSize = 4096; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove commented line if it's not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
//static const size_t cMaxPublicDataSize = 4096; | ||
|
||
// size of RSA key signature is 2048 | ||
static const size_t cMaxPublicDataSize = 6144; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did we arrive at 6144 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you need a larger buffer because of base64, the formula is
(bytes * 4 / 3) + 2 +1
The +2 is because of the 0 to 2 "=" padding bytes. This can be computed exactly if needed.
The +1 is if there is a NUL terminator byte
So I come up with 2048*4/3 +2 + 1 = 2733
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added 2048 bytes(to account encryptionKeySignature) to cMaxPublicDataSize(before 4096 bytes) which sum up to 6144.
the base64 encoded version of the signature was less than 2048, hence i chose to add 2048 to the cMaxPublicDataSize buffer size.
@dan yes we can allot max buffer size equals to 2733 bytes as you suggested. thanks for the suggestion.
[out] size_t* outSealedEnclaveDataSize, | ||
[out] sgx_report_t* outEnclaveReport | ||
); | ||
public tcf_err_t ecall_RefreshEnclaveKey( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ecall_RefreshEnclaveKey -> ecall_RefreshWorkerEncryptionKey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Log(TCF_LOG_ERROR, "error::DecryptMessage - %d - %s", \ | ||
e.error_code(), e.what()); | ||
tcf::error::ThrowIf<tcf::error::KeyRefreshError>( | ||
true == true, "Failed to decrypt encrypted session key. " \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use throw tcf::error::KeyRefreshError(msg) instead of tcf::error::ThrowIf(true==true)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} // tcf::enclave_api::base::CreateSignupData | ||
|
||
// XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | ||
tcf_err_t tcf::enclave_api::enclave_data::RefreshEnclaveKey( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RefreshEnclaveKey -> RefreshWorkerEncryptionKey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
result = TCF_ERR_UNKNOWN; | ||
} catch (...) { | ||
tcf::enclave_api::base::SetLastError("Unexpected exception in (RefreshEnclaveKey)"); | ||
result = TCF_ERR_UNKNOWN; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RefreshEnclaveKey -> RefreshWorkerEncryptionKey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
f34bf46
to
efa7b39
Compare
"Setting sleep interval to 10 seconds: %s", str(err)) | ||
sleep_interval = 10 | ||
while True: | ||
await lock.acquire() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment why we need a lock and why we need to synchronize between key refresh and work order processing if its required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add comment to explain the need to synchronization using lock
res["error"]["code"] == \ | ||
WorkOrderStatus.WORKER_ENCRYPT_KEY_REFRESHED: | ||
logger.error("Worker Key refreshed. Retrieving latest Worker details") | ||
_start_client(config, options, jrpc_req_id, worker_id, worker_registry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So _start_client()
is being called recursively here ? If so, please modify the logs. The logs is only mentioning about retrieving latest worker details. _start_client()
also resubmits the work order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not called recursively. Only if work order execution fails due to worker key refresh during the submission of work order, the error scenario is handle by retrieving latest worker details(after key refresh) and submit the work order again.
e6be4ea
to
4e27e62
Compare
enclave manager service | ||
""" | ||
try: | ||
sleep_interval = int(config["EnclaveManager"]["sleep_interval"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to integrate the change for synchronous processing via ZMQ instead of polling based on the config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will rebase my PR.
This feature initiates refresh of worker encryption key pair based on number of work orders processed in case of Singleton worker or number of pre-processed work orders in case of KME worker. A new pair of encryption key is generated in the enclave and the updated enclave signup details are stored in the KvStorage in workers table. Worker encryption key signature is re-computed when encryption key gets refreshed. When a worker key gets refreshed during the work order submission, a specific error code is returned to client to indicate worker key refresh. On receiving this error code, client retrieves the updated worker details and does work order submission again. Signed-off-by: manju956 <[email protected]>
4e27e62
to
1df89b3
Compare
This pull request introduces 5 alerts when merging 1df89b3 into ed424e7 - view on LGTM.com new alerts:
|
A new updated PR is raised for worker key refresh policy implementation #671. Hence closing this PR |
This feature initiates refresh of enclave encryption key pair based on timer value set in the config file. A new pair of encryption key is generated in the enclave and the updated enclave signup details are stored in the KvStorage in workers table.
Additioanlly, encryption key signature is also generated during enclave initialization and its made part of PublicEnclaveData. This encryption key signature is re-computed when encryption key gets refreshed.
When a worker key gets refreshed during the work order submission, a specific error code is returned to client to indicate worker key refresh happened at the enclave. On receiving this error code, client retrieves the updated worker details and does work order submission again.