-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] Confidentiality Analysis #10
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
# Automated Sensitivity Analysis | ||
|
||
| Status | Draft | | ||
:-------------- |:---------------------------------------------------- | | ||
| **Author(s)** | Morten Dahl ([email protected]) | | ||
| **Sponsor** | | | ||
| **Updated** | 2019-10-23 | | ||
|
||
## Objective | ||
|
||
In this proposal we outline an approach for automating the security analysis of TF Encrypted (TFE) computations based on the sensitivity property of values. | ||
|
||
Our approach takes the form of an effect type system and can be used at both compile and runtime time, the former allowing to catch errors even in computations where type erasure is applied before execution (such as when encrypted computation are compiled down to e.g. raw TensorFlow graphs). | ||
|
||
We add tools and syntactic constructs to the `tfe.analysis` module. | ||
|
||
## Motivation | ||
|
||
Assuming perfect encryption, it can still be hard for users to keep track of who gets to see which values in a computation, i.e. whether the desired security policy is in fact enforced. This proposal introduces a formal framework to reason about security policies as well as tools to help automate the analysis process, thereby helping users' confidence. | ||
|
||
## Design Proposal | ||
|
||
To be expanded: | ||
|
||
- Type system based on the built-in tensor types and their sensitivity property. | ||
- An error is raised if a plaintext tensor is ever found on a player that is *not* in its sensitivity set; this can be checked at compile time and, optionally, at runtime. | ||
- Subtyping allows for implicitly restricting sensitivity by removing players from the set: `T(S) <: T'(S')` if `S'` is subset of `S`. | ||
- `tfe.analysis.broaden` must be used to broaden sensitivity by adding players to the set: `broaden_S(x) : T(S union S')` when `x: T(S')`; this makes it syntactically clear to the user where extra attention must be paid; no-op used by the type system, similar to type hints. | ||
- `tfe.Tensor.with_sensitivity({})` is top, `tfe.Tensor.with_sensitivity(None)` is bottom; note that `None` here means the set of all players and hence `None != {}`. | ||
- When encrypting a tensor the sensitivity of the encrypted tensors is copied from the plaintext tensor; likewise when decrypting. | ||
- When combining tensors the sensitivity must match (after a potential application of subtyping). | ||
|
||
### Secure Aggregation Example | ||
|
||
For secure aggregation for federated learning we obtain: | ||
|
||
0) Model weights with type `PlaintextTensor({mo})`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here's an example where my understanding was different, related to above. Any tensor I'm sure there are benefits to the approach you specify here over the one I'm describing -- what did you have in mind? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Something like this will happen at runtime when the policy is being checked, but my thoughts were that we need something to check against, ie a way to express expectations.
Your intuition is that it is safe to share the weights? The idea is that when a player creates a tensor we start out by assuming that it is a very sensitive value; if that is not the case then it needs to be specified one way or another. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Agreed. I was expecting that instantiating a tensor
In the secure aggregation example, it seems necessary for the policy to broaden the sensitivity of the weights to at least include the data owners/clients.
This makes sense to me, it's a good default in the absence of what I'm describing here and in the other thread. I suppose I'd assumed the existence of that API in my comment above, so will focus attention on that thread. |
||
1) Model weights with type `PlaintextTensor(None)` after application of `broaden(None)`; this expresses the policy that it is okay for anyone to learn the model weights. | ||
2) Local data with type `PlaintextTensor({do1})`, ..., `PlaintextTensor({doN})`. | ||
3) Local gradients with type `PlaintextTensor({do1})`, ..., `PlaintextTensor({doN})` (after subtyping the weights). | ||
4) Local encryptions with type `EncryptedTensor({do1})`, ..., `EncryptedTensor({doN})`. | ||
5) Central encryptions with type `EncryptedTensor({})`, ..., `EncryptedTensor({})` (after subtyping). | ||
6) Central encryption of aggregation with type `EncryptedTensor({})`. | ||
7) Central encryption of aggregation with type `EncryptedTensor({mo})` after application of `broaden({mo})`; this expresses the policy that it is okay for the model owner to learn aggregated gradients. | ||
8) Plaintext aggregated gradient of type `PlaintextTensor({mo})` on the model owner. | ||
|
||
Note that there is no check for transitive broadening in the sense that the aggregated gradients are first broadened to `{mo}` and then indirectly to `None` later as part of the updated model weights. Likewise, we are not capturing who an malicious player might share values with. Instead, this analysis is only meant to help users catch bugs in their programming. | ||
|
||
### Private Prediction Example | ||
|
||
In this case the model owner is okay to share the weights with the compute servers in plaintext, but not to the prediction client (in which case the computation could just happen locally). | ||
|
||
1) Model weights with type `PlaintextTensor({mo})`; broadened to `PlaintextTensor({mo, s0, s1})` to indicate security policy; note that sending these to the prediction client would hence raise an error. | ||
2) Prediction input with type `PlaintextTensor({pc})`, encrypted to `EncryptedTensor({pc})` on the prediction client. | ||
3) Central computation with inputs of type `PlaintextTensor({})` and `EncryptedTensor({})` after subtyping, and result of type `EncryptedTensor({})`. | ||
4) `broaden({pc})` is applied to result, indicating that it's okay to release result back to prediction client (but no one else), and obtaining type `EncryptedTensor({pc})`. | ||
5) Prediction client decrypts to obtain `PlaintextTensor({pc})`. | ||
|
||
## Detailed Design | ||
|
||
## Questions and Discussion Topics | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems to be defined over abstract notions of Plaintext & Encrypted -- does this mean that sensitivity would apply to e.g. an AdditivelySharedTensor as well as the component shares inside the AdditivelySharedTensor? Or would it just be at the higher level? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Concrete encrypted tensors would be inherited sensitivity from the abstract EncryptedTensor. Did not image that component/backing tensors would have their own sensitivity, although they would probably have their own placement. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In this case, what qualifies these concrete encrypted tensors being in violation of their sensitivity set? It seems like it would have to be semantically different from what it means for plaintext tensors. Encrypted might be something like "is never decrypted by a player outside of the sensitivity set" vs. plaintext might be something like "is never possessed by a player outside the sensitivity set" -- is this correct and intentional? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, I think this accounts for the mismatch I describe above -- I was only thinking about sensitivity in the context of the plaintext description "is never possessed by a player outside the sensitivity set" and was thinking that the backing/component tensors would have their own sensitivity, in which case passing them through specific kernels might have more well defined effects on sensitivity sets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do see some benefits of this level of transparency, but I'm not sure this dynamic casting of the sensitivity fits my mental model. Here's how I think of it:
broaden
ing is unnecessary because transformations of sensitivity sets are implicitly defined by each node (i.e. operation) in the computation graph. This is not to forbid the user from doing so, but just discourages it.Put another way, should the sensitivity set be a property of each tensor type, or each tensor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we're saying the same thing: I imagine that each tensor instance (and not each tensor type) has its own sensitivity, i.e.
sensitivity
is an instance member, and depends e.g. on where the tensor was created.I was thinking that most operations do/should not change sensitivity. For high level functionalities (such as secure aggregation) the broadening could be an internal step that doesn't require any additional broadening by the user of these.
Any more thoughts on this?
Arbitrary broadening is an important part of the policy, say the fact that aggregation is enough to release otherwise sensitive values. As mentioned above, this policy can be baked into high level functionalities, but for general computations I don't see how we can know upfront what policy the user wants (besides the default of copying).
My thoughts are each tensor. Are we saying the same thing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe so -- the mention of "subtyping" was what confused me I think, can you clarify what you mean by that?
I've just noticed that in e.g. DP there is an allowable query set that maintains the DP bound, and suspect this might mean that operations can be similarly categorized for the kind of sensitivity we're considering here (although perhaps as you say that doesn't account for all the operations we'd want to be able to check).
Where is this? I'm not sure I see it in the doc -- are you referring to the examples below?