Implement an utility for checking secrets leaks in logs, use it a CI stage at PR level #6191

igor-sirotin · 2024-12-10T23:34:27Z

Functional tests provide some secrets to status-backend through API (e.g. #6144).

When tearing down the test, it could automatically scan the geth.log for any leaks of such data. As it knows the values that should not have leaked. This includes:

any provider tokens and other secrets
any privacy-discoulsing data (display name, images, keys etc)

We can also expose this approach to desktop and mobile tests.

The text was updated successfully, but these errors were encountered:

igor-sirotin · 2024-12-10T23:35:02Z

cc @antdanchenko @anastasiyaig @churik

igor-sirotin · 2024-12-13T16:18:46Z

We discussed that this should probably be done as a separate CI stage, depending on how long it takes.

Making it as a separate app adds some simplicity and some difficulties:

Keeping it external from tests, we can reuse the same utility for unit, functional and e2e tests, even with desktop and mobile
Yet we need to pass not just log files, but also list of secrets for each log. Because e.g. we also want to check for display names, which can be generated randomly in tests.

Though it would be nice to detect such issues locally as well, just while running tests.

igor-sirotin · 2024-12-13T16:19:29Z

Another thought: we can even generate some fake tokens in tests, we don't always need to use real secrets.

igor-sirotin · 2024-12-13T16:22:28Z

Also, this utility could be used by developers (or users) who wish to share their logs on Github.
Before uploading, they can run the utility locally to check for known display name, local paths and other things.

This is a bit of a rare case, but still a benefit 🙂

ilmotta · 2024-12-13T17:28:21Z

Slightly off topic, but we also need to eventually define what is considered private data circulating in logs. A wallet account address I consider private data, as I wouldn't want that to be exposed in plain text logs.

One idea is to leverage the Go type system as you and I talked about @igor-sirotin. For instance, the common.Address type is very general, but we could have a specialized address type which would always be redacted, unless the log level was debug. In the app, we can also request user approval/confirmation that some private data will leak when they enable debug level logs.

Once we identify what is considered private data we can analyse which data types in status-go should be wrapped/modified to be auto-redacted in logs (either always or above a certain log level like the previous example about common.Address).

igor-sirotin · 2024-12-13T17:49:41Z

@ilmotta 💯
We just got the SensitiveString type, which I'm currently integrating:

status-go/internal/security/sensitive_string.go

Lines 9 to 16 in ef177c1

    
           // SensitiveString is a type for handling sensitive information securely. 
        
           // This helps to achieve the following goals: 
        
           //  1. Prevent accidental logging of sensitive information. 
        
           //  2. Provide controlled visibility (e.g., redacted output for String() or MarshalJSON()). 
        
           //  3. Enable controlled access to the sensitive value when needed. 
        
           type SensitiveString struct { 
        
           	value string 
        
           }

And we can surely make more specialized types later as well 🙌

igor-sirotin · 2024-12-17T23:30:29Z

Well, I just realised that the "utility" is:

cat geth.log | grep <my-secret>

😂

igor-sirotin · 2024-12-20T13:35:17Z

I've been discovering this a bit. It is very simple to implement, but we need to agree on the requirements.
Will write my thoughts below.

Goal

Prevent our secrets leaks
Prevent user's secrets leaks

Use cases

Run on CI after tests
Run locally before submitting logs with bug report (can be regular users)

What to look for?

Secrets - discussed as part o the goal No critical leaks of data
- Passwords
- Private keys
- Seed phrases
- RPC provider tokens/credentials
- Messages contents
- ...
Privacy-exposing data - discussed as part o the goal No private data leaks
- Profile info: display name, ens name, image, public key, etc.
- Wallet info: addresses, transactions, etc.
- Community info: community id,
- IP Addresses
- Local file paths
- Timezones

What exactly to report?

Having as much info as possible is best for faster reaction. At the same time, if we run this on CI (which logs are public), we might want to be careful about what this utility will report. I see several levels of redacting data:

report only a boolean flag "something was/wasn't found";
report lines of logs which contain a leak, but don't say which secret;
report lines of logs and the leaked secrets;
make it optional, to print or to not.

I feel that I am overthinking this. Because if the secret was leaked, it doesn't really matter if we print it one more time with the utility, right? 😄 It will only make it a bit harder for potential attacker to find.

And of course the real solution is not to use any real secrets on CI.
But I see this is just the protection of potential future mistake.

A not about implementation. If we implement our own utility (not just shell script around grep), we can parse the logs and actually report which code line caused the leak. We have this info in the logs.

igor-sirotin · 2024-12-20T13:35:28Z

Idea: redact secrets before writing the logs

One thing we could do: intercept the zap logger and simply redact any known secrets before even writing them to the log.

This simplifies things, as at runtime we have access to all of the given secrets, we just need to register certain values as secret ones in the logger. If this is a separate utility, we'd need to provide

This would be very robust, but might badly affect the performance.
If this sounds interesting, I can explore and make some benchmarks.

wdyt? cc @status-im/status-go-guild @ilmotta

ilmotta · 2024-12-20T21:30:17Z

Idea: redact secrets before writing the logs

I lean more on this idea @igor-sirotin because it seems to be the most reliable one. Have we considered implementing the Stringer interface on types that should have some of their fields redacted? zap supports that out of the box, just checked now from a unit test in status-go. We could also use a custom zap encoder sometimes, but that's where I think performance may be a problem as you said already.

igor-sirotin · 2025-01-06T11:32:14Z

@ilmotta

Have we considered implementing the Stringer interface on types that should have some of their fields redacted?

This should be done for sure 👍 But it's quite easy to make a mistake here: forget to implement the method, accidentally use the String() method directly, or smth like that.

On the other side, processing all strings before writing to file should be more robust. The only thing we need not to forget is to tell the logger about the new secret string.

We could also use a custom zap encoder sometimes, but that's where I think performance may be a problem as you said already.

We already have a custom core, so I think we just need to benchmark and see how bad it is. Might be very bad indeed, but I won't try to estimate without measuring 😄

ilmotta · 2025-01-06T18:44:27Z

Replying to your comment #6191 (comment) @igor-sirotin:

This should be done for sure 👍 But it's quite easy to make a mistake here: forget to implement the method, accidentally use the String() method directly, or smth like that.

Depends on the data and our guidelines. For example, if we establish that all address types shouldn't be primitive strings and that they should implement Stringer, then I think it's doable to keep the code correct (for new code at least). A custom linter could go a long way in helping us detect primitive strings used incorrectly (say fields containing the substring "address"). This could be enough to prevent most mistakes, and then code reviews do the rest.

For the critical data (e.g. API keys), I'd imagine we don't have that many and it would be reasonable to wrap in custom types and keep them in check in reviews since they don't change very often.

Probably a combination of both approaches can be the best long-term, as you already pointed out. I have to agree with you, applying some heuristics to redact on the final strings would be easier. Not so sure about the safety aspect because I imagine it would be more reliable for us to audit the code to find places not implementing the expected types than it is to hunt strings from the final output (which depend on runtime analysis and vary based on data). But the more I think about the idea of tiny types to get compile-time protection the more I think it's too much for us in status-go to handle, at least for now 🤷🏼

On the topic of benchmarks, for mobile it's complicated because there's a wide range of slow devices that could be using the Status app and they are sensitive to things we take for granted are harmless. That's why we assume on mobile, based on real experience, that the app is already quite slow for some devices. On the desktop client it's more likely to be fine whatever we do here.

I was expecting the recent audit to point out a direction about redaction of secret and private values. I see now it's up to us to identify the criticality of different types of data and how/if they should be logged/reported.

igor-sirotin added functional tests goal:status-go:P0:no-critical-leaks goal:status-go:P1:no-leaks-in-logs labels Dec 10, 2024

igor-sirotin added this to Status Desktop/Mobile Board Dec 10, 2024

antdanchenko self-assigned this Dec 11, 2024

igor-sirotin changed the title ~~Add automatic check for leaks in logs after each functional test case~~ Implement an utility for checking secrets leaks in logs, use it a CI stage at PR level Dec 13, 2024

igor-sirotin mentioned this issue Dec 13, 2024

Set the default log level to error instead of disabling it for the release build. status-im/status-mobile#21538

Open

iurimatias added this to the 2.33.0 milestone Dec 16, 2024

igor-sirotin unassigned antdanchenko Jan 6, 2025

igor-sirotin removed the functional tests label Jan 6, 2025

igor-sirotin moved this to Next in Status Desktop/Mobile Board Jan 6, 2025

igor-sirotin self-assigned this Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement an utility for checking secrets leaks in logs, use it a CI stage at PR level #6191

Implement an utility for checking secrets leaks in logs, use it a CI stage at PR level #6191

igor-sirotin commented Dec 10, 2024 •

edited

Loading

igor-sirotin commented Dec 10, 2024

igor-sirotin commented Dec 13, 2024

igor-sirotin commented Dec 13, 2024

igor-sirotin commented Dec 13, 2024

ilmotta commented Dec 13, 2024

igor-sirotin commented Dec 13, 2024 •

edited

Loading

igor-sirotin commented Dec 17, 2024

igor-sirotin commented Dec 20, 2024 •

edited

Loading

igor-sirotin commented Dec 20, 2024

ilmotta commented Dec 20, 2024

igor-sirotin commented Jan 6, 2025

ilmotta commented Jan 6, 2025

Implement an utility for checking secrets leaks in logs, use it a CI stage at PR level #6191

Implement an utility for checking secrets leaks in logs, use it a CI stage at PR level #6191

Comments

igor-sirotin commented Dec 10, 2024 • edited Loading

igor-sirotin commented Dec 10, 2024

igor-sirotin commented Dec 13, 2024

igor-sirotin commented Dec 13, 2024

igor-sirotin commented Dec 13, 2024

ilmotta commented Dec 13, 2024

igor-sirotin commented Dec 13, 2024 • edited Loading

igor-sirotin commented Dec 17, 2024

igor-sirotin commented Dec 20, 2024 • edited Loading

Goal

Use cases

What to look for?

What exactly to report?

igor-sirotin commented Dec 20, 2024

Idea: redact secrets before writing the logs

ilmotta commented Dec 20, 2024

igor-sirotin commented Jan 6, 2025

ilmotta commented Jan 6, 2025

igor-sirotin commented Dec 10, 2024 •

edited

Loading

igor-sirotin commented Dec 13, 2024 •

edited

Loading

igor-sirotin commented Dec 20, 2024 •

edited

Loading