Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Support more SAST tools #1487

Closed
wants to merge 11 commits into from
Closed

Conversation

laurentsimon
Copy link
Contributor

@laurentsimon laurentsimon commented Jan 18, 2022

This PR addresses #1420, where we came to the conclusion that we want more granularity in the SAST check.
This PR does not contain unit tests yet, because there's a lot to discuss and I want to see if we're all OK with this changes.

WARNING: please ignore the comments and the calls to Println(): I will clean up once we have agreed everything else is fine.

This PR does look for:

  • Linter tools run on each PR, and reward 3 points if it's the case
  • Code analysis tools run on each PR, and reward 4 points if its the case
  • Supply-chain analysis tools run on each PR, and reward 3 points if it's the case. Since scorecard is the only supply-chain tool at the moment and we don't advertise PR support, we give 3 points if scorecard is enabled, even if not run on PRs.

This PR gives no additional points if a tool is declared in a workflow but not run on PRs - with the exception of scorecard above.

closes #1380
closes #1420
closes #1580

@@ -17,7 +17,6 @@ package checks
import (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore this file, it's mostly typo changes.

return nil, err
}

req, err := handler.client.NewRequest("GET", u, nil)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you file an issue and add a TODO comment here to point to it?


// Workflow represents a workflow.
type Workflow struct {
ID int64
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need discussion: pointer or not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's use pointer (and also below for *string)

Copy link
Member

@naveensrinivasan naveensrinivasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, But we need tests!

@laurentsimon
Copy link
Contributor Author

laurentsimon commented Jan 18, 2022

Great, But we need tests!

so much to discuss before I add the test. I'll add them at the end.

@laurentsimon
Copy link
Contributor Author

cc @evverx

@evverx
Copy link
Contributor

evverx commented Jan 19, 2022

Linter tools run on each PR, and reward 3 points if it's the case
Supply-chain analysis tools run on each PR, and reward 3 points if it's the case

I don't think that linters and supply-chain tools (combined) should outweigh SAST tools like LGTM, CodeQL or Coverity so it seems to me that they should be downgraded to 1 or something like that. I mean, systemd for example uses superlinter and it finds issues from time to time but I'm not sure it would be fair to say that in terms of security it's almost as important as LGTM or Coverity. It wouldn't be fair if projects using only LGTM received 4 while projects using only superlinter received 3 either.

I'll try to take a closer look later. Thanks!

@evverx
Copy link
Contributor

evverx commented Jan 19, 2022

This PR gives no additional points if a tool is declared in a workflow but not run on PRs

It's not always feasible to run SAST tools on every PR but if they are run daily or weekly it's still useful and should be rewarded I think. For example it takes CodeQL about 40 minutes to analyze systemd so it doesn't make much sense to clog its GHActions pool on every PR but it's still run daily

@evverx
Copy link
Contributor

evverx commented Jan 19, 2022

systemd runs LGTM and superlinter on every PR, CodeQL daily and Coverity Scan daily and with this PR applied its score went from 10 to 6. I'm not sure it's what it should get :-)

@evverx
Copy link
Contributor

evverx commented Jan 19, 2022

FWIW The curl score went from 7 to 0 with this PR applied.

Anyway, on the whole I think I like the idea of putting various tools into separate groups and assigning different "scores" to them. Though the way those "scores" are assigned depending on the group, frequency and so on needs tweaking I think. Thanks!

@azeemshaikh38
Copy link
Contributor

Will take a closer look at this tomo. Overall LGTM.

@laurentsimon
Copy link
Contributor Author

Thanks for the feedback @evverx I totally agree we need tweak the score; and also that tools that run on commits/schedules should be rewarded. Maybe give 70% of the points if run on commits vs PRs. Feel free to suggest.

@laurentsimon
Copy link
Contributor Author

laurentsimon commented Jan 19, 2022

systemd runs LGTM and superlinter on every PR, CodeQL daily and Coverity Scan daily and with this PR applied its score went from 10 to 6. I'm not sure it's what it should get :-)

you'd actually get a 10 if you install scorecard as an action. It suffices for one linter and one static code analysis tool (LGTM in your case) to be run on all PRs + scorecard installed to get 10 - we don't advertise scorecard on PR because it requires a (read) PAT to be made available on PRs atm. That said, I don't disagree that we need to reward tools run on commit or on schedules.

@evverx
Copy link
Contributor

evverx commented Jan 19, 2022

Maybe give 70% of the points if run on commits vs PRs

I think it depends on how frequent those runs are. If projects run SAST tools daily (or a few times a day) I think they should be scored higher than projects with weekly or monthly runs.

you'd actually get a 10 if you install scorecard as an action.

I get that :-) but it's complicated because on the one hand I think scorecard is helpful but on the other hand I don't want to promote OSSF. Other than that looking at a couple of checks that have already been introduced and issues where new checks are outlined it seems to me that scorecard is moving towards "consumers" instead of "producers" (which is understandable) but I'm not sure it will make sense for "producers" to keep it upstream at some point (time will tell I think). Just to clarify, I have absolutely no problems with this direction but somehow tools catered to "consumers" somehow shift things that should be done by "consumers" to "producers" (the latest example would be google/oss-fuzz#7146) and generally I wouldn't want to annoy maintainers.

Conclusion string
URL string
App CheckRunApp
Name string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make Name as *string?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add the reason as to why would like to change to a pointer?

Every change requested should have a reason as to why it is requested. It helps others understand the thought process and the reasoning. This is applicable across all these comments.

Pointer sematic has drawbacks we have to be clear on why we need it? Go doesn't encourage pointer semantics. Most of the standard library has value semantics

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. As part of #1242, we are in the process of migrating the current data structure to a more consistent format. The consistent formatting lets us use a convention that implies nil == no data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I make Name a pointer, shall I make everything else one too then?

App CheckRunApp
Name string
CheckSuiteID *int64
// PullRequests []PullRequest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove comment?

clients/githubrepo/workflows.go Outdated Show resolved Hide resolved
clients/workflows.go Show resolved Hide resolved

// Workflow represents a workflow.
type Workflow struct {
ID int64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's use pointer (and also below for *string)

}, nil
}

func addOptions(s string, opts interface{}) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does opts need to be an interface{}? Can it simply be *ListWorkflowRunOptions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pointer semantics *ListWorkflowRunOptions? I would recommend it being value sematics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it a pointer. Pointers are better for large structure, right? Otherwise it's copying around data for calls, etc, I think

return nil, err
}

req, err := handler.client.NewRequest("GET", u, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you file an issue and add a TODO comment here to point to it?

@azeemshaikh38
Copy link
Contributor

LGTM. 2 open issues would be - unit tests and addressing @evverx's comment.

Copy link
Member

@naveensrinivasan naveensrinivasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Few comments.

@@ -298,6 +298,10 @@ func IsWorkflowFile(pathfn string) bool {
}
}

func IsWorkflowFileCb(filename string) (bool, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have a func that never returns error as part of the return?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we remove the error?

One line go funcs are discouraged. Is there a specific reason to have this in a separate func?

checkRun.CheckSuite.ID != nil {
cr.CheckSuiteID = checkRun.CheckSuite.ID
}
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we please remove this commented code?

WorkflowID: workflowRun.WorkflowID,
}

/*prs := workflowRun.PullRequests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we please remove this commented code?

@@ -35,6 +35,8 @@ type RepoClient interface {
ListReleases() ([]Release, error)
ListContributors() ([]Contributor, error)
ListSuccessfulWorkflowRuns(filename string) ([]WorkflowRun, error)
ListWorkflowRuns(opts *ListWorkflowRunOptions) ([]WorkflowRun, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this not be pointer *ListWorkflowRunOption? Makes the API cleaner.

@github-actions
Copy link

Stale pull request message

checks/sast.go Outdated Show resolved Hide resolved
@laurentsimon
Copy link
Contributor Author

update: I've not tweaked the logic for score computation as follows:

  • linter are run on all PRs. Linters are cheap so this seems acceptable. 1 points is awarded if it's the case
  • supply-chain tool are used. We don't check if it's run on all PRs. So long as it's defined in a workflow and/or run on at least one commit, we award 2 points for it
  • static analysis. Same as for supply-chain, and we award 5 points. If one tool is run on all PRs, we give an additional 2 points.

Wdut?

@evverx
Copy link
Contributor

evverx commented Feb 3, 2022

supply-chain tool are used. We don't check if it's run on all PRs. So long as it's defined in a workflow and/or run on at least one commit, we award 2 points for it

I think supply chain tools should follow the same rules as static analyzers in the sense that if they can't be run on PRs they should be downgraded to 1. It should help to prompt the developers of those tools to make them compatible with the PR workflow (which is supposed to catch issues as early as possible).

@laurentsimon
Copy link
Contributor Author

supply-chain tool are used. We don't check if it's run on all PRs. So long as it's defined in a workflow and/or run on at least one commit, we award 2 points for it

I think supply chain tools should follow the same rules as static analyzers in the sense that if they can't be run on PRs they should be downgraded to 1. It should help to prompt the developers of those tools to make them compatible with the PR workflow (which is supposed to catch issues as early as possible).

good point. I'll do that then. Thank you for your feedback!

@laurentsimon
Copy link
Contributor Author

thoughts on rate limiting

https://docs.github.com/en/rest/overview/resources-in-the-rest-api#requests-from-github-actions

When using GITHUB_TOKEN, the rate limit is 1,000 requests per hour per repository. For requests to resources that belong to an enterprise account on GitHub.com, GitHub Enterprise Cloud's rate limit applies, and the limit is 15,000 requests per hour per repository.

That's fine since each commit should have a different token.

For PAT, https://docs.github.com/en/developers/apps/building-github-apps/rate-limits-for-github-apps

GitHub Apps making server-to-server requests use the installation's minimum rate limit of 5,000 requests per hour. If an application is installed on an organization with more than 20 users, the application receives another 50 requests per hour for each user. Installations that have more than 20 repositories receive another 50 requests per hour for each repository. The maximum rate limit for an installation is 12,500 requests per hour.

For a repository like systemd/systemd, we'd need 30*6 = ~200 API requests, so let's say 300 for each push. 10-15 commits per hours should be fine.

FYI, we hope to be able to support GITHUB_TOKEN in the future. For cron job, highly unlikely we'll ever be able to run SAST check if we apply the changes proposed here.

@azeemshaikh38 any further thought?

1 similar comment
@laurentsimon
Copy link
Contributor Author

thoughts on rate limiting

https://docs.github.com/en/rest/overview/resources-in-the-rest-api#requests-from-github-actions

When using GITHUB_TOKEN, the rate limit is 1,000 requests per hour per repository. For requests to resources that belong to an enterprise account on GitHub.com, GitHub Enterprise Cloud's rate limit applies, and the limit is 15,000 requests per hour per repository.

That's fine since each commit should have a different token.

For PAT, https://docs.github.com/en/developers/apps/building-github-apps/rate-limits-for-github-apps

GitHub Apps making server-to-server requests use the installation's minimum rate limit of 5,000 requests per hour. If an application is installed on an organization with more than 20 users, the application receives another 50 requests per hour for each user. Installations that have more than 20 repositories receive another 50 requests per hour for each repository. The maximum rate limit for an installation is 12,500 requests per hour.

For a repository like systemd/systemd, we'd need 30*6 = ~200 API requests, so let's say 300 for each push. 10-15 commits per hours should be fine.

FYI, we hope to be able to support GITHUB_TOKEN in the future. For cron job, highly unlikely we'll ever be able to run SAST check if we apply the changes proposed here.

@azeemshaikh38 any further thought?

@laurentsimon
Copy link
Contributor Author

@azeemshaikh38 take a final look before I merge. Ignore the comments, I'm going to remove them before the merge.

@evverx
Copy link
Contributor

evverx commented Apr 15, 2022

FWIW Looking at #1816 I think by that logic the SAST check should be inconclusive as well when scorecard can't find static analyzers it's aware of. Coverity is often hidden in bash scripts sending data to the scanner and it's unlikely that scorecard can ever detect that reliably.

@azeemshaikh38
Copy link
Contributor

@laurentsimon are we blocked on anything to get this merged?

@laurentsimon
Copy link
Contributor Author

no real blocker, mostly delayed to fix the e2e tests.

naveensrinivasan added a commit that referenced this pull request May 17, 2022
- removed the old json format from cron
fix #1487

Signed-off-by: naveensrinivasan <[email protected]>
naveensrinivasan added a commit that referenced this pull request May 17, 2022
- removed the old json format from cron
fix #1487

Signed-off-by: naveensrinivasan <[email protected]>
@laurentsimon laurentsimon reopened this May 18, 2022
@laurentsimon
Copy link
Contributor Author

Please don't close, Im going to get to this PR sometimes after I've migrated other checks to raw results.

@naveensrinivasan
Copy link
Member

Please don't close, Im going to get to this PR sometimes after I've migrated other checks to raw results.

I apologize it was a mistake.

@laurentsimon
Copy link
Contributor Author

laurentsimon commented Jun 2, 2022

Please don't close, Im going to get to this PR sometimes after I've migrated other checks to raw results.

I apologize it was a mistake.

np, happens to me all the time :)

@naveensrinivasan
Copy link
Member

@laurentsimon What do we want to do about this PR?

@laurentsimon
Copy link
Contributor Author

Let's keep it open if you don't mind. The code uses too many API calls, but I think if we relax some of checks, a large part of the code can be re-used. I'm not working on it atm, though.

@naveensrinivasan
Copy link
Member

Let's keep it open if you don't mind. The code uses too many API calls, but I think if we relax some of checks, a large part of the code can be re-used. I'm not working on it atm, though.

OK, sounds good! Thanks

@naveensrinivasan
Copy link
Member

Hi @laurentsimon,

I wondered if keeping this Pull Request open would be beneficial, considering it is over a year old.

@naveensrinivasan
Copy link
Member

Hi @laurentsimon,

I wondered if keeping this Pull Request open would be beneficial, considering it is over a year old.

A friendly ping @laurentsimon

@laurentsimon
Copy link
Contributor Author

Hi @laurentsimon,
I wondered if keeping this Pull Request open would be beneficial, considering it is over a year old.

A friendly ping @laurentsimon

Up to you. There's a lot of useful code someone may be able to re-use to implement the SAST. But we can close it a search for it if need be.

@naveensrinivasan
Copy link
Member

Hi @laurentsimon,
I wondered if keeping this Pull Request open would be beneficial, considering it is over a year old.

A friendly ping @laurentsimon

Up to you. There's a lot of useful code someone may be able to re-use to implement the SAST. But we can close it a search for it if need be.

Thanks for keeping it clean. I will close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants