-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug][gitextractor] subtask Clone Git Repo ended unexpectedly #7958
Comments
I found a similar issue that might be relevant to your problem. The issue titled "[Bug][Git Extractor] Git Extractor task failed in a GitHub blueprint" was closed and the suggested solution was to change the Docker base image [1]. The "Invalid username or password" error in the gitextractor subtask of your pipeline is likely caused by the permission of your tokens. Specifically, it could be due to using an unsupported authentication method or a token without the necessary permissions for the entities you want to collect [2]. To resolve this issue, ensure that you are using a supported authentication method and that your token has the necessary permissions for the repositories you are trying to clone. Additionally, if you are dealing with a private GitLab server with a self-signed certificate, you might need to add your root CA to the If the issue persists, consider checking the rate limits and throttling mechanisms that could affect the cloning of multiple repositories in a single pipeline run. DevLake uses a dynamic rate limit at around 12,000 requests/hour to collect GitLab data, and tokens under the same IP address share this rate limit [3]. |
No. It is not similar. My repositories are private. I have already deployed the latest version of devlake. Also, in my pipeline, some repositories clone successfully, others can't clone. It is GitHub Enterprise Cloud, not on-premise |
|
we are also running into this, when all pipelines ran at the same time it happened but when i gave them different schedules it worked fine, using a private github org with github app auth |
UPDATE: I changed the rate limit manually, for example, I set 10,000 requests per hour, and the pipeline begins with the error after stage 38. Before, it began after stages 20, 21, or 22. I think that it exceeds rate limits, therefore some stages end with errors. But, I don't see any errors about it at logs |
It’s quite likely you're encountering this error due to the rate limiting. We introduced it to give users the ability to reduce request frequency and avoid 429 errors. 😅 I recommend trying the approach shared by @KyriosGN0, as unfortunately, there’s not much else we can do to prevent this kind of issue(likely the data source or some point in the network closes the connection). |
@klesh does it make sense to introduce a re-try in this plug-in? (Or in general to give plugins the ability to rerun a failed task automatically?) |
@KyriosGN0 |
If the application returns an error about the rate limit, it would be clearer for users. Also, We should be able to change the duration of the pipeline. For example, if the pipeline works slowly, it won't exceed the rate limit. |
To clarify, there are 2 things here:
|
@klesh is trace a supported log level? i really want to see what went wrong in gitextractor |
@klesh this seems to have gotten worse in the new |
I updated to the latest version v1.0.1-beta9. But I have the same issue. Only gitextractor failed, api sync client is working normally with GitHub api. time="2024-09-08 23:21:14" level=debug msg=" [pipeline service] [pipeline #140] [task #11941] [Clone Git Repo] [gitcli] err: exit status 128, output: Cloning into bare repository '/tmp/gitextractor1157925462'...\nremote: Invalid username or password.\nfatal: Authentication failed for 'https://github.com/PB-***/adapter-***.git/'\n" time="2024-09-08 23:21:14" level=info msg=" [pipeline service] [pipeline #140] [task #11941] subtask Clone Git Repo finished in 655 ms" time="2024-09-08 23:21:14" level=error msg=" [pipeline service] [pipeline #140] [task #11941] subtask Clone Git Repo ended unexpectedly\n\tWraps: (2) git cmd [git clone https://git:**********************************@github.com/PB-/adapter-.git /tmp/gitextractor1157925462 --depth=1 --bare] in failed: Cloning into bare repository '/tmp/gitextractor1157925462'...\n\t | remote: Invalid username or password.\n\t | fatal: Authentication failed for 'https://github.com/PB-***/adapter-***.git/'\n\tError types: (1) *hintdetail.withDetail (2) *errors.errorString" |
It seems to be working fine on my end. |
@klesh from the project or the data source ? |
no, it is not a solution. Because it can be error different repositories |
@KyriosGN0 To clarify, did you mean the token assigned to the |
@klesh yes |
@KyriosGN0 Do you know how to reproduce the problem? It is working as expected on my local machine 😂 |
@klesh, i can reproduce it my devlake but it will take me a couple of hours |
@KyriosGN0 Oh, github app token would expire in 1 hour, it was a known problem. Try using PAT as a workaround for the moment. See #7655 @realhuseyn R u using Github App authentication as well? |
@klesh yeah, I am using Github App. I tried PAT one time. It results with the same error. The Github App is more useable than PAT because it has a 15000 rate limit per hour. |
@realhuseyn Same error as the following?
Try to take the following steps:
|
@klesh yes, the same error. I tried again and again. But it doesn't work. It starts to write errors after different steps. For example, after the 20th step, after the 23rd, after the 29, etc. But, when I use PAT, gitextractor finishes successfully, and the middle of the step finishes with errors. But when I use GitHub App, gitextractor finishes with errors, and the middle of the step finishes successfully |
What was the error message when using PAT? |
Response: {"message":"API rate limit exceeded for user ID 31611637. If you reach out to GitHub Support for help, please include the request ID 720F:309301:21A54C4:221F619:66E26312 and timestamp 2024-09-12 03:42:10 UTC.","documentation_url":"https://docs.github.com/rest/overview/rate-limits-for-the-rest-api","status":"403"} (403) Error types: (1) *hintdetail.withDetail (2) *errors.errorString |
hey @klesh is there any update on this? i see that the github app issue is resolved but i still get the error in gitextractor plugin |
@KyriosGN0 No, why do you think the GitHub app issue is resolved? I don't think it did though 😂 |
@KyriosGN0 You are correct. The GitHub App Auth is still problematic and no one is working on it AFAIK. |
@klesh can you point me to where we init the gitextractor plugin ? i will try to open a PR to fix this |
@KyriosGN0 The pipeline task generated by the github plugin is located at https://github.com/apache/incubator-devlake/blob/main/backend/plugins/github/api/blueprint_v200.go#L130 |
@klesh i think i have clue, this pipeline is rather long (6-7 hours) and the first 2 stages take around an hour, by which point the github app installation token expires, i assume that because the pipeline generates all of its steps in advance it will be rather hard to update the token in "runtime" (i.e when the pipeline is running) correct? |
@KyriosGN0 You are 100% correct. From what I can tell, we need a new mechanism to support such a scene which is not an easy task. 😂 |
Hi everyone! What do you think about this temporary solution for this issue? #8136 I'm open to suggestions and how to evolve this solution. |
Search before asking
What happened
My pipeline results in errors. One of them is related to clone GitHub repositories. My pipeline includes 60+ repositories. At the beginning of the pipeline, tasks finish successfully, but after 15 or 20th tasks, all other tasks finish with git clone errors.
time="2024-08-26 13:18:26" level=info msg=" [pipeline service] [pipeline #119] [task #7340] start executing task: 7340"
time="2024-08-26 13:18:26" level=info msg=" [pipeline service] [pipeline #119] [task #7340] start plugin"
time="2024-08-26 13:18:26" level=info msg=" [pipeline service] [pipeline #119] [task #7340] [gitextractor.PrepareTaskData] UseGoGit: false"
time="2024-08-26 13:18:26" level=info msg=" [pipeline service] [pipeline #119] [task #7340] [gitextractor.PrepareTaskData] SkipCommitStat: false"
time="2024-08-26 13:18:26" level=info msg=" [pipeline service] [pipeline #119] [task #7340] [gitextractor.PrepareTaskData] SkipCommitFiles: true"
time="2024-08-26 13:18:26" level=info msg=" [pipeline service] [pipeline #119] [task #7340] total step: 4"
time="2024-08-26 13:18:26" level=info msg=" [pipeline service] [pipeline #119] [task #7340] executing subtask Clone Git Repo"
time="2024-08-26 13:18:27" level=error msg=" [pipeline service] [pipeline #119] [task #7340] subtask Clone Git Repo ended unexpectedly\n\tWraps: (2) git cmd [git clone https://git:*************************************@github.com/Pl/adapter--.git /tmp/gitextractor3060880397 --depth=1 --bare] in failed: Cloning into bare repository '/tmp/gitextractor3060880397'...\n\t | remote: Invalid username or password.\n\t | fatal: Authentication failed for 'https://github.com/P***l/adapter-**-**.git/'\n\tError types: (1) *hintdetail.withDetail (2) *errors.errorString"
What do you expect to happen
The pipeline uses one GitHub connection. Why are some of repositories cloning successfully and all other repositories not cloning with "Invalid username or password" error?
How to reproduce
Rerun pipeline
Anything else
No response
Version
v1.0.1-beta7
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: