Skip to content

fix exec hang when tx channel is full #316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ningmingxiao
Copy link
Contributor

@ningmingxiao ningmingxiao commented Sep 27, 2024

@Burning1020 @mxpv another way to fix #270, this pr don't need to use unbounded_channel
can you review this pr ? Thank you.

@mxpv
Copy link
Member

mxpv commented Oct 3, 2024

Could you pls rebase against latest main to fix the CI ?

@ningmingxiao
Copy link
Contributor Author

done

@mxpv
Copy link
Member

mxpv commented Oct 4, 2024

Trying to understand this change. Can you elaborate how exactly this fixes the issue?

You spawn an extra thread on each iteration and each new thread still writes to the channel. There are no consumers on this side, right? So why does this help?

@ningmingxiao
Copy link
Contributor Author

ningmingxiao commented Oct 4, 2024

Trying to understand this change. Can you elaborate how exactly this fixes the issue?

You spawn an extra thread on each iteration and each new thread still writes to the channel. There are no consumers on this side, right? So why does this help?

I add some log find that containers.lock() can't get lock when chan is full.(and I add some log I can't get who hold the lock),
So I split it into two tokio::spawn to make sure consumer can consume the channel quickly.

❯ cat container.json 
{
  "metadata": {
    "name": "gpu"
  },
  "image": {
    "image": "docker.io/library/busybox:1.28"
  },
  "command": [
    "top"
  ],
  "tty": true,
  "linux": {
  }
}

❯ cat pod.json 
{
  "metadata": {
    "name": "sandbox-a10798895196912d15189a32112",
    "namespace": "default",
    "attempt": 1,
    "uid": "hdishd83djaidwnduwk28bcsb"
  },
  "log_directory": "/tmp",
  "linux": {
  }
}

crictl  run    --no-pull container.json  pod.json

main.go

package main

import (
        "fmt"
        "math/rand"
        "os/exec"
        "sync"
        "time"
)

func main() {
        test()
        fmt.Println("done")
}

func runCmd(command string) {
        cmd := exec.Command("bash", "-c", command)
        err := cmd.Run()
        if err != nil {
                fmt.Println(err)
        }
}

func test() {
        var wg sync.WaitGroup
        const numContainers = 500
        for i := 1; i <= numContainers; i++ {
                wg.Add(1)
                go func(i int) {
                        rand.Seed(time.Now().UnixNano())
                        randomNumber := rand.Intn(10) + 1
                        cmdcri := "crictl exec 2380a  sleep " + fmt.Sprintf("%d", randomNumber)
                        fmt.Println(cmdcri)
                        defer wg.Done()
                        runCmd(cmdcri)
                }(i)
        }
        wg.Wait()
}

and limit work_threads to 2 make it easier to reproduce

#[tokio::main(flavor = "multi_thread", worker_threads = 2)]
async fn main() {
    parse_version();
    run::<Service>("io.containerd.runc.v2-rs", None).await;
}

@ningmingxiao
Copy link
Contributor Author

ping @mxpv @Burning1020

Copy link

This PR is stale because it has been open 90 days with no activity. This PR will be closed in 7 days unless new comments are made or the stale label is removed.

@github-actions github-actions bot added the Stale label Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-runc-shim Runc shim C-shim Containerd shim Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants