Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VPP-1979] VCL app with worker is hang on exit #3441

Closed
vvalderrv opened this issue Feb 2, 2025 · 9 comments
Closed

[VPP-1979] VCL app with worker is hang on exit #3441

vvalderrv opened this issue Feb 2, 2025 · 9 comments

Comments

@vvalderrv
Copy link
Contributor

Description

compile the following VCL app

g++ -I /home/vadim/vpp/build-root/install-vpp_debug-native/vpp/include main.cpp -L/home/vadim/vpp/build-root/install-vpp_debug-native/vpp/lib -lpthread -lvppcom

#include 

#include 
#include 

int func()
{
    if (vppcom_worker_register() != VPPCOM_OK)
    {
        printf("failed to initialize worker");
        return 1;
    }
<span class="code-keyword">return</span> 0;

}

int main()
{
if (vppcom_app_create("test_vcl_server") != VPPCOM_OK)
{
printf("failed to initialize vppcom_app");
return 1;
}

<span class="code-keyword">std</span>::thread thread1(func);
thread1.join();
vppcom_app_destroy();

<span class="code-keyword">return</span> 0;

}

vadim@vadim-lenovo:~/hybrid-ngfw/sources/pdk/network_interceptor/test_vcl_server$ sudo LD_LIBRARY_PATH=/home/vadim/vpp/build-root/install-vpp_debug-native/vpp/lib VCL_CONFIG=/home/vadim/config/vcl.conf gdb a.out

GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2

Copyright (C) 2020 Free Software Foundation, Inc.

License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law.

Type "show copying" and "show warranty" for details.

This GDB was configured as "x86_64-linux-gnu".

Type "show configuration" for configuration details.

For bug reporting instructions, please see:

<http://www.gnu.org/software/gdb/bugs/>.

Find the GDB manual and other documentation resources online at:

<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".

Type "apropos word" to search for commands related to "word"...

Reading symbols from a.out...

(No debugging symbols found in a.out)

(gdb) r

Starting program: /home/vadim/hybrid-ngfw/sources/pdk/network_interceptor/test_vcl_server/a.out

[Thread debugging using libthread_db enabled]

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

VCL<91647>: allocated VCL heap = 0x7fffe786a000, size 268435456 (0x10000000)

VCL<91647>: configured rx_fifo_size 1048576 (0x100000)

VCL<91647>: configured tx_fifo_size 1048576 (0x100000)

VCL<91647>: configured api-socket-name (/run/vpp/api.sock)

VCL<91647>: configured with mq with eventfd

VCL<91647>: configured event_queue_size 524288 (0x80000)

VCL<91647>: completed parsing vppcom config!

[New Thread 0x7fffe7869700 (LWP 91651)]

vcl_bapi_connect_to_vpp:504: vcl<91647:0>: app (test_vcl_server-wrk-0) is connected to VPP!

vcl_bapi_init:607: vcl<91647:0>: sending session enable

vppcom_app_create:1267: vcl<91647:0>: app_name 'test_vcl_server', my_client_index 512 (0x200)

[New Thread 0x7fffe7068700 (LWP 91652)]

[New Thread 0x7fffe6867700 (LWP 91653)]

vcl_bapi_connect_to_vpp:504: vcl<91647:1>: app (test_vcl_server-wrk-1) is connected to VPP!

vl_api_app_worker_add_del_reply_t_handler:227: vcl<0:-1>: worker 1 vpp-worker 1 added

vcl_worker_register_with_vpp:218: vcl<91647:1>: added worker 1

vcl_worker_cleanup_cb:145: vcl<0:-1>: cleaned up worker 1

[Thread 0x7fffe7068700 (LWP 91652) exited]

[Thread 0x7fffe7869700 (LWP 91651) exited]

^C

Thread 1 "a.out" received signal SIGINT, Interrupt.

__pthread_clockjoin_ex (threadid=140737060960000, thread_return=0x7fffffffe150, clockid=, abstime=, block=) at pthread_join_common.c:145

145 pthread_join_common.c: No such file or directory.

(gdb) thread apply all bt

Thread 4 (Thread 0x7fffe6867700 (LWP 91653)):

#0 futex_wait_cancelable (private=, expected=0, futex_word=0x7f814202c950) at ../sysdeps/nptl/futex-internal.h:183

#1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7f814202c900, cond=0x7f814202c928) at pthread_cond_wait.c:508

#2 __pthread_cond_wait (cond=0x7f814202c928, mutex=0x7f814202c900) at pthread_cond_wait.c:638

#3 0x00007ffff7c5928d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:82

#4 0x4000000000000000 in ?? ()

#5 0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7ffff786a780 (LWP 91647)):

#0 __pthread_clockjoin_ex (threadid=140737060960000, thread_return=0x7fffffffe150, clockid=, abstime=, block=) at pthread_join_common.c:145

#1 0x00007ffff7b197c1 in disconnect_from_vlib_internal (do_unmap=0 '\000') at /home/vadim/vpp/src/vlibmemory/memory_client.c:518

#2 0x00007ffff7b19827 in vl_client_disconnect_from_vlib_no_unmap () at /home/vadim/vpp/src/vlibmemory/memory_client.c:538

#3 0x00007ffff7b20aaa in vl_socket_client_disconnect2 (scm=0x7fffe786ba18) at /home/vadim/vpp/src/vlibmemory/socket_client.c:237

#4 0x00007ffff7f50569 in vcl_bapi_disconnect_from_vpp () at /home/vadim/vpp/src/vcl/vcl_bapi.c:519

#5 0x00007ffff7f39bb8 in vcl_api_detach (wrk=0x7fffe786b8c0) at /home/vadim/vpp/src/vcl/vppcom.c:1228

#6 0x00007ffff7f3a1d4 in vppcom_app_destroy () at /home/vadim/vpp/src/vcl/vppcom.c:1293

#7 0x000055555555534b in main ()

(gdb)

 

Actual

HANGED on exit

 

Expected

No hang on exit

Assignee

Florin Coras

Reporter

Vadim Platonov

Comments

  • florin.coras (Thu, 8 Apr 2021 15:41:11 +0000): Fixed by https://gerrit.fd.io/r/c/vpp/+/31922
  • florin.coras (Thu, 8 Apr 2021 15:40:29 +0000): Awesome! Thanks for the report!
  • JIRAUSER14500 (Thu, 8 Apr 2021 07:12:48 +0000): Hi Florin,

working fine, thanks!

  • florin.coras (Thu, 8 Apr 2021 01:24:30 +0000): Could you try this out https://gerrit.fd.io/r/c/vpp/+/31922 ? The simple vcl app above seems to be working fine for me now.
  • florin.coras (Wed, 7 Apr 2021 17:56:29 +0000): Hi Vadim,

That's unfortunately another type of issue, that has to do with the fact that the app is multi threaded and not all workers have been stopped. Consequently, vcl asks vpp to cleanup the binary api registration for the worker, but doesn't clean it locally as it assumes it'll soon exit (not the case in your app). Without going too much into the weeds, that's an optimization mainly aimed at supporting sshd + ldp (has to do with seccomp sandbox).

Good news is that this should not be present if the app socket api is in use (as opposed to the binary api). Still, I'd like to also support your use case so let me see if I can come up with a solution.

  • JIRAUSER14500 (Wed, 7 Apr 2021 15:17:56 +0000): Hi, Florin,

With your fix vppcom_worker_unregister works fine, but the original issue is not fixed, the app is still hanged on exit.

Consider the problem is that Thread 4 rx_thread_fn is not joined on vppcom_app_destroy and vppcom_app_destroy is infinetly waiting for this thread. 

Starting program: /home/vadim/hybrid-ngfw/sources/pdk/network_interceptor/test_vcl_server/a.out 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
VCL<16492>: allocated VCL heap = 0x7fffe7869000, size 268435456 (0x10000000)
VCL<16492>: configured rx_fifo_size 1048576 (0x100000)
VCL<16492>: configured tx_fifo_size 1048576 (0x100000)
VCL<16492>: configured api-socket-name (/run/vpp/api.sock)
VCL<16492>: configured with mq with eventfd
VCL<16492>: configured event_queue_size 524288 (0x80000)
VCL<16492>: completed parsing vppcom config!
[New Thread 0x7fffe7868700 (LWP 16496)]
vcl_bapi_connect_to_vpp:504: vcl<16492:0>: app (test_vcl_server-wrk-0) is connected to VPP!
vcl_bapi_init:607: vcl<16492:0>: sending session enable
vppcom_app_create:1267: vcl<16492:0>: app_name 'test_vcl_server', my_client_index 256 (0x100)
[New Thread 0x7fffe7067700 (LWP 16497)]
[New Thread 0x7fffe6866700 (LWP 16498)]
vcl_bapi_connect_to_vpp:504: vcl<16492:1>: app (test_vcl_server-wrk-1) is connected to VPP!
vl_api_app_worker_add_del_reply_t_handler:227: vcl<0:-1>: worker 1 vpp-worker 1 added
vcl_worker_register_with_vpp:224: vcl<16492:1>: added worker 1
[Thread 0x7fffe7067700 (LWP 16497) exited]

Thread 1 "a.out" hit Breakpoint 1, vppcom_app_destroy () at /home/vadim/vpp/src/vcl/vppcom.c:1275
1275 {
(gdb) thread apply all bt

Thread 4 (Thread 0x7fffe6866700 (LWP 16498)):
#0 futex_wait_cancelable (private=, expected=0, futex_word=0x7ef563a2d950) at ../sysdeps/nptl/futex-internal.h:183
#1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7ef563a2d900, cond=0x7ef563a2d928) at pthread_cond_wait.c:508
#2 __pthread_cond_wait (cond=0x7ef563a2d928, mutex=0x7ef563a2d900) at pthread_cond_wait.c:638
#3 0x00007ffff7ae9c6d in svm_queue_wait_inline (q=0x7ef563a2d900) at /home/vadim/vpp/src/svm/queue.c:157
#4 0x00007ffff7aea407 in svm_queue_sub (q=0x7ef563a2d900, elem=0x7fffe6865e68 "xǡc\365~", cond=SVM_Q_WAIT, time=0) at /home/vadim/vpp/src/svm/queue.c:408
#5 0x00007ffff7b2a4a1 in vl_msg_api_queue_handler (q=0x7ef563a2d900) at /home/vadim/vpp/src/vlibapi/api_shared.c:907
#6 0x00007ffff7b1766c in rx_thread_fn (arg=0x7fffd8000b60) at /home/vadim/vpp/src/vlibmemory/memory_client.c:77
#7 0x00007ffff7f90609 in start_thread (arg=) at pthread_create.c:477
#8 0x00007ffff7c58293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7fffe7868700 (LWP 16496)):
#0 futex_wait_cancelable (private=, expected=0, futex_word=0x7ef567a2e950) at ../sysdeps/nptl/futex-internal.h:183
#1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7ef567a2e900, cond=0x7ef567a2e928) at pthread_cond_wait.c:508
#2 __pthread_cond_wait (cond=0x7ef567a2e928, mutex=0x7ef567a2e900) at pthread_cond_wait.c:638
#3 0x00007ffff7ae9c6d in svm_queue_wait_inline (q=0x7ef567a2e900) at /home/vadim/vpp/src/svm/queue.c:157
#4 0x00007ffff7aea407 in svm_queue_sub (q=0x7ef567a2e900, elem=0x7fffe7867e68 "xסg\365~", cond=SVM_Q_WAIT, time=0) at /home/vadim/vpp/src/svm/queue.c:408
#5 0x00007ffff7b2a4a1 in vl_msg_api_queue_handler (q=0x7ef567a2e900) at /home/vadim/vpp/src/vlibapi/api_shared.c:907
#6 0x00007ffff7b1766c in rx_thread_fn (arg=0x55555556c220) at /home/vadim/vpp/src/vlibmemory/memory_client.c:77
#7 0x00007ffff7f90609 in start_thread (arg=) at pthread_create.c:477
#8 0x00007ffff7c58293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7ffff7869780 (LWP 16492)):
#0 vppcom_app_destroy () at /home/vadim/vpp/src/vcl/vppcom.c:1275
#1 0x0000555555555380 in main ()
(gdb) c
Continuing.
[Thread 0x7fffe7868700 (LWP 16496) exited]
^C
Thread 1 "a.out" received signal SIGINT, Interrupt.
__pthread_clockjoin_ex (threadid=140737060955904, thread_return=0x7fffffffe150, clockid=, abstime=, block=) at pthread_join_common.c:145
145 pthread_join_common.c: No such file or directory.
(gdb) thread apply all bt

Thread 4 (Thread 0x7fffe6866700 (LWP 16498)):
#0 futex_wait_cancelable (private=, expected=0, futex_word=0x7ef563a2d950) at ../sysdeps/nptl/futex-internal.h:183
#1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7ef563a2d900, cond=0x7ef563a2d928) at pthread_cond_wait.c:508
#2 __pthread_cond_wait (cond=0x7ef563a2d928, mutex=0x7ef563a2d900) at pthread_cond_wait.c:638
#3 0x00007ffff7c5828d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:82
#4 0x4000000000000000 in ?? ()
#5 0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7ffff7869780 (LWP 16492)):
#0 __pthread_clockjoin_ex (threadid=140737060955904, thread_return=0x7fffffffe150, clockid=, abstime=, block=) at pthread_join_common.c:145
#1 0x00007ffff7b187c1 in disconnect_from_vlib_internal (do_unmap=0 '\000') at /home/vadim/vpp/src/vlibmemory/memory_client.c:518
#2 0x00007ffff7b18827 in vl_client_disconnect_from_vlib_no_unmap () at /home/vadim/vpp/src/vlibmemory/memory_client.c:538
#3 0x00007ffff7b1faaa in vl_socket_client_disconnect2 (scm=0x7fffe786aa18) at /home/vadim/vpp/src/vlibmemory/socket_client.c:237
#4 0x00007ffff7f4f569 in vcl_bapi_disconnect_from_vpp () at /home/vadim/vpp/src/vcl/vcl_bapi.c:519
#5 0x00007ffff7f38bb8 in vcl_api_detach (wrk=0x7fffe786a8c0) at /home/vadim/vpp/src/vcl/vppcom.c:1228
#6 0x00007ffff7f391d4 in vppcom_app_destroy () at /home/vadim/vpp/src/vcl/vppcom.c:1293
#7 0x0000555555555380 in main ()

- **florin.coras (Wed, 7 Apr 2021 07:37:48 +0000)**: That's definitely an issue! Could you check with the same code on top of this https://gerrit.fd.io/r/c/vpp/+/31915?

Now, regarding the purpose of the cb, it's there because we wanted to proactively cleanup worker state in vpp, in case apps forget to cleanup their workers or the workers are registered through ldp (apps are not aware of it). In particular, the binary api takes up to 30s until it cleans up a client and that could lead to issues if, for instance, such a worker listens on some ports that need to be reused.

Going forward, if apps attach using the explicit socket api (session

{ use-app-socket-api }
in startup.conf), vpp should be able to detect app worker cleanups the moment the attachment socket is closed. So we might be able to get rid of that cb at one point but, for now, the extra checks in the patch above should be enough.

  • JIRAUSER14500 (Wed, 7 Apr 2021 06:07:40 +0000): Hi Florin,

in case vppcom_woker_unregister() at end of func() there are assert because of the vcl_worker_cleanup_cb() callback at thread exit. By the way I consider it's better to remove this callback and let vcl users make explicit vppcom_worker_unregister(). 

Starting program: /home/vadim/test_vcl_server/a.out 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
VCL<5712>: allocated VCL heap = 0x7fffe786a000, size 268435456 (0x10000000)
VCL<5712>: configured rx_fifo_size 1048576 (0x100000)
VCL<5712>: configured tx_fifo_size 1048576 (0x100000)
VCL<5712>: configured api-socket-name (/run/vpp/api.sock)
VCL<5712>: configured with mq with eventfd
VCL<5712>: configured event_queue_size 524288 (0x80000)
VCL<5712>: completed parsing vppcom config!
[New Thread 0x7fffe7869700 (LWP 5716)]
vcl_bapi_connect_to_vpp:504: vcl<5712:0>: app (test_vcl_server-wrk-0) is connected to VPP!
vcl_bapi_init:607: vcl<5712:0>: sending session enable
vppcom_app_create:1267: vcl<5712:0>: app_name 'test_vcl_server', my_client_index 0 (0x0)
[New Thread 0x7fffe7068700 (LWP 5717)]
[New Thread 0x7fffe6867700 (LWP 5718)]
vcl_bapi_connect_to_vpp:504: vcl<5712:1>: app (test_vcl_server-wrk-1) is connected to VPP!
vl_api_app_worker_add_del_reply_t_handler:227: vcl<0:-1>: worker 1 vpp-worker 1 added
vcl_worker_register_with_vpp:218: vcl<5712:1>: added worker 1
/home/vadim/vpp/src/vcl/vcl_private.h:650 (vcl_worker_get) assertion `! pool_is_free (vcm->workers, _e)' fails

Thread 3 "a.out" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe7068700 (LWP 5717)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007ffff7b5c859 in __GI_abort () at abort.c:79
#2 0x00007ffff7a997e6 in os_panic () at /home/vadim/vpp/src/vppinfra/unix-misc.c:177
#3 0x00007ffff79fae5c in debugger () at /home/vadim/vpp/src/vppinfra/error.c:84
#4 0x00007ffff79fb253 in _clib_error (how_to_die=2, function_name=0x0, line_number=0, fmt=0x7ffff7f79d38 "%s:%d (%s) assertion `%s' fails") at /home/vadim/vpp/src/vppinfra/error.c:143
#5 0x00007ffff7f5a8a9 in vcl_worker_get (wrk_index=4294967295) at /home/vadim/vpp/src/vcl/vcl_private.h:650
#6 0x00007ffff7f5a8c3 in vcl_worker_get_current () at /home/vadim/vpp/src/vcl/vcl_private.h:664
#7 0x00007ffff7f5b567 in vcl_worker_cleanup_cb (arg=0x7fffe786bdd0) at /home/vadim/vpp/src/vcl/vcl_private.c:141
#8 0x00007ffff7f8f5a1 in __nptl_deallocate_tsd () at pthread_create.c:301
#9 0x00007ffff7f9062a in __nptl_deallocate_tsd () at pthread_create.c:256
#10 start_thread (arg=) at pthread_create.c:488
#11 0x00007ffff7c59293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

- **florin.coras (Tue, 6 Apr 2021 17:53:41 +0000)**: Hi Vadim! Could you try with vppcom_worker_unregister() at the end of func()?

Original issue: https://jira.fd.io/browse/VPP-1979

@vvalderrv
Copy link
Contributor Author

Fixed by https://gerrit.fd.io/r/c/vpp/+/31922

@vvalderrv
Copy link
Contributor Author

Awesome! Thanks for the report!

@vvalderrv
Copy link
Contributor Author

Hi Florin,

working fine, thanks!

@vvalderrv
Copy link
Contributor Author

Could you try this out https://gerrit.fd.io/r/c/vpp/+/31922 ? The simple vcl app above seems to be working fine for me now.

@vvalderrv
Copy link
Contributor Author

Hi Vadim,

That's unfortunately another type of issue, that has to do with the fact that the app is multi threaded and not all workers have been stopped. Consequently, vcl asks vpp to cleanup the binary api registration for the worker, but doesn't clean it locally as it assumes it'll soon exit (not the case in your app). Without going too much into the weeds, that's an optimization mainly aimed at supporting sshd + ldp (has to do with seccomp sandbox).

Good news is that this should not be present if the app socket api is in use (as opposed to the binary api). Still, I'd like to also support your use case so let me see if I can come up with a solution.

@vvalderrv
Copy link
Contributor Author

Hi, Florin,

With your fix vppcom_worker_unregister works fine, but the original issue is not fixed, the app is still hanged on exit.

Consider the problem is that Thread 4 rx_thread_fn is not joined on vppcom_app_destroy and vppcom_app_destroy is infinetly waiting for this thread. 

Starting program: /home/vadim/hybrid-ngfw/sources/pdk/network_interceptor/test_vcl_server/a.out 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
VCL<16492>: allocated VCL heap = 0x7fffe7869000, size 268435456 (0x10000000)
VCL<16492>: configured rx_fifo_size 1048576 (0x100000)
VCL<16492>: configured tx_fifo_size 1048576 (0x100000)
VCL<16492>: configured api-socket-name (/run/vpp/api.sock)
VCL<16492>: configured with mq with eventfd
VCL<16492>: configured event_queue_size 524288 (0x80000)
VCL<16492>: completed parsing vppcom config!
[New Thread 0x7fffe7868700 (LWP 16496)]
vcl_bapi_connect_to_vpp:504: vcl<16492:0>: app (test_vcl_server-wrk-0) is connected to VPP!
vcl_bapi_init:607: vcl<16492:0>: sending session enable
vppcom_app_create:1267: vcl<16492:0>: app_name 'test_vcl_server', my_client_index 256 (0x100)
[New Thread 0x7fffe7067700 (LWP 16497)]
[New Thread 0x7fffe6866700 (LWP 16498)]
vcl_bapi_connect_to_vpp:504: vcl<16492:1>: app (test_vcl_server-wrk-1) is connected to VPP!
vl_api_app_worker_add_del_reply_t_handler:227: vcl<0:-1>: worker 1 vpp-worker 1 added
vcl_worker_register_with_vpp:224: vcl<16492:1>: added worker 1
[Thread 0x7fffe7067700 (LWP 16497) exited]

Thread 1 "a.out" hit Breakpoint 1, vppcom_app_destroy () at /home/vadim/vpp/src/vcl/vppcom.c:1275
1275 {
(gdb) thread apply all bt

Thread 4 (Thread 0x7fffe6866700 (LWP 16498)):
#0 futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7ef563a2d950) at ../sysdeps/nptl/futex-internal.h:183
#1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7ef563a2d900, cond=0x7ef563a2d928) at pthread_cond_wait.c:508
#2 __pthread_cond_wait (cond=0x7ef563a2d928, mutex=0x7ef563a2d900) at pthread_cond_wait.c:638
#3 0x00007ffff7ae9c6d in svm_queue_wait_inline (q=0x7ef563a2d900) at /home/vadim/vpp/src/svm/queue.c:157
#4 0x00007ffff7aea407 in svm_queue_sub (q=0x7ef563a2d900, elem=0x7fffe6865e68 "xǡc\365~", cond=SVM_Q_WAIT, time=0) at /home/vadim/vpp/src/svm/queue.c:408
#5 0x00007ffff7b2a4a1 in vl_msg_api_queue_handler (q=0x7ef563a2d900) at /home/vadim/vpp/src/vlibapi/api_shared.c:907
#6 0x00007ffff7b1766c in rx_thread_fn (arg=0x7fffd8000b60) at /home/vadim/vpp/src/vlibmemory/memory_client.c:77
#7 0x00007ffff7f90609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#8 0x00007ffff7c58293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7fffe7868700 (LWP 16496)):
#0 futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7ef567a2e950) at ../sysdeps/nptl/futex-internal.h:183
#1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7ef567a2e900, cond=0x7ef567a2e928) at pthread_cond_wait.c:508
#2 __pthread_cond_wait (cond=0x7ef567a2e928, mutex=0x7ef567a2e900) at pthread_cond_wait.c:638
#3 0x00007ffff7ae9c6d in svm_queue_wait_inline (q=0x7ef567a2e900) at /home/vadim/vpp/src/svm/queue.c:157
#4 0x00007ffff7aea407 in svm_queue_sub (q=0x7ef567a2e900, elem=0x7fffe7867e68 "xסg\365~", cond=SVM_Q_WAIT, time=0) at /home/vadim/vpp/src/svm/queue.c:408
#5 0x00007ffff7b2a4a1 in vl_msg_api_queue_handler (q=0x7ef567a2e900) at /home/vadim/vpp/src/vlibapi/api_shared.c:907
#6 0x00007ffff7b1766c in rx_thread_fn (arg=0x55555556c220) at /home/vadim/vpp/src/vlibmemory/memory_client.c:77
#7 0x00007ffff7f90609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#8 0x00007ffff7c58293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7ffff7869780 (LWP 16492)):
#0 vppcom_app_destroy () at /home/vadim/vpp/src/vcl/vppcom.c:1275
#1 0x0000555555555380 in main ()
(gdb) c
Continuing.
[Thread 0x7fffe7868700 (LWP 16496) exited]
^C
Thread 1 "a.out" received signal SIGINT, Interrupt.
__pthread_clockjoin_ex (threadid=140737060955904, thread_return=0x7fffffffe150, clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>) at pthread_join_common.c:145
145 pthread_join_common.c: No such file or directory.
(gdb) thread apply all bt

Thread 4 (Thread 0x7fffe6866700 (LWP 16498)):
#0 futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7ef563a2d950) at ../sysdeps/nptl/futex-internal.h:183
#1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7ef563a2d900, cond=0x7ef563a2d928) at pthread_cond_wait.c:508
#2 __pthread_cond_wait (cond=0x7ef563a2d928, mutex=0x7ef563a2d900) at pthread_cond_wait.c:638
#3 0x00007ffff7c5828d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:82
#4 0x4000000000000000 in ?? ()
#5 0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7ffff7869780 (LWP 16492)):
#0 __pthread_clockjoin_ex (threadid=140737060955904, thread_return=0x7fffffffe150, clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>) at pthread_join_common.c:145
#1 0x00007ffff7b187c1 in disconnect_from_vlib_internal (do_unmap=0 '\000') at /home/vadim/vpp/src/vlibmemory/memory_client.c:518
#2 0x00007ffff7b18827 in vl_client_disconnect_from_vlib_no_unmap () at /home/vadim/vpp/src/vlibmemory/memory_client.c:538
#3 0x00007ffff7b1faaa in vl_socket_client_disconnect2 (scm=0x7fffe786aa18) at /home/vadim/vpp/src/vlibmemory/socket_client.c:237
#4 0x00007ffff7f4f569 in vcl_bapi_disconnect_from_vpp () at /home/vadim/vpp/src/vcl/vcl_bapi.c:519
#5 0x00007ffff7f38bb8 in vcl_api_detach (wrk=0x7fffe786a8c0) at /home/vadim/vpp/src/vcl/vppcom.c:1228
#6 0x00007ffff7f391d4 in vppcom_app_destroy () at /home/vadim/vpp/src/vcl/vppcom.c:1293
#7 0x0000555555555380 in main ()

 

 

 

@vvalderrv
Copy link
Contributor Author

That's definitely an issue! Could you check with the same code on top of this https://gerrit.fd.io/r/c/vpp/+/31915?

Now, regarding the purpose of the cb, it's there because we wanted to proactively cleanup worker state in vpp, in case apps forget to cleanup their workers or the workers are registered through ldp (apps are not aware of it). In particular, the binary api takes up to 30s until it cleans up a client and that could lead to issues if, for instance, such a worker listens on some ports that need to be reused.

Going forward, if apps attach using the explicit socket api (session

{ use-app-socket-api }

in startup.conf), vpp should be able to detect app worker cleanups the moment the attachment socket is closed. So we might be able to get rid of that cb at one point but, for now, the extra checks in the patch above should be enough.

@vvalderrv
Copy link
Contributor Author

Hi Florin,

in case vppcom_woker_unregister() at end of func() there are assert because of the vcl_worker_cleanup_cb() callback at thread exit. By the way I consider it's better to remove this callback and let vcl users make explicit vppcom_worker_unregister(). 

Starting program: /home/vadim/test_vcl_server/a.out 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
VCL<5712>: allocated VCL heap = 0x7fffe786a000, size 268435456 (0x10000000)
VCL<5712>: configured rx_fifo_size 1048576 (0x100000)
VCL<5712>: configured tx_fifo_size 1048576 (0x100000)
VCL<5712>: configured api-socket-name (/run/vpp/api.sock)
VCL<5712>: configured with mq with eventfd
VCL<5712>: configured event_queue_size 524288 (0x80000)
VCL<5712>: completed parsing vppcom config!
[New Thread 0x7fffe7869700 (LWP 5716)]
vcl_bapi_connect_to_vpp:504: vcl<5712:0>: app (test_vcl_server-wrk-0) is connected to VPP!
vcl_bapi_init:607: vcl<5712:0>: sending session enable
vppcom_app_create:1267: vcl<5712:0>: app_name 'test_vcl_server', my_client_index 0 (0x0)
[New Thread 0x7fffe7068700 (LWP 5717)]
[New Thread 0x7fffe6867700 (LWP 5718)]
vcl_bapi_connect_to_vpp:504: vcl<5712:1>: app (test_vcl_server-wrk-1) is connected to VPP!
vl_api_app_worker_add_del_reply_t_handler:227: vcl<0:-1>: worker 1 vpp-worker 1 added
vcl_worker_register_with_vpp:218: vcl<5712:1>: added worker 1
/home/vadim/vpp/src/vcl/vcl_private.h:650 (vcl_worker_get) assertion `! pool_is_free (vcm->workers, _e)' fails

Thread 3 "a.out" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe7068700 (LWP 5717)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007ffff7b5c859 in __GI_abort () at abort.c:79
#2 0x00007ffff7a997e6 in os_panic () at /home/vadim/vpp/src/vppinfra/unix-misc.c:177
#3 0x00007ffff79fae5c in debugger () at /home/vadim/vpp/src/vppinfra/error.c:84
#4 0x00007ffff79fb253 in _clib_error (how_to_die=2, function_name=0x0, line_number=0, fmt=0x7ffff7f79d38 "%s:%d (%s) assertion `%s' fails") at /home/vadim/vpp/src/vppinfra/error.c:143
#5 0x00007ffff7f5a8a9 in vcl_worker_get (wrk_index=4294967295) at /home/vadim/vpp/src/vcl/vcl_private.h:650
#6 0x00007ffff7f5a8c3 in vcl_worker_get_current () at /home/vadim/vpp/src/vcl/vcl_private.h:664
#7 0x00007ffff7f5b567 in vcl_worker_cleanup_cb (arg=0x7fffe786bdd0) at /home/vadim/vpp/src/vcl/vcl_private.c:141
#8 0x00007ffff7f8f5a1 in __nptl_deallocate_tsd () at pthread_create.c:301
#9 0x00007ffff7f9062a in __nptl_deallocate_tsd () at pthread_create.c:256
#10 start_thread (arg=<optimized out>) at pthread_create.c:488
#11 0x00007ffff7c59293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

 

@vvalderrv
Copy link
Contributor Author

Hi Vadim! Could you try with vppcom_worker_unregister() at the end of func()?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant