Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VPP-178] VPP stuck at boot due to SVM deadlock #1434

Closed
vvalderrv opened this issue Jan 31, 2025 · 2 comments
Closed

[VPP-178] VPP stuck at boot due to SVM deadlock #1434

vvalderrv opened this issue Jan 31, 2025 · 2 comments

Comments

@vvalderrv
Copy link
Contributor

Description

Sometimes, after VPP crashed, it cannot be restarted.

It boots and gets stuck at some point.

Here are some logs. Notice this part:

'svm_map_region:580: region /global_vm mutex held by dead pid 3464, tag 4, force unlock

svm_map_region:588: recovery: attempt to re-lock region

svm_map_region:595: recovery: attempt svm_data_region_map

svm_map_region:603: unlock and continue'

VPP detects something is wrong, but apparently fails to solve it without notifying the user.

Here are the full logs.

Running: sudo gdb -x /tmp/vpp.sh.gdbinit --args /home/ppfister/vpp-dev/build/open-vpp-mirror/build-root/install-vpp-native/vpp/bin/vpp cpu

{ workers 1 }
unix

{ interactive cli-history-limit 500 }
api-trace

{ on }
dpdk

{ socket-mem 1024,1024 coremask 3 no-pci }

GNU gdb (Ubuntu 7.11-0ubuntu1) 7.11

Copyright (C) 2016 Free Software Foundation, Inc.

License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law. Type "show copying"

and "show warranty" for details.

This GDB was configured as "x86_64-linux-gnu".

Type "show configuration" for configuration details.

For bug reporting instructions, please see:

<http://www.gnu.org/software/gdb/bugs/>.

Find the GDB manual and other documentation resources online at:

<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".

Type "apropos word" to search for commands related to "word"...

Reading symbols from /home/ppfister/vpp-dev/build/open-vpp-mirror/build-root/install-vpp-native/vpp/bin/vpp...done.

[Thread debugging using libthread_db enabled]

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

vlib_plugin_early_init:206: plugin path /usr/lib/vpp_plugins

svm_map_region:580: region /global_vm mutex held by dead pid 3464, tag 4, force unlock

svm_map_region:588: recovery: attempt to re-lock region

svm_map_region:595: recovery: attempt svm_data_region_map

svm_map_region:603: unlock and continue

EAL: Detected lcore 0 as core 0 on socket 0

EAL: Detected lcore 1 as core 1 on socket 0

EAL: Detected lcore 2 as core 2 on socket 0

EAL: Detected lcore 3 as core 3 on socket 0

EAL: Detected lcore 4 as core 4 on socket 0

EAL: Detected lcore 5 as core 5 on socket 0

EAL: Detected lcore 6 as core 0 on socket 1

EAL: Detected lcore 7 as core 1 on socket 1

EAL: Detected lcore 8 as core 2 on socket 1

EAL: Detected lcore 9 as core 3 on socket 1

EAL: Detected lcore 10 as core 4 on socket 1

EAL: Detected lcore 11 as core 5 on socket 1

EAL: Detected lcore 12 as core 0 on socket 0

EAL: Detected lcore 13 as core 1 on socket 0

EAL: Detected lcore 14 as core 2 on socket 0

EAL: Detected lcore 15 as core 3 on socket 0

EAL: Detected lcore 16 as core 4 on socket 0

EAL: Detected lcore 17 as core 5 on socket 0

EAL: Detected lcore 18 as core 0 on socket 1

EAL: Detected lcore 19 as core 1 on socket 1

EAL: Detected lcore 20 as core 2 on socket 1

EAL: Detected lcore 21 as core 3 on socket 1

EAL: Detected lcore 22 as core 4 on socket 1

EAL: Detected lcore 23 as core 5 on socket 1

EAL: Support maximum 256 logical core(s) by configuration.

EAL: Detected 24 lcore(s)

EAL: No free hugepages reported in hugepages-1048576kB

EAL: Setting up physically contiguous memory...

EAL: Ask a virtual area of 0xa800000 bytes

EAL: Virtual area found at 0x7fff8c400000 (size = 0xa800000)

EAL: Ask a virtual area of 0x200000 bytes

EAL: Virtual area found at 0x7fff8c000000 (size = 0x200000)

EAL: Ask a virtual area of 0x3e800000 bytes

EAL: Virtual area found at 0x7fff4d600000 (size = 0x3e800000)

EAL: Ask a virtual area of 0x200000 bytes

EAL: Virtual area found at 0x7fff4d200000 (size = 0x200000)

EAL: Ask a virtual area of 0x1b6c00000 bytes

EAL: Virtual area found at 0x7ffd96400000 (size = 0x1b6c00000)

EAL: Ask a virtual area of 0x1ff800000 bytes

EAL: Virtual area found at 0x7ffb96a00000 (size = 0x1ff800000)

EAL: Ask a virtual area of 0x400000 bytes

EAL: Virtual area found at 0x7ffb96400000 (size = 0x400000)

EAL: Ask a virtual area of 0x200000 bytes

EAL: Virtual area found at 0x7ffb96000000 (size = 0x200000)

EAL: Ask a virtual area of 0x200000 bytes

EAL: Virtual area found at 0x7ffb95c00000 (size = 0x200000)

EAL: Requesting 512 pages of size 2MB from socket 0

EAL: Requesting 512 pages of size 2MB from socket 1

EAL: TSC frequency is ~3391783 KHz

EAL: Master lcore 0 is ready (tid=f7fe08c0;cpuset=[0])

EAL: lcore 1 is ready (tid=9d09e700;cpuset=[1])

DPDK physical memory layout:

Segment 0: phys:0x2ac00000, len:176160768, virt:0x7fff8c400000, socket_id:0, hugepage_sz:2097152, nchannel:0, nrank:0

Segment 1: phys:0x35600000, len:2097152, virt:0x7fff8c000000, socket_id:0, hugepage_sz:2097152, nchannel:0, nrank:0

Segment 2: phys:0x36400000, len:895483904, virt:0x7fff4d600000, socket_id:0, hugepage_sz:2097152, nchannel:0, nrank:0

Segment 3: phys:0x1e2cc00000, len:1073741824, virt:0x7ffb96a00000, socket_id:1, hugepage_sz:2097152, nchannel:0, nrank:0

Thread 1 "vpp_main" received signal SIGINT, Interrupt.

__lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135

135 ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.

(gdb) bt

#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135

#1 0x00007ffff62d1dfd in _GI__pthread_mutex_lock (mutex=mutex@entry=0x30444008) at ../nptl/pthread_mutex_lock.c:80

#2 0x00007ffff6e7ea26 in region_lock (rp=rp@entry=0x30444000, tag=tag@entry=2)

at /home/ppfister/vpp-dev/build/open-vpp-mirror/build-data/../svm/svm.c:62

#3 0x00007ffff6e7f946 in svm_map_region (a=a@entry=0x7fffc5477e60) at /home/ppfister/vpp-dev/build/open-vpp-mirror/build-data/../svm/svm.c:590

#4 0x00007ffff6e800b7 in svm_region_find_or_create (a=a@entry=0x7fffc5477e60)

at /home/ppfister/vpp-dev/build/open-vpp-mirror/build-data/../svm/svm.c:724

#5 0x00007ffff79b0c34 in vl_map_shmem (region_name=0x5b1b28 "/vpe-api", is_vlib=is_vlib@entry=1)

at /home/ppfister/vpp-dev/build/open-vpp-mirror/build-data/../vlib-api/vlibmemory/memory_shared.c:235

#6 0x00007ffff79b70df in memory_api_init (region_name=)

at /home/ppfister/vpp-dev/build/open-vpp-mirror/build-data/../vlib-api/vlibmemory/memory_vlib.c:310

#7 memclnt_process (vm=0x8d4080 <vlib_global_main>, node=0x7fffc546f000, f=)

at /home/ppfister/vpp-dev/build/open-vpp-mirror/build-data/../vlib-api/vlibmemory/memory_vlib.c:365

#8 0x00007ffff7541436 in vlib_process_bootstrap (_a=)

at /home/ppfister/vpp-dev/build/open-vpp-mirror/build-data/../vlib/vlib/main.c:1177

#9 0x00007ffff68167d0 in clib_calljmp () at /home/ppfister/vpp-dev/build/open-vpp-mirror/build-data/../vppinfra/vppinfra/longjmp.S:110

#10 0x00007fffc59a0e30 in ?? ()

#11 0x00007ffff75423e9 in vlib_process_startup (f=0x0, p=0x7fffc546f000, vm=0x8d4080 <vlib_global_main>)

at /home/ppfister/vpp-dev/build/open-vpp-mirror/build-data/../vlib/vlib/main.c:1201

#12 dispatch_process (vm=0x8d4080 <vlib_global_main>, p=0x7fffc546f000, last_time_stamp=878648214215779, f=0x0)

at /home/ppfister/vpp-dev/build/open-vpp-mirror/build-data/../vlib/vlib/main.c:1246

#13 0x0000000000000004 in ?? ()

#14 0x0000000000000000 in ?? ()

(gdb)

Assignee

Unassigned

Reporter

Pierre Pfister

Comments

  • ppfister (Tue, 5 Jul 2016 11:19:03 +0000): Thanks for the update regarding coremask.

socket-mem may be the same story. It comes from the old days. Should I remove it ?

I will try to reproduce without these elements.

Although I think it also appears in other configurations.

  • dmarion (Tue, 5 Jul 2016 11:13:29 +0000):

    What is the VPP version?

What is the reason for specifying "socket-mem 1024,1024"?

"coremask 3" is deprecated, and should not be used. Use cpu


{ ... }
commands instead.

Original issue: https://jira.fd.io/browse/VPP-178

@vvalderrv
Copy link
Contributor Author

Thanks for the update regarding coremask.

socket-mem may be the same story. It comes from the old days. Should I remove it ?

I will try to reproduce without these elements.
Although I think it also appears in other configurations.

@vvalderrv
Copy link
Contributor Author

What is the VPP version?
What is the reason for specifying "socket-mem 1024,1024"?
"coremask 3" is deprecated, and should not be used. Use cpu

{ ... }

commands instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant