Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IP cameras that are restarted do not come back online. #723

Open
curtishall opened this issue Dec 30, 2024 · 0 comments
Open

IP cameras that are restarted do not come back online. #723

curtishall opened this issue Dec 30, 2024 · 0 comments
Assignees
Milestone

Comments

@curtishall
Copy link
Member

Customer report:

Hi am running bluecherry 3.1.8 - there's been an issue for a few point versions where the server will fail to restore a stream after a camera disconnect or reboot. I've just disable maintenance restarts on my cameras to minimise the problem, however it requires a server restart to resume the failed streams. I did see this mentioned somewhere as an issue under investigation, hoping will be resolved soon.
The previous bluecherry server and old client worked reliably for me, however this is making the new version problematic.

Suggestions:

Add proper retry logic for handling temporary connection failures
Implement connection state tracking to manage reconnection attempts
Add TCP keepalive options to detect connection failures earlier
Add session monitoring and recovery mechanisms
Add better logging of connection state changes

@curtishall curtishall added this to the 3.1.10 milestone Dec 30, 2024
andrey-utkin added a commit to andrey-utkin/bluecherry-apps that referenced this issue Feb 12, 2025
In 3.1.2 release of Bluecherry server, we have upgraded bundled FFmpeg
from 4.2.1 to 6.1.1. (Later, in 3.1.3, we upgraded to FFmpeg 6.1.2.)

Between 4.2.1 and 6.1.1, FFmpeg's RTSP demuxer has changed a parameter
name from "stimeout" to "timeout", but we haven't updated
lib/lavf_device.cpp accordingly. This causes the cameras on which RTSP
connection broke to stop working until Bluecherry server is restarted or
the camera is reconfigured in the database. We have received many such
complaints from the users.

Link: bluecherrydvr#723
andrey-utkin added a commit to andrey-utkin/bluecherry-apps that referenced this issue Feb 12, 2025
In 3.1.2 release of Bluecherry server, we have upgraded bundled FFmpeg
from 4.2.1 to 6.1.1. (Later, in 3.1.3, we upgraded to FFmpeg 6.1.2.)

Between 4.2.1 and 6.1.1, FFmpeg's RTSP demuxer has changed a parameter
name from "stimeout" to "timeout", but we haven't updated
lib/lavf_device.cpp accordingly. This causes the cameras on which RTSP
connection broke to stop working until Bluecherry server is restarted or
the camera is reconfigured in the database. We have received many such
complaints from the users.

This error was established by observing this backtrace in the debugger,
after the corresponding camera was manually rebooted.

 (gdb) bt
 #0  0x000072f4cbb43bbf in __GI___poll (fds=fds@entry=0x72f4bc9f3e40, nfds=nfds@entry=1, timeout=timeout@entry=100) at ../sysdeps/unix/sysv/linux/poll.c:29
 bluecherrydvr#1  0x000072f4ce5e8230 in poll (__timeout=100, __nfds=1, __fds=0x72f4bc9f3e40) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
 bluecherrydvr#2  ff_network_wait_fd (fd=<optimized out>, write=<optimized out>) at libavformat/network.c:74
 bluecherrydvr#3  0x000072f4ce5e82a2 in ff_network_wait_fd_timeout (fd=14, write=write@entry=0, timeout=0, int_cb=0x72f4a4003d30) at libavformat/network.c:86
 bluecherrydvr#4  0x000072f4ce610b32 in tcp_read (h=<optimized out>, buf=0x72f4bc9f3f77 "$", size=1) at libavformat/tcp.c:292
 bluecherrydvr#5  0x000072f4ce5716c6 in retry_transfer_wrapper (cbuf=<optimized out>, read=<optimized out>, size_min=<optimized out>, size=<optimized out>, buf=<optimized out>, h=<optimized out>) at libavformat/avio.c:363
 bluecherrydvr#6  ffurl_read_complete (h=0x72f4a4003d00, buf=buf@entry=0x72f4bc9f3f77 "$", size=size@entry=1) at libavformat/avio.c:408
 bluecherrydvr#7  0x000072f4ce6066b1 in ff_rtsp_read_reply (s=s@entry=0x72f4a4000fc0, reply=reply@entry=0x72f4bc9f8550, content_ptr=content_ptr@entry=0x0, return_on_interleaved_data=return_on_interleaved_data@entry=1,
     method=method@entry=0x0) at libavformat/rtsp.c:1205
 bluecherrydvr#8  0x000072f4ce60bb1d in ff_rtsp_tcp_read_packet (s=s@entry=0x72f4a4000fc0, prtsp_st=prtsp_st@entry=0x72f4bc9fa128,
     buf=0x72f4a4106800 "\240\340`Q\353\267\331\332\060\005\361<|A\032\251\021\360\310\305i\f\r\020\330?\237\264*\370=Wk>|\210E\223-J\bn-\"\207\064w\317\tU\vU\305\375h\303\341|\362\304L\301\356\027>\025\201-\234\005\221\bZtz\321\323\206'2\006k\215\020\063xI/\221\037\336X\f\361\232\374\246\064.\324\246\365u\340\060\242\255\304\246\353IR,(9^\305\216\364\312\232V&\375\216E\177\236\334\n\036\212&]\215\367\252^\353\204\265\325\300\350\264\334\224\217K\307~\212\016\247J\205\210Q\257\235uU\025\017\351\340\344\227\305\255p\212\211Y\262\202K", buf_size=buf_size@entry=81920) at libavformat/rtspdec.c:795
 bluecherrydvr#9  0x000072f4ce608643 in read_packet (wait_end=0, first_queue_st=0x0, rtsp_st=0x72f4bc9fa128, s=0x72f4a4000fc0) at libavformat/rtsp.c:2179
 bluecherrydvr#10 ff_rtsp_fetch_packet (s=s@entry=0x72f4a4000fc0, pkt=pkt@entry=0x606cb7bf63d0) at libavformat/rtsp.c:2270
 bluecherrydvr#11 0x000072f4ce609e8b in rtsp_read_packet (s=0x72f4a4000fc0, pkt=0x606cb7bf63d0) at libavformat/rtspdec.c:912
 bluecherrydvr#12 0x000072f4ce57bbb6 in ff_read_packet (s=s@entry=0x72f4a4000fc0, pkt=pkt@entry=0x606cb7bf63d0) at libavformat/demux.c:576
 bluecherrydvr#13 0x000072f4ce57c45b in read_frame_internal (s=0x72f4a4000fc0, pkt=0x606cb7bf63d0) at libavformat/demux.c:1264
 bluecherrydvr#14 0x000072f4ce57d18d in av_read_frame (s=0x72f4a4000fc0, pkt=pkt@entry=0x606cb7bf63d0) at libavformat/demux.c:1500
 bluecherrydvr#15 0x000072f4ce6b6916 in lavf_device::read_packet (this=0x606cb7bf5d70) at lavf_device.cpp:227
 bluecherrydvr#16 0x0000606cb5bef080 in bc_record::run (this=0x606cb7bf18b0) at bc-thread.cpp:368
 bluecherrydvr#17 0x0000606cb5bf145d in bc_device_thread (data=0x606cb7bf18b0) at bc-thread.cpp:142
 bluecherrydvr#18 0x000072f4cba16609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 bluecherrydvr#19 0x000072f4cbb50353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Note

    ff_network_wait_fd_timeout (..., ..., timeout=0, ...)

which says there's no ultimate timeout for reading from the socket.

Link: bluecherrydvr#723
andrey-utkin added a commit that referenced this issue Feb 13, 2025
In 3.1.2 release of Bluecherry server, we have upgraded bundled FFmpeg
from 4.2.1 to 6.1.1. (Later, in 3.1.3, we upgraded to FFmpeg 6.1.2.)

Between 4.2.1 and 6.1.1, FFmpeg's RTSP demuxer has changed a parameter
name from "stimeout" to "timeout", but we haven't updated
lib/lavf_device.cpp accordingly. This causes the cameras on which RTSP
connection broke to stop working until Bluecherry server is restarted or
the camera is reconfigured in the database. We have received many such
complaints from the users.

This error was established by observing this backtrace in the debugger,
after the corresponding camera was manually rebooted.

 (gdb) bt
 #0  0x000072f4cbb43bbf in __GI___poll (fds=fds@entry=0x72f4bc9f3e40, nfds=nfds@entry=1, timeout=timeout@entry=100) at ../sysdeps/unix/sysv/linux/poll.c:29
 #1  0x000072f4ce5e8230 in poll (__timeout=100, __nfds=1, __fds=0x72f4bc9f3e40) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
 #2  ff_network_wait_fd (fd=<optimized out>, write=<optimized out>) at libavformat/network.c:74
 #3  0x000072f4ce5e82a2 in ff_network_wait_fd_timeout (fd=14, write=write@entry=0, timeout=0, int_cb=0x72f4a4003d30) at libavformat/network.c:86
 #4  0x000072f4ce610b32 in tcp_read (h=<optimized out>, buf=0x72f4bc9f3f77 "$", size=1) at libavformat/tcp.c:292
 #5  0x000072f4ce5716c6 in retry_transfer_wrapper (cbuf=<optimized out>, read=<optimized out>, size_min=<optimized out>, size=<optimized out>, buf=<optimized out>, h=<optimized out>) at libavformat/avio.c:363
 #6  ffurl_read_complete (h=0x72f4a4003d00, buf=buf@entry=0x72f4bc9f3f77 "$", size=size@entry=1) at libavformat/avio.c:408
 #7  0x000072f4ce6066b1 in ff_rtsp_read_reply (s=s@entry=0x72f4a4000fc0, reply=reply@entry=0x72f4bc9f8550, content_ptr=content_ptr@entry=0x0, return_on_interleaved_data=return_on_interleaved_data@entry=1,
     method=method@entry=0x0) at libavformat/rtsp.c:1205
 #8  0x000072f4ce60bb1d in ff_rtsp_tcp_read_packet (s=s@entry=0x72f4a4000fc0, prtsp_st=prtsp_st@entry=0x72f4bc9fa128,
     buf=0x72f4a4106800 "\240\340`Q\353\267\331\332\060\005\361<|A\032\251\021\360\310\305i\f\r\020\330?\237\264*\370=Wk>|\210E\223-J\bn-\"\207\064w\317\tU\vU\305\375h\303\341|\362\304L\301\356\027>\025\201-\234\005\221\bZtz\321\323\206'2\006k\215\020\063xI/\221\037\336X\f\361\232\374\246\064.\324\246\365u\340\060\242\255\304\246\353IR,(9^\305\216\364\312\232V&\375\216E\177\236\334\n\036\212&]\215\367\252^\353\204\265\325\300\350\264\334\224\217K\307~\212\016\247J\205\210Q\257\235uU\025\017\351\340\344\227\305\255p\212\211Y\262\202K", buf_size=buf_size@entry=81920) at libavformat/rtspdec.c:795
 #9  0x000072f4ce608643 in read_packet (wait_end=0, first_queue_st=0x0, rtsp_st=0x72f4bc9fa128, s=0x72f4a4000fc0) at libavformat/rtsp.c:2179
 #10 ff_rtsp_fetch_packet (s=s@entry=0x72f4a4000fc0, pkt=pkt@entry=0x606cb7bf63d0) at libavformat/rtsp.c:2270
 #11 0x000072f4ce609e8b in rtsp_read_packet (s=0x72f4a4000fc0, pkt=0x606cb7bf63d0) at libavformat/rtspdec.c:912
 #12 0x000072f4ce57bbb6 in ff_read_packet (s=s@entry=0x72f4a4000fc0, pkt=pkt@entry=0x606cb7bf63d0) at libavformat/demux.c:576
 #13 0x000072f4ce57c45b in read_frame_internal (s=0x72f4a4000fc0, pkt=0x606cb7bf63d0) at libavformat/demux.c:1264
 #14 0x000072f4ce57d18d in av_read_frame (s=0x72f4a4000fc0, pkt=pkt@entry=0x606cb7bf63d0) at libavformat/demux.c:1500
 #15 0x000072f4ce6b6916 in lavf_device::read_packet (this=0x606cb7bf5d70) at lavf_device.cpp:227
 #16 0x0000606cb5bef080 in bc_record::run (this=0x606cb7bf18b0) at bc-thread.cpp:368
 #17 0x0000606cb5bf145d in bc_device_thread (data=0x606cb7bf18b0) at bc-thread.cpp:142
 #18 0x000072f4cba16609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 #19 0x000072f4cbb50353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Note

    ff_network_wait_fd_timeout (..., ..., timeout=0, ...)

which says there's no ultimate timeout for reading from the socket.

Link: #723
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants