Jiping Ma bf940a84c8 rcu: Avoid RCU-related unexpected reboot
We encountered an unexpected system reboot during a stress test, and
based on the kernel warning backtrace, it looks like a race condition in
the RCU subsystem caused this issue.

It is similar to the issue reported at
https://lore.kernel.org/all/20210917211148.GU4156@paulmck-ThinkPad-P17-Gen-1/#t
Guillaume Morin applied two patches, then he can not reproduce it again.
commit 2431774f04 [rcu: Mark accesses to rcu_state.n_force_qs] had been
in upstream Linux kernel.
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/
commit/?h=rcu/next&id=325a2030b90376d179a129794e2fae2b24d73923
[rcu: Tighten rcu_advance_cbs_nowake() checks]
Paul E. McKenney provided, which was not pushed to the mainline now.
(It would be slated for the v5.17 merge window by default)

Basically the rcu_advance_cbs() == true warning in
rcu_advance_cbs_nowake() is firing then everything eventually gets
stuck on RCU synchronization.

WARNING: CPU: 35 PID: 2743975 at kernel/rcu/tree.c:1589
rcu_advance_cbs_nowake+0x78/0x80
......
Call Trace:
call_rcu+0x173/0x5c0
task_work_run+0x6d/0xa0
exit_to_user_mode_prepare+0x130/0x140
syscall_exit_to_user_mode+0x27/0x1d0
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Verification:
The formal regression tests were carried out by colleagues in the test
team at Wind River, which include userspace packages, ltp and posix,
basic networking test etc.

Closes-Bug: #1952710

Signed-off-by: Jiping Ma <jiping.ma2@windriver.com>
Change-Id: I4ee7d3d007edced81c9ac43c5850f941f1d393ee
2021-12-02 17:05:51 +00:00
2020-04-21 16:01:28 -04:00
Description
StarlingX Linux kernel
12 MiB
Languages
Python 49.5%
Shell 25.1%
Makefile 23.3%
Perl 1.2%
POV-Ray SDL 0.9%