
When the system is stressed running pods on isolated cores (using stress-ng for instance [1]) and the Power Metrics App [2] is also being executed, the system hangs. [1] https://github.com/ColinIanKing/stress-ng [2] https://opendev.org/starlingx/app-power-metrics Dmesg shows the following output: WARNING: CPU: 16 PID: 207561 at kernel/events/core.c:868 perf_cgroup_switch+0x222/0x230 RIP: 0010:perf_cgroup_switch+0x222/0x230 Call Trace: ? __warn+0x79/0xc0 ? perf_cgroup_switch+0x222/0x230 ? report_bug+0x9e/0xc0 ? handle_bug+0x41/0x90 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x12/0x20 ? perf_cgroup_switch+0x222/0x230 ? perf_cgroup_switch+0xff/0x230 __perf_event_task_sched_in+0x169/0x330 ? __perf_event_task_sched_out+0x27c/0x6d0 ? newidle_balance+0x3fd/0x480 finish_task_switch.isra.0+0x118/0x4b0 __schedule+0x2ae/0x930 ? hrtimer_start_range_ns+0x2fc/0x420 schedule+0xa7/0x110 do_nanosleep+0x7c/0x1a0 hrtimer_nanosleep+0x9b/0x140 ? __hrtimer_init+0xe0/0xe0 __x64_sys_nanosleep+0xad/0xe0 do_syscall_64+0x30/0x40 entry_SYSCALL_64_after_hwframe+0x61/0xc6 There is an upstream patch set that fix a race condition on perf_cgroup_switch. Applying these patches into stx kernel solved the issue. * commit a0827713e298 ("perf/core: Don't pass task around when ctx sched in") (v5.18-rc2~8^2~3) * commit 6875186aea5c ("perf/core: perf/core: Use perf_cgroup_info->active to check if cgroup is active") (v5.18-rc2~8^2~2) * commit 96492a6c558a ("perf/core: Fix perf_cgroup_switch()") (v5.18-rc2~8^2~1) * commit e19cd0b6fa59 ("perf/core: Always set cpuctx cgrp when enable cgroup event") (v5.18-rc2~8^2) Note: It was verified that are no "fixes" commits from mainline kernel to the commits mentioned above Test plan: PASS: Build iso success for rt and std. PASS: Install success onto a AIO-SX lab with both rt and std kernel. PASS: Apply power-metrics app, launch stress pods and confirm the system is stable. Closes-Bug: 2035124 Change-Id: I30fcb63e4564a23cdb26794f4dfefa748eaa0cee Signed-off-by: Alyson Deives Pereira <alyson.deivespereira@windriver.com>
54 lines
1.8 KiB
Diff
54 lines
1.8 KiB
Diff
From 020b859a2c3ff55c35d5789a41286b0f4f520ced Mon Sep 17 00:00:00 2001
|
|
From: Chengming Zhou <zhouchengming@bytedance.com>
|
|
Date: Tue, 29 Mar 2022 23:45:21 +0800
|
|
Subject: [PATCH 72/74] perf/core: Use perf_cgroup_info->active to check if
|
|
cgroup is active
|
|
|
|
Since we use perf_cgroup_set_timestamp() to start cgroup time and
|
|
set active to 1, then use update_cgrp_time_from_cpuctx() to stop
|
|
cgroup time and set active to 0.
|
|
|
|
We can use info->active directly to check if cgroup is active.
|
|
|
|
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
|
|
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
|
|
Link: https://lore.kernel.org/r/20220329154523.86438-3-zhouchengming@bytedance.com
|
|
(cherry picked from commit 6875186aea5ce09a644758d9193265da1cc187c7)
|
|
Signed-off-by: Alyson Deives Pereira <alyson.deivespereira@windriver.com>
|
|
---
|
|
kernel/events/core.c | 7 ++-----
|
|
1 file changed, 2 insertions(+), 5 deletions(-)
|
|
|
|
diff --git a/kernel/events/core.c b/kernel/events/core.c
|
|
index a8b758ec7be0..6f9cc6d32a73 100644
|
|
--- a/kernel/events/core.c
|
|
+++ b/kernel/events/core.c
|
|
@@ -779,7 +779,6 @@ static inline void update_cgrp_time_from_cpuctx(struct perf_cpu_context *cpuctx,
|
|
static inline void update_cgrp_time_from_event(struct perf_event *event)
|
|
{
|
|
struct perf_cgroup_info *info;
|
|
- struct perf_cgroup *cgrp;
|
|
|
|
/*
|
|
* ensure we access cgroup data only when needed and
|
|
@@ -788,14 +787,12 @@ static inline void update_cgrp_time_from_event(struct perf_event *event)
|
|
if (!is_cgroup_event(event))
|
|
return;
|
|
|
|
- cgrp = perf_cgroup_from_task(current, event->ctx);
|
|
+ info = this_cpu_ptr(event->cgrp->info);
|
|
/*
|
|
* Do not update time when cgroup is not active
|
|
*/
|
|
- if (cgroup_is_descendant(cgrp->css.cgroup, event->cgrp->css.cgroup)) {
|
|
- info = this_cpu_ptr(event->cgrp->info);
|
|
+ if (info->active)
|
|
__update_cgrp_time(info, perf_clock(), true);
|
|
- }
|
|
}
|
|
|
|
static inline void
|
|
--
|
|
2.25.1
|
|
|