kernel/kernel-std/debian/patches/0072-perf-core-Use-perf_cgroup_info-active-to-check-if-cg.patch
Alyson Deives Pereira 2f15f5cb6c perf/core: Fix perf_cgroup_switch()
When the system is stressed running pods on isolated cores (using
stress-ng for instance [1]) and the Power Metrics App [2] is also
being executed, the system hangs.

[1] https://github.com/ColinIanKing/stress-ng
[2] https://opendev.org/starlingx/app-power-metrics

Dmesg shows the following output:
WARNING: CPU: 16 PID: 207561 at
  kernel/events/core.c:868 perf_cgroup_switch+0x222/0x230
RIP: 0010:perf_cgroup_switch+0x222/0x230
Call Trace:
 ? __warn+0x79/0xc0
 ? perf_cgroup_switch+0x222/0x230
 ? report_bug+0x9e/0xc0
 ? handle_bug+0x41/0x90
 ? exc_invalid_op+0x14/0x70
 ? asm_exc_invalid_op+0x12/0x20
 ? perf_cgroup_switch+0x222/0x230
 ? perf_cgroup_switch+0xff/0x230
 __perf_event_task_sched_in+0x169/0x330
 ? __perf_event_task_sched_out+0x27c/0x6d0
 ? newidle_balance+0x3fd/0x480
 finish_task_switch.isra.0+0x118/0x4b0
 __schedule+0x2ae/0x930
 ? hrtimer_start_range_ns+0x2fc/0x420
 schedule+0xa7/0x110
 do_nanosleep+0x7c/0x1a0
 hrtimer_nanosleep+0x9b/0x140
 ? __hrtimer_init+0xe0/0xe0
 __x64_sys_nanosleep+0xad/0xe0
 do_syscall_64+0x30/0x40
 entry_SYSCALL_64_after_hwframe+0x61/0xc6

There is an upstream patch set that fix a race condition on
perf_cgroup_switch. Applying these patches into stx kernel solved the
issue.

* commit a0827713e298
  ("perf/core: Don't pass task around when ctx sched in")
  (v5.18-rc2~8^2~3)

* commit 6875186aea5c
  ("perf/core: perf/core: Use perf_cgroup_info->active to check if
  cgroup is active") (v5.18-rc2~8^2~2)

* commit 96492a6c558a
  ("perf/core: Fix perf_cgroup_switch()") (v5.18-rc2~8^2~1)

* commit e19cd0b6fa59
  ("perf/core: Always set cpuctx cgrp when enable cgroup event")
  (v5.18-rc2~8^2)

Note: It was verified that are no "fixes" commits from mainline kernel
to the commits mentioned above

Test plan:
PASS: Build iso success for rt and std.
PASS: Install success onto a AIO-SX lab with both rt and std kernel.
PASS: Apply power-metrics app, launch stress pods and confirm the
      system is stable.

Closes-Bug: 2035124
Change-Id: I30fcb63e4564a23cdb26794f4dfefa748eaa0cee
Signed-off-by: Alyson Deives Pereira <alyson.deivespereira@windriver.com>
2023-09-13 10:16:15 -03:00

54 lines
1.8 KiB
Diff

From 020b859a2c3ff55c35d5789a41286b0f4f520ced Mon Sep 17 00:00:00 2001
From: Chengming Zhou <zhouchengming@bytedance.com>
Date: Tue, 29 Mar 2022 23:45:21 +0800
Subject: [PATCH 72/74] perf/core: Use perf_cgroup_info->active to check if
cgroup is active
Since we use perf_cgroup_set_timestamp() to start cgroup time and
set active to 1, then use update_cgrp_time_from_cpuctx() to stop
cgroup time and set active to 0.
We can use info->active directly to check if cgroup is active.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220329154523.86438-3-zhouchengming@bytedance.com
(cherry picked from commit 6875186aea5ce09a644758d9193265da1cc187c7)
Signed-off-by: Alyson Deives Pereira <alyson.deivespereira@windriver.com>
---
kernel/events/core.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index a8b758ec7be0..6f9cc6d32a73 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -779,7 +779,6 @@ static inline void update_cgrp_time_from_cpuctx(struct perf_cpu_context *cpuctx,
static inline void update_cgrp_time_from_event(struct perf_event *event)
{
struct perf_cgroup_info *info;
- struct perf_cgroup *cgrp;
/*
* ensure we access cgroup data only when needed and
@@ -788,14 +787,12 @@ static inline void update_cgrp_time_from_event(struct perf_event *event)
if (!is_cgroup_event(event))
return;
- cgrp = perf_cgroup_from_task(current, event->ctx);
+ info = this_cpu_ptr(event->cgrp->info);
/*
* Do not update time when cgroup is not active
*/
- if (cgroup_is_descendant(cgrp->css.cgroup, event->cgrp->css.cgroup)) {
- info = this_cpu_ptr(event->cgrp->info);
+ if (info->active)
__update_cgrp_time(info, perf_clock(), true);
- }
}
static inline void
--
2.25.1