Zhixiong Chi 134d5d2fbd Add the pci reboot quirk in DMI table for Dell PowerEdge R750
Problem:
The Dell R750 will hang after the following command being executed:
$sudo -i /bin/bash -c 'echo b > /proc/sysrq-trigger'
This issue can be reproduced almost within 5 times testing cycle.

The activated controller will send reboot command to mtcClient on the
standby controller due to the SM failure(heartbeat missed), and then
mtcClient tries to reboot the system gracefully. But if the standby
controller isn't rebooted within 120s, mtcClient tries to force reboot
it using the following command "echo b > /proc/sysrq-trigger".
Unfortunately the machine Dell PowerEdge R750 is stuck and the BMC
console doesn't show anything.

Solution:
After searching if there is any revelant clues about this machine,
nothing was found but the kernel parameter 'reboot=p' to change the
reboot type to pci_reboot for the sysrq magic key. With doing the test
cycle multiple times, and the issue has been gone with the kernel
option. The behavior that the system can reboot properly is expected.
So this way should be helpful for the Dell R750 reset.
Considering this kernel option should not be applicable to all target
machines, we just adjust the method to change reboot type for R750
machine based on DMI table quirk. The other kind of machine still uses
the default reboot type, and this commit just affects the R750 machine.

Base on the above, we add the pci reboot quirk in DMI table to change
the reboot_type to pci_reboot to make sure the kernel On Dell PowerEdge
R750 reboot properly.

On the R750 target we can see the following dmidecode information:
$sudo dmidecode |grep 'Product Name'
	Product Name: PowerEdge R750
$sudo dmidecode |grep 'Vendor'
	Vendor: Dell Inc.

TestPlan:
PASS: downloader && build-pkgs && build-image
PASS: Jenkins Installation on R750 machine and the other labs.
PASS: Execute the following testing cycle more than 20 times:
       $sudo -i /bin/bash -c 'echo b > /proc/sysrq-trigger'
       The system can reboot properly every time during test cycles.
       The stuck issue after reset hasn't been seen anymore.

Closes-Bug: 2041606

Signed-off-by: Zhixiong Chi <zhixiong.chi@windriver.com>
Change-Id: I05467cc6d5105aa813852dca0c935278741b043f
2023-10-30 22:30:42 -04:00
..