kernel/kernel-std/debian/patches/0030-x86-Enumerate-AVX512-FP16-CPUID-feature-flag.patch
M. Vefa Bicakci d642a320e3 kernel: Add SPR-5G-ISA/AVX512-FP16 support
This commit adds Sapphire Rapids 5G Instruction Set Architecture
(SPR-5G-ISA) support to the CentOS-based StarlingX kernel. This involves
AVX512-FP16 instructions, but note that these instructions are not
directly used by the kernel.

The benefits for StarlingX users are the ability to enumerate CPUs'
AVX512-FP16 capabilities, and the ability to start KVM-based virtual
machines that can make use of the capabilities in question. (Please note
that supporting AVX512-FP16 with KVM virtual machines requires patching
StarlingX's qemu-kvm-ev package in addition to this commit, as well.)

The cherry-picked commits were acquired from the v5.11 kernel release,
and all of them applied cleanly. The only change to the patches involved
the third patch, which was modified to remove a reference to a CPU
feature (X86_FEATURE_VM_PAGE_FLUSH) support for which is not provided by
StarlingX's v5.10 kernel baseline.

Test plan:

- CentOS-based StarlingX
  - Standard and preempt-rt kernels and all out-of-tree kernel modules
    were successfully built using a monolithic build procedure.
  - An ISO image was successfully built with this change.
  - The changes were confirmed to not negatively affect installation and
    Ansible boot-strap procedures in All-in-One Simplex virtual machines
    using standard and low-latency profiles.
  - Using a Sapphire Rapids-based server in All-in-One Simplex
    configuration, the aforementioned ISO image was installed and
    Ansible-bootstrapped, and the enumeration of the "avx512_fp16" CPU
    feature in /proc/cpuinfo was verified with the low-latency and
    standard kernels.

- Debian-based StarlingX
  - An ISO image was successfully built (in an incremental manner) with
    this change.
  - The changes were confirmed to not negatively affect installation and
    Ansible boot-strap procedures in All-in-One Simplex virtual machines
    using standard and low-latency profiles. (Due to time constraints,
    Debian-based StarlingX tests were carried out with virtual machines
    only.)

Story: 2010247
Task: 46073

Change-Id: I430de20651b6c4a0aa0d854d295b1760cb7b889c
Signed-off-by: M. Vefa Bicakci <vefa.bicakci@windriver.com>
2022-08-23 13:16:17 -04:00

61 lines
2.8 KiB
Diff

From 91f5728b6794550f08febea3bfe4018071520727 Mon Sep 17 00:00:00 2001
From: Kyung Min Park <kyung.min.park@intel.com>
Date: Mon, 7 Dec 2020 19:34:40 -0800
Subject: [PATCH] x86: Enumerate AVX512 FP16 CPUID feature flag
Enumerate AVX512 Half-precision floating point (FP16) CPUID feature
flag. Compared with using FP32, using FP16 cut the number of bits
required for storage in half, reducing the exponent from 8 bits to 5,
and the mantissa from 23 bits to 10. Using FP16 also enables developers
to train and run inference on deep learning models fast when all
precision or magnitude (FP32) is not needed.
A processor supports AVX512 FP16 if CPUID.(EAX=7,ECX=0):EDX[bit 23]
is present. The AVX512 FP16 requires AVX512BW feature be implemented
since the instructions for manipulating 32bit masks are associated with
AVX512BW.
The only in-kernel usage of this is kvm passthrough. The CPU feature
flag is shown as "avx512_fp16" in /proc/cpuinfo.
Signed-off-by: Kyung Min Park <kyung.min.park@intel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Message-Id: <20201208033441.28207-2-kyung.min.park@intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit e1b35da5e624f8b09d2e98845c2e4c84b179d9a4)
Signed-off-by: M. Vefa Bicakci <vefa.bicakci@windriver.com>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 3b407f46f1a0..b5252fd26682 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -374,6 +374,7 @@
#define X86_FEATURE_TSXLDTRK (18*32+16) /* TSX Suspend Load Address Tracking */
#define X86_FEATURE_PCONFIG (18*32+18) /* Intel PCONFIG */
#define X86_FEATURE_ARCH_LBR (18*32+19) /* Intel ARCH LBR */
+#define X86_FEATURE_AVX512_FP16 (18*32+23) /* AVX512 FP16 */
#define X86_FEATURE_SPEC_CTRL (18*32+26) /* "" Speculation Control (IBRS + IBPB) */
#define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */
#define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index d502241995a3..42af31b64c2c 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -69,6 +69,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_CQM_MBM_TOTAL, X86_FEATURE_CQM_LLC },
{ X86_FEATURE_CQM_MBM_LOCAL, X86_FEATURE_CQM_LLC },
{ X86_FEATURE_AVX512_BF16, X86_FEATURE_AVX512VL },
+ { X86_FEATURE_AVX512_FP16, X86_FEATURE_AVX512BW },
{ X86_FEATURE_ENQCMD, X86_FEATURE_XSAVES },
{ X86_FEATURE_PER_THREAD_MBA, X86_FEATURE_MBA },
{}
--
2.29.2