Skip to content

Commit 49d5759

Browse files
committed
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini: "ARM: - Provide a virtual cache topology to the guest to avoid inconsistencies with migration on heterogenous systems. Non secure software has no practical need to traverse the caches by set/way in the first place - Add support for taking stage-2 access faults in parallel. This was an accidental omission in the original parallel faults implementation, but should provide a marginal improvement to machines w/o FEAT_HAFDBS (such as hardware from the fruit company) - A preamble to adding support for nested virtualization to KVM, including vEL2 register state, rudimentary nested exception handling and masking unsupported features for nested guests - Fixes to the PSCI relay that avoid an unexpected host SVE trap when resuming a CPU when running pKVM - VGIC maintenance interrupt support for the AIC - Improvements to the arch timer emulation, primarily aimed at reducing the trap overhead of running nested - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the interest of CI systems - Avoid VM-wide stop-the-world operations when a vCPU accesses its own redistributor - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions in the host - Aesthetic and comment/kerneldoc fixes - Drop the vestiges of the old Columbia mailing list and add [Oliver] as co-maintainer RISC-V: - Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE - Correctly place the guest in S-mode after redirecting a trap to the guest - Redirect illegal instruction traps to guest - SBI PMU support for guest s390: - Sort out confusion between virtual and physical addresses, which currently are the same on s390 - A new ioctl that performs cmpxchg on guest memory - A few fixes x86: - Change tdp_mmu to a read-only parameter - Separate TDP and shadow MMU page fault paths - Enable Hyper-V invariant TSC control - Fix a variety of APICv and AVIC bugs, some of them real-world, some of them affecting architecurally legal but unlikely to happen in practice - Mark APIC timer as expired if its in one-shot mode and the count underflows while the vCPU task was being migrated - Advertise support for Intel's new fast REP string features - Fix a double-shootdown issue in the emergency reboot code - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give SVM similar treatment to VMX - Update Xen's TSC info CPUID sub-leaves as appropriate - Add support for Hyper-V's extended hypercalls, where "support" at this point is just forwarding the hypercalls to userspace - Clean up the kvm->lock vs. kvm->srcu sequences when updating the PMU and MSR filters - One-off fixes and cleanups - Fix and cleanup the range-based TLB flushing code, used when KVM is running on Hyper-V - Add support for filtering PMU events using a mask. If userspace wants to restrict heavily what events the guest can use, it can now do so without needing an absurd number of filter entries - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU support is disabled - Add PEBS support for Intel Sapphire Rapids - Fix a mostly benign overflow bug in SEV's send|receive_update_data() - Move several SVM-specific flags into vcpu_svm x86 Intel: - Handle NMI VM-Exits before leaving the noinstr region - A few trivial cleanups in the VM-Enter flows - Stop enabling VMFUNC for L1 purely to document that KVM doesn't support EPTP switching (or any other VM function) for L1 - Fix a crash when using eVMCS's enlighted MSR bitmaps Generic: - Clean up the hardware enable and initialization flow, which was scattered around multiple arch-specific hooks. Instead, just let the arch code call into generic code. Both x86 and ARM should benefit from not having to fight common KVM code's notion of how to do initialization - Account allocations in generic kvm_arch_alloc_vm() - Fix a memory leak if coalesced MMIO unregistration fails selftests: - On x86, cache the CPU vendor (AMD vs. Intel) and use the info to emit the correct hypercall instruction instead of relying on KVM to patch in VMMCALL - Use TAP interface for kvm_binary_stats_test and tsc_msrs_test" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (325 commits) KVM: SVM: hyper-v: placate modpost section mismatch error KVM: x86/mmu: Make tdp_mmu_allowed static KVM: arm64: nv: Use reg_to_encoding() to get sysreg ID KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes KVM: arm64: nv: Filter out unsupported features from ID regs KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2 KVM: arm64: nv: Allow a sysreg to be hidden from userspace only KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2 KVM: arm64: nv: Handle SMCs taken from virtual EL2 KVM: arm64: nv: Handle trapped ERET from virtual EL2 KVM: arm64: nv: Inject HVC exceptions to the virtual EL2 KVM: arm64: nv: Support virtual EL2 exceptions KVM: arm64: nv: Handle HCR_EL2.NV system register traps KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state KVM: arm64: nv: Add EL2 system registers to vcpu context KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set KVM: arm64: nv: Introduce nested virtualization VCPU feature KVM: arm64: Use the S2 MMU context to iterate over S2 table ...
2 parents 01687e7 + 45dd9bc commit 49d5759

File tree

200 files changed

+7274
-2928
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

200 files changed

+7274
-2928
lines changed

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2536,9 +2536,14 @@
25362536
protected: nVHE-based mode with support for guests whose
25372537
state is kept private from the host.
25382538

2539+
nested: VHE-based mode with support for nested
2540+
virtualization. Requires at least ARMv8.3
2541+
hardware.
2542+
25392543
Defaults to VHE/nVHE based on hardware support. Setting
25402544
mode to "protected" will disable kexec and hibernation
2541-
for the host.
2545+
for the host. "nested" is experimental and should be
2546+
used with extreme caution.
25422547

25432548
kvm-arm.vgic_v3_group0_trap=
25442549
[KVM,ARM] Trap guest accesses to GICv3 group-0

Documentation/virt/kvm/api.rst

Lines changed: 108 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -3736,7 +3736,7 @@ The fields in each entry are defined as follows:
37363736
:Parameters: struct kvm_s390_mem_op (in)
37373737
:Returns: = 0 on success,
37383738
< 0 on generic error (e.g. -EFAULT or -ENOMEM),
3739-
> 0 if an exception occurred while walking the page tables
3739+
16 bit program exception code if the access causes such an exception
37403740

37413741
Read or write data from/to the VM's memory.
37423742
The KVM_CAP_S390_MEM_OP_EXTENSION capability specifies what functionality is
@@ -3754,6 +3754,8 @@ Parameters are specified via the following structure::
37543754
struct {
37553755
__u8 ar; /* the access register number */
37563756
__u8 key; /* access key, ignored if flag unset */
3757+
__u8 pad1[6]; /* ignored */
3758+
__u64 old_addr; /* ignored if flag unset */
37573759
};
37583760
__u32 sida_offset; /* offset into the sida */
37593761
__u8 reserved[32]; /* ignored */
@@ -3781,6 +3783,7 @@ Possible operations are:
37813783
* ``KVM_S390_MEMOP_ABSOLUTE_WRITE``
37823784
* ``KVM_S390_MEMOP_SIDA_READ``
37833785
* ``KVM_S390_MEMOP_SIDA_WRITE``
3786+
* ``KVM_S390_MEMOP_ABSOLUTE_CMPXCHG``
37843787

37853788
Logical read/write:
37863789
^^^^^^^^^^^^^^^^^^^
@@ -3829,15 +3832,34 @@ the checks required for storage key protection as one operation (as opposed to
38293832
user space getting the storage keys, performing the checks, and accessing
38303833
memory thereafter, which could lead to a delay between check and access).
38313834
Absolute accesses are permitted for the VM ioctl if KVM_CAP_S390_MEM_OP_EXTENSION
3832-
is > 0.
3835+
has the KVM_S390_MEMOP_EXTENSION_CAP_BASE bit set.
38333836
Currently absolute accesses are not permitted for VCPU ioctls.
38343837
Absolute accesses are permitted for non-protected guests only.
38353838

38363839
Supported flags:
38373840
* ``KVM_S390_MEMOP_F_CHECK_ONLY``
38383841
* ``KVM_S390_MEMOP_F_SKEY_PROTECTION``
38393842

3840-
The semantics of the flags are as for logical accesses.
3843+
The semantics of the flags common with logical accesses are as for logical
3844+
accesses.
3845+
3846+
Absolute cmpxchg:
3847+
^^^^^^^^^^^^^^^^^
3848+
3849+
Perform cmpxchg on absolute guest memory. Intended for use with the
3850+
KVM_S390_MEMOP_F_SKEY_PROTECTION flag.
3851+
Instead of doing an unconditional write, the access occurs only if the target
3852+
location contains the value pointed to by "old_addr".
3853+
This is performed as an atomic cmpxchg with the length specified by the "size"
3854+
parameter. "size" must be a power of two up to and including 16.
3855+
If the exchange did not take place because the target value doesn't match the
3856+
old value, the value "old_addr" points to is replaced by the target value.
3857+
User space can tell if an exchange took place by checking if this replacement
3858+
occurred. The cmpxchg op is permitted for the VM ioctl if
3859+
KVM_CAP_S390_MEM_OP_EXTENSION has flag KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG set.
3860+
3861+
Supported flags:
3862+
* ``KVM_S390_MEMOP_F_SKEY_PROTECTION``
38413863

38423864
SIDA read/write:
38433865
^^^^^^^^^^^^^^^^
@@ -4457,6 +4479,18 @@ not holding a previously reported uncorrected error).
44574479
:Parameters: struct kvm_s390_cmma_log (in, out)
44584480
:Returns: 0 on success, a negative value on error
44594481

4482+
Errors:
4483+
4484+
====== =============================================================
4485+
ENOMEM not enough memory can be allocated to complete the task
4486+
ENXIO if CMMA is not enabled
4487+
EINVAL if KVM_S390_CMMA_PEEK is not set but migration mode was not enabled
4488+
EINVAL if KVM_S390_CMMA_PEEK is not set but dirty tracking has been
4489+
disabled (and thus migration mode was automatically disabled)
4490+
EFAULT if the userspace address is invalid or if no page table is
4491+
present for the addresses (e.g. when using hugepages).
4492+
====== =============================================================
4493+
44604494
This ioctl is used to get the values of the CMMA bits on the s390
44614495
architecture. It is meant to be used in two scenarios:
44624496

@@ -4537,12 +4571,6 @@ mask is unused.
45374571

45384572
values points to the userspace buffer where the result will be stored.
45394573

4540-
This ioctl can fail with -ENOMEM if not enough memory can be allocated to
4541-
complete the task, with -ENXIO if CMMA is not enabled, with -EINVAL if
4542-
KVM_S390_CMMA_PEEK is not set but migration mode was not enabled, with
4543-
-EFAULT if the userspace address is invalid or if no page table is
4544-
present for the addresses (e.g. when using hugepages).
4545-
45464574
4.108 KVM_S390_SET_CMMA_BITS
45474575
----------------------------
45484576

@@ -5005,6 +5033,15 @@ using this ioctl.
50055033
:Parameters: struct kvm_pmu_event_filter (in)
50065034
:Returns: 0 on success, -1 on error
50075035

5036+
Errors:
5037+
5038+
====== ============================================================
5039+
EFAULT args[0] cannot be accessed
5040+
EINVAL args[0] contains invalid data in the filter or filter events
5041+
E2BIG nevents is too large
5042+
EBUSY not enough memory to allocate the filter
5043+
====== ============================================================
5044+
50085045
::
50095046

50105047
struct kvm_pmu_event_filter {
@@ -5016,14 +5053,69 @@ using this ioctl.
50165053
__u64 events[0];
50175054
};
50185055

5019-
This ioctl restricts the set of PMU events that the guest can program.
5020-
The argument holds a list of events which will be allowed or denied.
5021-
The eventsel+umask of each event the guest attempts to program is compared
5022-
against the events field to determine whether the guest should have access.
5023-
The events field only controls general purpose counters; fixed purpose
5024-
counters are controlled by the fixed_counter_bitmap.
5056+
This ioctl restricts the set of PMU events the guest can program by limiting
5057+
which event select and unit mask combinations are permitted.
5058+
5059+
The argument holds a list of filter events which will be allowed or denied.
5060+
5061+
Filter events only control general purpose counters; fixed purpose counters
5062+
are controlled by the fixed_counter_bitmap.
5063+
5064+
Valid values for 'flags'::
5065+
5066+
``0``
5067+
5068+
To use this mode, clear the 'flags' field.
5069+
5070+
In this mode each event will contain an event select + unit mask.
5071+
5072+
When the guest attempts to program the PMU the guest's event select +
5073+
unit mask is compared against the filter events to determine whether the
5074+
guest should have access.
5075+
5076+
``KVM_PMU_EVENT_FLAG_MASKED_EVENTS``
5077+
:Capability: KVM_CAP_PMU_EVENT_MASKED_EVENTS
5078+
5079+
In this mode each filter event will contain an event select, mask, match, and
5080+
exclude value. To encode a masked event use::
5081+
5082+
KVM_PMU_ENCODE_MASKED_ENTRY()
5083+
5084+
An encoded event will follow this layout::
5085+
5086+
Bits Description
5087+
---- -----------
5088+
7:0 event select (low bits)
5089+
15:8 umask match
5090+
31:16 unused
5091+
35:32 event select (high bits)
5092+
36:54 unused
5093+
55 exclude bit
5094+
63:56 umask mask
5095+
5096+
When the guest attempts to program the PMU, these steps are followed in
5097+
determining if the guest should have access:
5098+
5099+
1. Match the event select from the guest against the filter events.
5100+
2. If a match is found, match the guest's unit mask to the mask and match
5101+
values of the included filter events.
5102+
I.e. (unit mask & mask) == match && !exclude.
5103+
3. If a match is found, match the guest's unit mask to the mask and match
5104+
values of the excluded filter events.
5105+
I.e. (unit mask & mask) == match && exclude.
5106+
4.
5107+
a. If an included match is found and an excluded match is not found, filter
5108+
the event.
5109+
b. For everything else, do not filter the event.
5110+
5.
5111+
a. If the event is filtered and it's an allow list, allow the guest to
5112+
program the event.
5113+
b. If the event is filtered and it's a deny list, do not allow the guest to
5114+
program the event.
50255115

5026-
No flags are defined yet, the field must be zero.
5116+
When setting a new pmu event filter, -EINVAL will be returned if any of the
5117+
unused fields are set or if any of the high bits (35:32) in the event
5118+
select are set when called on Intel.
50275119

50285120
Valid values for 'action'::
50295121

Documentation/virt/kvm/devices/vm.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -302,6 +302,10 @@ Allows userspace to start migration mode, needed for PGSTE migration.
302302
Setting this attribute when migration mode is already active will have
303303
no effects.
304304

305+
Dirty tracking must be enabled on all memslots, else -EINVAL is returned. When
306+
dirty tracking is disabled on any memslot, migration mode is automatically
307+
stopped.
308+
305309
:Parameters: none
306310
:Returns: -ENOMEM if there is not enough free memory to start migration mode;
307311
-EINVAL if the state of the VM is invalid (e.g. no memory defined);

Documentation/virt/kvm/locking.rst

Lines changed: 16 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ KVM Lock Overview
99

1010
The acquisition orders for mutexes are as follows:
1111

12+
- cpus_read_lock() is taken outside kvm_lock
13+
1214
- kvm->lock is taken outside vcpu->mutex
1315

1416
- kvm->lock is taken outside kvm->slots_lock and kvm->irq_lock
@@ -226,15 +228,10 @@ time it will be set using the Dirty tracking mechanism described above.
226228
:Type: mutex
227229
:Arch: any
228230
:Protects: - vm_list
229-
230-
``kvm_count_lock``
231-
^^^^^^^^^^^^^^^^^^
232-
233-
:Type: raw_spinlock_t
234-
:Arch: any
235-
:Protects: - hardware virtualization enable/disable
236-
:Comment: 'raw' because hardware enabling/disabling must be atomic /wrt
237-
migration.
231+
- kvm_usage_count
232+
- hardware virtualization enable/disable
233+
:Comment: KVM also disables CPU hotplug via cpus_read_lock() during
234+
enable/disable.
238235

239236
``kvm->mn_invalidate_lock``
240237
^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -292,3 +289,13 @@ time it will be set using the Dirty tracking mechanism described above.
292289
wakeup notification event since external interrupts from the
293290
assigned devices happens, we will find the vCPU on the list to
294291
wakeup.
292+
293+
``vendor_module_lock``
294+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
295+
:Type: mutex
296+
:Arch: x86
297+
:Protects: loading a vendor module (kvm_amd or kvm_intel)
298+
:Comment: Exists because using kvm_lock leads to deadlock. cpu_hotplug_lock is
299+
taken outside of kvm_lock, e.g. in KVM's CPU online/offline callbacks, and
300+
many operations need to take cpu_hotplug_lock when loading a vendor module,
301+
e.g. updating static calls.

Documentation/virt/kvm/x86/errata.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,3 +37,14 @@ Nested virtualization features
3737
------------------------------
3838

3939
TBD
40+
41+
x2APIC
42+
------
43+
When KVM_X2APIC_API_USE_32BIT_IDS is enabled, KVM activates a hack/quirk that
44+
allows sending events to a single vCPU using its x2APIC ID even if the target
45+
vCPU has legacy xAPIC enabled, e.g. to bring up hotplugged vCPUs via INIT-SIPI
46+
on VMs with > 255 vCPUs. A side effect of the quirk is that, if multiple vCPUs
47+
have the same physical APIC ID, KVM will deliver events targeting that APIC ID
48+
only to the vCPU with the lowest vCPU ID. If KVM_X2APIC_API_USE_32BIT_IDS is
49+
not enabled, KVM follows x86 architecture when processing interrupts (all vCPUs
50+
matching the target APIC ID receive the interrupt).

MAINTAINERS

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11269,13 +11269,12 @@ F: virt/kvm/*
1126911269

1127011270
KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)
1127111271
M: Marc Zyngier <[email protected]>
11272+
M: Oliver Upton <[email protected]>
1127211273
R: James Morse <[email protected]>
1127311274
R: Suzuki K Poulose <[email protected]>
11274-
R: Oliver Upton <[email protected]>
1127511275
R: Zenghui Yu <[email protected]>
1127611276
L: [email protected] (moderated for non-subscribers)
1127711277
11278-
L: [email protected] (deprecated, moderated for non-subscribers)
1127911278
S: Maintained
1128011279
T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git
1128111280
F: arch/arm64/include/asm/kvm*

arch/arm64/include/asm/cache.h

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,15 @@
1616
#define CLIDR_LOC(clidr) (((clidr) >> CLIDR_LOC_SHIFT) & 0x7)
1717
#define CLIDR_LOUIS(clidr) (((clidr) >> CLIDR_LOUIS_SHIFT) & 0x7)
1818

19+
/* Ctypen, bits[3(n - 1) + 2 : 3(n - 1)], for n = 1 to 7 */
20+
#define CLIDR_CTYPE_SHIFT(level) (3 * (level - 1))
21+
#define CLIDR_CTYPE_MASK(level) (7 << CLIDR_CTYPE_SHIFT(level))
22+
#define CLIDR_CTYPE(clidr, level) \
23+
(((clidr) & CLIDR_CTYPE_MASK(level)) >> CLIDR_CTYPE_SHIFT(level))
24+
25+
/* Ttypen, bits [2(n - 1) + 34 : 2(n - 1) + 33], for n = 1 to 7 */
26+
#define CLIDR_TTYPE_SHIFT(level) (2 * ((level) - 1) + CLIDR_EL1_Ttypen_SHIFT)
27+
1928
/*
2029
* Memory returned by kmalloc() may be used for DMA, so we must make
2130
* sure that all such allocations are cache aligned. Otherwise,

0 commit comments

Comments
 (0)