--
tel. 790 202 300
Tytus Rogalewski
Dolina Krzemowa 6A
83-010 Jagatowo
NIP: 9570976234
On 3/5/26 09:11, Tytus Rogalewski wrote:
> Hi David,
Hi,
let me CC Jens.
@Jens, thread starts here:
https://marc.info/?l=linux-mm&m=176961245128225&w=2
For some reason, I cannot find that mail on lore, only my reply here:
https://lore.kernel.org/all/5948f3a6-8f30-4c45-9b86-2af9a6b37405@kernel.org/
>
> This is strange but the issue stopped when i changed io_uring to threads.
> Would that make any sense ?
On the screenshot I can see that "Async IO" is changed from Default
(io_uring) to "threads". I assume that changes QEMU's behavior to not
use io_uring for I/O.
> We did also few other smaller things but honestly issues stopped.
> Was it fixed or maybe Async IO could cause it ? (The qcow2 image is on
> fuse mounting).
>
> I am not certain but if that makes any sense, should i report this to
> someone working on io_uring or it should have nothing to do with that bug ?
Good question. It seems unrelated at first, but maybe it's related to
memory consumption with io_uring (below).
>
> CleanShot 2026-03-05 at 09.08.15@2x.png
>
> On all those kernels i seen no such bug for few weeks after changing
> async io.
>
What was the latest kernel you are running?
> CleanShot 2026-03-05 at 09.09.03@2x.png
>
Looking back at the original report, I can see that the system has extremely
little free memory. Maybe that gets eaten by io_uring somehow?
Do you still see that memory consumption when not using io_uring?
Let me copy the original information for Jens, maybe something could
give him a clue whether io_uring could be involved here.
"It happens on multiple servers (less on 6.18.6, more on 6.19-rc4+).
All servers are doing KVM with vfio GPU PCIE passthrough and it
happens when i am using HUGEPAGE 1GB + qemu Basically i am allocating
970GB into hugepages, leaving 37GB to kvm. In normal operation i have
about 20GB free space but when this issue occurs, all RAM is taken and
even when i have added 100GB swap, it was also consumed. It can work
for days or week without issue and
I did not seen that issue when i had hugepages disabled (on normal 2KB
pages allocation in kvm). And i am using hugepages as it is impossible
to boot VM with >200GB ram. When that issue happens, process ps hangs
and only top shows something but machine needs to be rebooted due to many
zombiee processes.
Hardware:
Motherboard: ASRockRack GENOA2D24G-2L
CPU: 2x AMD EPYC 9654 96-Core Processor
System ram: 1024 GB
GPUs: 8x RTX5090 vfio passthrough
root@pve14:~# uname -a
Linux pve14 6.18.6-pbk #1 SMP PREEMPT_DYNAMIC Mon Jan 19 20:59:46 UTC 2026 x86_64 GNU/Linux
[171053.341288] BUG: unable to handle page fault for address: ff469ae640000000
[171053.341310] #PF: supervisor read access in kernel mode
[171053.341319] #PF: error_code(0x0000) - not-present page
[171053.341328] PGD 4602067 P4D 0
[171053.341337] Oops: Oops: 0000 [#1] SMP NOPTI
[171053.341348] CPU: 16 UID: 0 PID: 3250869 Comm: qm Not tainted 6.18.6-pbk #1 PREEMPT(voluntary)
[171053.341362] Hardware name: TURIN2D24G-2L+/500W/TURIN2D24G-2L+/500W, BIOS 10.20 05/05/2025
[171053.341373] RIP: 0010:walk_pgd_range+0x6ff/0xbb0
[171053.341386] Code: 08 49 39 dd 0f 84 8c 01 00 00 49 89 de 49 8d 9e 00 00 20 00 48 8b 75 b8 48 81 e3 00 00 e0 ff 48 8d 43 ff 48 39 f0 49 0f 43 dd <49> f7 04 24 9f ff ff ff 0f 84 e2 fd ff ff 48 8b 45 c0 41 c7 47 20
[171053.341406] RSP: 0018:ff59d95d70e6b748 EFLAGS: 00010287
[171053.341416] RAX: 00007a22401fffff RBX: 00007a2240200000 RCX: 0000000000000000
[171053.341425] RDX: 0000000000000000 RSI: 00007a227fffffff RDI: 800008dfc00002b7
[171053.341435] RBP: ff59d95d70e6b828 R08: 0000000000000080 R09: 0000000000000000
[171053.341444] R10: ffffffff8de588c0 R11: 0000000000000000 R12: ff469ae640000000
[171053.341454] R13: 00007a2280000000 R14: 00007a2240000000 R15: ff59d95d70e6b8a8
[171053.341464] FS: 00007d4e8ec94b80(0000) GS:ff4692876ae7e000(0000) knlGS:0000000000000000
[171053.341476] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[171053.341485] CR2: ff469ae640000000 CR3: 0000008241eed006 CR4: 0000000000f71ef0
[171053.341495] PKRU: 55555554
[171053.341501] Call Trace:
[171053.341508] <TASK>
[171053.341518] __walk_page_range+0x8e/0x220
[171053.341529] ? sysvec_apic_timer_interrupt+0x57/0xc0
[171053.341541] walk_page_vma+0x92/0xe0
[171053.341551] smap_gather_stats.part.0+0x8c/0xd0
[171053.341563] show_smaps_rollup+0x258/0x420
[171053.341577] seq_read_iter+0x137/0x4c0
[171053.341588] seq_read+0xf5/0x140
[171053.341596] ? __x64_sys_move_mount+0x11/0x40
[171053.341607] vfs_read+0xbb/0x350
[171053.341617] ? do_syscall_64+0xb8/0xd00
[171053.341627] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.341637] ? strncpy_from_user+0x27/0x130
[171053.341649] ksys_read+0x69/0xf0
[171053.341658] __x64_sys_read+0x19/0x30
[171053.341666] x64_sys_call+0x2180/0x25a0
[171053.341855] do_syscall_64+0x80/0xd00
[171053.342029] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.342198] ? __x64_sys_ioctl+0x83/0x100
[171053.342367] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.342532] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.342696] ? x64_sys_call+0xac0/0x25a0
[171053.342857] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.343019] ? do_syscall_64+0xb8/0xd00
[171053.343181] ? seq_read+0xf5/0x140
[171053.343341] ? __x64_sys_move_mount+0x11/0x40
[171053.343504] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.343662] ? vfs_read+0xbb/0x350
[171053.343819] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.343973] ? ksys_read+0x69/0xf0
[171053.344126] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.344280] ? generic_file_llseek+0x21/0x40
[171053.344432] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.344582] ? kernfs_fop_llseek+0x7b/0xd0
[171053.344730] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.344873] ? ksys_lseek+0x4f/0xd0
[171053.345010] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.345144] ? __x64_sys_lseek+0x18/0x30
[171053.345275] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.345407] ? x64_sys_call+0x1abe/0x25a0
[171053.345535] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.345665] ? do_syscall_64+0xb8/0xd00
[171053.345792] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.345919] ? irqentry_exit+0x43/0x50
[171053.346044] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.346169] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[171053.346292] RIP: 0033:0x7d4e8ed61687
[171053.346417] Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff
[171053.346687] RSP: 002b:00007ffdd7c76000 EFLAGS: 00000202 ORIG_RAX: 0000000000000000
[171053.346828] RAX: ffffffffffffffda RBX: 00007d4e8ec94b80 RCX: 00007d4e8ed61687
[171053.346969] RDX: 0000000000002000 RSI: 000061ff84297ce0 RDI: 0000000000000006
[171053.347111] RBP: 0000000000002000 R08: 0000000000000000 R09: 0000000000000000
[171053.347253] R10: 0000000000000000 R11: 0000000000000202 R12: 000061ff84297ce0
[171053.347394] R13: 000061ff7d3d62a0 R14: 0000000000000006 R15: 000061ff842478c0
[171053.347542] </TASK>
[171053.347684] Modules linked in: sctp ip6_udp_tunnel udp_tunnel nf_tables bridge stp llc softdog bonding sunrpc binfmt_misc nfnetlink_log amd_atl intel_rapl_msr intel_rapl_common nls_iso8859_1 amd64_edac edac_mce_amd kvm_amd snd_pcm snd_timer dax_hmem ipmi_ssif kvm cxl_acpi snd polyval_clmulni ghash_clmulni_intel cxl_port soundcore aesni_intel rapl cxl_core acpi_ipmi einj pcspkr ast ipmi_si spd5118 ipmi_devintf k10temp ccp ipmi_msghandler joydev input_leds mac_hid sch_fq_codel msr vhost_net vhost vhost_iotlb tap vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd efi_pstore nfnetlink dmi_sysfs autofs4 btrfs blake2b_generic xor raid6_pq mlx5_ib ib_uverbs macsec ib_core dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio nvme mlx5_core nvme_core cdc_ether nvme_keyring igb mlxfw psample usbnet nvme_auth i2c_algo_bit usbkbd mii hid_generic hkdf tls dca ahci i2c_piix4 libahci i2c_smbus usbmouse usbhid hid
[171053.349092] CR2: ff469ae640000000
[171053.349269] ---[ end trace 0000000000000000 ]---
[171054.248409] RIP: 0010:walk_pgd_range+0x6ff/0xbb0
[171054.248750] Code: 08 49 39 dd 0f 84 8c 01 00 00 49 89 de 49 8d 9e 00 00 20 00 48 8b 75 b8 48 81 e3 00 00 e0 ff 48 8d 43 ff 48 39 f0 49 0f 43 dd <49> f7 04 24 9f ff ff ff 0f 84 e2 fd ff ff 48 8b 45 c0 41 c7 47 20
[171054.249177] RSP: 0018:ff59d95d70e6b748 EFLAGS: 00010287
[171054.249392] RAX: 00007a22401fffff RBX: 00007a2240200000 RCX: 0000000000000000
[171054.249820] RDX: 0000000000000000 RSI: 00007a227fffffff RDI: 800008dfc00002b7
[171054.250036] RBP: ff59d95d70e6b828 R08: 0000000000000080 R09: 0000000000000000
[171054.250253] R10: ffffffff8de588c0 R11: 0000000000000000 R12: ff469ae640000000
[171054.250471] R13: 00007a2280000000 R14: 00007a2240000000 R15: ff59d95d70e6b8a8
[171054.250691] FS: 00007d4e8ec94b80(0000) GS:ff4692876ae7e000(0000) knlGS:0000000000000000
[171054.250914] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[171054.251137] CR2: ff469ae640000000 CR3: 0000008241eed006 CR4: 0000000000f71ef0
[171054.251375] PKRU: 55555554
[171054.251601] note: qm[3250869] exited with irqs disabled
--
Cheers,
David