linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Felix Kuehling <felix.kuehling@amd.com>
To: David Hildenbrand <david@redhat.com>,
	Alistair Popple <apopple@nvidia.com>,
	Matthew Wilcox <willy@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: BUG_ON() in pfn_swap_entry_to_page()
Date: Thu, 25 Apr 2024 10:33:13 -0400	[thread overview]
Message-ID: <e170427d-5388-45eb-a3cb-d6cdf22fa160@amd.com> (raw)
In-Reply-To: <25b39ce9-9631-45fd-a067-d806ff64e640@redhat.com>



On 2024-04-25 5:32, David Hildenbrand wrote:
> On 24.04.24 21:45, Felix Kuehling wrote:
>> Sorry for top-posting. I'm resurrecting an old thread here because I 
>> think I ran into the same problem with this assertion failing on Linux 
>> 6.7:
>>
>> static inline struct page *pfn_swap_entry_to_page(swp_entry_t entry)
>> {
>>          struct page *p = pfn_to_page(swp_offset_pfn(entry));
>>
>>          /*
>>           * Any use of migration entries may only occur while the
>>           * corresponding page is locked
>>           */
>> -->     BUG_ON(is_migration_entry(entry) && !PageLocked(p));
>>
>>          return p;
>> }
>>
>> It looks like this thread just fizzled two years ago. Did anything 
>> ever come of this?
>>
>> Maybe I should add that I saw this in a pre-silicon test environment. 
>> I've never seen this on real hardware. Maybe something timing-sensitive.
> 
> In the past, it indicated a swp pte corruption, that would e.g., mess up 
> the stored PFN ot the swap entry type.
> 
> On which call chain do you see that?
> 

This is the backtrace, it's coming from hmm_range_fault. Looks like the 
swap entries are from migrated DEVICE_PRIVATE pages.

[Apr 3 20:11] ------------[ cut here ]------------
[  +0.000041] kernel BUG at include/linux/swapops.h:466!
[  +0.000691] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  +0.000342] CPU: 2 PID: 49 Comm: kworker/2:1 Not tainted 
6.7.0-kfd-compute-rocm-npi-186 #1
[  +0.000556] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[  +0.000703] Workqueue: events amdgpu_irq_handle_ih_soft [amdgpu]
[  +0.000501] RIP: 0010:migration_entry_wait_on_locked+0x26b/0x2b0
[  +0.000389] Code: fe ff ff 48 8d 7c 24 07 e8 02 7e f0 ff e9 58 fe ff 
ff 48 8b 43 08 a8 01 75 3f 66 90 48 89 d8 48 8b 00 a8 01 0f 85 f1 fd ff 
ff <0f> 0b 48 8d 58 ff e9 f7 fd ff ff 48 89 d8 f7 c3 ff 0f 00 00 75 df
[  +0.001161] RSP: 0018:ffffb211c01bb788 EFLAGS: 00010246
[  +0.000339] RAX: 017fff8000080018 RBX: fffff682c40ce8c0 RCX: 
0000000000000001
[  +0.000463] RDX: 0000000000000000 RSI: ffff977a45034840 RDI: 
000000000000001a
[  +0.000454] RBP: ffff977a45034840 R08: 68000000001033a3 R09: 
0000000000000030
[  +0.000451] R10: ffffb211c01bb6a8 R11: 0000000000000001 R12: 
ffff977a46bd1318
[  +0.000461] R13: 0000000000000003 R14: 4000000000000000 R15: 
ffffb211c01bb9b8
[  +0.000454] FS:  0000000000000000(0000) GS:ffff977dafd00000(0000) 
knlGS:0000000000000000
[  +0.000518] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000372] CR2: 00007fa2d1cba000 CR3: 00000001030d2004 CR4: 
0000000000770ef0
[  +0.000453] PKRU: 55555554
[  +0.000182] Call Trace:
[  +0.000171]  <TASK>
[  +0.000147]  ? die+0x37/0x90
[  +0.000211]  ? do_trap+0xe0/0x110
[  +0.000221]  ? migration_entry_wait_on_locked+0x26b/0x2b0
[  +0.000351]  ? do_error_trap+0x98/0x120
[  +0.000252]  ? migration_entry_wait_on_locked+0x26b/0x2b0
[  +0.000346]  ? migration_entry_wait_on_locked+0x26b/0x2b0
[  +0.000355]  ? exc_invalid_op+0x52/0x70
[  +0.000254]  ? migration_entry_wait_on_locked+0x26b/0x2b0
[  +0.000345]  ? asm_exc_invalid_op+0x1a/0x20
[  +0.000274]  ? migration_entry_wait_on_locked+0x26b/0x2b0
[  +0.000361]  ? migration_entry_wait+0x4e/0x160
[  +0.000293]  ? lock_release+0x119/0x260
[  +0.000255]  migration_entry_wait+0x105/0x160
[  +0.000290]  hmm_vma_walk_pmd+0x822/0x8a0
[  +0.000263]  walk_pgd_range+0x40b/0x900
[  +0.000268]  __walk_page_range+0x205/0x220
[  +0.000267]  walk_page_range+0x13a/0x250
[  +0.000259]  hmm_range_fault+0x5d/0xb0
[  +0.000247]  amdgpu_hmm_range_get_pages+0x144/0x240 [amdgpu]
[  +0.000491]  svm_range_validate_and_map+0x2e5/0x1310 [amdgpu]
[  +0.000479]  ? svm_migrate_ram_to_vram+0x360/0x630 [amdgpu]
[  +0.000453]  svm_range_restore_pages+0xd1e/0x11b0 [amdgpu]
[  +0.000462]  amdgpu_vm_handle_fault+0xc0/0x370 [amdgpu]
[  +0.000428]  gmc_v9_0_process_interrupt+0x10d/0x670 [amdgpu]
[  +0.000463]  ? __wake_up+0x21/0x60
[  +0.000427]  ? find_held_lock+0x2b/0x80
[  +0.000435]  ? process_one_work+0x16a/0x4b0
[  +0.000446]  ? amdgpu_irq_dispatch+0xc2/0x220 [amdgpu]
[  +0.000596]  amdgpu_irq_dispatch+0xc2/0x220 [amdgpu]
[  +0.000579]  amdgpu_ih_process+0x7d/0xe0 [amdgpu]
[  +0.000561]  process_one_work+0x1d1/0x4b0
[  +0.000435]  worker_thread+0x1d3/0x3d0
[  +0.000400]  ? rescuer_thread+0x360/0x360
[  +0.000410]  kthread+0xee/0x120
[  +0.000367]  ? kthread_complete_and_exit+0x20/0x20
[  +0.000452]  ret_from_fork+0x31/0x50
[  +0.000371]  ? kthread_complete_and_exit+0x20/0x20
[  +0.000448]  ret_from_fork_asm+0x11/0x20
[  +0.000390]  </TASK>
[  +0.000281] Modules linked in: amdgpu drm_ttm_helper ttm video wmi 
drm_exec drm_suballoc_helper amdxcp drm_buddy gpu_sched 
drm_display_helper fuse ip_tables x_tables virtio_gpu virtio_dma_buf 
drm_shmem_helper drm_kms_helper drm drm_panel_orientation_quirks
[  +0.002319] ---[ end trace 0000000000000000 ]---


  reply	other threads:[~2024-04-25 14:33 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-22 17:25 Sebastian Andrzej Siewior
2022-03-22 17:41 ` Matthew Wilcox
2022-03-23  0:29   ` Alistair Popple
2022-03-24  3:24     ` Matthew Wilcox
2022-03-24  3:51       ` Alistair Popple
2024-04-24 19:45         ` Felix Kuehling
2024-04-25  9:32           ` David Hildenbrand
2024-04-25 14:33             ` Felix Kuehling [this message]
2024-04-26  8:49               ` David Hildenbrand
2024-04-26 14:56                 ` Felix Kuehling
2022-03-22 18:53 ` Ritesh Harjani

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e170427d-5388-45eb-a3cb-d6cdf22fa160@amd.com \
    --to=felix.kuehling@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=bigeasy@linutronix.de \
    --cc=david@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=tglx@linutronix.de \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox