From: Raghavendra K T <raghavendra.kt@amd.com>
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
gourry@gourry.net, nehagholkar@meta.com, abhishekd@meta.com,
david@redhat.com, ying.huang@intel.com, nphamcs@gmail.com,
akpm@linux-foundation.org, hannes@cmpxchg.org,
feng.tang@intel.com, kbusch@meta.com, bharata@amd.com,
Hasan.Maruf@amd.com, sj@kernel.org, willy@infradead.org,
kirill.shutemov@linux.intel.com, mgorman@techsingularity.net,
vbabka@suse.cz, hughd@google.com, rientjes@google.com,
shy828301@gmail.com, Liam.Howlett@Oracle.com,
peterz@infradead.org, mingo@redhat.com
Subject: Re: [RFC PATCH V0 0/10] mm: slowtier page promotion based on PTE A bit
Date: Thu, 13 Feb 2025 11:09:37 +0530 [thread overview]
Message-ID: <0f9a6f66-7cbc-4c0d-b12e-9eaacdf1bda8@amd.com> (raw)
In-Reply-To: <20250212170212.f5coa462p75fuqj6@offworld>
On 2/12/2025 10:32 PM, Davidlohr Bueso wrote:
> On Sun, 01 Dec 2024, Raghavendra K T wrote:
>
>> 6. Holding PTE lock before migration.
>
> fyi I tried testing this series with 'perf-bench numa mem' and got a
> soft lockup,
> unable to take the PTL (and lost the machine to debug further atm), ie:
>
> [ 3852.217675] CPU: 127 UID: 0 PID: 12537 Comm: watch-numa-sche Tainted:
> G D L 6.14.0-rc2-kmmscand-v1+ #3
> [ 3852.217677] Tainted: [D]=DIE, [L]=SOFTLOCKUP
> [ 3852.217678] RIP: 0010:native_queued_spin_lock_slowpath+0x64/0x290
> [ 3852.217683] Code: 77 7b f0 0f ba 2b 08 0f 92 c2 8b 03 0f b6 d2 c1 e2
> 08 30 e4 09 d0 3d ff 00 00 00 77 57 85 c0 74 10 0f b6 03 84 c0 74 09 f3
> 90 <0f> b6 03 84 c0 75 f7 b8 01 00 00 00 66 89 03 5b 5d 41 5c 41 5d c3
> [ 3852.217684] RSP: 0018:ff274259b3c9f988 EFLAGS: 00000202
> [ 3852.217685] RAX: 0000000000000001 RBX: ffbd2efd8c08c9a8 RCX:
> 000ffffffffff000
> [ 3852.217686] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
> ffbd2efd8c08c9a8
> [ 3852.217687] RBP: ff161328422c1328 R08: ff274259b3c9fb90 R09:
> ff161328422c1000
> [ 3852.217688] R10: 00000000ffffffff R11: 0000000000000004 R12:
> 00007f52cca00000
> [ 3852.217688] R13: ff274259b3c9fa00 R14: ff16132842326000 R15:
> ff161328422c1328
> [ 3852.217689] FS: 00007f32b6f92b80(0000) GS:ff161423bfd80000(0000)
> knlGS:0000000000000000
> [ 3852.217691] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3852.217692] CR2: 0000564ddbf68008 CR3: 00000080a81cc005 CR4:
> 0000000000773ef0
> [ 3852.217693] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 3852.217694] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7:
> 0000000000000400
> [ 3852.217694] PKRU: 55555554
> [ 3852.217695] Call Trace:
> [ 3852.217696] <IRQ>
> [ 3852.217697] ? watchdog_timer_fn+0x21b/0x2a0
> [ 3852.217699] ? __pfx_watchdog_timer_fn+0x10/0x10
> [ 3852.217702] ? __hrtimer_run_queues+0x10f/0x2a0
> [ 3852.217704] ? hrtimer_interrupt+0xfb/0x240
> [ 3852.217706] ? __sysvec_apic_timer_interrupt+0x4e/0x110
> [ 3852.217709] ? sysvec_apic_timer_interrupt+0x68/0x90
> [ 3852.217712] </IRQ>
> [ 3852.217712] <TASK>
> [ 3852.217713] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> [ 3852.217717] ? native_queued_spin_lock_slowpath+0x64/0x290
> [ 3852.217720] _raw_spin_lock+0x25/0x30
> [ 3852.217723] __pte_offset_map_lock+0x9a/0x110
> [ 3852.217726] gather_pte_stats+0x1e3/0x2c0
> [ 3852.217730] walk_pgd_range+0x528/0xbb0
> [ 3852.217733] __walk_page_range+0x71/0x1d0
> [ 3852.217736] walk_page_vma+0x98/0xf0
> [ 3852.217738] show_numa_map+0x11a/0x3a0
> [ 3852.217741] seq_read_iter+0x2a6/0x470
> [ 3852.217745] seq_read+0x12b/0x170
> [ 3852.217748] vfs_read+0xe0/0x370
> [ 3852.217751] ? syscall_exit_to_user_mode+0x49/0x210
> [ 3852.217755] ? do_syscall_64+0x8a/0x190
> [ 3852.217758] ksys_read+0x6a/0xe0
> [ 3852.217762] do_syscall_64+0x7e/0x190
> [ 3852.217765] ? __memcg_slab_free_hook+0xd4/0x120
> [ 3852.217768] ? __x64_sys_close+0x38/0x80
> [ 3852.217771] ? kmem_cache_free+0x3bf/0x3e0
> [ 3852.217774] ? syscall_exit_to_user_mode+0x49/0x210
> [ 3852.217777] ? do_syscall_64+0x8a/0x190
> [ 3852.217780] ? do_syscall_64+0x8a/0x190
> [ 3852.217783] ? __irq_exit_rcu+0x3e/0xe0
> [ 3852.217785] entry_SYSCALL_64_after_hwframe+0x76/0x7e
Hello David,
Thanks for reporting, details. Reproducer information helps me
to stabilize the code quickly. Micro-benchmark I used did not show any
issues. I will add PTL lock and also check the issue from my side..
(with multiple scanning threads, it could cause even more issues because
of more migration pressure, wondering if I should go ahead with more
stabilized single thread scanning version in the coming post)
Thanks and Regards
- Raghu
prev parent reply other threads:[~2025-02-13 5:39 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-01 15:38 Raghavendra K T
2024-12-01 15:38 ` [RFC PATCH V0 01/10] mm: Add kmmscand kernel daemon Raghavendra K T
2024-12-01 15:38 ` [RFC PATCH V0 02/10] mm: Maintain mm_struct list in the system Raghavendra K T
2024-12-01 15:38 ` [RFC PATCH V0 03/10] mm: Scan the mm and create a migration list Raghavendra K T
2024-12-01 15:38 ` [RFC PATCH V0 04/10] mm/migration: Migrate accessed folios to toptier node Raghavendra K T
2024-12-01 15:38 ` [RFC PATCH V0 05/10] mm: Add throttling of mm scanning using scan_period Raghavendra K T
2024-12-01 15:38 ` [RFC PATCH V0 06/10] mm: Add throttling of mm scanning using scan_size Raghavendra K T
2024-12-01 15:38 ` [RFC PATCH V0 07/10] sysfs: Add sysfs support to tune scanning Raghavendra K T
2024-12-01 15:38 ` [RFC PATCH V0 08/10] vmstat: Add vmstat counters Raghavendra K T
2024-12-01 15:38 ` [RFC PATCH V0 09/10] trace/kmmscand: Add tracing of scanning and migration Raghavendra K T
2024-12-05 17:46 ` Steven Rostedt
2024-12-06 6:33 ` Raghavendra K T
2024-12-06 14:49 ` Steven Rostedt
2024-12-01 15:38 ` [RFC PATCH V0 DO NOT MERGE 10/10] kmmscand: Add scanning Raghavendra K T
2024-12-10 18:53 ` [RFC PATCH V0 0/10] mm: slowtier page promotion based on PTE A bit SeongJae Park
2024-12-20 6:30 ` Raghavendra K T
2025-02-12 17:02 ` Davidlohr Bueso
2025-02-13 5:39 ` Raghavendra K T [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0f9a6f66-7cbc-4c0d-b12e-9eaacdf1bda8@amd.com \
--to=raghavendra.kt@amd.com \
--cc=Hasan.Maruf@amd.com \
--cc=Liam.Howlett@Oracle.com \
--cc=abhishekd@meta.com \
--cc=akpm@linux-foundation.org \
--cc=bharata@amd.com \
--cc=david@redhat.com \
--cc=feng.tang@intel.com \
--cc=gourry@gourry.net \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kbusch@meta.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@redhat.com \
--cc=nehagholkar@meta.com \
--cc=nphamcs@gmail.com \
--cc=peterz@infradead.org \
--cc=rientjes@google.com \
--cc=shy828301@gmail.com \
--cc=sj@kernel.org \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox