From: Johannes Weiner <hannes@cmpxchg.org>
To: Christoph Hellwig <hch@infradead.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
linux-mm@kvack.org
Subject: Re: page type is 3, passed migratetype is 1 (nr=512)
Date: Tue, 28 May 2024 12:47:56 -0400 [thread overview]
Message-ID: <20240528164756.GA2820@cmpxchg.org> (raw)
In-Reply-To: <ZlSHJGOH-zrJmPU-@infradead.org>
Hello,
On Mon, May 27, 2024 at 06:14:12AM -0700, Christoph Hellwig wrote:
> On Mon, May 27, 2024 at 01:58:25AM -0700, Christoph Hellwig wrote:
> > Hi all,
> >
> > when running xfstests on nfs against a local server I see warnings like
> > the ones above, which appear to have been added in commit
> > e0932b6c1f94 (mm: page_alloc: consolidate free page accounting").
>
> I've also reproduced this with xfstests on local xfs and no nfs in the
> loop:
>
> generic/176 214s ... [ 1204.507931] run fstests generic/176 at 2024-05-27 12:52:30
> [ 1204.969286] XFS (nvme0n1): Mounting V5 Filesystem cd936307-415f-48a3-b99d-a2d52ae1f273
> [ 1204.993621] XFS (nvme0n1): Ending clean mount
> [ 1205.387032] XFS (nvme1n1): Mounting V5 Filesystem ab3ee1a4-af62-4934-9a6a-6c2fde321850
> [ 1205.412322] XFS (nvme1n1): Ending clean mount
> [ 1205.440388] XFS (nvme1n1): Unmounting Filesystem ab3ee1a4-af62-4934-9a6a-6c2fde321850
> [ 1205.808063] XFS (nvme1n1): Mounting V5 Filesystem 7099b02d-9c58-4d1d-be1d-2cc472d12cd9
> [ 1205.827290] XFS (nvme1n1): Ending clean mount
> [ 1208.058931] ------------[ cut here ]------------
> [ 1208.059613] page type is 3, passed migratetype is 1 (nr=512)
> [ 1208.060402] WARNING: CPU: 0 PID: 509870 at mm/page_alloc.c:645 expand+0x1c5/0x1f0
> [ 1208.061352] Modules linked in: i2c_i801 crc32_pclmul i2c_smbus [last unloaded: scsi_debug]
> [ 1208.062344] CPU: 0 PID: 509870 Comm: xfs_io Not tainted 6.10.0-rc1+ #2437
> [ 1208.063150] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Thanks for the report.
Could you please send me your .config? I'll try to reproduce it
locally.
> [ 1208.064204] RIP: 0010:expand+0x1c5/0x1f0
> [ 1208.064625] Code: 05 16 70 bf 02 01 e8 ca fc ff ff 8b 54 24 34 44 89 e1 48 c7 c7 80 a2 28 83 48 89 c6 b8 01 00 3
> [ 1208.066555] RSP: 0018:ffffc90003b2b968 EFLAGS: 00010082
> [ 1208.067111] RAX: 0000000000000000 RBX: ffffffff83fa9480 RCX: 0000000000000000
> [ 1208.067872] RDX: 0000000000000005 RSI: 0000000000000027 RDI: 00000000ffffffff
> [ 1208.068629] RBP: 00000000001f2600 R08: 00000000fffeffff R09: 0000000000000001
> [ 1208.069336] R10: 0000000000000000 R11: ffffffff83676200 R12: 0000000000000009
> [ 1208.070038] R13: 0000000000000200 R14: 0000000000000001 R15: ffffea0007c98000
> [ 1208.070750] FS: 00007f72ca3d5780(0000) GS:ffff8881f9c00000(0000) knlGS:0000000000000000
> [ 1208.071552] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1208.072121] CR2: 00007f72ca1fff38 CR3: 00000001aa0c6002 CR4: 0000000000770ef0
> [ 1208.072829] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1208.073527] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
> [ 1208.074225] PKRU: 55555554
> [ 1208.074507] Call Trace:
> [ 1208.074758] <TASK>
> [ 1208.074977] ? __warn+0x7b/0x120
> [ 1208.075308] ? expand+0x1c5/0x1f0
> [ 1208.075652] ? report_bug+0x191/0x1c0
> [ 1208.076043] ? handle_bug+0x3c/0x80
> [ 1208.076400] ? exc_invalid_op+0x17/0x70
> [ 1208.076782] ? asm_exc_invalid_op+0x1a/0x20
> [ 1208.077203] ? expand+0x1c5/0x1f0
> [ 1208.077543] ? expand+0x1c5/0x1f0
> [ 1208.077878] __rmqueue_pcplist+0x3a9/0x730
Ok so the allocator is taking a larger buddy off the freelist to
satisfy a smaller request, then puts the remainder back on the list.
There is no warning from the del_page_from_free_list(), so the buddy
type and the type of the list it was taken from are coherent.
The warning happens when it expands the remainder of the buddy and
finds the tail block to be of a different type.
Specifically, it takes a movable buddy (type 1) off the movable list,
but finds a tail block of it marked highatomic (type 3).
I don't see how we could have merged those during freeing, because the
highatomic buddy would have failed migratetype_is_mergeable().
Ah, but there DOES seem to be an issue with how we reserve
highatomics: reserving and unreserving happens one pageblock at a
time, but MAX_ORDER is usually bigger. If we rmqueue() an order-10
request, reserve_highatomic_block() will only convert the first
order-9 block in it; the tail will remain the original type, which
will produce a buddy of mixed type blocks upon freeing.
This doesn't fully explain the warning here. We'd expect to see it the
other way round - passing an assumed type of 3 (HIGHATOMIC) for the
remainder that is actually 1 (MOVABLE). But the pageblock-based
reservations look fishy. I'll cook up a patch to make this
range-based. It might just fix it in a way I'm not seeing just yet.
> [ 1208.078285] get_page_from_freelist+0x7a0/0xf00
> [ 1208.078745] __alloc_pages_noprof+0x153/0x2e0
> [ 1208.079181] __folio_alloc_noprof+0x10/0xa0
> [ 1208.079603] __filemap_get_folio+0x16b/0x370
> [ 1208.080030] iomap_write_begin+0x496/0x680
> [ 1208.080441] iomap_file_buffered_write+0x17f/0x440
> [ 1208.080916] xfs_file_buffered_write+0x7e/0x2a0
> [ 1208.081374] vfs_write+0x262/0x440
> [ 1208.081717] __x64_sys_pwrite64+0x8f/0xc0
> [ 1208.082112] do_syscall_64+0x4f/0x120
> [ 1208.082487] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 1208.082982] RIP: 0033:0x7f72ca4ce2b7
> [ 1208.083350] Code: 08 89 3c 24 48 89 4c 24 18 e8 15 f4 f8 ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 48 8b 74 24 b
> [ 1208.085126] RSP: 002b:00007ffe56d1a930 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
> [ 1208.085867] RAX: ffffffffffffffda RBX: 0000000154400000 RCX: 00007f72ca4ce2b7
> [ 1208.086560] RDX: 0000000000400000 RSI: 00007f72c9401000 RDI: 0000000000000003
> [ 1208.087248] RBP: 0000000154400000 R08: 0000000000000000 R09: 00007ffe56d1a9d0
> [ 1208.087946] R10: 0000000154400000 R11: 0000000000000293 R12: 00000000ffffffff
> [ 1208.088639] R13: 00000000abc00000 R14: 0000000000000000 R15: 0000000000000551
> [ 1208.089340] </TASK>
> [ 1208.089565] ---[ end trace 0000000000000000 ]---
next prev parent reply other threads:[~2024-05-28 16:48 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-27 8:58 Christoph Hellwig
2024-05-27 13:14 ` Christoph Hellwig
2024-05-28 16:47 ` Johannes Weiner [this message]
2024-05-29 5:43 ` Christoph Hellwig
2024-05-29 16:28 ` Johannes Weiner
2024-05-30 1:04 ` Johannes Weiner
2024-05-30 1:51 ` Zi Yan
2024-05-30 3:22 ` Johannes Weiner
2024-05-30 4:06 ` [PATCH] mm: page_alloc: fix highatomic typing in multi-block buddies kernel test robot
2024-05-30 11:42 ` page type is 3, passed migratetype is 1 (nr=512) Johannes Weiner
2024-05-30 14:34 ` Zi Yan
2024-05-31 13:43 ` Vlastimil Babka
2024-05-31 5:41 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240528164756.GA2820@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=andriy.shevchenko@linux.intel.com \
--cc=baolin.wang@linux.alibaba.com \
--cc=hch@infradead.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox