linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: mm/huge_memory: do not clobber swp_entry_t during THP split
@ 2022-10-24 13:04 Tvrtko Ursulin
  2022-10-24 14:23 ` Mel Gorman
  0 siblings, 1 reply; 8+ messages in thread
From: Tvrtko Ursulin @ 2022-10-24 13:04 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Matthew Wilcox (Oracle), Linux MM, Matthew Auld, Intel-gfx


Hi Mel, mm experts,

With 6.1-rc2 we started hitting the WARN_ON added in 71e2d666ef85 ("mm/huge_memory: do not clobber swp_entry_t during THP split") in i915 automated CI:

It looks like this:

<4> [259.367534] page:ffffea0008850000 refcount:0 mapcount:0 mapping:ffff88811a756a00 index:0x0 pfn:0x221400
<4> [259.367593] head:ffffea0008850000 order:9 compound_mapcount:0 compound_pincount:0
<4> [259.367596] aops:shmem_aops ino:2 dentry name:"i915"
<4> [259.367600] flags: 0x80000000000d003f(locked|referenced|uptodate|dirty|lru|active|head|reclaim|swapbacked|zone=2)
<4> [259.367604] raw: 80000000000d003f ffffea00042d08c8 ffffea0008855248 ffff88811a756a00
<4> [259.367606] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
<4> [259.367607] page dumped because: VM_WARN_ON_ONCE_PAGE(page_tail->private != 0)
<4> [259.367613] ------------[ cut here ]------------
<4> [259.367614] WARNING: CPU: 2 PID: 5515 at mm/huge_memory.c:2465 split_huge_page_to_list+0x12de/0x1760
<4> [259.367619] Modules linked in: i915(+) drm_display_helper drm_kms_helper vgem drm_shmem_helper snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm prime_numbers ttm drm_buddy syscopyarea sysfillrect sysimgblt fb_sys_fops fuse x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel igb ptp mei_me pps_core i2c_i801 i2c_smbus mei video acpi_power_meter wmi [last unloaded: i915]
<4> [259.367663] CPU: 2 PID: 5515 Comm: i915_selftest Tainted: G     U    I        6.1.0-rc2-CI_DRM_12280-g7bb7f55322b3+ #1
<4> [259.367666] Hardware name: Intel Corporation S1200SP/S1200SP, BIOS S1200SP.86B.03.01.0026.092720170729 09/27/2017
<4> [259.367667] RIP: 0010:split_huge_page_to_list+0x12de/0x1760
<4> [259.367670] Code: 86 00 e9 31 fa ff ff 80 3d b8 3a 5a 01 00 0f 85 bb f4 ff ff 48 c7 c6 60 8c 2c 82 48 89 df e8 39 26 f9 ff c6 05 9c 3a 5a 01 01 <0f> 0b e9 9e f4 ff ff 48 83 e8 01 e9 a9 f8 ff ff 48 8b 45 08 49 89
<4> [259.367672] RSP: 0018:ffffc9000146b738 EFLAGS: 00010046
<4> [259.367675] RAX: 0000000000000042 RBX: ffffea0008850000 RCX: 0000000000000003
<4> [259.367677] RDX: 0000000000000000 RSI: ffffffff822ca515 RDI: 00000000ffffffff
<4> [259.367678] RBP: ffffea0008855200 R08: 0000000000000000 R09: c0000000ffffdf42
<4> [259.367680] R10: 00000000001a5e18 R11: ffffc9000146b5d8 R12: ffffea0008850000
<4> [259.367681] R13: ffffea0008850000 R14: ffff88811a756a00 R15: 0000000000000200
<4> [259.367683] FS:  00007f42aafecc00(0000) GS:ffff88826b300000(0000) knlGS:0000000000000000
<4> [259.367685] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [259.367687] CR2: 000055792f60d9f0 CR3: 0000000107514005 CR4: 00000000003706e0
<4> [259.367688] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4> [259.367690] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4> [259.367691] Call Trace:
<4> [259.367692]  <TASK>
<4> [259.367706]  shmem_writepage+0x53/0x5e0
<4> [259.367710]  ? folio_mapping+0x47/0x80
<4> [259.367715]  __shmem_writeback+0x1f2/0x510 [i915]
<4> [259.367853]  shmem_shrink+0x3a/0x50 [i915]
<4> [259.367961]  i915_gem_shrink+0x57c/0x860 [i915]
<4> [259.368083]  igt_shrink_thp+0x362/0x490 [i915]
<4> [259.368209]  __i915_subtests.cold.7+0x42/0x92 [i915]
<4> [259.368345]  ? __i915_nop_teardown+0x10/0x10 [i915]
<4> [259.368495]  ? __i915_live_setup+0x30/0x30 [i915]
<4> [259.368612]  __run_selftests.part.3+0xfa/0x158 [i915]
<4> [259.368747]  i915_live_selftests.cold.5+0x1f/0x4f [i915]
<4> [259.368878]  i915_pci_probe+0xd6/0x240 [i915]
<4> [259.368965]  ? _raw_spin_unlock_irqrestore+0x3d/0x70
<4> [259.368971]  pci_device_probe+0x98/0x110
<4> [259.368976]  really_probe+0xd9/0x350
<4> [259.368979]  ? pm_runtime_barrier+0x4b/0xa0
<4> [259.368985]  __driver_probe_device+0x73/0x170
<4> [259.368989]  driver_probe_device+0x1a/0x90
<4> [259.368992]  __driver_attach+0xbc/0x190
<4> [259.368995]  ? __device_attach_driver+0x110/0x110
<4> [259.368998]  ? __device_attach_driver+0x110/0x110
<4> [259.369001]  bus_for_each_dev+0x75/0xc0
<4> [259.369006]  bus_add_driver+0x1bb/0x210
<4> [259.369012]  driver_register+0x66/0xc0
<4> [259.369015]  i915_init+0x22/0x82 [i915]
<4> [259.369098]  ? 0xffffffffa0860000
<4> [259.369101]  do_one_initcall+0x56/0x2f0
<4> [259.369105]  ? rcu_read_lock_sched_held+0x51/0x80
<4> [259.369109]  ? kmalloc_trace+0xae/0x100
<4> [259.369113]  do_init_module+0x45/0x1c0
<4> [259.369117]  load_module+0x1d5e/0x1e90
<4> [259.369134]  ? __do_sys_finit_module+0xaf/0x120
<4> [259.369137]  __do_sys_finit_module+0xaf/0x120
<4> [259.369150]  do_syscall_64+0x3a/0x90
<4> [259.369154]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
<4> [259.369157] RIP: 0033:0x7f42ad7bd89d
<4> [259.369159] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 f5 0c 00 f7 d8 64 89 01 48
<4> [259.369161] RSP: 002b:00007ffd788a2268 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
<4> [259.369164] RAX: ffffffffffffffda RBX: 000055d219ecf5a0 RCX: 00007f42ad7bd89d
<4> [259.369166] RDX: 0000000000000000 RSI: 000055d219edb390 RDI: 0000000000000006
<4> [259.369167] RBP: 0000000000000020 R08: 00007ffd788a1040 R09: 000055d219ec3470
<4> [259.369168] R10: 00007ffd788a23b0 R11: 0000000000000246 R12: 000055d219edb390
<4> [259.369170] R13: 0000000000000000 R14: 000055d219eda730 R15: 000055d219ecf5a0
<4> [259.369180]  </TASK>
<4> [259.369181] irq event stamp: 65138522
<4> [259.369183] hardirqs last  enabled at (65138521): [<ffffffff81b73764>] _raw_spin_unlock_irqrestore+0x54/0x70
<4> [259.369186] hardirqs last disabled at (65138522): [<ffffffff8129ca93>] split_huge_page_to_list+0x5f3/0x1760
<4> [259.369189] softirqs last  enabled at (65138410): [<ffffffff81e0031e>] __do_softirq+0x31e/0x48a
<4> [259.369191] softirqs last disabled at (65138403): [<ffffffff810c1b58>] irq_exit_rcu+0xb8/0xe0
<4> [259.369194] ---[ end trace 0000000000000000 ]---

At the point of the warn it should have been a single huge page and then we entered i915 shrinker which does this:

void __shmem_writeback(size_t size, struct address_space *mapping)
{
	struct writeback_control wbc = {
		.sync_mode = WB_SYNC_NONE,
		.nr_to_write = SWAP_CLUSTER_MAX,
		.range_start = 0,
		.range_end = LLONG_MAX,
		.for_reclaim = 1,
	};
	unsigned long i;

	/*
	 * Leave mmapings intact (GTT will have been revoked on unbinding,
	 * leaving only CPU mmapings around) and add those pages to the LRU
	 * instead of invoking writeback so they are aged and paged out
	 * as normal.
	 */

	/* Begin writeback on each dirty page */
	for (i = 0; i < size >> PAGE_SHIFT; i++) {
		struct page *page;

		page = find_lock_page(mapping, i);
		if (!page)
			continue;

		if (!page_mapped(page) && clear_page_dirty_for_io(page)) {
			int ret;

			SetPageReclaim(page);
			ret = mapping->a_ops->writepage(page, &wbc);
			if (!PageWriteback(page))
				ClearPageReclaim(page);
			if (!ret)
				goto put;
		}
		unlock_page(page);
put:
		put_page(page);
	}
}

Not sure if this loop is doing something incorrectly or what is going on. Help and suggestions would be appreciated.

Regards,

Tvrtko


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mm/huge_memory: do not clobber swp_entry_t during THP split
  2022-10-24 13:04 mm/huge_memory: do not clobber swp_entry_t during THP split Tvrtko Ursulin
@ 2022-10-24 14:23 ` Mel Gorman
  2022-10-25  8:50   ` Tvrtko Ursulin
  0 siblings, 1 reply; 8+ messages in thread
From: Mel Gorman @ 2022-10-24 14:23 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Matthew Wilcox (Oracle), Hugh Dickins, Linux MM, Matthew Auld, Intel-gfx

On Mon, Oct 24, 2022 at 02:04:50PM +0100, Tvrtko Ursulin wrote:
> 
> Hi Mel, mm experts,
> 
> With 6.1-rc2 we started hitting the WARN_ON added in 71e2d666ef85 ("mm/huge_memory: do not clobber swp_entry_t during THP split") in i915 automated CI:
> 

Thanks for the report. As shmem pages pages are allocated via vma_alloc_folio
and are compound pages, can you try the following patch please?  If it
still triggers, please post the new oops as it'll include the tail page
information.

--8<--
From: Hugh Dickins <hughd@google.com>
Subject: [PATCH] mm: prep_compound_tail() clear page->private

Although page allocation always clears page->private in the first page
or head page of an allocation, it has never made a point of clearing
page->private in the tails (though 0 is often what is already there).

But now commit 71e2d666ef85 ("mm/huge_memory: do not clobber swp_entry_t
during THP split") issues a warning when page_tail->private is found to
be non-0 (unless it's swapcache).

Change that warning to dump page_tail (which also dumps head), instead
of just the head: so far we have seen dead000000000122, dead000000000003,
dead000000000001 or 0000000000000002 in the raw output for tail private.

We could just delete the warning, but today's consensus appears to want
page->private to be 0, unless there's a good reason for it to be set:
so now clear it in prep_compound_tail() (more general than just for THP;
but not for high order allocation, which makes no pass down the tails).

Fixes: 71e2d666ef85 ("mm/huge_memory: do not clobber swp_entry_t during THP split")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
---
 mm/huge_memory.c | 2 +-
 mm/page_alloc.c  | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 03fc7e5edf07..561a42567477 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2462,7 +2462,7 @@ static void __split_huge_page_tail(struct page *head, int tail,
 	 * Fix up and warn once if private is unexpectedly set.
 	 */
 	if (!folio_test_swapcache(page_folio(head))) {
-		VM_WARN_ON_ONCE_PAGE(page_tail->private != 0, head);
+		VM_WARN_ON_ONCE_PAGE(page_tail->private != 0, page_tail);
 		page_tail->private = 0;
 	}
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b5a6c815ae28..218b28ee49ed 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -807,6 +807,7 @@ static void prep_compound_tail(struct page *head, int tail_idx)
 
 	p->mapping = TAIL_MAPPING;
 	set_compound_head(p, head);
+	set_page_private(p, 0);
 }
 
 void prep_compound_page(struct page *page, unsigned int order)
-- 
2.35.3


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mm/huge_memory: do not clobber swp_entry_t during THP split
  2022-10-24 14:23 ` Mel Gorman
@ 2022-10-25  8:50   ` Tvrtko Ursulin
  2022-10-25 10:03     ` Mel Gorman
  0 siblings, 1 reply; 8+ messages in thread
From: Tvrtko Ursulin @ 2022-10-25  8:50 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Matthew Wilcox (Oracle), Hugh Dickins, Linux MM, Matthew Auld, Intel-gfx


On 24/10/2022 15:23, Mel Gorman wrote:
> On Mon, Oct 24, 2022 at 02:04:50PM +0100, Tvrtko Ursulin wrote:
>>
>> Hi Mel, mm experts,
>>
>> With 6.1-rc2 we started hitting the WARN_ON added in 71e2d666ef85 ("mm/huge_memory: do not clobber swp_entry_t during THP split") in i915 automated CI:
>>
> 
> Thanks for the report. As shmem pages pages are allocated via vma_alloc_folio
> and are compound pages, can you try the following patch please?  If it
> still triggers, please post the new oops as it'll include the tail page
> information.
> 
> --8<--
> From: Hugh Dickins <hughd@google.com>
> Subject: [PATCH] mm: prep_compound_tail() clear page->private
> 
> Although page allocation always clears page->private in the first page
> or head page of an allocation, it has never made a point of clearing
> page->private in the tails (though 0 is often what is already there).
> 
> But now commit 71e2d666ef85 ("mm/huge_memory: do not clobber swp_entry_t
> during THP split") issues a warning when page_tail->private is found to
> be non-0 (unless it's swapcache).
> 
> Change that warning to dump page_tail (which also dumps head), instead
> of just the head: so far we have seen dead000000000122, dead000000000003,
> dead000000000001 or 0000000000000002 in the raw output for tail private.
> 
> We could just delete the warning, but today's consensus appears to want
> page->private to be 0, unless there's a good reason for it to be set:
> so now clear it in prep_compound_tail() (more general than just for THP;
> but not for high order allocation, which makes no pass down the tails).
> 
> Fixes: 71e2d666ef85 ("mm/huge_memory: do not clobber swp_entry_t during THP split")
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: <stable@vger.kernel.org>
> ---
>   mm/huge_memory.c | 2 +-
>   mm/page_alloc.c  | 1 +
>   2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 03fc7e5edf07..561a42567477 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2462,7 +2462,7 @@ static void __split_huge_page_tail(struct page *head, int tail,
>   	 * Fix up and warn once if private is unexpectedly set.
>   	 */
>   	if (!folio_test_swapcache(page_folio(head))) {
> -		VM_WARN_ON_ONCE_PAGE(page_tail->private != 0, head);
> +		VM_WARN_ON_ONCE_PAGE(page_tail->private != 0, page_tail);
>   		page_tail->private = 0;
>   	}
>   
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b5a6c815ae28..218b28ee49ed 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -807,6 +807,7 @@ static void prep_compound_tail(struct page *head, int tail_idx)
>   
>   	p->mapping = TAIL_MAPPING;
>   	set_compound_head(p, head);
> +	set_page_private(p, 0);
>   }
>   
>   void prep_compound_page(struct page *page, unsigned int order)

The patch seems to fix our CI runs. Is it considered final version? If 
so I can temporarily put it in until it arrives via the next rc - 
assuming that would be the flow from upstream pov?

Regards,

Tvrtko


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mm/huge_memory: do not clobber swp_entry_t during THP split
  2022-10-25  8:50   ` Tvrtko Ursulin
@ 2022-10-25 10:03     ` Mel Gorman
  2022-10-25 15:26       ` Hugh Dickins
  2022-10-26  3:04       ` Andrew Morton
  0 siblings, 2 replies; 8+ messages in thread
From: Mel Gorman @ 2022-10-25 10:03 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Matthew Wilcox (Oracle),
	Hugh Dickins, Linux MM, Matthew Auld, Andrew Morton, Intel-gfx

Cc'ing Andrew for awareness. Andrew, this bug report is almost identical to
the one Hugh already reported and fixed in "[PATCH] mm: prep_compound_tail()
clear page->private". Nothing wrong with the patch AFAIK and only the last
paragraph is relevant to you.

On Tue, Oct 25, 2022 at 09:50:14AM +0100, Tvrtko Ursulin wrote:
> 
> On 24/10/2022 15:23, Mel Gorman wrote:
> > On Mon, Oct 24, 2022 at 02:04:50PM +0100, Tvrtko Ursulin wrote:
> > > 
> > > Hi Mel, mm experts,
> > > 
> > > With 6.1-rc2 we started hitting the WARN_ON added in 71e2d666ef85 ("mm/huge_memory: do not clobber swp_entry_t during THP split") in i915 automated CI:
> > > 
> > 
> > Thanks for the report. As shmem pages pages are allocated via vma_alloc_folio
> > and are compound pages, can you try the following patch please?  If it
> > still triggers, please post the new oops as it'll include the tail page
> > information.
> > 
> > --8<--
> > From: Hugh Dickins <hughd@google.com>
> > Subject: [PATCH] mm: prep_compound_tail() clear page->private
> > 
> > Although page allocation always clears page->private in the first page
> > or head page of an allocation, it has never made a point of clearing
> > page->private in the tails (though 0 is often what is already there).
> > 
> > But now commit 71e2d666ef85 ("mm/huge_memory: do not clobber swp_entry_t
> > during THP split") issues a warning when page_tail->private is found to
> > be non-0 (unless it's swapcache).
> > 
> > Change that warning to dump page_tail (which also dumps head), instead
> > of just the head: so far we have seen dead000000000122, dead000000000003,
> > dead000000000001 or 0000000000000002 in the raw output for tail private.
> > 
> > We could just delete the warning, but today's consensus appears to want
> > page->private to be 0, unless there's a good reason for it to be set:
> > so now clear it in prep_compound_tail() (more general than just for THP;
> > but not for high order allocation, which makes no pass down the tails).
> > 
> > Fixes: 71e2d666ef85 ("mm/huge_memory: do not clobber swp_entry_t during THP split")
> > Signed-off-by: Hugh Dickins <hughd@google.com>
> > Cc: Mel Gorman <mgorman@techsingularity.net>
> > Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> > Cc: <stable@vger.kernel.org>
> > ---
> >   mm/huge_memory.c | 2 +-
> >   mm/page_alloc.c  | 1 +
> >   2 files changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 03fc7e5edf07..561a42567477 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2462,7 +2462,7 @@ static void __split_huge_page_tail(struct page *head, int tail,
> >   	 * Fix up and warn once if private is unexpectedly set.
> >   	 */
> >   	if (!folio_test_swapcache(page_folio(head))) {
> > -		VM_WARN_ON_ONCE_PAGE(page_tail->private != 0, head);
> > +		VM_WARN_ON_ONCE_PAGE(page_tail->private != 0, page_tail);
> >   		page_tail->private = 0;
> >   	}
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index b5a6c815ae28..218b28ee49ed 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -807,6 +807,7 @@ static void prep_compound_tail(struct page *head, int tail_idx)
> >   	p->mapping = TAIL_MAPPING;
> >   	set_compound_head(p, head);
> > +	set_page_private(p, 0);
> >   }
> >   void prep_compound_page(struct page *page, unsigned int order)
> 
> The patch seems to fix our CI runs.

Thanks for letting me know.

> Is it considered final version?

AFAIK, yes.

> If so I
> can temporarily put it in until it arrives via the next rc - assuming that
> would be the flow from upstream pov?
> 

I expect it to. It's currently in the akpm/mm.git branch
mm/mm-hotfixes-unstable where I expect it to flow to mm/mm-hotfixes-stable
in due course before sending to Linus. I can't make promises about the
timing as that's determined by Andrew.

-- 
Mel Gorman
SUSE Labs


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mm/huge_memory: do not clobber swp_entry_t during THP split
  2022-10-25 10:03     ` Mel Gorman
@ 2022-10-25 15:26       ` Hugh Dickins
  2022-10-26  9:25         ` Mel Gorman
  2022-10-26  3:04       ` Andrew Morton
  1 sibling, 1 reply; 8+ messages in thread
From: Hugh Dickins @ 2022-10-25 15:26 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Tvrtko Ursulin, Matthew Wilcox (Oracle),
	Hugh Dickins, Linux MM, Matthew Auld, Andrew Morton, Intel-gfx

On Tue, 25 Oct 2022, Mel Gorman wrote:

> Cc'ing Andrew for awareness. Andrew, this bug report is almost identical to
> the one Hugh already reported and fixed in "[PATCH] mm: prep_compound_tail()
> clear page->private". Nothing wrong with the patch AFAIK and only the last
> paragraph is relevant to you.
> 
> On Tue, Oct 25, 2022 at 09:50:14AM +0100, Tvrtko Ursulin wrote:
> > 
> > On 24/10/2022 15:23, Mel Gorman wrote:
> > > On Mon, Oct 24, 2022 at 02:04:50PM +0100, Tvrtko Ursulin wrote:
> > > > 
> > > > Hi Mel, mm experts,
> > > > 
> > > > With 6.1-rc2 we started hitting the WARN_ON added in 71e2d666ef85 ("mm/huge_memory: do not clobber swp_entry_t during THP split") in i915 automated CI:
> > > > 
> > > 
> > > Thanks for the report. As shmem pages pages are allocated via vma_alloc_folio
> > > and are compound pages, can you try the following patch please?  If it
> > > still triggers, please post the new oops as it'll include the tail page
> > > information.
> > > 
> > > --8<--
> > > From: Hugh Dickins <hughd@google.com>
> > > Subject: [PATCH] mm: prep_compound_tail() clear page->private
> > > 
> > > Although page allocation always clears page->private in the first page
> > > or head page of an allocation, it has never made a point of clearing
> > > page->private in the tails (though 0 is often what is already there).
> > > 
> > > But now commit 71e2d666ef85 ("mm/huge_memory: do not clobber swp_entry_t
> > > during THP split") issues a warning when page_tail->private is found to
> > > be non-0 (unless it's swapcache).
> > > 
> > > Change that warning to dump page_tail (which also dumps head), instead
> > > of just the head: so far we have seen dead000000000122, dead000000000003,
> > > dead000000000001 or 0000000000000002 in the raw output for tail private.
> > > 
> > > We could just delete the warning, but today's consensus appears to want
> > > page->private to be 0, unless there's a good reason for it to be set:
> > > so now clear it in prep_compound_tail() (more general than just for THP;
> > > but not for high order allocation, which makes no pass down the tails).
> > > 
> > > Fixes: 71e2d666ef85 ("mm/huge_memory: do not clobber swp_entry_t during THP split")
> > > Signed-off-by: Hugh Dickins <hughd@google.com>
> > > Cc: Mel Gorman <mgorman@techsingularity.net>
> > > Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > Cc: <stable@vger.kernel.org>
> > > ---
> > >   mm/huge_memory.c | 2 +-
> > >   mm/page_alloc.c  | 1 +
> > >   2 files changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index 03fc7e5edf07..561a42567477 100644
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -2462,7 +2462,7 @@ static void __split_huge_page_tail(struct page *head, int tail,
> > >   	 * Fix up and warn once if private is unexpectedly set.
> > >   	 */
> > >   	if (!folio_test_swapcache(page_folio(head))) {
> > > -		VM_WARN_ON_ONCE_PAGE(page_tail->private != 0, head);
> > > +		VM_WARN_ON_ONCE_PAGE(page_tail->private != 0, page_tail);
> > >   		page_tail->private = 0;
> > >   	}
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index b5a6c815ae28..218b28ee49ed 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -807,6 +807,7 @@ static void prep_compound_tail(struct page *head, int tail_idx)
> > >   	p->mapping = TAIL_MAPPING;
> > >   	set_compound_head(p, head);
> > > +	set_page_private(p, 0);
> > >   }
> > >   void prep_compound_page(struct page *page, unsigned int order)
> > 
> > The patch seems to fix our CI runs.
> 
> Thanks for letting me know.
> 
> > Is it considered final version?
> 
> AFAIK, yes.

AFAItooK, yes - modulo akpm's signoff and final SHA1.

> 
> > If so I
> > can temporarily put it in until it arrives via the next rc - assuming that
> > would be the flow from upstream pov?

The right thing for now is for GregKH to drop Mel's from 6.0.4:
I've just sent a mail asking for that (I would have asked yesterday,
but mistook that GregKH was not in Cc).

Of course Mel's fix is much more important than the harmless
(unless panic on warn) warning, but let's delay it a few more days,
it just flowed into stable too quickly.

Thanks Mel: I never knowingly hit the THP_SWAP issue which your patch
is fixing, but it now looks like it was also responsible for mysterious
occasional OOM kills that I had been chasing for weeks.

Hugh

> > 
> 
> I expect it to. It's currently in the akpm/mm.git branch
> mm/mm-hotfixes-unstable where I expect it to flow to mm/mm-hotfixes-stable
> in due course before sending to Linus. I can't make promises about the
> timing as that's determined by Andrew.
> 
> -- 
> Mel Gorman
> SUSE Labs


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mm/huge_memory: do not clobber swp_entry_t during THP split
  2022-10-25 10:03     ` Mel Gorman
  2022-10-25 15:26       ` Hugh Dickins
@ 2022-10-26  3:04       ` Andrew Morton
  2022-10-26  3:35         ` Hugh Dickins
  1 sibling, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2022-10-26  3:04 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Tvrtko Ursulin, Matthew Wilcox (Oracle),
	Hugh Dickins, Linux MM, Matthew Auld, Intel-gfx

On Tue, 25 Oct 2022 11:03:38 +0100 Mel Gorman <mgorman@techsingularity.net> wrote:

> > If so I
> > can temporarily put it in until it arrives via the next rc - assuming that
> > would be the flow from upstream pov?
> > 
> 
> I expect it to. It's currently in the akpm/mm.git branch
> mm/mm-hotfixes-unstable where I expect it to flow to mm/mm-hotfixes-stable
> in due course before sending to Linus. I can't make promises about the
> timing as that's determined by Andrew.

This is now in mainline, 71e2d666ef85.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mm/huge_memory: do not clobber swp_entry_t during THP split
  2022-10-26  3:04       ` Andrew Morton
@ 2022-10-26  3:35         ` Hugh Dickins
  0 siblings, 0 replies; 8+ messages in thread
From: Hugh Dickins @ 2022-10-26  3:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Tvrtko Ursulin, Matthew Wilcox (Oracle),
	Hugh Dickins, Linux MM, Matthew Auld, Intel-gfx

On Tue, 25 Oct 2022, Andrew Morton wrote:
> On Tue, 25 Oct 2022 11:03:38 +0100 Mel Gorman <mgorman@techsingularity.net> wrote:
> 
> > > If so I
> > > can temporarily put it in until it arrives via the next rc - assuming that
> > > would be the flow from upstream pov?
> > > 
> > 
> > I expect it to. It's currently in the akpm/mm.git branch
> > mm/mm-hotfixes-unstable where I expect it to flow to mm/mm-hotfixes-stable
> > in due course before sending to Linus. I can't make promises about the
> > timing as that's determined by Andrew.
> 
> This is now in mainline, 71e2d666ef85.

No, that one is Mel's commit, which GregKH already picked up for stable.
But what we're waiting for here is my fix to the warning that brings,
my fix currently lurking in mm-hotfixes-unstable as
45ba9c269874 ("mm: prep_compound_tail() clear page->private")

Hugh


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mm/huge_memory: do not clobber swp_entry_t during THP split
  2022-10-25 15:26       ` Hugh Dickins
@ 2022-10-26  9:25         ` Mel Gorman
  0 siblings, 0 replies; 8+ messages in thread
From: Mel Gorman @ 2022-10-26  9:25 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Tvrtko Ursulin, Matthew Wilcox (Oracle),
	Linux MM, Matthew Auld, Andrew Morton, Intel-gfx

On Tue, Oct 25, 2022 at 08:26:06AM -0700, Hugh Dickins wrote:
> > 
> > > If so I
> > > can temporarily put it in until it arrives via the next rc - assuming that
> > > would be the flow from upstream pov?
> 
> The right thing for now is for GregKH to drop Mel's from 6.0.4:
> I've just sent a mail asking for that (I would have asked yesterday,
> but mistook that GregKH was not in Cc).
> 

Thanks for catching that, I only saw the mail this morning that it had been
picked up as a stable candidate and was internally screaming "no no no"
until I saw your mail :P. I added the warning thinking "we have almost a
full rc cycle to catch any additional fallout".

> Of course Mel's fix is much more important than the harmless
> (unless panic on warn) warning, but let's delay it a few more days,
> it just flowed into stable too quickly.
> 
> Thanks Mel: I never knowingly hit the THP_SWAP issue which your patch
> is fixing, but it now looks like it was also responsible for mysterious
> occasional OOM kills that I had been chasing for weeks.
> 

I'm glad it helped! I worried that the additional warning would trigger an
excessive number of new bugs but it served its intended purpose -- catch
fallout from clobbering page->private causing subtle bugs later that are
hard to debug.

-- 
Mel Gorman
SUSE Labs


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-10-26  9:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-24 13:04 mm/huge_memory: do not clobber swp_entry_t during THP split Tvrtko Ursulin
2022-10-24 14:23 ` Mel Gorman
2022-10-25  8:50   ` Tvrtko Ursulin
2022-10-25 10:03     ` Mel Gorman
2022-10-25 15:26       ` Hugh Dickins
2022-10-26  9:25         ` Mel Gorman
2022-10-26  3:04       ` Andrew Morton
2022-10-26  3:35         ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox