From: Andrew Morton <akpm@linux-foundation.org>
To: Hugh Dickins <hughd@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>,
Nhat Pham <nphamcs@gmail.com>, Yang Shi <shy828301@gmail.com>,
Zi Yan <ziy@nvidia.com>, Barry Song <baohua@kernel.org>,
Kefeng Wang <wangkefeng.wang@huawei.com>,
David Hildenbrand <david@redhat.com>,
Matthew Wilcox <willy@infradead.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Kefeng Wang <wangkefeng.wang@huawei.com>
Subject: Re: [PATCH hotfix] mm: fix crashes from deferred split racing folio migration
Date: Wed, 3 Jul 2024 19:35:36 -0700 [thread overview]
Message-ID: <20240703193536.78bce768a9330da3a361ca8a@linux-foundation.org> (raw)
In-Reply-To: <29c83d1a-11ca-b6c9-f92e-6ccb322af510@google.com>
On Tue, 2 Jul 2024 00:40:55 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote:
> Even on 6.10-rc6, I've been seeing elusive "Bad page state"s (often on
> flags when freeing, yet the flags shown are not bad: PG_locked had been
> set and cleared??), and VM_BUG_ON_PAGE(page_ref_count(page) == 0)s from
> deferred_split_scan()'s folio_put(), and a variety of other BUG and WARN
> symptoms implying double free by deferred split and large folio migration.
>
> 6.7 commit 9bcef5973e31 ("mm: memcg: fix split queue list crash when large
> folio migration") was right to fix the memcg-dependent locking broken in
> 85ce2c517ade ("memcontrol: only transfer the memcg data for migration"),
> but missed a subtlety of deferred_split_scan(): it moves folios to its own
> local list to work on them without split_queue_lock, during which time
> folio->_deferred_list is not empty, but even the "right" lock does nothing
> to secure the folio and the list it is on.
>
> Fortunately, deferred_split_scan() is careful to use folio_try_get(): so
> folio_migrate_mapping() can avoid the race by folio_undo_large_rmappable()
> while the old folio's reference count is temporarily frozen to 0 - adding
> such a freeze in the !mapping case too (originally, folio lock and
> unmapping and no swap cache left an anon folio unreachable, so no freezing
> was needed there: but the deferred split queue offers a way to reach it).
There's a conflict when applying Kefeng's "mm: refactor
folio_undo_large_rmappable()"
(https://lkml.kernel.org/r/20240521130315.46072-1-wangkefeng.wang@huawei.com)
on top of this hotfix.
--- mm/memcontrol.c~mm-refactor-folio_undo_large_rmappable
+++ mm/memcontrol.c
@@ -7832,8 +7832,7 @@ void mem_cgroup_migrate(struct folio *ol
* In addition, the old folio is about to be freed after migration, so
* removing from the split queue a bit earlier seems reasonable.
*/
- if (folio_test_large(old) && folio_test_large_rmappable(old))
- folio_undo_large_rmappable(old);
+ folio_undo_large_rmappable(old);
old->memcg_data = 0;
}
I'm resolving this by simply dropping the above hunk. So Kefeng's
patch is now as below. Please check.
--- a/mm/huge_memory.c~mm-refactor-folio_undo_large_rmappable
+++ a/mm/huge_memory.c
@@ -3258,22 +3258,11 @@ out:
return ret;
}
-void folio_undo_large_rmappable(struct folio *folio)
+void __folio_undo_large_rmappable(struct folio *folio)
{
struct deferred_split *ds_queue;
unsigned long flags;
- if (folio_order(folio) <= 1)
- return;
-
- /*
- * At this point, there is no one trying to add the folio to
- * deferred_list. If folio is not in deferred_list, it's safe
- * to check without acquiring the split_queue_lock.
- */
- if (data_race(list_empty(&folio->_deferred_list)))
- return;
-
ds_queue = get_deferred_split_queue(folio);
spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
if (!list_empty(&folio->_deferred_list)) {
--- a/mm/internal.h~mm-refactor-folio_undo_large_rmappable
+++ a/mm/internal.h
@@ -622,7 +622,22 @@ static inline void folio_set_order(struc
#endif
}
-void folio_undo_large_rmappable(struct folio *folio);
+void __folio_undo_large_rmappable(struct folio *folio);
+static inline void folio_undo_large_rmappable(struct folio *folio)
+{
+ if (folio_order(folio) <= 1 || !folio_test_large_rmappable(folio))
+ return;
+
+ /*
+ * At this point, there is no one trying to add the folio to
+ * deferred_list. If folio is not in deferred_list, it's safe
+ * to check without acquiring the split_queue_lock.
+ */
+ if (data_race(list_empty(&folio->_deferred_list)))
+ return;
+
+ __folio_undo_large_rmappable(folio);
+}
static inline struct folio *page_rmappable_folio(struct page *page)
{
--- a/mm/page_alloc.c~mm-refactor-folio_undo_large_rmappable
+++ a/mm/page_alloc.c
@@ -2661,8 +2661,7 @@ void free_unref_folios(struct folio_batc
unsigned long pfn = folio_pfn(folio);
unsigned int order = folio_order(folio);
- if (order > 0 && folio_test_large_rmappable(folio))
- folio_undo_large_rmappable(folio);
+ folio_undo_large_rmappable(folio);
if (!free_pages_prepare(&folio->page, order))
continue;
/*
--- a/mm/swap.c~mm-refactor-folio_undo_large_rmappable
+++ a/mm/swap.c
@@ -123,8 +123,7 @@ void __folio_put(struct folio *folio)
}
page_cache_release(folio);
- if (folio_test_large(folio) && folio_test_large_rmappable(folio))
- folio_undo_large_rmappable(folio);
+ folio_undo_large_rmappable(folio);
mem_cgroup_uncharge(folio);
free_unref_page(&folio->page, folio_order(folio));
}
@@ -1021,10 +1020,7 @@ void folios_put_refs(struct folio_batch
free_huge_folio(folio);
continue;
}
- if (folio_test_large(folio) &&
- folio_test_large_rmappable(folio))
- folio_undo_large_rmappable(folio);
-
+ folio_undo_large_rmappable(folio);
__page_cache_release(folio, &lruvec, &flags);
if (j != i)
--- a/mm/vmscan.c~mm-refactor-folio_undo_large_rmappable
+++ a/mm/vmscan.c
@@ -1439,9 +1439,7 @@ free_it:
*/
nr_reclaimed += nr_pages;
- if (folio_test_large(folio) &&
- folio_test_large_rmappable(folio))
- folio_undo_large_rmappable(folio);
+ folio_undo_large_rmappable(folio);
if (folio_batch_add(&free_folios, folio) == 0) {
mem_cgroup_uncharge_folios(&free_folios);
try_to_unmap_flush();
@@ -1848,9 +1846,7 @@ static unsigned int move_folios_to_lru(s
if (unlikely(folio_put_testzero(folio))) {
__folio_clear_lru_flags(folio);
- if (folio_test_large(folio) &&
- folio_test_large_rmappable(folio))
- folio_undo_large_rmappable(folio);
+ folio_undo_large_rmappable(folio);
if (folio_batch_add(&free_folios, folio) == 0) {
spin_unlock_irq(&lruvec->lru_lock);
mem_cgroup_uncharge_folios(&free_folios);
_
next prev parent reply other threads:[~2024-07-04 2:35 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-02 7:40 Hugh Dickins
2024-07-02 9:25 ` Baolin Wang
2024-07-02 16:15 ` Hugh Dickins
2024-07-03 1:51 ` Baolin Wang
2024-07-03 2:13 ` Andrew Morton
2024-07-03 14:30 ` Zi Yan
2024-07-03 16:21 ` David Hildenbrand
2024-07-03 16:22 ` Zi Yan
2024-07-04 2:35 ` Andrew Morton [this message]
2024-07-04 3:21 ` Hugh Dickins
2024-07-04 3:28 ` Andrew Morton
2024-07-04 6:12 ` Kefeng Wang
2024-07-06 21:29 ` Hugh Dickins
2024-07-07 2:11 ` Andrew Morton
2024-07-07 3:07 ` Kefeng Wang
2024-07-07 8:28 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240703193536.78bce768a9330da3a361ca8a@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@redhat.com \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=shy828301@gmail.com \
--cc=wangkefeng.wang@huawei.com \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox