From: Yosry Ahmed <yosryahmed@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Yu Zhao <yuzhao@google.com>,
"Jan Alexander Steffens (heftig)" <heftig@archlinux.org>,
Steven Barrett <steven@liquorix.net>,
Brian Geffon <bgeffon@google.com>,
"T.J. Alumbaugh" <talumbau@google.com>,
Gaosheng Cui <cuigaosheng1@huawei.com>,
Suren Baghdasaryan <surenb@google.com>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
David Hildenbrand <david@redhat.com>,
Jason Gunthorpe <jgg@ziepe.ca>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
David Howells <dhowells@redhat.com>,
Hugh Dickins <hughd@google.com>,
Greg Thelen <gthelen@google.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Yosry Ahmed <yosryahmed@google.com>
Subject: [RFC PATCH 2/5] mm/mlock: fixup mlock_count during unmap
Date: Sun, 18 Jun 2023 06:57:56 +0000 [thread overview]
Message-ID: <20230618065756.1364399-1-yosryahmed@google.com> (raw)
In the rare case where an mlocked order-0 folio is mapped 2^20 or more
times, the high mapcount can be interpreted mistakenly by munlock() as
an mlock_count, causing PG_mlocked to not be cleared, possibly leaving
the folio stranded as unevictable endlessly.
To fix this, add a hook during unmapping to check if the bits used for
mlock_count are 0s yet PG_mlocked is set. In this case, call make sure
to perform the missed munlock operation.
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
include/linux/mm.h | 4 ++++
mm/mlock.c | 18 +++++++++++++++++-
mm/rmap.c | 1 +
3 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3994580772b3..b341477a83e8 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1050,6 +1050,7 @@ unsigned long vmalloc_to_pfn(const void *addr);
extern bool is_vmalloc_addr(const void *x);
extern int is_vmalloc_or_module_addr(const void *x);
extern int folio_mlocked_mapcount(struct folio *folio);
+extern void folio_mlock_unmap_check(struct folio *folio);
#else
static inline bool is_vmalloc_addr(const void *x)
{
@@ -1063,6 +1064,9 @@ static inline int folio_mlocked_mapcount(struct folio *folio)
{
return 0;
}
+static inline void folio_mlock_unmap_check(struct folio *folio)
+{
+}
#endif
/*
diff --git a/mm/mlock.c b/mm/mlock.c
index 5c5462627391..8261df11d6a6 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -66,7 +66,8 @@ EXPORT_SYMBOL(can_do_mlock);
* (1) The mapcount will be incorrect (underestimated). It will be correct again
* once the number of mappings falls below MLOCK_COUNT_BIAS.
* (2) munlock() can misinterpret the large number of mappings as an mlock_count
- * and leave PG_mlocked set.
+ * and leave PG_mlocked set. This will be fixed when the number of mappings
+ * falls below MLOCK_COUNT_BIAS by folio_mlock_unmap_check().
*/
#define MLOCK_COUNT_SHIFT 20
#define MLOCK_COUNT_BIAS (1U << MLOCK_COUNT_SHIFT)
@@ -139,6 +140,21 @@ static int folio_mlock_count_dec(struct folio *folio)
return mlock_count - 1;
}
+/*
+ * Call after decrementing the mapcount. If the mapcount previously overflowed
+ * beyond the lower 20 bits for an order-0 mlocked folio, munlock() have
+ * mistakenly left the folio mlocked. Fix it here.
+ */
+void folio_mlock_unmap_check(struct folio *folio)
+{
+ int mapcount = atomic_read(&folio->_mapcount) + 1;
+ int mlock_count = mapcount >> MLOCK_COUNT_SHIFT;
+
+ if (unlikely(!folio_test_large(folio) && folio_test_mlocked(folio) &&
+ mlock_count == 0))
+ munlock_folio(folio);
+}
+
/*
* Mlocked folios are marked with the PG_mlocked flag for efficient testing
* in vmscan and, possibly, the fault path; and to support semi-accurate
diff --git a/mm/rmap.c b/mm/rmap.c
index 19392e090bec..02e558551f15 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1392,6 +1392,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
nr = atomic_dec_return_relaxed(mapped);
nr = (nr < COMPOUND_MAPPED);
}
+ folio_mlock_unmap_check(folio);
} else if (folio_test_pmd_mappable(folio)) {
/* That test is redundant: it's for safety or to optimize out */
--
2.41.0.162.gfafddb0af9-goog
reply other threads:[~2023-06-18 6:58 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230618065756.1364399-1-yosryahmed@google.com \
--to=yosryahmed@google.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=bgeffon@google.com \
--cc=cuigaosheng1@huawei.com \
--cc=david@redhat.com \
--cc=dhowells@redhat.com \
--cc=gthelen@google.com \
--cc=heftig@archlinux.org \
--cc=hughd@google.com \
--cc=jgg@ziepe.ca \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=steven@liquorix.net \
--cc=surenb@google.com \
--cc=talumbau@google.com \
--cc=willy@infradead.org \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox