From: Zi Yan <ziy@nvidia.com>
To: "Matthew Wilcox (Oracle)" <willy@infradead.org>,
Song Liu <songliubraving@fb.com>
Cc: Chris Mason <clm@fb.com>, David Sterba <dsterba@suse.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <ljs@kernel.org>, Zi Yan <ziy@nvidia.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Nico Pache <npache@redhat.com>,
Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>, Shuah Khan <shuah@kernel.org>,
linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-kselftest@vger.kernel.org
Subject: [PATCH 7.2 v3 02/12] mm/khugepaged: add folio dirty check after try_to_unmap()
Date: Fri, 17 Apr 2026 22:44:19 -0400 [thread overview]
Message-ID: <20260418024429.4055056-3-ziy@nvidia.com> (raw)
In-Reply-To: <20260418024429.4055056-1-ziy@nvidia.com>
This check ensures the correctness of collapse read-only THPs for FSes
after READ_ONLY_THP_FOR_FS is enabled by default for all FSes supporting
PMD THP pagecache.
READ_ONLY_THP_FOR_FS only supports read-only fd and uses mapping->nr_thps
and inode->i_writecount to prevent any write to read-only to-be-collapsed
folios. In upcoming commits, READ_ONLY_THP_FOR_FS will be removed and the
aforementioned mechanism will go away too. To ensure khugepaged functions
as expected after the changes, skip if any folio is dirty after
try_to_unmap(), since a dirty folio means this read-only folio
got some writes via mmap can happen between try_to_unmap() and
try_to_unmap_flush() via cached TLB entries and khugepaged does not support
writable pagecache folio collapse yet.
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
mm/khugepaged.c | 25 +++++++++++++++++++++----
1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 3eb5d982d3d3..1c0fdc81d276 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1979,8 +1979,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
}
} else if (folio_test_dirty(folio)) {
/*
- * khugepaged only works on read-only fd,
- * so this page is dirty because it hasn't
+ * This page is dirty because it hasn't
* been flushed since first write. There
* won't be new dirty pages.
*
@@ -2038,8 +2037,8 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
if (!is_shmem && (folio_test_dirty(folio) ||
folio_test_writeback(folio))) {
/*
- * khugepaged only works on read-only fd, so this
- * folio is dirty because it hasn't been flushed
+ * khugepaged only works on clean file-backed folios,
+ * so this folio is dirty because it hasn't been flushed
* since first write.
*/
result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
@@ -2083,6 +2082,24 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
goto out_unlock;
}
+ /*
+ * At this point, the folio is locked, unmapped. Make sure the
+ * folio is clean, so that no one else is able to write to it,
+ * since that would require taking the folio lock first.
+ * Otherwise that means the folio was pointed by a dirty PTE and
+ * some CPU might have a valid TLB entry with dirty bit set
+ * still pointing to this folio and writes can happen without
+ * causing a page table walk and folio lock acquisition before
+ * the try_to_unmap_flush() below is done. After the collapse,
+ * file-backed folio is not set as dirty and can be discarded
+ * before any new write marks the folio dirty, causing data
+ * corruption.
+ */
+ if (!is_shmem && folio_test_dirty(folio)) {
+ result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
+ goto out_unlock;
+ }
+
/*
* Accumulate the folios that are being collapsed.
*/
--
2.43.0
next prev parent reply other threads:[~2026-04-18 2:45 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-18 2:44 [PATCH 7.2 v3 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
2026-04-18 2:44 ` Zi Yan [this message]
2026-04-18 2:44 ` [PATCH 7.2 v3 03/12] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 04/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_pmd_enabled() Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 05/12] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 06/12] mm: fs: remove filemap_nr_thps*() functions and their users Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 07/12] fs: remove nr_thps from struct address_space Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 08/12] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 09/12] mm/truncate: use folio_split() in truncate_inode_partial_folio() Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 10/12] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 11/12] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS code from guard-regions Zi Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260418024429.4055056-3-ziy@nvidia.com \
--to=ziy@nvidia.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=brauner@kernel.org \
--cc=clm@fb.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=dsterba@suse.com \
--cc=jack@suse.cz \
--cc=lance.yang@linux.dev \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=npache@redhat.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=shuah@kernel.org \
--cc=songliubraving@fb.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox