linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Zi Yan <ziy@nvidia.com>
To: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Song Liu <songliubraving@fb.com>, Chris Mason <clm@fb.com>,
	David Sterba <dsterba@suse.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Lorenzo Stoakes <ljs@kernel.org>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Nico Pache <npache@redhat.com>,
	Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
	Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>,
	Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Shuah Khan <shuah@kernel.org>,
	<linux-btrfs@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<linux-fsdevel@vger.kernel.org>, <linux-mm@kvack.org>,
	<linux-kselftest@vger.kernel.org>
Subject: Re: [PATCH 7.2 v2 02/12] mm/khugepaged: add folio dirty check after try_to_unmap_flush()
Date: Thu, 16 Apr 2026 22:09:05 -0400	[thread overview]
Message-ID: <C83282EC-F3C3-44FB-A8A4-615F67F9CEEC@nvidia.com> (raw)
In-Reply-To: <842D3AD2-32A0-41E1-AC09-08EC793EAF7F@nvidia.com>

On 14 Apr 2026, at 11:55, Zi Yan wrote:

> On 14 Apr 2026, at 6:38, David Hildenbrand (Arm) wrote:
>
>> On 4/13/26 21:20, Zi Yan wrote:
>>> This check ensures the correctness of collapse read-only THPs for FSes
>>> after READ_ONLY_THP_FOR_FS is enabled by default for all FSes supporting
>>> PMD THP pagecache.
>>>
>>> READ_ONLY_THP_FOR_FS only supports read-only fd and uses mapping->nr_thps
>>> and inode->i_writecount to prevent any write to read-only to-be-collapsed
>>> folios. In upcoming commits, READ_ONLY_THP_FOR_FS will be removed and the
>>> aforementioned mechanism will go away too. To ensure khugepaged functions
>>> as expected after the changes, rollback if any folio is dirty after
>>> try_to_unmap_flush() to , since a dirty folio means this read-only folio
>>> got some writes via mmap can happen between try_to_unmap() and
>>> try_to_unmap_flush() via cached TLB entries and khugepaged does not support
>>> collapse writable pagecache folios.
>>>
>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>> ---
>>>  mm/khugepaged.c | 18 ++++++++++++++++++
>>>  1 file changed, 18 insertions(+)
>>>
>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>> index d2f0acd2dac2..ec609e53082e 100644
>>> --- a/mm/khugepaged.c
>>> +++ b/mm/khugepaged.c
>>> @@ -2121,6 +2121,24 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>>>  	 */
>>>  	try_to_unmap_flush();
>>>
>>> +	/*
>>> +	 * At this point, all folios are locked, unmapped, and all cached
>>> +	 * mappings in TLBs are flushed. No one else is able to write to these
>>> +	 * folios, since
>>> +	 * 1. writes via FS ops require folio locks (see write_begin_get_folio());
>>> +	 * 2. writes via mmap require taking a fault and locking folio locks.
>>> +	 *
>>
>> maybe simplify to "folios, since that would require taking the folio lock first."
>
> Sure.
>
>>
>>> +	 * khugepaged only works for read-only fd, make sure all folios are
>>> +	 * clean, since writes via mmap can happen between try_to_unmap() and
>>> +	 * try_to_unmap_flush() via cached TLB entries.
>>
>>
>> IIRC, after successful try_to_unmap() the PTE dirty bit would be synced to the
>> folio. That's what you care about, not about any stale TLB entries.
>
> Right. I missed the PTE dirty bit to folio dirtiness part.
>
>>
>> The important part is that the
>>
>> So can't we simply test for dirty folios after the refcount check (where
>> we made sure the folio is no longer mapped)?
>>
>
> I think so.

After more thoughts, this still works but the reasoning is more complicated.

After try_to_unmap(), PTE dirty bit is transferred to folio dirty flag if exists.
The code bails out if the folio is dirty. A clean folio means the removed
PTEs were clean, thus TLB entries caching these PTEs have dirty bit not set.
Any write to the folio after try_to_unmap() causes a page table walk and
a page fault due to invalid PTEs, and they cannot proceed since collapse_file()
is holding the folio lock. As a result, we can skip marking folio dirty
after the collapse.

If we allow a dirty folio after try_to_unmap(), we need to mark the folio dirty
after the collapse like shmem. And a potential optimization is that we only
mark the after-collapse folio dirty if any old folio is dirty after try_to_unmap().

>
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index b2ac28ddd480..920e16067134 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -2089,6 +2089,14 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>>                         goto out_unlock;
>>                 }
>>
>> +               /* ... */
>> +               if (!is_shmem && folio_test_dirty(folio)) {
>> +                       result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
>> +                       xas_unlock_irq(&xas);
>> +                       folio_putback_lru(folio);
>> +                       goto out_unlock;
>> +               }
>> +
>>                 /*
>>                  * Accumulate the folios that are being collapsed.
>>
>>
>> I guess we don't have to recheck folio_test_writeback() ?
>
> Right, writeback needs to take folio lock and we are holding it, so
> others can only dirty the folio but are not able to initiate write backs.
>
> Best Regards,
> Yan, Zi


--
Best Regards,
Yan, Zi


  reply	other threads:[~2026-04-17  2:09 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-13 19:20 [PATCH 7.2 v2 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
2026-04-13 19:20 ` [PATCH 7.2 v2 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
2026-04-13 20:20   ` Matthew Wilcox
2026-04-13 20:34     ` Zi Yan
2026-04-14 10:19       ` David Hildenbrand (Arm)
2026-04-14 10:20       ` David Hildenbrand (Arm)
2026-04-15  6:09       ` Baolin Wang
2026-04-14 10:29   ` David Hildenbrand (Arm)
2026-04-14 15:37     ` Lance Yang
2026-04-14 15:43       ` Lance Yang
2026-04-14 15:59         ` Zi Yan
2026-04-13 19:20 ` [PATCH 7.2 v2 02/12] mm/khugepaged: add folio dirty check after try_to_unmap_flush() Zi Yan
2026-04-13 20:23   ` Matthew Wilcox
2026-04-13 20:28     ` Zi Yan
2026-04-14 10:38   ` David Hildenbrand (Arm)
2026-04-14 15:55     ` Zi Yan
2026-04-17  2:09       ` Zi Yan [this message]
2026-04-13 19:20 ` [PATCH 7.2 v2 03/12] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Zi Yan
2026-04-14 10:40   ` David Hildenbrand (Arm)
2026-04-14 15:59     ` Zi Yan
2026-04-15  6:17   ` Baolin Wang
2026-04-13 19:20 ` [PATCH 7.2 v2 04/12] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Zi Yan
2026-04-14 10:40   ` David Hildenbrand (Arm)
2026-04-15  6:20   ` Baolin Wang
2026-04-13 19:20 ` [PATCH 7.2 v2 05/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_pmd_enabled() Zi Yan
2026-04-13 20:33   ` Matthew Wilcox
2026-04-13 20:42     ` Zi Yan
2026-04-14 11:02       ` David Hildenbrand (Arm)
2026-04-14 16:30         ` Zi Yan
2026-04-14 18:14           ` David Hildenbrand (Arm)
2026-04-14 18:25             ` Zi Yan
2026-04-15  6:36               ` Baolin Wang
2026-04-15  8:00                 ` David Hildenbrand (Arm)
2026-04-15  9:21                   ` Baolin Wang
2026-04-15 18:01                     ` Zi Yan
2026-04-16  0:49                       ` Baolin Wang
2026-04-16  8:47                       ` David Hildenbrand (Arm)
2026-04-16 13:56                         ` Zi Yan
2026-04-13 19:20 ` [PATCH 7.2 v2 06/12] mm: fs: remove filemap_nr_thps*() functions and their users Zi Yan
2026-04-13 20:35   ` Matthew Wilcox
2026-04-14 11:02   ` David Hildenbrand (Arm)
2026-04-15  6:53   ` Baolin Wang
2026-04-13 19:20 ` [PATCH 7.2 v2 07/12] fs: remove nr_thps from struct address_space Zi Yan
2026-04-13 20:38   ` Matthew Wilcox
2026-04-15  6:44   ` Baolin Wang
2026-04-13 19:20 ` [PATCH 7.2 v2 08/12] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS Zi Yan
2026-04-13 20:41   ` Matthew Wilcox
2026-04-13 20:46     ` Zi Yan
2026-04-14 11:03       ` David Hildenbrand (Arm)
2026-04-15  6:47   ` Baolin Wang
2026-04-13 19:20 ` [PATCH 7.2 v2 09/12] mm/truncate: use folio_split() in truncate_inode_partial_folio() Zi Yan
2026-04-13 19:20 ` [PATCH 7.2 v2 10/12] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS Zi Yan
2026-04-14 11:06   ` David Hildenbrand (Arm)
2026-04-13 19:20 ` [PATCH 7.2 v2 11/12] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged Zi Yan
2026-04-14 11:06   ` David Hildenbrand (Arm)
2026-04-13 19:20 ` [PATCH 7.2 v2 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS from comments in guard-regions Zi Yan
2026-04-13 20:47   ` Matthew Wilcox
2026-04-13 20:51     ` Zi Yan
2026-04-13 22:28       ` Matthew Wilcox
2026-04-14 11:09         ` David Hildenbrand (Arm)
2026-04-14 16:45           ` Zi Yan
2026-04-14 17:40             ` Matthew Wilcox
2026-04-14 17:53               ` Zi Yan
2026-04-14 11:07   ` David Hildenbrand (Arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=C83282EC-F3C3-44FB-A8A4-615F67F9CEEC@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=brauner@kernel.org \
    --cc=clm@fb.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=dsterba@suse.com \
    --cc=jack@suse.cz \
    --cc=lance.yang@linux.dev \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=npache@redhat.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shuah@kernel.org \
    --cc=songliubraving@fb.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox