From: Zi Yan <ziy@nvidia.com>
To: David Hildenbrand <david@redhat.com>
Cc: "Pankaj Raghav (Samsung)" <kernel@pankajraghav.com>,
Matthew Wilcox <willy@infradead.org>,
Luis Chamberlain <mcgrof@kernel.org>,
Jinjiang Tu <tujinjiang@huawei.com>,
Oscar Salvador <osalvador@suse.de>,
akpm@linux-foundation.org, linmiaohe@huawei.com,
mhocko@kernel.org, linux-mm@kvack.org,
wangkefeng.wang@huawei.com
Subject: Re: [PATCH v2 2/2] mm/memory_hotplug: fix hwpoisoned large folio handling in do_migrate_range
Date: Mon, 14 Jul 2025 11:44:54 -0400 [thread overview]
Message-ID: <E0D87F2B-20AD-4AE4-943E-C126395C3CE5@nvidia.com> (raw)
In-Reply-To: <c0f5492c-f9fe-48c8-98bc-d8cc8e7e00b3@redhat.com>
On 14 Jul 2025, at 11:33, David Hildenbrand wrote:
> On 14.07.25 17:28, Zi Yan wrote:
>> On 14 Jul 2025, at 11:25, Zi Yan wrote:
>>
>>> On 14 Jul 2025, at 11:14, David Hildenbrand wrote:
>>>
>>>> On 14.07.25 17:09, Pankaj Raghav (Samsung) wrote:
>>>>>>>>> So we will need to take care of madvise cold or pageout case?
>>>>>>>>>
>>>>>>>>> Hi Matthew, Pankaj, and Luis,
>>>>>>>>>
>>>>>>>>> Is it possible to partially map a min-order folio in a fs with LBS? Based on my
>>>>>>>>
>>>>>>>> Typically, FSs match the min order with the blocksize of the filesystem.
>>>>>>>> As a filesystem block is the smallest unit of data that the filesystem uses
>>>>>>>> to store file data on the disk, we cannot partially map them.
>>>>>>>>
>>>>>>>> So if I understand your question correctly, the answer is no.
>>>>>>
>>>>>> I'm confused. Shouldn't this be trivially possible?
>>>>>>
>>>>> Hmm, maybe I misunderstood the question?
>>>>>
>>>>>> E.g., just mmap() a single page of such a file? Who would make that fail?
>>>>>>
>>>>>
>>>>> My point was, even if you try to mmap a single page of a file, page
>>>>> cache will read the whole block (that corresponds to min order folio).
>>>>>
>>>>> Technically we can mmap a single page of file, but FS will always read
>>>>> and write **at least** in min folio order chunks.
>>>>
>>>> Okay, so it can be partially mapped into page tables :) What happens in the background (page cache management) is a different story
>>>
>>> David, thanks for getting to the bottom of this.
>>>
>>> OK. So we will see deadlock looping in madvise cold or pageout case.
>>> I wonder how to proceed with this. Since the folio is seen as a whole
>>> by fs, it should be marked cold/paged out as a whole. Maybe we should
>>> skip the partially mapped region?
>>
>> Actually, it is skipped, since split_folio() bumps new_order to the min
>> order, and if the folio order is already at min order, split code return
>> -EINVAL. This makes the madvise cold or pageout code move to the next
>> address.
>
> But what if the folio order is 2x min_order etc?
If folio order is greater than min_order, the code is suboptimal.
It will first split the original folio to min_order, then
loop through after-split folios. If the after-split ones are fully
mapped, madvise ops will be performed. Otherwise, the code will
try to split after-split ones again, fail, and end up with skipping
the partially mapped range.
An improvement can be done by skipping the folio if it is partially
mapped and its order is already min_order. Something like:
diff --git a/mm/madvise.c b/mm/madvise.c
index e61e32b2cd91..545ab7920a81 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -499,6 +499,13 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
if (nr < folio_nr_pages(folio)) {
int err;
+ /*
+ * Skip partially mapped folios that are
+ * already at their min order
+ */
+ if (folio_order(folio) ==
+ min_order_for_split(folio))
+ continue;
if (folio_maybe_mapped_shared(folio))
continue;
if (pageout_anon_only_filter && !folio_test_anon(folio))
Best Regards,
Yan, Zi
next prev parent reply other threads:[~2025-07-14 15:45 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-27 12:57 [PATCH v2 0/2] fix two calls of unmap_poisoned_folio() for large folio Jinjiang Tu
2025-06-27 12:57 ` [PATCH v2 1/2] mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list Jinjiang Tu
2025-06-27 17:10 ` David Hildenbrand
2025-06-27 22:00 ` Andrew Morton
2025-06-28 2:38 ` Jinjiang Tu
2025-06-28 3:13 ` Miaohe Lin
2025-07-01 14:13 ` Oscar Salvador
2025-07-03 7:30 ` Jinjiang Tu
2025-06-27 12:57 ` [PATCH v2 2/2] mm/memory_hotplug: fix hwpoisoned large folio handling in do_migrate_range Jinjiang Tu
2025-07-01 14:21 ` Oscar Salvador
2025-07-03 7:46 ` Jinjiang Tu
2025-07-03 7:57 ` David Hildenbrand
2025-07-03 8:24 ` Jinjiang Tu
2025-07-03 9:06 ` David Hildenbrand
2025-07-07 11:51 ` Jinjiang Tu
2025-07-07 12:37 ` David Hildenbrand
2025-07-08 1:15 ` Jinjiang Tu
2025-07-08 9:54 ` David Hildenbrand
2025-07-09 16:27 ` Zi Yan
2025-07-14 13:53 ` Pankaj Raghav
2025-07-14 14:20 ` Zi Yan
2025-07-14 14:24 ` David Hildenbrand
2025-07-14 15:09 ` Pankaj Raghav (Samsung)
2025-07-14 15:14 ` David Hildenbrand
2025-07-14 15:25 ` Zi Yan
2025-07-14 15:28 ` Zi Yan
2025-07-14 15:33 ` David Hildenbrand
2025-07-14 15:44 ` Zi Yan [this message]
2025-07-14 15:52 ` David Hildenbrand
2025-07-20 2:23 ` Andrew Morton
2025-07-22 15:30 ` David Hildenbrand
2025-08-21 5:02 ` Andrew Morton
2025-08-21 22:07 ` David Hildenbrand
2025-08-22 17:24 ` Zi Yan
2025-08-25 2:05 ` Miaohe Lin
2025-07-03 7:53 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=E0D87F2B-20AD-4AE4-943E-C126395C3CE5@nvidia.com \
--to=ziy@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=kernel@pankajraghav.com \
--cc=linmiaohe@huawei.com \
--cc=linux-mm@kvack.org \
--cc=mcgrof@kernel.org \
--cc=mhocko@kernel.org \
--cc=osalvador@suse.de \
--cc=tujinjiang@huawei.com \
--cc=wangkefeng.wang@huawei.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox