From: David Hildenbrand <david@redhat.com>
To: Lance Yang <ioworker0@gmail.com>
Cc: akpm@linux-foundation.org, willy@infradead.org, sj@kernel.org,
baolin.wang@linux.alibaba.com, maskray@google.com,
ziy@nvidia.com, ryan.roberts@arm.com, 21cnbao@gmail.com,
mhocko@suse.com, fengwei.yin@intel.com, zokeefe@google.com,
shy828301@gmail.com, xiehuan09@gmail.com, libang.li@antgroup.com,
wangkefeng.wang@huawei.com, songmuchun@bytedance.com,
peterx@redhat.com, minchan@kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v6 2/3] mm/rmap: integrate PMD-mapped folio splitting into pagewalk loop
Date: Thu, 6 Jun 2024 10:01:43 +0200 [thread overview]
Message-ID: <758f7be7-c17e-46d1-879f-83340ec85749@redhat.com> (raw)
In-Reply-To: <CAK1f24kKra71RSQdFOpQecU6+yMELC748irKUt54Kg64-P=4-A@mail.gmail.com>
On 06.06.24 05:55, Lance Yang wrote:
> On Wed, Jun 5, 2024 at 10:28 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 05.06.24 16:20, Lance Yang wrote:
>>> Hi David,
>>>
>>> On Wed, Jun 5, 2024 at 8:46 PM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 21.05.24 06:02, Lance Yang wrote:
>>>>> In preparation for supporting try_to_unmap_one() to unmap PMD-mapped
>>>>> folios, start the pagewalk first, then call split_huge_pmd_address() to
>>>>> split the folio.
>>>>>
>>>>> Since TTU_SPLIT_HUGE_PMD will no longer perform immediately, we might
>>>>> encounter a PMD-mapped THP missing the mlock in the VM_LOCKED range during
>>>>> the page walk. It’s probably necessary to mlock this THP to prevent it from
>>>>> being picked up during page reclaim.
>>>>>
>>>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>>>> Suggested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>>> Signed-off-by: Lance Yang <ioworker0@gmail.com>
>>>>> ---
>>>>
>>>> [...] again, sorry for the late review.
>>>
>>> No worries at all, thanks for taking time to review!
>>>
>>>>
>>>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>>> index ddffa30c79fb..08a93347f283 100644
>>>>> --- a/mm/rmap.c
>>>>> +++ b/mm/rmap.c
>>>>> @@ -1640,9 +1640,6 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>>>> if (flags & TTU_SYNC)
>>>>> pvmw.flags = PVMW_SYNC;
>>>>>
>>>>> - if (flags & TTU_SPLIT_HUGE_PMD)
>>>>> - split_huge_pmd_address(vma, address, false, folio);
>>>>> -
>>>>> /*
>>>>> * For THP, we have to assume the worse case ie pmd for invalidation.
>>>>> * For hugetlb, it could be much worse if we need to do pud
>>>>> @@ -1668,20 +1665,35 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>>>> mmu_notifier_invalidate_range_start(&range);
>>>>>
>>>>> while (page_vma_mapped_walk(&pvmw)) {
>>>>> - /* Unexpected PMD-mapped THP? */
>>>>> - VM_BUG_ON_FOLIO(!pvmw.pte, folio);
>>>>> -
>>>>> /*
>>>>> * If the folio is in an mlock()d vma, we must not swap it out.
>>>>> */
>>>>> if (!(flags & TTU_IGNORE_MLOCK) &&
>>>>> (vma->vm_flags & VM_LOCKED)) {
>>>>> /* Restore the mlock which got missed */
>>>>> - if (!folio_test_large(folio))
>>>>> + if (!folio_test_large(folio) ||
>>>>> + (!pvmw.pte && (flags & TTU_SPLIT_HUGE_PMD)))
>>>>> mlock_vma_folio(folio, vma);
>>>>
>>>> Can you elaborate why you think this would be required? If we would have
>>>> performed the split_huge_pmd_address() beforehand, we would still be
>>>> left with a large folio, no?
>>>
>>> Yep, there would still be a large folio, but it wouldn't be PMD-mapped.
>>>
>>> After Weifeng's series[1], the kernel supports mlock for PTE-mapped large
>>> folio, but there are a few scenarios where we don't mlock a large folio, such
>>> as when it crosses a VM_LOCKed VMA boundary.
>>>
>>> - if (!folio_test_large(folio))
>>> + if (!folio_test_large(folio) ||
>>> + (!pvmw.pte && (flags & TTU_SPLIT_HUGE_PMD)))
>>>
>>> And this check is just future-proofing and likely unnecessary. If encountering a
>>> PMD-mapped THP missing the mlock for some reason, we can mlock this
>>> THP to prevent it from being picked up during page reclaim, since it is fully
>>> mapped and doesn't cross the VMA boundary, IIUC.
>>>
>>> What do you think?
>>> I would appreciate any suggestions regarding this check ;)
>>
>> Reading this patch only, I wonder if this change makes sense in the
>> context here.
>
> Allow me to try explaining it again ;)
>
>>
>> Before this patch, we would have PTE-mapped the PMD-mapped THP before
>> reaching this call and skipped it due to "!folio_test_large(folio)".
>
> Yes, there is only a PTE-mapped THP when doing the "!folio_test_large(folio)"
> check, as we will first conditionally split the PMD via
> split_huge_pmd_address().
>
>>
>> After this patch, we either
>
> Things will change. We'll first do the "!folio_test_large(folio)" check, then
> conditionally split the PMD via split_huge_pmd_address().
>
>>
>> a) PTE-remap the THP after this check, but retry and end-up here again,
>> whereby we would skip it due to "!folio_test_large(folio)".
>
> Hmm...
>
> IIUC, we will skip it after this check, stop the page walk, and not
> PTE-remap the THP.
>
>>
>> b) Discard the PMD-mapped THP due to lazyfree directly. Can that
>> co-exist with mlock and what would be the problem here with mlock?
>
> Before discarding a PMD-mapped THP as a whole, as patch #3 did,
> we also perform the "!folio_test_large(folio)" check. If the THP coexists
> with mlock, we will skip it, stop the page walk, and not discard it. IIUC.
But "!folio_test_large(folio)" would *skip* the THP and not consider it
regarding mlock.
I'm probably missing something and should try current mm/mm-unstable
with MADV_FREE + mlock() on a PMD-mapped THP.
>
>>
>>
>> So if the check is required in this patch, we really have to understand
>> why. If not, we should better drop it from this patch.
>
> I added the "!pvmw.pte && (flags & TTU_SPLIT_HUGE_PMD))" check
> in this patch just to future-proof mlock for a PMD-mapped THP missing
> the mlock, to prevent it from being picked up during page reclaim.
>
> But is this really required? It seems like nothing should really be broken
> without this check.
>
> Perhaps, we should drop it from this patch until we fully understand the
> reason for it. Could you get me some suggestions?
We should drop it from this patch, agreed. We might need it
("!pvmw.pte") in patch #3, but I still have to understand if there
really would be a problem.
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2024-06-06 8:01 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-21 4:02 [PATCH v6 0/3] Reclaim lazyfree THP without splitting Lance Yang
2024-05-21 4:02 ` [PATCH v6 1/3] mm/rmap: remove duplicated exit code in pagewalk loop Lance Yang
2024-06-05 12:34 ` David Hildenbrand
2024-06-05 12:49 ` Lance Yang
2024-05-21 4:02 ` [PATCH v6 2/3] mm/rmap: integrate PMD-mapped folio splitting into " Lance Yang
2024-06-05 12:46 ` David Hildenbrand
2024-06-05 14:20 ` Lance Yang
2024-06-05 14:28 ` David Hildenbrand
2024-06-05 14:39 ` David Hildenbrand
2024-06-05 14:57 ` Lance Yang
2024-06-05 15:02 ` David Hildenbrand
2024-06-05 15:43 ` Lance Yang
2024-06-05 16:16 ` David Hildenbrand
2024-06-06 3:57 ` Lance Yang
2024-06-06 3:55 ` Lance Yang
2024-06-06 8:01 ` David Hildenbrand [this message]
2024-06-06 8:06 ` David Hildenbrand
2024-06-06 9:38 ` Lance Yang
2024-06-06 9:41 ` David Hildenbrand
2024-06-07 1:50 ` Lance Yang
2024-05-21 4:02 ` [PATCH v6 3/3] mm/vmscan: avoid split lazyfree THP during shrink_folio_list() Lance Yang
[not found] ` <ede2a2ad-1046-4967-a930-692cfa829c7b@redhat.com>
2024-06-05 14:40 ` Lance Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=758f7be7-c17e-46d1-879f-83340ec85749@redhat.com \
--to=david@redhat.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=fengwei.yin@intel.com \
--cc=ioworker0@gmail.com \
--cc=libang.li@antgroup.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=maskray@google.com \
--cc=mhocko@suse.com \
--cc=minchan@kernel.org \
--cc=peterx@redhat.com \
--cc=ryan.roberts@arm.com \
--cc=shy828301@gmail.com \
--cc=sj@kernel.org \
--cc=songmuchun@bytedance.com \
--cc=wangkefeng.wang@huawei.com \
--cc=willy@infradead.org \
--cc=xiehuan09@gmail.com \
--cc=ziy@nvidia.com \
--cc=zokeefe@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox