Re: [RFC PATCH v2 3/3] mm: mlock: update mlock_pte_range to handle large folio

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Yu Zhao <yuzhao@google.com>
To: Yin Fengwei <fengwei.yin@intel.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	 akpm@linux-foundation.org, willy@infradead.org,
	david@redhat.com,  ryan.roberts@arm.com, shy828301@gmail.com
Subject: Re: [RFC PATCH v2 3/3] mm: mlock: update mlock_pte_range to handle large folio
Date: Wed, 26 Jul 2023 10:57:27 -0600	[thread overview]
Message-ID: <CAOUHufY_b2skiEXSukpOLnpbzrifFiwxY8HA0W_z9aZbVome4Q@mail.gmail.com> (raw)
In-Reply-To: <3bd7b290-91ad-347f-b1b5-5d45ac566f69@intel.com>

On Wed, Jul 26, 2023 at 6:49 AM Yin Fengwei <fengwei.yin@intel.com> wrote:
>
>
>
> On 7/15/23 14:06, Yu Zhao wrote:
> > On Wed, Jul 12, 2023 at 12:31 AM Yu Zhao <yuzhao@google.com> wrote:
> >>
> >> On Wed, Jul 12, 2023 at 12:02 AM Yin Fengwei <fengwei.yin@intel.com> wrote:
> >>>
> >>> Current kernel only lock base size folio during mlock syscall.
> >>> Add large folio support with following rules:
> >>>   - Only mlock large folio when it's in VM_LOCKED VMA range
> >>>
> >>>   - If there is cow folio, mlock the cow folio as cow folio
> >>>     is also in VM_LOCKED VMA range.
> >>>
> >>>   - munlock will apply to the large folio which is in VMA range
> >>>     or cross the VMA boundary.
> >>>
> >>> The last rule is used to handle the case that the large folio is
> >>> mlocked, later the VMA is split in the middle of large folio
> >>> and this large folio become cross VMA boundary.
> >>>
> >>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
> >>> ---
> >>>  mm/mlock.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++++---
> >>>  1 file changed, 99 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/mm/mlock.c b/mm/mlock.c
> >>> index 0a0c996c5c214..f49e079066870 100644
> >>> --- a/mm/mlock.c
> >>> +++ b/mm/mlock.c
> >>> @@ -305,6 +305,95 @@ void munlock_folio(struct folio *folio)
> >>>         local_unlock(&mlock_fbatch.lock);
> >>>  }
> >>>
> >>> +static inline bool should_mlock_folio(struct folio *folio,
> >>> +                                       struct vm_area_struct *vma)
> >>> +{
> >>> +       if (vma->vm_flags & VM_LOCKED)
> >>> +               return (!folio_test_large(folio) ||
> >>> +                               folio_within_vma(folio, vma));
> >>> +
> >>> +       /*
> >>> +        * For unlock, allow munlock large folio which is partially
> >>> +        * mapped to VMA. As it's possible that large folio is
> >>> +        * mlocked and VMA is split later.
> >>> +        *
> >>> +        * During memory pressure, such kind of large folio can
> >>> +        * be split. And the pages are not in VM_LOCKed VMA
> >>> +        * can be reclaimed.
> >>> +        */
> >>> +
> >>> +       return true;
> >>
> >> Looks good, or just
> >>
> >> should_mlock_folio() // or whatever name you see fit, can_mlock_folio()?
> >> {
> >>   return !(vma->vm_flags & VM_LOCKED) || folio_within_vma();
> >> }
> >>
> >>> +}
> >>> +
> >>> +static inline unsigned int get_folio_mlock_step(struct folio *folio,
> >>> +                       pte_t pte, unsigned long addr, unsigned long end)
> >>> +{
> >>> +       unsigned int nr;
> >>> +
> >>> +       nr = folio_pfn(folio) + folio_nr_pages(folio) - pte_pfn(pte);
> >>> +       return min_t(unsigned int, nr, (end - addr) >> PAGE_SHIFT);
> >>> +}
> >>> +
> >>> +void mlock_folio_range(struct folio *folio, struct vm_area_struct *vma,
> >>> +               pte_t *pte, unsigned long addr, unsigned int nr)
> >>> +{
> >>> +       struct folio *cow_folio;
> >>> +       unsigned int step = 1;
> >>> +
> >>> +       mlock_folio(folio);
> >>> +       if (nr == 1)
> >>> +               return;
> >>> +
> >>> +       for (; nr > 0; pte += step, addr += (step << PAGE_SHIFT), nr -= step) {
> >>> +               pte_t ptent;
> >>> +
> >>> +               step = 1;
> >>> +               ptent = ptep_get(pte);
> >>> +
> >>> +               if (!pte_present(ptent))
> >>> +                       continue;
> >>> +
> >>> +               cow_folio = vm_normal_folio(vma, addr, ptent);
> >>> +               if (!cow_folio || cow_folio == folio) {
> >>> +                       continue;
> >>> +               }
> >>> +
> >>> +               mlock_folio(cow_folio);
> >>> +               step = get_folio_mlock_step(folio, ptent,
> >>> +                               addr, addr + (nr << PAGE_SHIFT));
> >>> +       }
> >>> +}
> >>> +
> >>> +void munlock_folio_range(struct folio *folio, struct vm_area_struct *vma,
> >>> +               pte_t *pte, unsigned long addr, unsigned int nr)
> >>> +{
> >>> +       struct folio *cow_folio;
> >>> +       unsigned int step = 1;
> >>> +
> >>> +       munlock_folio(folio);
> >>> +       if (nr == 1)
> >>> +               return;
> >>> +
> >>> +       for (; nr > 0; pte += step, addr += (step << PAGE_SHIFT), nr -= step) {
> >>> +               pte_t ptent;
> >>> +
> >>> +               step = 1;
> >>> +               ptent = ptep_get(pte);
> >>> +
> >>> +               if (!pte_present(ptent))
> >>> +                       continue;
> >>> +
> >>> +               cow_folio = vm_normal_folio(vma, addr, ptent);
> >>> +               if (!cow_folio || cow_folio == folio) {
> >>> +                       continue;
> >>> +               }
> >>> +
> >>> +               munlock_folio(cow_folio);
> >>> +               step = get_folio_mlock_step(folio, ptent,
> >>> +                               addr, addr + (nr << PAGE_SHIFT));
> >>> +       }
> >>> +}
> >>
> >> I'll finish the above later.
> >
> > There is a problem here that I didn't have the time to elaborate: we
> > can't mlock() a folio that is within the range but not fully mapped
> > because this folio can be on the deferred split queue. When the split
> > happens, those unmapped folios (not mapped by this vma but are mapped
> > into other vmas) will be stranded on the unevictable lru.
> Checked remap case in past few days, I agree we shouldn't treat a folio
> in the range but not fully mapped as in_range folio.
>
> As for remap case, it's possible that the folio is not in deferred split
> queue. But part of folio is mapped to VM_LOCKED vma and other part of
> folio is mapped to none VM_LOCKED vma. In this case, page can't be split
> as it's not in deferred split queue. So page reclaim should be allowed to
> pick this folio up, split it and reclaim the pages in none VM_LOCKED vma.
> So we can't mlock such kind of folio.
>
> The same thing can happen with madvise_cold_or_pageout_pte_range().
> I will update folio_in_vma() to check the PTE also.

Thanks, and I think we should move forward with this series and fix
the potential mlock race problem separately since it's not caused by
this series.

WDYT?

next prev parent reply	other threads:[~2023-07-26 16:58 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-12  6:01 [RFC PATCH v2 0/3] support large folio for mlock Yin Fengwei
2023-07-12  6:01 ` [RFC PATCH v2 1/3] mm: add functions folio_in_range() and folio_within_vma() Yin Fengwei
2023-07-12  6:11   ` Yu Zhao
2023-07-12  6:01 ` [RFC PATCH v2 2/3] mm: handle large folio when large folio in VM_LOCKED VMA range Yin Fengwei
2023-07-12  6:23   ` Yu Zhao
2023-07-12  6:43     ` Yin Fengwei
2023-07-12 17:03       ` Yu Zhao
2023-07-13  1:55         ` Yin Fengwei
2023-07-14  2:21       ` Hugh Dickins
2023-07-14  2:49         ` Yin, Fengwei
2023-07-14  3:41           ` Hugh Dickins
2023-07-14  5:45             ` Yin, Fengwei
2023-07-12  6:01 ` [RFC PATCH v2 3/3] mm: mlock: update mlock_pte_range to handle large folio Yin Fengwei
2023-07-12  6:31   ` Yu Zhao
2023-07-15  6:06     ` Yu Zhao
2023-07-16 23:59       ` Yin, Fengwei
2023-07-17  0:35         ` Yu Zhao
2023-07-17  1:58           ` Yin Fengwei
2023-07-18 22:48             ` Yosry Ahmed
2023-07-18 23:47               ` Yin Fengwei
2023-07-19  1:32                 ` Yosry Ahmed
2023-07-19  1:52                   ` Yosry Ahmed
2023-07-19  1:57                     ` Yin Fengwei
2023-07-19  2:00                       ` Yosry Ahmed
2023-07-19  2:09                         ` Yin Fengwei
2023-07-19  2:22                           ` Yosry Ahmed
2023-07-19  2:28                             ` Yin Fengwei
2023-07-19 14:26                               ` Hugh Dickins
2023-07-19 15:44                                 ` Yosry Ahmed
2023-07-20 12:02                                   ` Yin, Fengwei
2023-07-20 20:51                                     ` Yosry Ahmed
2023-07-21  1:12                                       ` Yin, Fengwei
2023-07-21  1:35                                         ` Yosry Ahmed
2023-07-21  3:18                                           ` Yin, Fengwei
2023-07-21  3:39                                             ` Yosry Ahmed
2023-07-20  1:52                                 ` Yin, Fengwei
2023-07-17  8:12           ` Yin Fengwei
2023-07-18  2:06             ` Yin Fengwei
2023-07-18  3:59               ` Yu Zhao
2023-07-26 12:49       ` Yin Fengwei
2023-07-26 16:57         ` Yu Zhao [this message]
2023-07-27  0:15           ` Yin Fengwei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOUHufY_b2skiEXSukpOLnpbzrifFiwxY8HA0W_z9aZbVome4Q@mail.gmail.com \
    --to=yuzhao@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=fengwei.yin@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox