Re: [RFC PATCH 0/3] support large folio for mlock

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Yin, Fengwei" <fengwei.yin@intel.com>
To: Yu Zhao <yuzhao@google.com>, Matthew Wilcox <willy@infradead.org>
Cc: David Hildenbrand <david@redhat.com>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>, <ryan.roberts@arm.com>,
	<shy828301@gmail.com>, <akpm@linux-foundation.org>
Subject: Re: [RFC PATCH 0/3] support large folio for mlock
Date: Sat, 8 Jul 2023 12:40:36 +0800	[thread overview]
Message-ID: <0e3a7331-bf91-5d3b-0f05-634f1cbb60d5@intel.com> (raw)
In-Reply-To: <CAOUHufbjiW00jd_DWanfW0ps1o8KstZYbvkbYcJia1L=jVojMw@mail.gmail.com>



On 7/8/2023 12:35 PM, Yu Zhao wrote:
> On Fri, Jul 7, 2023 at 10:02 PM Matthew Wilcox <willy@infradead.org> wrote:
>>
>> On Sat, Jul 08, 2023 at 11:52:23AM +0800, Yin, Fengwei wrote:
>>>> Oh, I agree, there are always going to be circumstances where we realise
>>>> we've made a bad decision and can't (easily) undo it.  Unless we have a
>>>> per-page pincount, and I Would Rather Not Do That.  But we should _try_
>>>> to do that because it's the right model -- that's what I meant by "Tell
>>>> me why I'm wrong"; what scenarios do we have where a user temporarilly
>>>> mlocks (or mprotects or ...) a range of memory, but wants that memory
>>>> to be aged in the LRU exactly the same way as the adjacent memory that
>>>> wasn't mprotected?
>>> for manpage of mlock():
>>>        mlock(),  mlock2(), and mlockall() lock part or all of the calling process's virtual address space into RAM, preventing that memory
>>>        from being paged to the swap area.
>>>
>>> So my understanding is it's OK to let the memory mlocked to be aged with
>>> the adjacent memory which is not mlocked. Just make sure they are not
>>> paged out to swap.
>>
>> Right, it doesn't break anything; it's just a similar problem to
>> internal fragmentation.  The pages of the folio which aren't mlocked
>> will also be locked in RAM and never paged out.
> 
> I don't think this is the case: since partially locking a
> non-pmd-mappable large folio is a nop, it remains on one of the
> evictable LRUs. The rmap walk by folio_referenced() should already be
> able to find the VMA and the PTEs mapping the unlocked portion. So the
> page reclaim should be able to correctly age the unlocked portion even
> though the folio contains a locked portion too. And when it tries to
> reclaim the entire folio, it first tries to split it into a list of
> base folios in shrink_folio_list(), and if that succeeds, it walks the
> rmap of each base folio on that list to unmap (not age). Unmapping
> doesn't have TTU_IGNORE_MLOCK, so it should correctly call
> mlock_vma_folio() on the locked base folios and bail out. And finally
> those locked base folios are put back to the unevictable list.
Yes. This is exact my understanding to this too. It's also why this
patchset keep large folio cross VMA boundary munlocked.


Regards
Yin, Fengwei

> 
>>> One question for implementation detail:
>>>   If the large folio cross VMA boundary can not be split, how do we
>>>   deal with this case? Retry in syscall till it's split successfully?
>>>   Or return error (and what ERRORS should we choose) to user space?
>>
>> I would be tempted to allocate memory & copy to the new mlocked VMA.
>> The old folio will go on the deferred_list and be split later, or its
>> valid parts will be written to swap and then it can be freed.

next prev parent reply	other threads:[~2023-07-08  4:40 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-07 16:52 Yin Fengwei
2023-07-07 16:52 ` [RFC PATCH 1/3] mm: add function folio_in_range() Yin Fengwei
2023-07-08  5:47   ` Yu Zhao
2023-07-08  6:44     ` Yin, Fengwei
2023-07-07 16:52 ` [RFC PATCH 2/3] mm: handle large folio when large folio in VM_LOCKED VMA range Yin Fengwei
2023-07-08  5:11   ` Yu Zhao
2023-07-08  5:33     ` Yin, Fengwei
2023-07-08  5:56       ` Yu Zhao
2023-07-07 16:52 ` [RFC PATCH 3/3] mm: mlock: update mlock_pte_range to handle large folio Yin Fengwei
2023-07-07 17:26 ` [RFC PATCH 0/3] support large folio for mlock Matthew Wilcox
2023-07-07 18:54   ` David Hildenbrand
2023-07-07 19:06     ` Matthew Wilcox
2023-07-07 19:15       ` David Hildenbrand
2023-07-07 19:26         ` Matthew Wilcox
2023-07-10 10:36           ` Ryan Roberts
2023-07-08  3:52       ` Yin, Fengwei
2023-07-08  4:02         ` Matthew Wilcox
2023-07-08  4:35           ` Yu Zhao
2023-07-08  4:40             ` Yin, Fengwei [this message]
2023-07-08  4:36           ` Yin, Fengwei
2023-07-09 13:25           ` Yin, Fengwei
2023-07-10  9:32             ` David Hildenbrand
2023-07-10  9:43               ` Yin, Fengwei
2023-07-10  9:57                 ` David Hildenbrand
2023-07-10 10:19                   ` Yin, Fengwei
2023-07-08  3:34     ` Yin, Fengwei
2023-07-08  3:31   ` Yin, Fengwei
2023-07-08  4:45 ` Yu Zhao
2023-07-08  5:01   ` Yin, Fengwei
2023-07-08  5:06     ` Yu Zhao
2023-07-08  5:35       ` Yin, Fengwei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0e3a7331-bf91-5d3b-0f05-634f1cbb60d5@intel.com \
    --to=fengwei.yin@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox