Re: [RFC PATCH 0/3] support large folio for mlock

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Yin, Fengwei" <fengwei.yin@intel.com>
To: David Hildenbrand <david@redhat.com>,
	"Yin, Fengwei" <fengwei.yin@intel.com>,
	Matthew Wilcox <willy@infradead.org>
Cc: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
	<yuzhao@google.com>, <ryan.roberts@arm.com>,
	<shy828301@gmail.com>, <akpm@linux-foundation.org>
Subject: Re: [RFC PATCH 0/3] support large folio for mlock
Date: Mon, 10 Jul 2023 18:19:37 +0800	[thread overview]
Message-ID: <af1ee9a7-2c6f-0450-e44e-59e5eeb50d6b@intel.com> (raw)
In-Reply-To: <967ccf33-0982-6042-e4ce-b0c859b4c3b1@redhat.com>



On 7/10/2023 5:57 PM, David Hildenbrand wrote:
> On 10.07.23 11:43, Yin, Fengwei wrote:
>> Hi David,
>>
>> On 7/10/2023 5:32 PM, David Hildenbrand wrote:
>>> On 09.07.23 15:25, Yin, Fengwei wrote:
>>>>
>>>>
>>>> On 7/8/2023 12:02 PM, Matthew Wilcox wrote:
>>>>> I would be tempted to allocate memory & copy to the new mlocked VMA.
>>>>> The old folio will go on the deferred_list and be split later, or its
>>>>> valid parts will be written to swap and then it can be freed.
>>>> If the large folio splitting failure is because of GUP pages, can we
>>>> do copy here?
>>>>
>>>> Let's say, if the GUP page is target of DMA operation and DMA operation
>>>> is ongoing. We allocated a new page and copy GUP page content to the
>>>> new page, the data in the new page can be corrupted.
>>>
>>> No, we may only replace anon pages that are flagged as maybe shared (!PageAnonExclusive). We must not replace pages that are exclusive (PageAnonExclusive) unless we first try marking them maybe shared. Clearing will fail if the page maybe pinned.
>> Thanks a lot for clarification.
>>
>> So my understanding is that if large folio splitting fails, it's not always
>> true that we can allocate new folios, copy original large folio content to
>> new folios, remove original large folio from VMA and map the new folios to
>> VMA (like it's only true if original large folio is marked as maybe shared).
>>
> 
> While it might work in many cases, there are some corner cases where it won't work.
> 
> So to summarize
> 
> (1) THP are transparent and should not result in arbitrary syscall
>     failures.
> (2) Splitting a THP might fail at random points in time either due to
>     GUP pins or due to speculative page references (including
>     speculative GUP pins).
> (3) Replacing an exclusive anon page that maybe pinned will result in
>     memory corruptions.
> 
> So we can try to split any THP that crosses VMA borders on VMA modifications (split due to munmap, mremap, madvise, mprotect, mlock, ...), it's not guaranteed to work due to (1). And we can try to replace pages such pages, but it's not guaranteed to be allowed due to (3).
> 
> And as it's all transparent, we cannot fail (1).
Very clear to me now.

> 
> For the other cases that Willy and I discussed (split on VMA modifications after fork()), we can at least always replace the anon page.
> 
> <details>
> 
> What always works, is putting the THP on the deferred split queue to see if we can split it later. The deferred split queue is a bit suboptimal right now, because it requires the (sub)page mapcounts to detect whether the folio is partially mapped vs. fully mapped. If we want to get rid of that, we have to come up with something reasonable.
> 
> I was wondering if we could have a an optimized deferred split queue, that only conditionally splits: do an rmap walk and detect if (a) each page of the folio is still mapped (b) the folio does not cross a VMA. If both are met, one could skip the deferred split. But that needs a bit of thought -- but we're already doing an rmap walk when splitting, so scanning which parts are actually mapped does not sound too weird.
> 
> </details>
> 
Thanks a lot for extra information which help me to know more background.
Really appreciate it.


Regards
Yin, Fengwei

next prev parent reply	other threads:[~2023-07-10 10:20 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-07 16:52 Yin Fengwei
2023-07-07 16:52 ` [RFC PATCH 1/3] mm: add function folio_in_range() Yin Fengwei
2023-07-08  5:47   ` Yu Zhao
2023-07-08  6:44     ` Yin, Fengwei
2023-07-07 16:52 ` [RFC PATCH 2/3] mm: handle large folio when large folio in VM_LOCKED VMA range Yin Fengwei
2023-07-08  5:11   ` Yu Zhao
2023-07-08  5:33     ` Yin, Fengwei
2023-07-08  5:56       ` Yu Zhao
2023-07-07 16:52 ` [RFC PATCH 3/3] mm: mlock: update mlock_pte_range to handle large folio Yin Fengwei
2023-07-07 17:26 ` [RFC PATCH 0/3] support large folio for mlock Matthew Wilcox
2023-07-07 18:54   ` David Hildenbrand
2023-07-07 19:06     ` Matthew Wilcox
2023-07-07 19:15       ` David Hildenbrand
2023-07-07 19:26         ` Matthew Wilcox
2023-07-10 10:36           ` Ryan Roberts
2023-07-08  3:52       ` Yin, Fengwei
2023-07-08  4:02         ` Matthew Wilcox
2023-07-08  4:35           ` Yu Zhao
2023-07-08  4:40             ` Yin, Fengwei
2023-07-08  4:36           ` Yin, Fengwei
2023-07-09 13:25           ` Yin, Fengwei
2023-07-10  9:32             ` David Hildenbrand
2023-07-10  9:43               ` Yin, Fengwei
2023-07-10  9:57                 ` David Hildenbrand
2023-07-10 10:19                   ` Yin, Fengwei [this message]
2023-07-08  3:34     ` Yin, Fengwei
2023-07-08  3:31   ` Yin, Fengwei
2023-07-08  4:45 ` Yu Zhao
2023-07-08  5:01   ` Yin, Fengwei
2023-07-08  5:06     ` Yu Zhao
2023-07-08  5:35       ` Yin, Fengwei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=af1ee9a7-2c6f-0450-e44e-59e5eeb50d6b@intel.com \
    --to=fengwei.yin@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox