From: Ryan Roberts <ryan.roberts@arm.com>
To: Baolin Wang <baolin.wang@linux.alibaba.com>,
akpm@linux-foundation.org, hughd@google.com
Cc: willy@infradead.org, david@redhat.com,
wangkefeng.wang@huawei.com, 21cnbao@gmail.com,
ying.huang@intel.com, shy828301@gmail.com, ziy@nvidia.com,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 1/5] mm: memory: extend finish_fault() to support large folio
Date: Wed, 24 Apr 2024 09:07:02 +0100 [thread overview]
Message-ID: <6c418d70-a75d-4019-a0f5-56a61002d37a@arm.com> (raw)
In-Reply-To: <c48ae381-f073-4b20-84ae-bd5e9e56ce29@linux.alibaba.com>
On 24/04/2024 04:23, Baolin Wang wrote:
>
>
> On 2024/4/23 19:03, Ryan Roberts wrote:
>> On 22/04/2024 08:02, Baolin Wang wrote:
>>> Add large folio mapping establishment support for finish_fault() as a
>>> preparation,
>>> to support multi-size THP allocation of anonymous shared pages in the following
>>> patches.
>>>
>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> ---
>>> mm/memory.c | 25 ++++++++++++++++++-------
>>> 1 file changed, 18 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index b6fa5146b260..094a76730776 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -4766,7 +4766,10 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
>>> {
>>> struct vm_area_struct *vma = vmf->vma;
>>> struct page *page;
>>> + struct folio *folio;
>>> vm_fault_t ret;
>>> + int nr_pages, i;
>>> + unsigned long addr;
>>> /* Did we COW the page? */
>>> if ((vmf->flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED))
>>> @@ -4797,22 +4800,30 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
>>> return VM_FAULT_OOM;
>>> }
>>> + folio = page_folio(page);
>>> + nr_pages = folio_nr_pages(folio);
>>> + addr = ALIGN_DOWN(vmf->address, nr_pages * PAGE_SIZE);
>>
>> I'm not sure this is safe. IIUC, finish_fault() is called for any file-backed
>> mapping. So you could have a situation where part of a (regular) file is mapped
>> in the process, faults and hits in the pagecache. But the folio returned by the
>> pagecache is bigger than the portion that the process has mapped. So you now end
>> up mapping beyond the VMA limits? In the pagecache case, you also can't assume
>> that the folio is naturally aligned in virtual address space.
>
> Good point. Yes, I think you are right, I need consider the VMA limits, and I
> should refer to the calculations of the start pte and end pte in do_fault_around().
You might also need to be careful not to increase reported RSS. I have a vague
recollection that David once mentioned a problem with fault-around because it
causes the reported RSS to increase for the process and this could lead to
different decisions in other places. IIRC Redhat had an advisory somewhere with
suggested workaround being to disable fault-around. For the anon-shared memory
case, it shouldn't be a problem because the user has opted into allocating
bigger blocks, but there may be a need to ensure we don't also start eagerly
mapping regular files beyond what fault-around is configured for.
>
>>> vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
>>> - vmf->address, &vmf->ptl);
>>> + addr, &vmf->ptl);
>>> if (!vmf->pte)
>>> return VM_FAULT_NOPAGE;
>>> /* Re-check under ptl */
>>> - if (likely(!vmf_pte_changed(vmf))) {
>>> - struct folio *folio = page_folio(page);
>>> -
>>> - set_pte_range(vmf, folio, page, 1, vmf->address);
>>> - ret = 0;
>>> - } else {
>>> + if (nr_pages == 1 && vmf_pte_changed(vmf)) {
>>> update_mmu_tlb(vma, vmf->address, vmf->pte);
>>> ret = VM_FAULT_NOPAGE;
>>> + goto unlock;
>>> + } else if (nr_pages > 1 && !pte_range_none(vmf->pte, nr_pages)) {
>>
>> I think you have grabbed this from do_anonymous_page()? But I'm not sure it
>> works in the same way here as it does there. For the anon case, if userfaultfd
>> is armed, alloc_anon_folio() will only ever allocate order-0. So we end up in
>
> IMO, the userfaultfd validation should do in the vma->vm_ops->fault() callback,
> to make sure the nr_pages is always 1 if userfaultfd is armed.
OK. Are you saying there is already logic to do that today? Great!
>
>> the vmf_pte_changed() path, which will allow overwriting a uffd entry. But here,
>> there is nothing stopping nr_pages being greater than 1 when there could be a
>> uffd entry present, and you will fail due to the pte_range_none() check. (see
>> pte_marker_handle_uffd_wp()).
>
> So if we do the userfaultfd validation in ->fault() callback, then here we can
> use the same logic as with anonymous case.
next prev parent reply other threads:[~2024-04-24 8:07 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-22 7:02 [RFC PATCH 0/5] add mTHP support for anonymous share pages Baolin Wang
2024-04-22 7:02 ` [RFC PATCH 1/5] mm: memory: extend finish_fault() to support large folio Baolin Wang
2024-04-23 8:39 ` Lance Yang
2024-04-25 7:04 ` Baolin Wang
2024-04-23 11:03 ` Ryan Roberts
2024-04-24 3:23 ` Baolin Wang
2024-04-24 8:07 ` Ryan Roberts [this message]
2024-04-24 9:26 ` Baolin Wang
2024-04-24 9:57 ` Ryan Roberts
2024-04-22 7:02 ` [RFC PATCH 2/5] mm: shmem: add an 'order' parameter for shmem_alloc_hugefolio() Baolin Wang
2024-04-24 6:28 ` Kefeng Wang
2024-04-24 6:55 ` Baolin Wang
2024-04-22 7:02 ` [RFC PATCH 3/5] mm: shmem: add THP validation for PMD-mapped THP related statistics Baolin Wang
2024-04-23 1:13 ` Barry Song
2024-04-22 7:02 ` [RFC PATCH 4/5] mm: shmem: add mTHP support for anonymous share pages Baolin Wang
2024-04-22 7:02 ` [RFC PATCH 5/5] mm: shmem: add anonymous share mTHP counters Baolin Wang
2024-04-23 1:17 ` Barry Song
2024-04-23 1:46 ` Baolin Wang
2024-04-23 11:39 ` Ryan Roberts
2024-04-24 3:48 ` Baolin Wang
2024-04-23 9:45 ` Lance Yang
2024-04-23 11:22 ` Lance Yang
2024-04-24 3:49 ` Baolin Wang
2024-04-23 11:37 ` David Hildenbrand
2024-04-24 6:10 ` Baolin Wang
2024-04-24 7:11 ` David Hildenbrand
2024-04-24 8:15 ` Ryan Roberts
2024-04-24 9:31 ` Baolin Wang
2024-04-23 10:41 ` [RFC PATCH 0/5] add mTHP support for anonymous share pages Ryan Roberts
2024-04-23 11:05 ` David Hildenbrand
2024-04-24 6:55 ` Baolin Wang
2024-04-24 8:26 ` Ryan Roberts
2024-04-24 9:55 ` Baolin Wang
2024-04-24 10:01 ` Ryan Roberts
2024-04-24 13:49 ` Baolin Wang
2024-04-24 14:20 ` Ryan Roberts
2024-04-25 6:20 ` Baolin Wang
2024-04-25 8:17 ` Ryan Roberts
2024-04-25 8:26 ` David Hildenbrand
2024-04-25 8:46 ` Ryan Roberts
2024-04-25 8:57 ` David Hildenbrand
2024-04-25 9:05 ` Baolin Wang
2024-04-25 9:20 ` David Hildenbrand
2024-04-25 9:50 ` Baolin Wang
2024-04-25 10:17 ` Ryan Roberts
2024-04-25 10:19 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6c418d70-a75d-4019-a0f5-56a61002d37a@arm.com \
--to=ryan.roberts@arm.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@redhat.com \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=shy828301@gmail.com \
--cc=wangkefeng.wang@huawei.com \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox