linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Kefeng Wang <wangkefeng.wang@huawei.com>
To: Baolin Wang <baolin.wang@linux.alibaba.com>,
	Ryan Roberts <ryan.roberts@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>, Ard Biesheuvel <ardb@kernel.org>,
	Marc Zyngier <maz@kernel.org>, James Morse <james.morse@arm.com>,
	Andrey Ryabinin <ryabinin.a.a@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Matthew Wilcox <willy@infradead.org>,
	Mark Rutland <mark.rutland@arm.com>,
	David Hildenbrand <david@redhat.com>,
	John Hubbard <jhubbard@nvidia.com>, Zi Yan <ziy@nvidia.com>,
	Barry Song <21cnbao@gmail.com>,
	Alistair Popple <apopple@nvidia.com>,
	Yang Shi <shy828301@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Yin, Fengwei" <fengwei.yin@intel.com>
Cc: <linux-arm-kernel@lists.infradead.org>, <x86@kernel.org>,
	<linuxppc-dev@lists.ozlabs.org>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v6 18/18] arm64/mm: Automatically fold contpte mappings
Date: Tue, 25 Jun 2024 20:23:20 +0800	[thread overview]
Message-ID: <99aa61b6-afc9-445f-8f50-1e017450efd1@huawei.com> (raw)
In-Reply-To: <b75aa60d-e058-4b5c-877a-9c0cd295e96f@linux.alibaba.com>



On 2024/6/25 15:23, Baolin Wang wrote:
> 
> 
> On 2024/6/25 11:16, Kefeng Wang wrote:
>>
>>
>> On 2024/6/24 23:56, Ryan Roberts wrote:
>>> + Baolin Wang and Yin Fengwei, who maybe able to help with this.
>>>
>>>
>>> Hi Kefeng,
>>>
>>> Thanks for the report!
>>>
>>>
>>> On 24/06/2024 15:30, Kefeng Wang wrote:
>>>> Hi Ryan,
>>>>
>>>> A big regression on page-fault3("Separate file shared mapping page
>>>> fault") testcase from will-it-scale on arm64, no issue on x86,
>>>>
>>>> ./page_fault3_processes -t 128 -s 5
>>>
>>> I see that this program is mkstmp'ing a file at 
>>> "/tmp/willitscale.XXXXXX". Based
>>> on your description, I'm inferring that /tmp is backed by ext4 with 
>>> your large
>>> folio patches enabled?
>>
>> Yes, mount /tmp by ext4, sorry to forget to mention that.
>>
>>>
>>>>
>>>> 1) large folio disabled on ext4:
>>>>     92378735
>>>> 2) large folio  enabled on ext4 +  CONTPTE enabled
>>>>     16164943
>>>> 3) large folio  enabled on ext4 +  CONTPTE disabled
>>>>     80364074
>>>> 4) large folio  enabled on ext4 +  CONTPTE enabled + large folio 
>>>> mapping enabled
>>>> in finish_fault()[2]
>>>>     299656874
>>>>
>>>> We found *contpte_convert* consume lots of CPU(76%) in case 2),
>>>
>>> contpte_convert() is expensive and to be avoided; In this case I 
>>> expect it is
>>> repainting the PTEs with the PTE_CONT bit added in, and to do that it 
>>> needs to
>>> invalidate the tlb for the virtual range. The code is there to mop up 
>>> user space
>>> patterns where each page in a range is temporarily made RO, then 
>>> later changed
>>> back. In this case, we want to re-fold the contpte range once all 
>>> pages have
>>> been serviced in RO mode.
>>>
>>> Of course this path is only intended as a fallback, and the more 
>>> optimium
>>> approach is to set_ptes() the whole folio in one go where possible - 
>>> kind of
>>> what you are doing below.
>>>
>>>> and disappeared
>>>> by following change[2], it is easy to understood the different 
>>>> between case 2)
>>>> and case 4) since case 2) always map one page
>>>> size, but always try to fold contpte mappings, which spend a lot of
>>>> time. Case 4) is a workaround, any other better suggestion?
>>>
>>> See below.
>>>
>>>>
>>>> Thanks.
>>>>
>>>> [1] https://github.com/antonblanchard/will-it-scale
>>>> [2] enable large folio mapping in finish_fault()
>>>>
>>>> diff --git a/mm/memory.c b/mm/memory.c
>>>> index 00728ea95583..5623a8ce3a1e 100644
>>>> --- a/mm/memory.c
>>>> +++ b/mm/memory.c
>>>> @@ -4880,7 +4880,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
>>>>           * approach also applies to non-anonymous-shmem faults to 
>>>> avoid
>>>>           * inflating the RSS of the process.
>>>>           */
>>>> -       if (!vma_is_anon_shmem(vma) || 
>>>> unlikely(userfaultfd_armed(vma))) {
>>>> +       if (unlikely(userfaultfd_armed(vma))) {
>>>
>>> The change to make finish_fault() handle multiple pages in one go are 
>>> new; added
>>> by Baolin Wang at [1]. That extra conditional that you have removed 
>>> is there to
>>> prevent RSS reporting bloat. See discussion that starts at [2].
>>>
>>> Anyway, it was my vague understanding that the fault around mechanism
>>> (do_fault_around()) would ensure that (by default) 64K worth of pages 
>>> get mapped
>>> together in a single set_ptes() call, via filemap_map_pages() ->
>>> filemap_map_folio_range(). Looking at the code, I guess fault around 
>>> only
>>> applies to read faults. This test is doing a write fault.
>>>
>>> I guess we need to do a change a bit like what you have done, but 
>>> also taking
>>> into account fault_around configuration?
> 
> For the writable mmap() of tmpfs, we will use mTHP interface to control 
> the size of folio to allocate, as discussed in previous meeting [1], so 
> I don't think fault_around configuration will be helpful for tmpfs.

Yes, tmpfs is different from ext4.

> 
> For other filesystems, like ext4, I did not found the logic to determin 
> what size of folio to allocate in writable mmap() path (Kefeng, please 
> correct me if I missed something). If there is a control like mTHP, we 
> can rely on that instead of 'fault_around'?

For ext4 or most filesystems, the folio is allocated from filemap_fault(),
we don't have explicit interface like mTHP to control the folio size.


> 
> [1] 
> https://lore.kernel.org/all/f1783ff0-65bd-4b2b-8952-52b6822a0835@redhat.com/
> 
>> Yes, the current changes is not enough, I hint some issue and still 
>> debugging, so our direction is trying to map large folio for 
>> do_shared_fault(), right?
> 
> I think this is the right direction to do. I add this 
> '!vma_is_anon_shmem(vma)' conditon to gradually implement support for 
> large folio mapping buidling, especially for writable mmap() support in 
> tmpfs.
> 
>>> [1]
>>> https://lore.kernel.org/all/3a190892355989d42f59cf9f2f98b94694b0d24d.1718090413.git.baolin.wang@linux.alibaba.com/
>>> [2] 
>>> https://lore.kernel.org/linux-mm/13939ade-a99a-4075-8a26-9be7576b7e03@arm.com/


  parent reply	other threads:[~2024-06-25 12:23 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-15 10:31 [PATCH v6 00/18] Transparent Contiguous PTEs for User Mappings Ryan Roberts
2024-02-15 10:31 ` [PATCH v6 01/18] mm: Clarify the spec for set_ptes() Ryan Roberts
2024-02-15 10:31 ` [PATCH v6 02/18] mm: thp: Batch-collapse PMD with set_ptes() Ryan Roberts
2024-02-15 10:31 ` [PATCH v6 03/18] mm: Introduce pte_advance_pfn() and use for pte_next_pfn() Ryan Roberts
2024-02-15 10:40   ` David Hildenbrand
2024-02-15 10:31 ` [PATCH v6 04/18] arm64/mm: Convert pte_next_pfn() to pte_advance_pfn() Ryan Roberts
2024-02-15 10:42   ` David Hildenbrand
2024-02-15 11:17   ` Mark Rutland
2024-02-15 18:27   ` Catalin Marinas
2024-02-15 10:31 ` [PATCH v6 05/18] x86/mm: " Ryan Roberts
2024-02-15 10:43   ` David Hildenbrand
2024-02-15 10:31 ` [PATCH v6 06/18] mm: Tidy up pte_next_pfn() definition Ryan Roberts
2024-02-15 10:43   ` David Hildenbrand
2024-02-15 10:31 ` [PATCH v6 07/18] arm64/mm: Convert READ_ONCE(*ptep) to ptep_get(ptep) Ryan Roberts
2024-02-15 11:18   ` Mark Rutland
2024-02-15 18:34   ` Catalin Marinas
2024-02-15 10:31 ` [PATCH v6 08/18] arm64/mm: Convert set_pte_at() to set_ptes(..., 1) Ryan Roberts
2024-02-15 11:19   ` Mark Rutland
2024-02-15 18:34   ` Catalin Marinas
2024-02-15 10:31 ` [PATCH v6 09/18] arm64/mm: Convert ptep_clear() to ptep_get_and_clear() Ryan Roberts
2024-02-15 11:20   ` Mark Rutland
2024-02-15 18:35   ` Catalin Marinas
2024-02-15 10:31 ` [PATCH v6 10/18] arm64/mm: New ptep layer to manage contig bit Ryan Roberts
2024-02-15 11:23   ` Mark Rutland
2024-02-15 19:21   ` Catalin Marinas
2024-02-15 10:31 ` [PATCH v6 11/18] arm64/mm: Split __flush_tlb_range() to elide trailing DSB Ryan Roberts
2024-02-15 11:24   ` Mark Rutland
2024-02-15 19:22   ` Catalin Marinas
2024-02-15 10:31 ` [PATCH v6 12/18] arm64/mm: Wire up PTE_CONT for user mappings Ryan Roberts
2024-02-15 11:27   ` Mark Rutland
2024-02-16 12:25   ` Catalin Marinas
2024-02-16 12:53     ` Ryan Roberts
2024-02-16 16:56       ` Catalin Marinas
2024-02-16 19:54         ` John Hubbard
2024-02-20 19:50           ` Ryan Roberts
2024-02-19 15:18       ` Catalin Marinas
2024-02-20 19:58         ` Ryan Roberts
2024-02-15 10:32 ` [PATCH v6 13/18] arm64/mm: Implement new wrprotect_ptes() batch API Ryan Roberts
2024-02-15 11:28   ` Mark Rutland
2024-02-16 12:30   ` Catalin Marinas
2024-02-15 10:32 ` [PATCH v6 14/18] arm64/mm: Implement new [get_and_]clear_full_ptes() batch APIs Ryan Roberts
2024-02-15 11:28   ` Mark Rutland
2024-02-16 12:30   ` Catalin Marinas
2024-02-15 10:32 ` [PATCH v6 15/18] mm: Add pte_batch_hint() to reduce scanning in folio_pte_batch() Ryan Roberts
2024-02-15 10:32 ` [PATCH v6 16/18] arm64/mm: Implement pte_batch_hint() Ryan Roberts
2024-02-16 12:34   ` Catalin Marinas
2024-02-15 10:32 ` [PATCH v6 17/18] arm64/mm: __always_inline to improve fork() perf Ryan Roberts
2024-02-16 12:34   ` Catalin Marinas
2024-02-15 10:32 ` [PATCH v6 18/18] arm64/mm: Automatically fold contpte mappings Ryan Roberts
2024-02-15 11:30   ` Mark Rutland
2024-02-16 12:35   ` Catalin Marinas
2024-06-24 14:30   ` Kefeng Wang
2024-06-24 15:56     ` Ryan Roberts
2024-06-25  3:16       ` Kefeng Wang
2024-06-25  7:23         ` Baolin Wang
2024-06-25 11:40           ` Ryan Roberts
2024-06-25 12:37             ` Baolin Wang
2024-06-25 12:41               ` Ryan Roberts
2024-06-25 13:06                 ` Matthew Wilcox
2024-06-25 13:41                   ` Ryan Roberts
2024-06-25 14:06                     ` Matthew Wilcox
2024-06-25 14:45                       ` Ryan Roberts
2024-06-25 12:23           ` Kefeng Wang [this message]
2024-02-15 11:36 ` [PATCH v6 00/18] Transparent Contiguous PTEs for User Mappings Mark Rutland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=99aa61b6-afc9-445f-8f50-1e017450efd1@huawei.com \
    --to=wangkefeng.wang@huawei.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=ardb@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=fengwei.yin@intel.com \
    --cc=hpa@zytor.com \
    --cc=james.morse@arm.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=mingo@redhat.com \
    --cc=ryabinin.a.a@gmail.com \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox