Re: [PATCH v1 03/10] mm: Introduce try_vma_alloc_movable_folio()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Ryan Roberts <ryan.roberts@arm.com>
To: Yu Zhao <yuzhao@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Yin Fengwei <fengwei.yin@intel.com>,
	David Hildenbrand <david@redhat.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Sven Schnelle <svens@linux.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-alpha@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-ia64@vger.kernel.org,
	linux-m68k@lists.linux-m68k.org, linux-s390@vger.kernel.org
Subject: Re: [PATCH v1 03/10] mm: Introduce try_vma_alloc_movable_folio()
Date: Tue, 27 Jun 2023 08:56:57 +0100	[thread overview]
Message-ID: <ba282a84-1a0d-4ffd-0b22-ac9510a820ef@arm.com> (raw)
In-Reply-To: <CAOUHufZeFTjzO6nSFz7Y=5rBGPzY+_eeN3f8W+g0u6AqosdmuQ@mail.gmail.com>

On 27/06/2023 06:29, Yu Zhao wrote:
> On Mon, Jun 26, 2023 at 8:34 PM Yu Zhao <yuzhao@google.com> wrote:
>>
>> On Mon, Jun 26, 2023 at 11:14 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>
>>> Opportunistically attempt to allocate high-order folios in highmem,
>>> optionally zeroed. Retry with lower orders all the way to order-0, until
>>> success. Although, of note, order-1 allocations are skipped since a
>>> large folio must be at least order-2 to work with the THP machinery. The
>>> user must check what they got with folio_order().
>>>
>>> This will be used to oportunistically allocate large folios for
>>> anonymous memory with a sensible fallback under memory pressure.
>>>
>>> For attempts to allocate non-0 orders, we set __GFP_NORETRY to prevent
>>> high latency due to reclaim, instead preferring to just try for a lower
>>> order. The same approach is used by the readahead code when allocating
>>> large folios.
>>>
>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>> ---
>>>  mm/memory.c | 33 +++++++++++++++++++++++++++++++++
>>>  1 file changed, 33 insertions(+)
>>>
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index 367bbbb29d91..53896d46e686 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -3001,6 +3001,39 @@ static vm_fault_t fault_dirty_shared_page(struct vm_fault *vmf)
>>>         return 0;
>>>  }
>>>
>>> +static inline struct folio *vma_alloc_movable_folio(struct vm_area_struct *vma,
>>> +                               unsigned long vaddr, int order, bool zeroed)
>>> +{
>>> +       gfp_t gfp = order > 0 ? __GFP_NORETRY | __GFP_NOWARN : 0;
>>> +
>>> +       if (zeroed)
>>> +               return vma_alloc_zeroed_movable_folio(vma, vaddr, gfp, order);
>>> +       else
>>> +               return vma_alloc_folio(GFP_HIGHUSER_MOVABLE | gfp, order, vma,
>>> +                                                               vaddr, false);
>>> +}
>>> +
>>> +/*
>>> + * Opportunistically attempt to allocate high-order folios, retrying with lower
>>> + * orders all the way to order-0, until success. order-1 allocations are skipped
>>> + * since a folio must be at least order-2 to work with the THP machinery. The
>>> + * user must check what they got with folio_order(). vaddr can be any virtual
>>> + * address that will be mapped by the allocated folio.
>>> + */
>>> +static struct folio *try_vma_alloc_movable_folio(struct vm_area_struct *vma,
>>> +                               unsigned long vaddr, int order, bool zeroed)
>>> +{
>>> +       struct folio *folio;
>>> +
>>> +       for (; order > 1; order--) {
>>> +               folio = vma_alloc_movable_folio(vma, vaddr, order, zeroed);
>>> +               if (folio)
>>> +                       return folio;
>>> +       }
>>> +
>>> +       return vma_alloc_movable_folio(vma, vaddr, 0, zeroed);
>>> +}
>>
>> I'd drop this patch. Instead, in do_anonymous_page():
>>
>>   if (IS_ENABLED(CONFIG_ARCH_WANTS_PTE_ORDER))
>>     folio = vma_alloc_zeroed_movable_folio(vma, addr,
>> CONFIG_ARCH_WANTS_PTE_ORDER))
>>
>>   if (!folio)
>>     folio = vma_alloc_zeroed_movable_folio(vma, addr, 0);
> 
> I meant a runtime function arch_wants_pte_order() (Its default
> implementation would return 0.)

There are a bunch of things which you are implying here which I'll try to make
explicit:

I think you are implying that we shouldn't retry allocation with intermediate
orders; but only try the order requested by the arch (arch_wants_pte_order())
and 0. Correct? For arm64 at least, I would like the VMA's THP hint to be a
factor in determining the preferred order (see patches 8 and 9). So I would add
a vma parameter to arch_wants_pte_order() to allow for this.

For the case where the THP hint is present, then the arch will request 2M (if
the page size is 16K or 64K). If that fails to allocate, there is still value in
allocating a 64K folio (which is order 2 in the 16K case). Without the retry
with intermediate orders logic, we would not get this.

We can't just blindly allocate a folio of arch_wants_pte_order() size because it
might overlap with existing populated PTEs, or cross the bounds of the VMA (or a
number of other things - see calc_anon_folio_order_alloc() in patch 10). Are you
implying that if there is any kind of issue like this, then we should go
directly to order 0? I can kind of see the argument from a minimizing
fragmentation perspective, but for best possible performance I think we are
better off "packing the bin" with intermediate orders.

You're also implying that a runtime arch_wants_pte_order() function is better
than the Kconfig stuff I did in patch 8. On reflection, I agree with you here. I
think you mentioned that AMD supports coalescing 8 pages on some CPUs - so you
would probably want runtime logic to determine if you are on an appropriate AMD
CPU as part of the decision in that function?

The real reason for the existance of try_vma_alloc_movable_folio() is that I'm
reusing it on the other fault paths (which are no longer part of this series).
But I guess that's not a good reason to keep this until we get to those patches.

next prev parent reply	other threads:[~2023-06-27  7:57 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-26 17:14 [PATCH v1 00/10] variable-order, large folios for anonymous memory Ryan Roberts
2023-06-26 17:14 ` [PATCH v1 01/10] mm: Expose clear_huge_page() unconditionally Ryan Roberts
2023-06-27  1:55   ` Yu Zhao
2023-06-27  7:21     ` Ryan Roberts
2023-06-27  8:29       ` Yu Zhao
2023-06-27  9:41         ` Ryan Roberts
2023-06-27 18:26           ` Yu Zhao
2023-06-28 10:56             ` Ryan Roberts
2023-06-26 17:14 ` [PATCH v1 02/10] mm: pass gfp flags and order to vma_alloc_zeroed_movable_folio() Ryan Roberts
2023-06-27  2:27   ` Yu Zhao
2023-06-27  7:27     ` Ryan Roberts
2023-06-26 17:14 ` [PATCH v1 03/10] mm: Introduce try_vma_alloc_movable_folio() Ryan Roberts
2023-06-27  2:34   ` Yu Zhao
2023-06-27  5:29     ` Yu Zhao
2023-06-27  7:56       ` Ryan Roberts [this message]
2023-06-28  2:32         ` Yin Fengwei
2023-06-28 11:06           ` Ryan Roberts
2023-06-26 17:14 ` [PATCH v1 04/10] mm: Implement folio_add_new_anon_rmap_range() Ryan Roberts
2023-06-27  7:08   ` Yu Zhao
2023-06-27  8:09     ` Ryan Roberts
2023-06-28  2:20       ` Yin Fengwei
2023-06-28 11:09         ` Ryan Roberts
2023-06-28  2:17     ` Yin Fengwei
2023-06-26 17:14 ` [PATCH v1 05/10] mm: Implement folio_remove_rmap_range() Ryan Roberts
2023-06-27  3:06   ` Yu Zhao
2023-06-26 17:14 ` [PATCH v1 06/10] mm: Allow deferred splitting of arbitrary large anon folios Ryan Roberts
2023-06-27  2:54   ` Yu Zhao
2023-06-28  2:43   ` Yin Fengwei
2023-06-26 17:14 ` [PATCH v1 07/10] mm: Batch-zap large anonymous folio PTE mappings Ryan Roberts
2023-06-27  3:04   ` Yu Zhao
2023-06-27  9:46     ` Ryan Roberts
2023-06-26 17:14 ` [PATCH v1 08/10] mm: Kconfig hooks to determine max anon folio allocation order Ryan Roberts
2023-06-27  2:47   ` Yu Zhao
2023-06-27  9:54     ` Ryan Roberts
2023-06-29  1:38   ` Yang Shi
2023-06-29 11:31     ` Ryan Roberts
2023-06-26 17:14 ` [PATCH v1 09/10] arm64: mm: Declare support for large anonymous folios Ryan Roberts
2023-06-27  2:53   ` Yu Zhao
2023-06-26 17:14 ` [PATCH v1 10/10] mm: Allocate large folios for anonymous memory Ryan Roberts
2023-06-27  3:01   ` Yu Zhao
2023-06-27  9:57     ` Ryan Roberts
2023-06-27 18:33       ` Yu Zhao
2023-06-29  2:13   ` Yang Shi
2023-06-29 11:30     ` Ryan Roberts
2023-06-29 17:05       ` Yang Shi
2023-06-27  3:30 ` [PATCH v1 00/10] variable-order, " Yu Zhao
2023-06-27  7:49   ` Yu Zhao
2023-06-27  9:59     ` Ryan Roberts
2023-06-28 18:22       ` Yu Zhao
2023-06-28 23:59         ` Yin Fengwei
2023-06-29  0:27           ` Yu Zhao
2023-06-29  0:31             ` Yin Fengwei
2023-06-29 15:28         ` Ryan Roberts
2023-06-29  2:21     ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ba282a84-1a0d-4ffd-0b22-ac9510a820ef@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=borntraeger@linux.ibm.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=fengwei.yin@intel.com \
    --cc=geert@linux-m68k.org \
    --cc=hpa@zytor.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-alpha@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-m68k@lists.linux-m68k.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=svens@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox