linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Kalesh Singh <kaleshsingh@google.com>
To: Yang Shi <shy828301@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	kernel-team@android.com, android-mm@google.com,
	 Andrew Morton <akpm@linux-foundation.org>,
	Yang Shi <yang@os.amperecomputing.com>,
	 Rik van Riel <riel@surriel.com>,
	Ryan Roberts <ryan.roberts@arm.com>,
	 Suren Baghdasaryan <surenb@google.com>,
	Minchan Kim <minchan@kernel.org>, Hans Boehm <hboehm@google.com>,
	 Lokesh Gidra <lokeshgidra@google.com>,
	stable@vger.kernel.org,
	 "Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	 Jann Horn <jannh@google.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: Respect mmap hint address when aligning for THP
Date: Mon, 18 Nov 2024 14:04:49 -0800	[thread overview]
Message-ID: <CAC_TJvcNNcgO3z0DfJoSf3KSUXsLkL66NsAvft2vcJSCK2AgEw@mail.gmail.com> (raw)
In-Reply-To: <CAHbLzkqFfduLoWAiP3xwjMg5=JqEsKFMHAxh_ypmvGMBqAMbFg@mail.gmail.com>

On Mon, Nov 18, 2024 at 1:44 PM Yang Shi <shy828301@gmail.com> wrote:
>
> On Mon, Nov 18, 2024 at 9:52 AM Kalesh Singh <kaleshsingh@google.com> wrote:
> >
> > On Mon, Nov 18, 2024 at 9:05 AM Yang Shi <shy828301@gmail.com> wrote:
> > >
> > > On Sun, Nov 17, 2024 at 3:12 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> > > >
> > > > On 11/15/24 23:15, Yang Shi wrote:
> > > > > On Fri, Nov 15, 2024 at 1:52 PM Kalesh Singh <kaleshsingh@google.com> wrote:
> > > > >>
> > > > >> Commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
> > > > >> boundaries") updated __get_unmapped_area() to align the start address
> > > > >> for the VMA to a PMD boundary if CONFIG_TRANSPARENT_HUGEPAGE=y.
> > > > >>
> > > > >> It does this by effectively looking up a region that is of size,
> > > > >> request_size + PMD_SIZE, and aligning up the start to a PMD boundary.
> > > > >>
> > > > >> Commit 4ef9ad19e176 ("mm: huge_memory: don't force huge page alignment
> > > > >> on 32 bit") opted out of this for 32bit due to regressions in mmap base
> > > > >> randomization.
> > > > >>
> > > > >> Commit d4148aeab412 ("mm, mmap: limit THP alignment of anonymous
> > > > >> mappings to PMD-aligned sizes") restricted this to only mmap sizes that
> > > > >> are multiples of the PMD_SIZE due to reported regressions in some
> > > > >> performance benchmarks -- which seemed mostly due to the reduced spatial
> > > > >> locality of related mappings due to the forced PMD-alignment.
> > > > >>
> > > > >> Another unintended side effect has emerged: When a user specifies an mmap
> > > > >> hint address, the THP alignment logic modifies the behavior, potentially
> > > > >> ignoring the hint even if a sufficiently large gap exists at the requested
> > > > >> hint location.
> > > > >>
> > > > >> Example Scenario:
> > > > >>
> > > > >> Consider the following simplified virtual address (VA) space:
> > > > >>
> > > > >>     ...
> > > > >>
> > > > >>     0x200000-0x400000 --- VMA A
> > > > >>     0x400000-0x600000 --- Hole
> > > > >>     0x600000-0x800000 --- VMA B
> > > > >>
> > > > >>     ...
> > > > >>
> > > > >> A call to mmap() with hint=0x400000 and len=0x200000 behaves differently:
> > > > >>
> > > > >>   - Before THP alignment: The requested region (size 0x200000) fits into
> > > > >>     the gap at 0x400000, so the hint is respected.
> > > > >>
> > > > >>   - After alignment: The logic searches for a region of size
> > > > >>     0x400000 (len + PMD_SIZE) starting at 0x400000.
> > > > >>     This search fails due to the mapping at 0x600000 (VMA B), and the hint
> > > > >>     is ignored, falling back to arch_get_unmapped_area[_topdown]().
> > > >
> >
> > Hi all, Thanks for the reviews.
> >
> > > > Hmm looks like the search is not done in the optimal way regardless of
> > > > whether or not it ignores a hint - it should be able to find the hole, no?
> >
> > It's not able to find the hole in the example case because the size we
> > are looking for is now
> > (requested size + padding len) which is larger than the hole we have
> > at the hint address.
> >
> > > >
> > > > >> In general the hint is effectively ignored, if there is any
> > > > >> existing mapping in the below range:
> > > > >>
> > > > >>      [mmap_hint + mmap_size, mmap_hint + mmap_size + PMD_SIZE)
> > > > >>
> > > > >> This changes the semantics of mmap hint; from ""Respect the hint if a
> > > > >> sufficiently large gap exists at the requested location" to "Respect the
> > > > >> hint only if an additional PMD-sized gap exists beyond the requested size".
> > > > >>
> > > > >> This has performance implications for allocators that allocate their heap
> > > > >> using mmap but try to keep it "as contiguous as possible" by using the
> > > > >> end of the exisiting heap as the address hint. With the new behavior
> > > > >> it's more likely to get a much less contiguous heap, adding extra
> > > > >> fragmentation and performance overhead.
> > > > >>
> > > > >> To restore the expected behavior; don't use thp_get_unmapped_area_vmflags()
> > > > >> when the user provided a hint address.
> > > >
> > > > Agreed, the hint should take precendence.
> > > >
> > > > > Thanks for fixing it. I agree we should respect the hint address. But
> > > > > this patch actually just fixed anonymous mapping and the file mappings
> > > > > which don't support thp_get_unmapped_area(). So I think you should
> > > > > move the hint check to __thp_get_unmapped_area().
> > > > >
> > > > > And Vlastimil's fix d4148aeab412 ("mm, mmap: limit THP alignment of
> > > > > anonymous mappings to PMD-aligned sizes") should be moved to there too
> > > > > IMHO.
> > > >
> > > > This was brought up, but I didn't want to do it as part of the stable fix as
> > > > that would change even situations that Rik's change didn't.
> > > > If the mmap hint change is another stable hotfix, I wouldn't conflate it
> > > > either. But we can try it for further development. But careful about just
> > > > moving the code as-is, the file-based mappings are different than anonymous
> > > > memory and I believe file offsets matter:
> > > >
> > > > https://lore.kernel.org/all/9d7c73f6-1e1a-458b-93c6-3b44959022e0@suse.cz/
> > > >
> > > > https://lore.kernel.org/all/5f7a49e8-0416-4648-a704-a7a67e8cd894@suse.cz/
> > >
> >
> > I see, so I think we should keep the check here to address the
> > immediate regression for anonymous mappings.
> >
> > I believe what we would need to address this longer term is to have
> > vma_iter_lowest() [1] vma_iter_highest() [2] take into account the
> > alignment when doing the search. That way we don't need to inflate the
> > search size to facilitate the manual alignment after the fact. I
> > haven't looked too too deeply into this, maybe Liam has some ideas
> > about that?
> >
> > [1] https://github.com/torvalds/linux/blob/v6.12-rc7/mm/vma.h#L420
> >
> > [2] https://github.com/torvalds/linux/blob/v6.12-rc7/mm/vma.h#L426
> >
> > > Did some research about the history of the code, I found this commit:
> > >
> > > 97d3d0f9a1cf ("mm/huge_memory.c: thp: fix conflict of above-47bit hint
> > > address and PMD alignment"), it tried to fix "the function would not
> > > try to allocate PMD-aligned area if *any* hint address specified."
> > > It was for file mapping back then since anonymous mapping THP
> > > alignment was not supported yet.
> > >
> > > But it seems like this patch somehow tried to do something reverse. It
> > > may not be correct either.
> > >
> >
> > IIUC Kirill's patch is doing the right thing (mostly), i.e. it will
> > return the hint address (without any rounding to PMD alignment). The
> > case it doesn't handle is what I mentioned above, where we aren't able
> > to find the hole at the hint address in the first place because the
> > hole is smaller than (size + padding len)
>
> Yes. But my point is your fix (just simply skip PMD alignment when
> hint is specified) actually broke what Kirill fixed IIUC.

Hi Yang,

It's true the PMD alignment is skipped in that case for the anon
mappings. Though I believe that's still what we want to have here
initially as we shouldn't regress the hint behaviour.

I've posted the v2 here:
https://lore.kernel.org/lkml/20241118214650.3667577-1-kaleshsingh@google.com/

>
> >
> > > With Vlastimis's fix, we just try to make the address THP aligned for
> > > anonymous mapping when the size is PMD aligned. So we don't need to
> > > take into account the padding for anonymous mapping anymore.
> > >
> >
> > We still need to have padding length because PMD alignment of the size
> > doesn't mean that the start address returned by the search will be PMD
> > aligned. Inherently those are only PAGE aligned.
>
> Yes, you are right, I overlooked this. I think we can do this in two
> passes. Use len w/o padding in the first pass. If the returned address
> equals the hint or it is already PMD aligned, just return it.
> Otherwise it means there is no hole with suitable size and alignment.
> In the second pass, we redo it with padding. Just off the top of my
> head, others may have better ideas.
>

You are right, it's one way we can do it. Though, I am concerned that
the 2 passes will add overhead on mmap() performance. One idea I have
is to move the alignment handling lower down to
vma_iter_highest()/lowest(). Interested to hear your thoughts on that.
We can do this in a follow up patch, which should fix file mappings as
well.

Thanks,
Kalesh

> >
> > Thanks,
> > Kalesh
> >
> > > So IIUC we should do something like:
> > >
> > > @@ -1085,7 +1085,11 @@ static unsigned long
> > > __thp_get_unmapped_area(struct file *filp,
> > >         if (off_end <= off_align || (off_end - off_align) < size)
> > >                 return 0;
> > >
> > > -       len_pad = len + size;
> > > +       if (filp)
> > > +               len_pad = len + size;
> > > +       else
> > > +               len_pad = len;
> > > +
> > >         if (len_pad < len || (off + len_pad) < off)
> > >                 return 0;
> > >
> > > >
> > > > >> Cc: Andrew Morton <akpm@linux-foundation.org>
> > > > >> Cc: Vlastimil Babka <vbabka@suse.cz>
> > > > >> Cc: Yang Shi <yang@os.amperecomputing.com>
> > > > >> Cc: Rik van Riel <riel@surriel.com>
> > > > >> Cc: Ryan Roberts <ryan.roberts@arm.com>
> > > > >> Cc: Suren Baghdasaryan <surenb@google.com>
> > > > >> Cc: Minchan Kim <minchan@kernel.org>
> > > > >> Cc: Hans Boehm <hboehm@google.com>
> > > > >> Cc: Lokesh Gidra <lokeshgidra@google.com>
> > > > >> Cc: <stable@vger.kernel.org>
> > > > >> Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries")
> > > > >> Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > > >
> > > > Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> > > >
> > > > >> ---
> > > > >>  mm/mmap.c | 1 +
> > > > >>  1 file changed, 1 insertion(+)
> > > > >>
> > > > >> diff --git a/mm/mmap.c b/mm/mmap.c
> > > > >> index 79d541f1502b..2f01f1a8e304 100644
> > > > >> --- a/mm/mmap.c
> > > > >> +++ b/mm/mmap.c
> > > > >> @@ -901,6 +901,7 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> > > > >>         if (get_area) {
> > > > >>                 addr = get_area(file, addr, len, pgoff, flags);
> > > > >>         } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)
> > > > >> +                  && !addr /* no hint */
> > > > >>                    && IS_ALIGNED(len, PMD_SIZE)) {
> > > > >>                 /* Ensures that larger anonymous mappings are THP aligned. */
> > > > >>                 addr = thp_get_unmapped_area_vmflags(file, addr, len,
> > > > >>
> > > > >> base-commit: 2d5404caa8c7bb5c4e0435f94b28834ae5456623
> > > > >> --
> > > > >> 2.47.0.338.g60cca15819-goog
> > > > >>
> > > >


      reply	other threads:[~2024-11-18 22:05 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-15 21:52 Kalesh Singh
2024-11-15 22:06 ` Rik van Riel
2024-11-15 22:15 ` Yang Shi
2024-11-15 22:44   ` Kalesh Singh
2024-11-17 11:12   ` Vlastimil Babka
2024-11-18 17:05     ` Yang Shi
2024-11-18 17:52       ` Kalesh Singh
2024-11-18 21:44         ` Yang Shi
2024-11-18 22:04           ` Kalesh Singh [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAC_TJvcNNcgO3z0DfJoSf3KSUXsLkL66NsAvft2vcJSCK2AgEw@mail.gmail.com \
    --to=kaleshsingh@google.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=android-mm@google.com \
    --cc=hboehm@google.com \
    --cc=jannh@google.com \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lokeshgidra@google.com \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=minchan@kernel.org \
    --cc=riel@surriel.com \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=yang@os.amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox