linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yang Shi <shy828301@gmail.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	Yang Shi <yang@os.amperecomputing.com>,
	riel@surriel.com,  cl@linux.com, akpm@linux-foundation.org,
	linux-kernel@vger.kernel.org,  linux-mm@kvack.org
Subject: Re: [RESEND PATCH] mm: align larger anonymous mappings on THP boundaries
Date: Tue, 23 Jan 2024 09:14:27 -0800	[thread overview]
Message-ID: <CAHbLzkrtcsU=pW13AyAMvF72A03fUV5iFcM0HwQoEemeajtqxg@mail.gmail.com> (raw)
In-Reply-To: <bad7ec4a-1507-4ec4-996a-ea29d07d47a0@arm.com>

On Tue, Jan 23, 2024 at 1:41 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 22/01/2024 19:43, Yang Shi wrote:
> > On Mon, Jan 22, 2024 at 3:37 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>
> >> On 20/01/2024 16:39, Matthew Wilcox wrote:
> >>> On Sat, Jan 20, 2024 at 12:04:27PM +0000, Ryan Roberts wrote:
> >>>> However, after this patch, each allocation is in its own VMA, and there is a 2M
> >>>> gap between each VMA. This causes 2 problems: 1) mmap becomes MUCH slower
> >>>> because there are so many VMAs to check to find a new 1G gap. 2) It fails once
> >>>> it hits the VMA limit (/proc/sys/vm/max_map_count). Hitting this limit then
> >>>> causes a subsequent calloc() to fail, which causes the test to fail.
> >>>>
> >>>> Looking at the code, I think the problem is that arm64 selects
> >>>> ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT. But __thp_get_unmapped_area() allocates
> >>>> len+2M then always aligns to the bottom of the discovered gap. That causes the
> >>>> 2M hole. As far as I can see, x86 allocates bottom up, so you don't get a hole.
> >>>
> >>> As a quick hack, perhaps
> >>> #ifdef ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
> >>> take-the-top-half
> >>> #else
> >>> current-take-bottom-half-code
> >>> #endif
> >>>
> >>> ?
> >
> > Thanks for the suggestion. It makes sense to me. Doing the alignment
> > needs to take into account this.
> >
> >>
> >> There is a general problem though that there is a trade-off between abutting
> >> VMAs, and aligning them to PMD boundaries. This patch has decided that in
> >> general the latter is preferable. The case I'm hitting is special though, in
> >> that both requirements could be achieved but currently are not.
> >>
> >> The below fixes it, but I feel like there should be some bitwise magic that
> >> would give the correct answer without the conditional - but my head is gone and
> >> I can't see it. Any thoughts?
> >
> > Thanks Ryan for the patch. TBH I didn't see a bitwise magic without
> > the conditional either.
> >
> >>
> >> Beyond this, though, there is also a latent bug where the offset provided to
> >> mmap() is carried all the way through to the get_unmapped_area()
> >> impelementation, even for MAP_ANONYMOUS - I'm pretty sure we should be
> >> force-zeroing it for MAP_ANONYMOUS? Certainly before this change, for arches
> >> that use the default get_unmapped_area(), any non-zero offset would not have
> >> been used. But this change starts using it, which is incorrect. That said, there
> >> are some arches that override the default get_unmapped_area() and do use the
> >> offset. So I'm not sure if this is a bug or a feature that user space can pass
> >> an arbitrary value to the implementation for anon memory??
> >
> > Thanks for noticing this. If I read the code correctly, the pgoff used
> > by some arches to workaround VIPT caches, and it looks like it is for
> > shared mapping only (just checked arm and mips). And I believe
> > everybody assumes 0 should be used when doing anonymous mapping. The
> > offset should have nothing to do with seeking proper unmapped virtual
> > area. But the pgoff does make sense for file THP due to the alignment
> > requirements. I think it should be zero'ed for anonymous mappings,
> > like:
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 2ff79b1d1564..a9ed353ce627 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1830,6 +1830,7 @@ get_unmapped_area(struct file *file, unsigned
> > long addr, unsigned long len,
> >                 pgoff = 0;
> >                 get_area = shmem_get_unmapped_area;
> >         } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> > +               pgoff = 0;
> >                 /* Ensures that larger anonymous mappings are THP aligned. */
> >                 get_area = thp_get_unmapped_area;
> >         }
>
> I think it would be cleaner to just zero pgoff if file==NULL, then it covers the
> shared case, the THP case, and the non-THP case properly. I'll prepare a
> separate patch for this.

IIUC I don't think this is ok for those arches which have to
workaround VIPT cache since MAP_ANONYMOUS | MAP_SHARED with NULL file
pointer is a common case for creating tmpfs mapping. For example,
arm's arch_get_unmapped_area() has:

if (aliasing)
        do_align = filp || (flags & MAP_SHARED);

The pgoff is needed if do_align is true. So we should just zero pgoff
iff !file && !MAP_SHARED like what my patch does, we can move the
zeroing to a better place.

>
>
> >
> >>
> >> Finally, the second test failure I reported (ksm_tests) is actually caused by a
> >> bug in the test code, but provoked by this change. So I'll send out a fix for
> >> the test code separately.
> >>
> >>
> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >> index 4f542444a91f..68ac54117c77 100644
> >> --- a/mm/huge_memory.c
> >> +++ b/mm/huge_memory.c
> >> @@ -632,7 +632,7 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
> >>  {
> >>         loff_t off_end = off + len;
> >>         loff_t off_align = round_up(off, size);
> >> -       unsigned long len_pad, ret;
> >> +       unsigned long len_pad, ret, off_sub;
> >>
> >>         if (off_end <= off_align || (off_end - off_align) < size)
> >>                 return 0;
> >> @@ -658,7 +658,13 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
> >>         if (ret == addr)
> >>                 return addr;
> >>
> >> -       ret += (off - ret) & (size - 1);
> >> +       off_sub = (off - ret) & (size - 1);
> >> +
> >> +       if (current->mm->get_unmapped_area == arch_get_unmapped_area_topdown &&
> >> +           !off_sub)
> >> +               return ret + size;
> >> +
> >> +       ret += off_sub;
> >>         return ret;
> >>  }
> >
> > I didn't spot any problem, would you please come up with a formal patch?
>
> Yeah, I'll aim to post today.

Thanks!

>
>


  reply	other threads:[~2024-01-23 17:14 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-14 22:34 Yang Shi
2024-01-20 12:04 ` Ryan Roberts
2024-01-20 12:13   ` Ryan Roberts
2024-01-20 16:39   ` Matthew Wilcox
2024-01-22 11:37     ` Ryan Roberts
2024-01-22 19:43       ` Yang Shi
2024-01-23  9:41         ` Ryan Roberts
2024-01-23 17:14           ` Yang Shi [this message]
2024-01-23 17:26             ` Yang Shi
2024-01-23 17:26             ` Ryan Roberts
2024-01-23 17:33               ` Yang Shi
2024-05-07  8:25               ` Kefeng Wang
2024-05-07 10:08                 ` Ryan Roberts
2024-05-07 10:59                   ` Kefeng Wang
2024-05-07 11:13                     ` David Hildenbrand
2024-05-07 11:14                       ` Ryan Roberts
2024-05-07 11:26                         ` Ryan Roberts
2024-05-07 11:34                           ` David Hildenbrand
2024-05-07 11:42                             ` David Hildenbrand
2024-05-07 12:36                               ` Ryan Roberts
2024-05-07 13:53                       ` Kefeng Wang
2024-05-07 15:53                         ` Ryan Roberts
2024-05-07 17:17                           ` Yang Shi
2024-05-08  7:48                             ` Kefeng Wang
2024-05-08  8:36                               ` Ryan Roberts
2024-05-08 13:37                                 ` Kefeng Wang
2024-05-08 13:41                                   ` Ryan Roberts
2024-05-08 15:25                                   ` Yang Shi
2024-05-09  1:47                                     ` Kefeng Wang
2024-01-22 20:20       ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHbLzkrtcsU=pW13AyAMvF72A03fUV5iFcM0HwQoEemeajtqxg@mail.gmail.com' \
    --to=shy828301@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@surriel.com \
    --cc=ryan.roberts@arm.com \
    --cc=willy@infradead.org \
    --cc=yang@os.amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox