Re: [PATCH v3 0/3] A Solution to Re-enable hugetlb vmemmap optimize

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Catalin Marinas <catalin.marinas@arm.com>
To: Yu Zhao <yuzhao@google.com>
Cc: Nanyong Sun <sunnanyong@huawei.com>,
	will@kernel.org, mike.kravetz@oracle.com, muchun.song@linux.dev,
	akpm@linux-foundation.org, anshuman.khandual@arm.com,
	willy@infradead.org, wangkefeng.wang@huawei.com,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v3 0/3] A Solution to Re-enable hugetlb vmemmap optimize
Date: Fri, 5 Jul 2024 16:49:08 +0100	[thread overview]
Message-ID: <ZogV9Iag4mxe6enx@arm.com> (raw)
In-Reply-To: <CAOUHufYo=SQmpaYA3ThrdHcY9fQfFmycriSvOX1iuC4Y=Gj7Xg@mail.gmail.com>

On Thu, Jun 27, 2024 at 03:19:55PM -0600, Yu Zhao wrote:
> On Wed, Feb 7, 2024 at 5:44 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Sat, Jan 27, 2024 at 01:04:15PM +0800, Nanyong Sun wrote:
> > > On 2024/1/26 2:06, Catalin Marinas wrote:
> > > > On Sat, Jan 13, 2024 at 05:44:33PM +0800, Nanyong Sun wrote:
> > > > > HVO was previously disabled on arm64 [1] due to the lack of necessary
> > > > > BBM(break-before-make) logic when changing page tables.
> > > > > This set of patches fix this by adding necessary BBM sequence when
> > > > > changing page table, and supporting vmemmap page fault handling to
> > > > > fixup kernel address translation fault if vmemmap is concurrently accessed.
> > [...]
> > > > How often is this code path called? I wonder whether a stop_machine()
> > > > approach would be simpler.
> > > 
> > > As long as allocating or releasing hugetlb is called.  We cannot
> > > limit users to only allocate or release hugetlb when booting or
> > > not running any workload on all other cpus, so if use
> > > stop_machine(), it will be triggered 8 times every 2M and 4096
> > > times every 1G, which is probably too expensive.
> >
> > I'm hoping this can be batched somehow and not do a stop_machine() (or
> > 8) for every 2MB huge page.
> 
> Theoretically, all hugeTLB vmemmap operations from a single user
> request can be done in one batch. This would require the preallocation
> of the new copy of vmemmap so that the old copy can be replaced with
> one BBM.

Do we ever re-create pmd block entries back for the vmmemap range that
was split or do they remain pmd table + pte entries? If the latter, I
guess we could do a stop_machine() only for a pmd, it should be self
limiting after a while. I don't want user-space to DoS the system by
triggering stop_machine() when mapping/unmapping hugetlbfs pages.

If I did the maths right, for a 2MB hugetlb page, we have about 8
vmemmap pages (32K). Once we split a 2MB vmemap range, whatever else
needs to be touched in this range won't require a stop_machine().

> > Just to make sure I understand - is the goal to be able to free struct
> > pages corresponding to hugetlbfs pages?
> 
> Correct, if you are referring to the pages holding struct page[].
> 
> > Can we not leave the vmemmap in
> > place and just release that memory to the page allocator?
> 
> We cannot, since the goal is to reuse those pages for something else,
> i.e., reduce the metadata overhead for hugeTLB.

What I meant is that we can leave the vmemmap alias in place and just
reuse those pages via the linear map etc. The kernel should touch those
struct pages to corrupt the data. The only problem would be if we
physically unplug those pages but I don't think that's the case here.

-- 
Catalin

next prev parent reply	other threads:[~2024-07-05 15:49 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-13  9:44 Nanyong Sun
2024-01-13  9:44 ` [PATCH v3 1/3] mm: HVO: introduce helper function to update and flush pgtable Nanyong Sun
2024-01-13  9:44 ` [PATCH v3 2/3] arm64: mm: HVO: support BBM of vmemmap pgtable safely Nanyong Sun
2024-01-15  2:38   ` Muchun Song
2024-02-07 12:21   ` Mark Rutland
2024-02-08  9:30     ` Nanyong Sun
2024-01-13  9:44 ` [PATCH v3 3/3] arm64: mm: Re-enable OPTIMIZE_HUGETLB_VMEMMAP Nanyong Sun
2024-01-25 18:06 ` [PATCH v3 0/3] A Solution to Re-enable hugetlb vmemmap optimize Catalin Marinas
2024-01-27  5:04   ` Nanyong Sun
2024-02-07 11:12     ` Will Deacon
2024-02-07 11:21       ` Matthew Wilcox
2024-02-07 12:11         ` Will Deacon
2024-02-07 12:24           ` Mark Rutland
2024-02-07 14:17           ` Matthew Wilcox
2024-02-08  2:24             ` Jane Chu
2024-02-08 15:49               ` Matthew Wilcox
2024-02-08 19:21                 ` Jane Chu
2024-02-11 11:59                 ` Muchun Song
2024-06-05 20:50                   ` Yu Zhao
2024-06-06  8:30                     ` David Hildenbrand
2024-06-07 16:55                       ` Frank van der Linden
2024-02-07 12:20         ` Catalin Marinas
2024-02-08  9:44           ` Nanyong Sun
2024-02-08 13:17             ` Will Deacon
2024-03-13 23:32               ` David Rientjes
2024-03-25 15:24                 ` Nanyong Sun
2024-03-26 12:54                   ` Will Deacon
2024-06-24  5:39                   ` Yu Zhao
2024-06-27 14:33                     ` Nanyong Sun
2024-06-27 21:03                       ` Yu Zhao
2024-07-04 11:47                         ` Nanyong Sun
2024-07-04 19:45                           ` Yu Zhao
2024-02-07 12:44     ` Catalin Marinas
2024-06-27 21:19       ` Yu Zhao
2024-07-05 15:49         ` Catalin Marinas [this message]
2024-07-05 17:41           ` Yu Zhao
2024-07-10 16:51             ` Catalin Marinas
2024-07-10 17:12               ` Yu Zhao
2024-07-10 22:29                 ` Catalin Marinas
2024-07-10 23:07                   ` Yu Zhao
2024-07-11  8:31                     ` Yu Zhao
2024-07-11 11:39                       ` Catalin Marinas
2024-07-11 17:38                         ` Yu Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZogV9Iag4mxe6enx@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=muchun.song@linux.dev \
    --cc=sunnanyong@huawei.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox