Re: [PATCH v3 0/3] A Solution to Re-enable hugetlb vmemmap optimize

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Catalin Marinas <catalin.marinas@arm.com>
To: Yu Zhao <yuzhao@google.com>
Cc: Nanyong Sun <sunnanyong@huawei.com>,
	will@kernel.org, mike.kravetz@oracle.com, muchun.song@linux.dev,
	akpm@linux-foundation.org, anshuman.khandual@arm.com,
	willy@infradead.org, wangkefeng.wang@huawei.com,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v3 0/3] A Solution to Re-enable hugetlb vmemmap optimize
Date: Wed, 10 Jul 2024 17:51:24 +0100	[thread overview]
Message-ID: <Zo68DP6siXfb6ZBR@arm.com> (raw)
In-Reply-To: <CAOUHufYwoTTsRBF_wWZU_jWzb8e6FF=vN8UKtVHBeXLBkwHWzA@mail.gmail.com>

On Fri, Jul 05, 2024 at 11:41:34AM -0600, Yu Zhao wrote:
> On Fri, Jul 5, 2024 at 9:49 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > If I did the maths right, for a 2MB hugetlb page, we have about 8
> > vmemmap pages (32K). Once we split a 2MB vmemap range,
> 
> Correct.
> 
> > whatever else
> > needs to be touched in this range won't require a stop_machine().
> 
> There might be some misunderstandings here.
> 
> To do HVO:
> 1. we split a PMD into 512 PTEs;
> 2. for every 8 PTEs:
>   2a. we allocate an order-0 page for PTE #0;
>   2b. we remap PTE #0 *RW* to this page;
>   2c. we remap PTEs #1-7 *RO* to this page;
>   2d. we free the original order-3 page.

Thanks. I now remember why we reverted such support in 060a2c92d1b6
("arm64: mm: hugetlb: Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP"). The main
problem is that point 2c also changes the output address of the PTE
(and the content of the page slightly). The architecture requires a
break-before-make in such scenario, though it would have been nice if it
was more specific on what could go wrong.

We can do point 1 safely if we have FEAT_BBM level 2. For point 2, I
assume these 8 vmemmap pages may be accessed and that's why we can't do
a break-before-make safely. I was wondering whether we could make the
PTEs RO first and then change the output address but we have another
rule that the content of the page should be the same. I don't think
entries 1-7 are identical to entry 0 (though we could ask the architects
for clarification here). Also, can we guarantee that nothing writes to
entry 0 while we would do such remapping? We know entries 1-7 won't be
written as we mapped them as RO but entry 0 contains the head page.
Maybe it's ok to map it RO temporarily until the newly allocated hugetlb
page is returned.

If we could get the above work, it would be a lot simpler than thinking
of stop_machine() or other locks to wait for such remapping.

> To do de-HVO:
> 1. for every 8 PTEs:
>   1a. we allocate 7 order-0 pages.
>   1b. we remap PTEs #1-7 *RW* to those pages, respectively.

Similar problem in 1.b, changing the output address. Here we could force
the content to be the same and remap PTEs 1-7 RO first to the new page,
turn them RW afterwards and it's all compliant with the architecture
(even without FEAT_BBM).

> > What I meant is that we can leave the vmemmap alias in place and just
> > reuse those pages via the linear map etc. The kernel should touch those
> > struct pages to corrupt the data. The only problem would be if we
> > physically unplug those pages but I don't think that's the case here.
> 
> Set the repercussions of memory corruption aside, we still can't do
> this because PTEs #1-7 need to map meaningful data, hence step 2c
> above.

Yeah, I missed this one.

-- 
Catalin

next prev parent reply	other threads:[~2024-07-10 16:51 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-13  9:44 Nanyong Sun
2024-01-13  9:44 ` [PATCH v3 1/3] mm: HVO: introduce helper function to update and flush pgtable Nanyong Sun
2024-01-13  9:44 ` [PATCH v3 2/3] arm64: mm: HVO: support BBM of vmemmap pgtable safely Nanyong Sun
2024-01-15  2:38   ` Muchun Song
2024-02-07 12:21   ` Mark Rutland
2024-02-08  9:30     ` Nanyong Sun
2024-01-13  9:44 ` [PATCH v3 3/3] arm64: mm: Re-enable OPTIMIZE_HUGETLB_VMEMMAP Nanyong Sun
2024-01-25 18:06 ` [PATCH v3 0/3] A Solution to Re-enable hugetlb vmemmap optimize Catalin Marinas
2024-01-27  5:04   ` Nanyong Sun
2024-02-07 11:12     ` Will Deacon
2024-02-07 11:21       ` Matthew Wilcox
2024-02-07 12:11         ` Will Deacon
2024-02-07 12:24           ` Mark Rutland
2024-02-07 14:17           ` Matthew Wilcox
2024-02-08  2:24             ` Jane Chu
2024-02-08 15:49               ` Matthew Wilcox
2024-02-08 19:21                 ` Jane Chu
2024-02-11 11:59                 ` Muchun Song
2024-06-05 20:50                   ` Yu Zhao
2024-06-06  8:30                     ` David Hildenbrand
2024-06-07 16:55                       ` Frank van der Linden
2024-02-07 12:20         ` Catalin Marinas
2024-02-08  9:44           ` Nanyong Sun
2024-02-08 13:17             ` Will Deacon
2024-03-13 23:32               ` David Rientjes
2024-03-25 15:24                 ` Nanyong Sun
2024-03-26 12:54                   ` Will Deacon
2024-06-24  5:39                   ` Yu Zhao
2024-06-27 14:33                     ` Nanyong Sun
2024-06-27 21:03                       ` Yu Zhao
2024-07-04 11:47                         ` Nanyong Sun
2024-07-04 19:45                           ` Yu Zhao
2024-02-07 12:44     ` Catalin Marinas
2024-06-27 21:19       ` Yu Zhao
2024-07-05 15:49         ` Catalin Marinas
2024-07-05 17:41           ` Yu Zhao
2024-07-10 16:51             ` Catalin Marinas [this message]
2024-07-10 17:12               ` Yu Zhao
2024-07-10 22:29                 ` Catalin Marinas
2024-07-10 23:07                   ` Yu Zhao
2024-07-11  8:31                     ` Yu Zhao
2024-07-11 11:39                       ` Catalin Marinas
2024-07-11 17:38                         ` Yu Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zo68DP6siXfb6ZBR@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=muchun.song@linux.dev \
    --cc=sunnanyong@huawei.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox