Re: [PATCH v1] mm/hugetlb_vmemmap: remap head page to newly allocated page

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mike Kravetz <mike.kravetz@oracle.com>
To: Muchun Song <songmuchun@bytedance.com>
Cc: Joao Martins <joao.m.martins@oracle.com>,
	linux-mm@kvack.org, Naoya Horiguchi <naoya.horiguchi@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v1] mm/hugetlb_vmemmap: remap head page to newly allocated page
Date: Thu, 4 Aug 2022 09:56:39 -0700	[thread overview]
Message-ID: <Yuv6R9YMAQXc+KrG@monkey> (raw)
In-Reply-To: <Yut9rBWto31i3z6W@FVFYT0MHHV2J.usts.net>

On 08/04/22 16:05, Muchun Song wrote:
> On Wed, Aug 03, 2022 at 03:42:15PM -0700, Mike Kravetz wrote:
> > On 08/03/22 18:44, Muchun Song wrote:
> > > On Wed, Aug 03, 2022 at 10:52:13AM +0100, Joao Martins wrote:
> > > > On 8/3/22 05:11, Muchun Song wrote:
> > > > > On Tue, Aug 02, 2022 at 07:03:09PM +0100, Joao Martins wrote:
> > > I am not sure we are on the same page (it seems that we are not after I saw your
> > > below patch). So let me make it become more clarified.
> > 
> > Thanks Muchun!
> > I told Joao that you would be the expert in this area and was correct.
> >
> 
> Thanks for your trust.
>  
> > > 
> > > CPU0:                                            CPU1:
> > > 
> > > vmemmap_remap_free(start, end, reuse)
> > >   // alloc a new page used to be the head vmemmap page
> > >   page = alloc_pages_node();
> > > 
> > >   memcpy(page_address(page), reuse, PAGE_SIZE);
> > >                                                  // Now the @reuse address is mapped to the original
> > >                                                  // page frame. So the change will be reflected on the
> > >                                                  // original page frame.
> > >                                                  get_page(reuse);
> > 
> > Here is a thought.
> > 
> > This code gets called early after allocating a new hugetlb page.  This new
> > compound page has a ref count of 1 on the head page and 0 on all the tail
> > pages.  If the ref count was 0 on the head page, get_page() would not succeed.
> > 
> > I can not think of a reason why we NEED to have a ref count of 1 on the
> > head page.  It does make it more convenient to simply call put_page() on
> > the newly initialized hugetlb page and have the page be added to the huegtlb
> > free lists, but this could easily be modified.
> > 
> > Matthew Willcox recently pointed me at this:
> > https://lore.kernel.org/linux-mm/20220531150611.1303156-1-willy@infradead.org/T/#m98fb9f9bd476155b4951339da51a0887b2377476
> > 
> > That would only work for hugetlb pages allocated from buddy.  For gigantic
> > pages, we manually 'freeze' (zero ref count) of tail pages and check for
> > an unexpected increased ref count.  We could do the same with the head page.
> > 
> > Would having zero ref count on the head page eliminate this race?
> 
> I think most races which try to grab an extra refcount could be avoided
> in this case.
> 
> However, I can figure out a race related to memory failure, that is to poison
> a page in memory_failure(), which set PageHWPoison without checking if the
> refcount is zero. If we can fix this easily, I think this patch is a good
> direction.

Adding Naoya.

Yes, I recall that discussion in the thread,
https://lore.kernel.org/linux-mm/3c542543-0965-ef60-4627-1a4116077a5b@huawei.com/

I don't have any ideas about how to avoid that issue.

Naoya notes in that thread, the issue is poisioning a generic compound page
that may turn into a hugetlb page.  There may be a way to work around this.

However, IMO we may be trying too hard to cover ALL memory error handling races
with hugetlb.  We know that memory errors can happen at ANY time, and a page
could be marked poision at any time.  I suspect there are many paths where bad
things could happen if a memory error happens at 'just the right time'.  For
example, I believe a page could be poisioned at the very end of the page
allocation path and returned to the caller requesting the page.  Certainly,
not every caller of page allocation checks for and handles a poisioned page.
In general, we know that memory error handling will not be 100% perfect
and can not be handled in ALL code paths.  In some cases if a memory error
happens at 'just the right time', bad things will happen.  It would be almost
impossible to handle a memory error at any time in all code paths. Someone
please correct me if this is not accepted/known situation.

It seems there has been much effort lately to try and catch all hugetlb races
with memory error handling.  That is OK, but I think we need to accept that
there may be races with code paths that are not worth the effort to try and
cover.  Just my opinion of course.

For this specific proposal by Joao, I think we can handle most of the races
if all sub-pages of the hugetlb (including head) have a ref count of zero.
I actually like moving in this direction as it means we could remove some
existing hugetlb code checking for increased ref counts.

I can start working on code to move in this direction.  We will need to
wait for the alloc_frozen_pages() interface and will need to make changes
to other allocators for gigantic pages.  Until then we can manually zero the
ref counts before calling the vmemmap freeing routines.  With this in place,
I would say that we do something like Joao proposes to free more contiguous
vmemmap.

Thoughts?

In any case, I am going to move forward with setting ref count to zero of
all 'hugetlb pages' before they officially become hugetlb pages.  As mentioned,
this should allow for removal of some code checking for inflated ref counts.
-- 
Mike Kravetz

next prev parent reply	other threads:[~2022-08-04 16:56 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-02 18:03 Joao Martins
2022-08-03  4:11 ` Muchun Song
2022-08-03  9:52   ` Joao Martins
2022-08-03 10:44     ` Muchun Song
2022-08-03 12:22       ` Joao Martins
2022-08-04  7:17         ` Muchun Song
2022-08-04  9:23           ` Joao Martins
2022-08-04 11:53             ` Muchun Song
2022-08-03 22:42       ` Mike Kravetz
2022-08-04  8:05         ` Muchun Song
2022-08-04 16:56           ` Mike Kravetz [this message]
2022-08-05  2:54             ` Muchun Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yuv6R9YMAQXc+KrG@monkey \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=joao.m.martins@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=naoya.horiguchi@linux.dev \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox