Re: [LSF/MM/BPF TOPIC] HGM for hugetlbfs

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Rientjes <rientjes@google.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: James Houghton <jthoughton@google.com>,
	 Naoya Horiguchi <naoya.horiguchi@nec.com>,
	 Miaohe Lin <linmiaohe@huawei.com>,
	lsf-pc@lists.linux-foundation.org,  linux-mm@kvack.org,
	Peter Xu <peterx@redhat.com>,  Michal Hocko <mhocko@suse.com>,
	Matthew Wilcox <willy@infradead.org>,
	 David Hildenbrand <david@redhat.com>,
	 Axel Rasmussen <axelrasmussen@google.com>,
	Jiaqi Yan <jiaqiyan@google.com>
Subject: Re: [LSF/MM/BPF TOPIC] HGM for hugetlbfs
Date: Tue, 6 Jun 2023 15:40:10 -0700 (PDT)	[thread overview]
Message-ID: <7e0ce268-f374-8e83-2b32-7c53f025fec5@google.com> (raw)
In-Reply-To: <20230602172723.GA3941@monkey>

On Fri, 2 Jun 2023, Mike Kravetz wrote:

> The benefit of HGM in the case of memory errors is fairly obvious.  As
> mentioned above, when a memory error is encountered on a hugetlb page,
> that entire hugetlb page becomes inaccessible to the application.  Losing,
> 1G or even 2M of data is often catastrophic for an application.  There
> is often no way to recover.  It just makes sense that recovering from
> the loss of 4K of data would generally be easier and more likely to be
> possible.  Today, when Oracle DB encounters a hard memory error on a
> hugetlb page it will shutdown.  Plans are currently in place repair and
> recover from such errors if possible.  Isolating the area of data loss
> to a single 4K page significantly increases the likelihood of repair and
> recovery.
> 
> Today, when a memory error is encountered on a hugetlb page an
> application is 'notified' of the error by a SIGBUS, as well as the
> virtual address of the hugetlb page and it's size.  This makes sense as
> hugetlb pages are accessed by a single page table entry, so you get all
> or nothing.  As mentioned by James above, this is catastrophic for VMs
> as the hypervisor has just been told that 2M or 1G is now inaccessible.
> With HGM, we can isolate such errors to 4K.
> 
> Backing VMs with hugetlb pages is a real use case today.  We are seeing
> memory errors on such hugetlb pages with the result being VM failures.
> One of the advantages of backing VMs with THPs is that they are split in
> the case of memory errors.  HGM would allow similar functionality.

Thanks for this context, Mike, it's very useful.

I think everybody is aligned on the desire to map memory at smaller 
granularities for multiple use cases and it's fairly clear that these use 
cases are critically important to multiple stakeholders.

I think the open question is whether this functionality is supported in 
hugetlbfs (like with HGM) or that there is a hard requirement that we must 
use THP for this support.

I don't think that hugetlbfs is feature frozen, but if there's a strong 
bias toward not merging additional complexity into the subsystem that 
would useful to know.  I personally think the critical use cases described 
above justify the added complexity of HGM to hugetlb and we wouldn't be 
blocked by the long standing (15+ years) desire to mesh hugetlb into the 
core MM subsystem before we can stop the pain associated with memory 
poisoning and live migration.

Are there strong objections to extending hugetlb for this support?

next prev parent reply	other threads:[~2023-06-06 22:40 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-06 19:19 Mike Kravetz
2023-03-14 15:37 ` James Houghton
2023-04-12  1:44   ` David Rientjes
2023-05-24 20:26 ` James Houghton
2023-05-26  3:00   ` David Rientjes
     [not found]     ` <20230602172723.GA3941@monkey>
2023-06-06 22:40       ` David Rientjes [this message]
2023-06-07  7:38         ` David Hildenbrand
2023-06-07  7:51           ` Yosry Ahmed
2023-06-07  8:13             ` David Hildenbrand
2023-06-07 22:06               ` Mike Kravetz
2023-06-08  0:02                 ` David Rientjes
2023-06-08  6:34                   ` David Hildenbrand
2023-06-08 18:50                     ` Yang Shi
2023-06-08 21:23                       ` Mike Kravetz
2023-06-09  1:57                         ` Zi Yan
2023-06-09 15:17                           ` Pasha Tatashin
2023-06-09 19:04                             ` Ankur Arora
2023-06-09 19:57                           ` Matthew Wilcox
2023-06-08 20:10                     ` Matthew Wilcox
2023-06-09  2:59                       ` David Rientjes
2023-06-13 14:59                       ` Jason Gunthorpe
2023-06-13 15:15                         ` David Hildenbrand
2023-06-13 15:45                           ` Peter Xu
2023-06-08 21:54                 ` [Lsf-pc] " Dan Williams
2023-06-08 22:35                   ` Mike Kravetz
2023-06-09  3:36                     ` Dan Williams
2023-06-09 20:20                       ` James Houghton
2023-06-13 15:17                         ` Jason Gunthorpe
2023-06-07 14:40           ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7e0ce268-f374-8e83-2b32-7c53f025fec5@google.com \
    --to=rientjes@google.com \
    --cc=axelrasmussen@google.com \
    --cc=david@redhat.com \
    --cc=jiaqiyan@google.com \
    --cc=jthoughton@google.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=peterx@redhat.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox