Re: [LSF/MM/BPF TOPIC] HGM for hugetlbfs

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Rientjes <rientjes@google.com>
To: James Houghton <jthoughton@google.com>,
	 Naoya Horiguchi <naoya.horiguchi@nec.com>,
	 Miaohe Lin <linmiaohe@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
	lsf-pc@lists.linux-foundation.org,  linux-mm@kvack.org,
	Peter Xu <peterx@redhat.com>,  Michal Hocko <mhocko@suse.com>,
	Matthew Wilcox <willy@infradead.org>,
	 David Hildenbrand <david@redhat.com>,
	 Axel Rasmussen <axelrasmussen@google.com>,
	Jiaqi Yan <jiaqiyan@google.com>
Subject: Re: [LSF/MM/BPF TOPIC] HGM for hugetlbfs
Date: Thu, 25 May 2023 20:00:59 -0700 (PDT)	[thread overview]
Message-ID: <a49e7ec8-735d-5a81-1744-cb887389a559@google.com> (raw)
In-Reply-To: <CADrL8HW87GWWTrBT1i722UnxLTG5Rh_5Y9XvCa1hWhY9C4Bh2Q@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4182 bytes --]

On Wed, 24 May 2023, James Houghton wrote:

> Hi everyone,
> 
> If you came to the HGM session at LSF/MM/BPF, thank you!

Thank you, James, for putting together such a detailed discussion and 
soliciting some great feedback.

> I want to
> address some of the feedback I got and restate the importance of HGM,
> especially as it relates to handling memory poison.
> 

Thanks for bringing this up, I think it's a very important use case.  
Adding in Naoya Horiguchi and Miaohe Lin as well.

> ## Memory poison is a problem
> 
> HGM allows us to unmap poison at 4K instead of unmapping the entire
> hugetlb page. For applications that use HugeTLB, losing the entire
> hugepage can be catastrophic. For example, if a hypervisor is using 1G
> pages for guest memory, the VM will lose 1G of its physical address
> space, which is catastrophic (even 2M will most likely kill the VM).
> If we can limit the poisoning to only 4K, the VM will most likely be
> able to recover. This improved recoverability applies to other HugeTLB
> users as well, like databases.
> 

Mike, do you have feedback on how useful this would be, especially for use 
cases beyond what cloud providers would find helpful?

> ## Adding a new filesystem has risks, and unification will take years
> 
> Most of the feedback I got from the HGM session was to simply avoid
> adding new code to HugeTLB, and instead to make a new device or
> filesystem. Creating a new device or filesystem could work, but it
> leaves existing HugeTLB users with no answer for memory poison. Users
> would need to switch to the new device/filesystem if they want better
> hwpoison handling, and it will probably take years for the new
> device/filesystem to support all the features that HugeTLB supports
> today (so beyond PUD+ mappings, we would need page table sharing, page
> struct freeing, and even private mappings/CoW).
> 
> If we make a new filesystem and are unable to completely implement the
> HugeTLB uapi exactly with that filesystem, we will be stuck unable to
> remove HugeTLB.  We would strongly like to avoid coexisting HugeTLB
> implementations (similar to cgroup v1 and cgroup v2) if at all
> possible.
> 
> Instead of making a new filesystem, we could add HugeTLB-like features
> tmpfs, such as support for gigantic page allocations (from bootmem or
> CMA, like HugeTLB), for example. This path would work to mostly unify
> HugeTLB with tmpfs, but existing HugeTLB users will still have to wait
> for many years before poison can be handled more efficiently. (And
> some users care about things like hugetlb_cgroup!)
> 
> ## HGM doesn’t hinder future unification
> 
> HGM doesn’t add any new special cases into mm code; it takes advantage
> of the existing special cases that already exist to support HugeTLB.
> HGM also isn’t adding a completely novel feature that can’t be
> replicated by THPs: PTE-mapping of THPs is already supported.
> 

I think this is important, there are deficiencies that HGM can fully 
address (like the aforementioned smaller granularity page poisoning, as 
well as optimized live migration) while not posing an obstacle for future 
unification if possible.

If not for HGM, it would be great to get alignment on what needs to be 
done so that we can support memory poisoning in smaller sizes for users of 
1GB pages *and* optimized live migration for VMs backed by 1GB pages 
without requiring a full unification of the HugeTLB subsystem with the 
rest of core MM.

While that unification has been discussed for several years, it would be a 
shame if that became a full blocker to address these real deficiencies 
that are actively causing pain.

> HGM solves a problem that HugeTLB users have right now: unnecessarily
> large portions of memory are poisoned. Unless we fix HugeTLB itself,
> we will have to spend years effectively rewriting HugeTLB and telling
> users to switch to the new system that gets built.
> 
> Given all this, I think we should continue to move forward with HGM
> unless there is another feasible way to solve poisoning for existing
> HugeTLB users. Also, I encourage everyone to read the series itself
> (it's not all that complicated!).
> 
> - James
>

next prev parent reply	other threads:[~2023-05-26  3:01 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-06 19:19 Mike Kravetz
2023-03-14 15:37 ` James Houghton
2023-04-12  1:44   ` David Rientjes
2023-05-24 20:26 ` James Houghton
2023-05-26  3:00   ` David Rientjes [this message]
     [not found]     ` <20230602172723.GA3941@monkey>
2023-06-06 22:40       ` David Rientjes
2023-06-07  7:38         ` David Hildenbrand
2023-06-07  7:51           ` Yosry Ahmed
2023-06-07  8:13             ` David Hildenbrand
2023-06-07 22:06               ` Mike Kravetz
2023-06-08  0:02                 ` David Rientjes
2023-06-08  6:34                   ` David Hildenbrand
2023-06-08 18:50                     ` Yang Shi
2023-06-08 21:23                       ` Mike Kravetz
2023-06-09  1:57                         ` Zi Yan
2023-06-09 15:17                           ` Pasha Tatashin
2023-06-09 19:04                             ` Ankur Arora
2023-06-09 19:57                           ` Matthew Wilcox
2023-06-08 20:10                     ` Matthew Wilcox
2023-06-09  2:59                       ` David Rientjes
2023-06-13 14:59                       ` Jason Gunthorpe
2023-06-13 15:15                         ` David Hildenbrand
2023-06-13 15:45                           ` Peter Xu
2023-06-08 21:54                 ` [Lsf-pc] " Dan Williams
2023-06-08 22:35                   ` Mike Kravetz
2023-06-09  3:36                     ` Dan Williams
2023-06-09 20:20                       ` James Houghton
2023-06-13 15:17                         ` Jason Gunthorpe
2023-06-07 14:40           ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a49e7ec8-735d-5a81-1744-cb887389a559@google.com \
    --to=rientjes@google.com \
    --cc=axelrasmussen@google.com \
    --cc=david@redhat.com \
    --cc=jiaqiyan@google.com \
    --cc=jthoughton@google.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=peterx@redhat.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox