Re: [Lsf-pc] [LSF/MM/BPF TOPIC] HGM for hugetlbfs

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Jason Gunthorpe <jgg@ziepe.ca>
To: James Houghton <jthoughton@google.com>
Cc: Dan Williams <dan.j.williams@intel.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	David Hildenbrand <david@redhat.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	Peter Xu <peterx@redhat.com>, Yosry Ahmed <yosryahmed@google.com>,
	linux-mm@kvack.org, Michal Hocko <mhocko@suse.com>,
	Matthew Wilcox <willy@infradead.org>,
	David Rientjes <rientjes@google.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	lsf-pc@lists.linux-foundation.org,
	Jiaqi Yan <jiaqiyan@google.com>,
	jane.chu@oracle.com
Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] HGM for hugetlbfs
Date: Tue, 13 Jun 2023 12:17:23 -0300	[thread overview]
Message-ID: <ZIiIg7i+7r47h17S@ziepe.ca> (raw)
In-Reply-To: <CADrL8HXy3OnAqd4Y6FHZLMkxXsCE54UH=PQVCTUvNhX9yWCacw@mail.gmail.com>

On Fri, Jun 09, 2023 at 01:20:19PM -0700, James Houghton wrote:

> So, we could:
> 1. Do what HGM does and have the kernel unmap the 4K page in the
> userspace page tables.
> 2. On-the-fly change the VMA for our hugepage to not be HugeTLB
> anymore, and re-map all the good 4K pages.
> 3. Tell userspace that it must change its mapping from HugeTLB to
> something else, and move the good 4K pages into the new mapping.
 
> (2) feels like more complexity than (1). If a user created a
> MAP_HUGETLB mapping and now it isn't HugeTLB, that feels wrong.
> 
> (3) today isn't possible, but with Jiaqi's improvement to hugetlbfs
> read() it becomes possible. We'll need to have an extra 1G of memory
> while we are doing this copying/recovery, and it isn't transparent at
> all.

It is transparent to the VM, it just has a longer EPT fault response
time if the VM touches that range.

> (3) is additionally painful when considering live migration. We have
> to keep the 4K page unmapped after the migration (to keep it poisoned
> from the guest's perspective), but the page is no longer *actually*
> poisoned on the host. To get the memory we need to back our
> fake-poisoned pages with tmpfs, we would need to free our 1G page.
> Getting that page back later isn't trivial.

Why does this change with #1?

As David says you can't transparently "fix" the page, so when you
migrate a VM with unavailable pages it must migrate those unavailable
pages too, regardless if the kernel made them unavailable or
userspace did.

So, regardless, you end up with a VM that has holes in its address
map.

I guess if the hole is created from a PTE map of a 1G hugetlbfs it is
easier to "heal" back to a full 1G map, but this healing could also be
done by copying.

It seems to me the main value of the kernel-side approach is that it
eliminates the copies and makes the time the 1G page would be
unavailable to the guest shorter.

> So (1) still seems like the most natural solution, so the question
> becomes: how exactly do we implement 4K unmapping? And that brings us
> back to the main question about how HGM should be implemented in
> general.

IMHO if you can do it in userspace with a copy you can solve your
urgent customer need and then have more time to do the big kernel
rework required to optimize it with kernel support.

Jason

next prev parent reply	other threads:[~2023-06-13 15:17 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-06 19:19 Mike Kravetz
2023-03-14 15:37 ` James Houghton
2023-04-12  1:44   ` David Rientjes
2023-05-24 20:26 ` James Houghton
2023-05-26  3:00   ` David Rientjes
     [not found]     ` <20230602172723.GA3941@monkey>
2023-06-06 22:40       ` David Rientjes
2023-06-07  7:38         ` David Hildenbrand
2023-06-07  7:51           ` Yosry Ahmed
2023-06-07  8:13             ` David Hildenbrand
2023-06-07 22:06               ` Mike Kravetz
2023-06-08  0:02                 ` David Rientjes
2023-06-08  6:34                   ` David Hildenbrand
2023-06-08 18:50                     ` Yang Shi
2023-06-08 21:23                       ` Mike Kravetz
2023-06-09  1:57                         ` Zi Yan
2023-06-09 15:17                           ` Pasha Tatashin
2023-06-09 19:04                             ` Ankur Arora
2023-06-09 19:57                           ` Matthew Wilcox
2023-06-08 20:10                     ` Matthew Wilcox
2023-06-09  2:59                       ` David Rientjes
2023-06-13 14:59                       ` Jason Gunthorpe
2023-06-13 15:15                         ` David Hildenbrand
2023-06-13 15:45                           ` Peter Xu
2023-06-08 21:54                 ` [Lsf-pc] " Dan Williams
2023-06-08 22:35                   ` Mike Kravetz
2023-06-09  3:36                     ` Dan Williams
2023-06-09 20:20                       ` James Houghton
2023-06-13 15:17                         ` Jason Gunthorpe [this message]
2023-06-07 14:40           ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZIiIg7i+7r47h17S@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=axelrasmussen@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=jane.chu@oracle.com \
    --cc=jiaqiyan@google.com \
    --cc=jthoughton@google.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=peterx@redhat.com \
    --cc=rientjes@google.com \
    --cc=willy@infradead.org \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox