From: James Houghton <jthoughton@google.com>
To: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>,
Mike Kravetz <mike.kravetz@oracle.com>,
David Rientjes <rientjes@google.com>,
linux-mm@kvack.org, John Hubbard <jhubbard@nvidia.com>,
Matthew Wilcox <willy@infradead.org>,
Peter Xu <peterx@redhat.com>, Vlastimil Babka <vbabka@suse.cz>,
Zi Yan <ziy@nvidia.com>
Subject: Re: [Invitation] Linux MM Alignment Session on HugeTLB Core MM Convergence on Wednesday
Date: Thu, 15 Jun 2023 10:24:24 -0700 [thread overview]
Message-ID: <CADrL8HVgFLb5NWGSpEg3GPMsOFv_U+upHmOYtgZnjmi6=p+zeA@mail.gmail.com> (raw)
In-Reply-To: <141b7088-684b-32dc-efe4-03713d38ae28@redhat.com>
On Thu, Jun 15, 2023 at 1:30 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 15.06.23 10:04, Michal Hocko wrote:
> > On Wed 14-06-23 16:04:58, Mike Kravetz wrote:
> >> On 06/12/23 18:59, David Rientjes wrote:
> >>> This week's topic will be a technical brainstorming session on HugeTLB
> >>> convergence with the core MM. This has been discussed most recently in
> >>> this thread:
> >>> https://lore.kernel.org/linux-mm/ZIOEDTUBrBg6tepk@casper.infradead.org/T/
> >>
> >> Thank you David for putting this session together! And, thanks to everyone
> >> who participated.
> >>
> >> Following up on linux-mm with most active participants on Cc (sorry if I
> >> missed someone). If it makes more sense to continue the above thread,
> >> please move there.
> >>
> >> Even though everyone knows that hugetlb is special cased throughout the
> >> core mm, it came to a head with the proposed introduction of HGM. TBH,
> >> few people in the core mm community paid much attention to HGM when first
> >> introduced. A LSF/MM session was then dedicated to the discussion of
> >> HGM with the outcome being the suggestion to create a new filesystem/driver
> >> (hugetlb2 if you will) that would satisfy the use cases requiring HGM.
> >> One thing that was not emphasized at LSF/MM is that there are existing
> >> hugetlb users experiencing major issues that could be addressed with HGM:
> >> specifically the issues of memory errors and live migration. That was
> >> the starting point for recent discussion in the above thread.
> >>
> >> I may be wrong, but it appeared the direction of that thread was to
> >> first try and unify some of the hugetlb and core mm code. Eliminate
> >> some of the special casing. If hugetlb was less of a special case, then
> >> perhaps HGM would be more acceptable. That is the impression I (perhaps
> >> incorrectly) had going into today's session.
> >
> > My impression from the discussion yesterday was that the level of
> > unification would need to be really large and time consuming in order to
> > be useful for the HGM patchset to be in a more maintainable form. The
> > final outcome is quite hard to predict at this stage.
> >
> >> During today's session, we often discussed what would/could be introduced
> >> in a hugetlb v2. The idea is that this would be the ideal place for HGM.
> >> However, people also made the comparisons to cgroup v1 - v2. Such a
> >> redesign provides the needed 'clean slate' to do things right, but it
> >> does little for existing users who would be unwilling to quickly move off
> >> existing hugetlb.
> >>
> >> We did spend a good chunk of time on hugetlb/core mm unification and
> >> removing special casing. In some (most) of these cases, the benefit of
> >> removing special cases from core mm would result in adding more code to
> >> hugetlb. For example: proper type'ing so that hugetlb does not treat
> >> all page table entries as PTEs. Again, I may be wrong but I think
> >> people were OK with adding more code (and even complexity) to hugetlb
> >> if it eliminated special casing in the core mm. But, there did not
> >> seem to be a clear concensus especially with the thought that we may
> >> need to double hugetlb code to get types right.
> >
> > This is primarily your call as a maintainer. If you ask me, hugetlb is
> > over complicated in its current form already. Regression are not really
> > seldom when code is added which is a signal we are hitting maintenance
> > cost walls. This doesn't mean further development is impossible of
> > course but it is increasingly more costly AFAICS.
> >
> >> Unless I missed something, there was no clear direction at the end of this
> >> session. I was hoping that we could come up with a plan to address the
> >> issues facing today's hugetlb users. IMO, there seems to be two options:
> >> 1) Start work on hugetlb v2 with the intention that customers will need
> >> to move to this to address their issues.
> >> 2) Incorporate functionality like HGM into existing hugetlb.
> >
>
> I fully agree with all that Michal said.
>
> I'm just going to add that I don't see why anyone would look into a
> hugetlbv2 if we're going to use the motivation of "help existing users"
> to make hugetlb ever-more complicated and special. "existing users" her
> even meaning "people use hugetlb for backing VMs. Now they want to get
> postcopy working with less latency." -- which I consider partially a new
> use case.
>
> So working on adding HGM and concurrently starting a hugetlbv2? I don't
> think that will happen if we decide on adding HGM and proceeding with
> that reasoning about existing users.
>
> As expressed yesterday, I don't see a fast an clean way to make hugetlb
> significantly less special (thanks Willy for the list of odd cases).
>
> Sure, we can talk about adding pte_t safety, but I don't really see a
> way forward to unify page table walking code that way -- there are still
> the (PT) locking, PMD sharing, PTE-cont special cases ... but sure, if
> anybody wants to work on that, why not.
>
> Having that said, like Michal, I acknowledge that it is Mikes call
> regarding the hugetlb code. I, for my part, will push back on any added
> core-mm complexity that adds more special casing for hugetlb. Maybe
> there are easy ways to integrate it nicely and that is not really a concern.
HGM is mostly contained in the already-existing HugeTLB special cases.
HGM doesn't really *add* special cases, it just makes the HugeTLB
special cases more complicated.
There are a few small ways that HGM touches non-hugetlb code:
1. Mapcount (to make hugetlb use the THP scheme) [1], newer version here[2]
2. madvise (to add MADV_SPLIT and update MADV_COLLAPSE) [3] and [4]
3. A small non-hugetlb changes to page_vma_mapped_walk (provide pte_order)[5]
4. A small special case in try_to_unmap_one and try_to_migrate_one (to
check the head page for page flags)[6]
5. smaps stats[7]
[1]: https://lore.kernel.org/linux-mm/20230218002819.1486479-6-jthoughton@google.com/
[2]: https://lore.kernel.org/linux-mm/20230306230004.1387007-1-jthoughton@google.com/
[3]: https://lore.kernel.org/linux-mm/20230218002819.1486479-10-jthoughton@google.com/
[4]: https://lore.kernel.org/linux-mm/20230218002819.1486479-35-jthoughton@google.com/
[5]: https://lore.kernel.org/linux-mm/20230218002819.1486479-27-jthoughton@google.com/
[6]: https://lore.kernel.org/linux-mm/20230218002819.1486479-29-jthoughton@google.com/
[7]: https://lore.kernel.org/linux-mm/20230218002819.1486479-39-jthoughton@google.com/
>
> Note that while we've been discussing how HGM would already interfere
> with core-mm, we've not even started discussing how actual
> MADV_SPLIT/MADV_COLLAPSE/page poisioning ... would affect core-mm and
> require special-casing for hugetlb.
>
> I, for my part, will explore a bit the mapcount topic (as time permits)
> and see if we can come up at least with a unified mapcount approach
> (e.g., sub-page mapcount?). But I suspect even figuring that out will
> take quite a while already ...
Thanks! Simply using the current THP mapcount scheme with HGM isn't
great (but IIUC this isn't blocking HGM). By using this scheme,
HugeTLB loses the vmemmap optimization / page struct freeing when HGM
is in use, and, of course, this scheme gets slow with very large
folios.
next prev parent reply other threads:[~2023-06-15 17:25 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <c5afdf35-a5fa-03e2-348d-cf1d990fc389@google.com>
[not found] ` <20230614230458.GB3559@monkey>
2023-06-15 1:12 ` David Rientjes
2023-06-15 8:04 ` Michal Hocko
2023-06-15 8:29 ` David Hildenbrand
2023-06-15 17:24 ` James Houghton [this message]
2023-06-15 18:58 ` Peter Xu
2023-06-15 18:31 ` Mike Kravetz
2023-06-15 17:00 ` James Houghton
2023-06-15 17:18 ` Matthew Wilcox
2023-06-15 17:59 ` Mike Kravetz
2023-06-13 2:01 David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CADrL8HVgFLb5NWGSpEg3GPMsOFv_U+upHmOYtgZnjmi6=p+zeA@mail.gmail.com' \
--to=jthoughton@google.com \
--cc=david@redhat.com \
--cc=jhubbard@nvidia.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=mike.kravetz@oracle.com \
--cc=peterx@redhat.com \
--cc=rientjes@google.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox