linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Michal Hocko <mhocko@suse.com>, Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Rientjes <rientjes@google.com>,
	linux-mm@kvack.org, James Houghton <jthoughton@google.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Matthew Wilcox <willy@infradead.org>,
	Peter Xu <peterx@redhat.com>, Vlastimil Babka <vbabka@suse.cz>,
	Zi Yan <ziy@nvidia.com>
Subject: Re: [Invitation] Linux MM Alignment Session on HugeTLB Core MM Convergence on Wednesday
Date: Thu, 15 Jun 2023 10:29:54 +0200	[thread overview]
Message-ID: <141b7088-684b-32dc-efe4-03713d38ae28@redhat.com> (raw)
In-Reply-To: <ZIrGKKpFTKpxCUN1@dhcp22.suse.cz>

On 15.06.23 10:04, Michal Hocko wrote:
> On Wed 14-06-23 16:04:58, Mike Kravetz wrote:
>> On 06/12/23 18:59, David Rientjes wrote:
>>> This week's topic will be a technical brainstorming session on HugeTLB
>>> convergence with the core MM.  This has been discussed most recently in
>>> this thread:
>>> https://lore.kernel.org/linux-mm/ZIOEDTUBrBg6tepk@casper.infradead.org/T/
>>
>> Thank you David for putting this session together!  And, thanks to everyone
>> who participated.
>>
>> Following up on linux-mm with most active participants on Cc (sorry if I
>> missed someone).   If it makes more sense to continue the above thread,
>> please move there.
>>
>> Even though everyone knows that hugetlb is special cased throughout the
>> core mm, it came to a head with the proposed introduction of HGM.  TBH,
>> few people in the core mm community paid much attention to HGM when first
>> introduced.  A LSF/MM session was then dedicated to the discussion of
>> HGM with the outcome being the suggestion to create a new filesystem/driver
>> (hugetlb2 if you will) that would satisfy the use cases requiring HGM.
>> One thing that was not emphasized at LSF/MM is that there are existing
>> hugetlb users experiencing major issues that could be addressed with HGM:
>> specifically the issues of memory errors and live migration.  That was
>> the starting point for recent discussion in the above thread.
>>
>> I may be wrong, but it appeared the direction of that thread was to
>> first try and unify some of the hugetlb and core mm code.  Eliminate
>> some of the special casing.  If hugetlb was less of a special case, then
>> perhaps HGM would be more acceptable.  That is the impression I (perhaps
>> incorrectly) had going into today's session.
> 
> My impression from the discussion yesterday was that the level of
> unification would need to be really large and time consuming in order to
> be useful for the HGM patchset to be in a more maintainable form. The
> final outcome is quite hard to predict at this stage.
>   
>> During today's session, we often discussed what would/could be introduced
>> in a hugetlb v2.  The idea is that this would be the ideal place for HGM.
>> However, people also made the comparisons to cgroup v1 - v2.  Such a
>> redesign provides the needed 'clean slate' to do things right, but it
>> does little for existing users who would be unwilling to quickly move off
>> existing hugetlb.
>>
>> We did spend a good chunk of time on hugetlb/core mm unification and
>> removing special casing.  In some (most) of these cases, the benefit of
>> removing special cases from core mm would result in adding more code to
>> hugetlb.  For example: proper type'ing so that hugetlb does not treat
>> all page table entries as PTEs.  Again, I may be wrong but I think
>> people were OK with adding more code (and even complexity) to hugetlb
>> if it eliminated special casing in the core mm.  But, there did not
>> seem to be a clear concensus especially with the thought that we may
>> need to double hugetlb code to get types right.
> 
> This is primarily your call as a maintainer. If you ask me, hugetlb is
> over complicated in its current form already. Regression are not really
> seldom when code is added which is a signal we are hitting maintenance
> cost walls. This doesn't mean further development is impossible of
> course but it is increasingly more costly AFAICS.
> 
>> Unless I missed something, there was no clear direction at the end of this
>> session.  I was hoping that we could come up with a plan to address the
>> issues facing today's hugetlb users.  IMO, there seems to be two options:
>> 1) Start work on hugetlb v2 with the intention that customers will need
>>     to move to this to address their issues.
>> 2) Incorporate functionality like HGM into existing hugetlb.
> 

I fully agree with all that Michal said.

I'm just going to add that I don't see why anyone would look into a 
hugetlbv2 if we're going to use the motivation of "help existing users" 
to make hugetlb ever-more complicated and special. "existing users" her 
even meaning "people use hugetlb for backing VMs. Now they want to get 
postcopy working with less latency." -- which I consider partially a new 
use case.

So working on adding HGM and concurrently starting a hugetlbv2? I don't 
think that will happen if we decide on adding HGM and proceeding with 
that reasoning about existing users.

As expressed yesterday, I don't see a fast an clean way to make hugetlb 
significantly less special (thanks Willy for the list of odd cases).

Sure, we can talk about adding pte_t safety, but I don't really see a 
way forward to unify page table walking code that way -- there are still 
the (PT) locking, PMD sharing, PTE-cont special cases ... but sure, if 
anybody wants to work on that, why not.

Having that said, like Michal, I acknowledge that it is Mikes call 
regarding the hugetlb code. I, for my part, will push back on any added 
core-mm complexity that adds more special casing for hugetlb. Maybe 
there are easy ways to integrate it nicely and that is not really a concern.

Note that while we've been discussing how HGM would already interfere 
with core-mm, we've not even started discussing how actual 
MADV_SPLIT/MADV_COLLAPSE/page poisioning ... would affect core-mm and 
require special-casing for hugetlb.

I, for my part, will explore a bit the mapcount topic (as time permits) 
and see if we can come up at least with a unified mapcount approach 
(e.g., sub-page mapcount?). But I suspect even figuring that out will 
take quite a while already ...

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2023-06-15  8:30 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <c5afdf35-a5fa-03e2-348d-cf1d990fc389@google.com>
     [not found] ` <20230614230458.GB3559@monkey>
2023-06-15  1:12   ` David Rientjes
2023-06-15  8:04   ` Michal Hocko
2023-06-15  8:29     ` David Hildenbrand [this message]
2023-06-15 17:24       ` James Houghton
2023-06-15 18:58         ` Peter Xu
2023-06-15 18:31       ` Mike Kravetz
2023-06-15 17:00     ` James Houghton
2023-06-15 17:18       ` Matthew Wilcox
2023-06-15 17:59         ` Mike Kravetz
2023-06-13  2:01 David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=141b7088-684b-32dc-efe4-03713d38ae28@redhat.com \
    --to=david@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=jthoughton@google.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=peterx@redhat.com \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox