Re: [RFC 0/6] mm: improve page allocator scalability via splitting zones

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Huang, Ying" <ying.huang@intel.com>
To: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>,
	 linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	 Arjan Van De Ven <arjan@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Mel Gorman <mgorman@techsingularity.net>,
	 Vlastimil Babka <vbabka@suse.cz>,
	Johannes Weiner <jweiner@redhat.com>,
	 Dave Hansen <dave.hansen@linux.intel.com>,
	 Pavel Tatashin <pasha.tatashin@soleen.com>,
	 Matthew Wilcox <willy@infradead.org>
Subject: Re: [RFC 0/6] mm: improve page allocator scalability via splitting zones
Date: Thu, 18 May 2023 16:06:43 +0800	[thread overview]
Message-ID: <875y8q83n0.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <eae68813-4240-4de1-6177-0a44e00bd04d@redhat.com> (David Hildenbrand's message of "Wed, 17 May 2023 10:09:31 +0200")

David Hildenbrand <david@redhat.com> writes:

>>> If we could avoid instantiating more zones and rather improve existing
>>> mechanisms (PCP), that would be much more preferred IMHO. I'm sure
>>> it's not easy, but that shouldn't stop us from trying ;)
>> I do think improving PCP or adding another level of cache will help
>> performance and scalability.
>> And, I think that it has value too to improve the performance of
>> zone
>> itself.  Because there will be always some cases that the zone lock
>> itself is contended.
>> That is, PCP and zone works at different level, and both deserve to
>> be
>> improved.  Do you agree?
>
> Spoiler: my humble opinion
>
>
> Well, the zone is kind-of your "global" memory provider, and PCPs
> cache a fraction of that to avoid exactly having to mess with that
> global datastructure and lock contention.
>
> One benefit I can see of such a "global" memory provider with caches
> on top is is that it is nicely integrated: for example, the concept of 
> memory pressure exists for the zone as a whole. All memory is of the
> same kind and managed in a single entity, but free memory is cached
> for performance.
>
> As soon as you manage the memory in multiple zones of the same kind,
> you lose that "global" view of your memory that is of the same kind,
> but managed in different bucks. You might end up with a lot of memory 
> pressure in a single such zone, but still have plenty in another zone.
>
> As one example, hot(un)plug of memory is easy: there is only a single
> zone. No need to make smart decisions or deal with having memory we're 
> hotunplugging be stranded in multiple zones.

I understand that there are some unresolved issues for splitting zone.
I will think more about them and the possible solutions.

>> 
>>> I did not look into the details of this proposal, but seeing the
>>> change in include/linux/page-flags-layout.h scares me.
>> It's possible for us to use 1 more bit in page->flags.  Do you think
>> that will cause severe issue?  Or you think some other stuff isn't
>> acceptable?
>
> The issue is, everybody wants to consume more bits in page->flags, so
> if we can get away without it that would be much better :)

Yes.

> The more bits you want to consume, the more people will ask for making
> this a compile-time option and eventually compile it out on distro 
> kernels (e.g., with many NUMA nodes). So we end up with more code and
> complexity and eventually not get the benefits where we really want
> them.

That's possible.  Although I think we will still use more page flags
when necessary.

>> 
>>> Further, I'm not so sure how that change really interacts with
>>> hot(un)plug of memory ... on a quick glimpse I feel like this series
>>> hacks the code such that such that the split works based on the boot
>>> memory size ...
>> Em..., the zone stuff is kind of static now.  It's hard to add a
>> zone at
>> run-time.  So, in this series, we determine the number of zones per zone
>> type based on boot memory size.  This may be improved in the future via
>> pre-allocate some empty zone instances during boot and hot-add some
>> memory to these zones.
>
> Just to give you some idea: with virtio-mem, hyper-v, daxctl, and
> upcoming cxl dynamic memory pooling (some day I'm sure ;) ) you might 
> see quite a small boot memory (e.g., 4 GiB) but a significant amount
> of memory getting hotplugged incrementally (e.g., up to 1 TiB) --
> well, and hotunplugged. With multiple zone instances you really have
> to be careful and might have to re-balance between the multiple zones
> to keep the scalability, to not create imbalances between the zones
> ...

Thanks for your information!

> Something like PCP auto-tuning would be able to handle that mostly
> automatically, as there is only a single memory pool.

I agree that optimizing PCP will help performance regardless of
splitting zone or not.

>> 
>>> I agree with Michal that looking into auto-tuning PCP would be
>>> preferred. If that can't be done, adding another layer might end up
>>> cleaner and eventually cover more use cases.
>> I do agree that it's valuable to make PCP etc. cover more use cases.
>> I
>> just think that this should not prevent us from optimizing zone itself
>> to cover remaining use cases.
>
> I really don't like the concept of replicating zones of the same kind
> for the same NUMA node. But that's just my personal opinion
> maintaining some memory hot(un)plug code :)
>
> Having that said, some kind of a sub-zone concept (additional layer)
> as outlined by Michal IIUC, for example, indexed by core
> id/has/whatsoever could eventually be worth exploring. Yes, such a
> design raises various questions ... :)

Yes.  That's another possible solution for the page allocation
scalability problem.

Best Regards,
Huang, Ying

next prev parent reply	other threads:[~2023-05-18  8:07 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-11  6:56 Huang Ying
2023-05-11  6:56 ` [RFC 1/6] mm: distinguish zone type and zone instance explicitly Huang Ying
2023-05-11  6:56 ` [RFC 2/6] mm: add struct zone_type_struct to describe zone type Huang Ying
2023-05-11  6:56 ` [RFC 3/6] mm: support multiple zone instances per zone type in memory online Huang Ying
2023-05-11  6:56 ` [RFC 4/6] mm: avoid show invalid zone in /proc/zoneinfo Huang Ying
2023-05-11  6:56 ` [RFC 5/6] mm: create multiple zone instances for one zone type based on memory size Huang Ying
2023-05-11  6:56 ` [RFC 6/6] mm: prefer different zone list on different logical CPU Huang Ying
2023-05-11 10:30 ` [RFC 0/6] mm: improve page allocator scalability via splitting zones Jonathan Cameron
2023-05-11 13:07   ` Arjan van de Ven
2023-05-11 14:23 ` Dave Hansen
2023-05-12  3:08   ` Huang, Ying
2023-05-11 15:05 ` Michal Hocko
2023-05-12  2:55   ` Huang, Ying
2023-05-15 11:14     ` Michal Hocko
2023-05-16  9:38       ` Huang, Ying
2023-05-16 10:30         ` David Hildenbrand
2023-05-17  1:34           ` Huang, Ying
2023-05-17  8:09             ` David Hildenbrand
2023-05-18  8:06               ` Huang, Ying [this message]
2023-05-24 12:30           ` Michal Hocko
2023-05-29  1:13             ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875y8q83n0.fsf@yhuang6-desk2.ccr.corp.intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@linux.intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=jweiner@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox