linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Barry Song <21cnbao@gmail.com>
To: yuzhao@google.com
Cc: corbet@lwn.net, linux-mm@kvack.org, lsf-pc@lists.linux-foundation.org
Subject: Re: [Chapter One] THP zones: the use cases of policy zones
Date: Tue,  5 Mar 2024 21:41:16 +1300	[thread overview]
Message-ID: <20240305084116.25103-1-21cnbao@gmail.com> (raw)
In-Reply-To: <20240229183436.4110845-2-yuzhao@google.com>

> There are three types of zones:
> 1. The first four zones partition the physical address space of CPU
>    memory.
> 2. The device zone provides interoperability between CPU and device
>    memory.
> 3. The movable zone commonly represents a memory allocation policy.
> 
> Though originally designed for memory hot removal, the movable zone is
> instead widely used for other purposes, e.g., CMA and kdump kernel, on
> platforms that do not support hot removal, e.g., Android and ChromeOS.
> Nowadays, it is legitimately a zone independent of any physical
> characteristics. In spite of being somewhat regarded as a hack,
> largely due to the lack of a generic design concept for its true major
> use cases (on billions of client devices), the movable zone naturally
> resembles a policy (virtual) zone overlayed on the first four
> (physical) zones.
> 
> This proposal formally generalizes this concept as policy zones so
> that additional policies can be implemented and enforced by subsequent
> zones after the movable zone. An inherited requirement of policy zones
> (and the first four zones) is that subsequent zones must be able to
> fall back to previous zones and therefore must add new properties to
> the previous zones rather than remove existing ones from them. Also,
> all properties must be known at the allocation time, rather than the
> runtime, e.g., memory object size and mobility are valid properties
> but hotness and lifetime are not.
> 
> ZONE_MOVABLE becomes the first policy zone, followed by two new policy
> zones:
> 1. ZONE_NOSPLIT, which contains pages that are movable (inherited from
>    ZONE_MOVABLE) and restricted to a minimum order to be
>    anti-fragmentation. The latter means that they cannot be split down
>    below that order, while they are free or in use.
> 2. ZONE_NOMERGE, which contains pages that are movable and restricted
>    to an exact order. The latter means that not only is split
>    prohibited (inherited from ZONE_NOSPLIT) but also merge (see the
>    reason in Chapter Three), while they are free or in use.
> 
> Since these two zones only can serve THP allocations (__GFP_MOVABLE |
> __GFP_COMP), they are called THP zones. Reclaim works seamlessly and
> compaction is not needed for these two zones.
> 
> Compared with the hugeTLB pool approach, THP zones tap into core MM
> features including:
> 1. THP allocations can fall back to the lower zones, which can have
>    higher latency but still succeed.
> 2. THPs can be either shattered (see Chapter Two) if partially
>    unmapped or reclaimed if becoming cold.
> 3. THP orders can be much smaller than the PMD/PUD orders, e.g., 64KB
>    contiguous PTEs on arm64 [1], which are more suitable for client
>    workloads.
> 
> Policy zones can be dynamically resized by offlining pages in one of
> them and onlining those pages in another of them. Note that this is
> only done among policy zones, not between a policy zone and a physical
> zone, since resizing is a (software) policy, not a physical
> characteristic.
> 
> Implementing the same idea in the pageblock granularity has also been
> explored but rejected at Google. Pageblocks have a finer granularity
> and therefore can be more flexible than zones. The tradeoff is that
> this alternative implementation was more complex and failed to bring a
> better ROI. However, the rejection was mainly due to its inability to
> be smoothly extended to 1GB THPs [2], which is a planned use case of
> TAO.

We did implement similar idea in the pageblock granularity on OPPO's
phones by extending two special migratetypes[1]:

* QUAD_TO_TRIP - this is mainly for 4-order mTHP allocation which can use
ARM64's CONT-PTE; but can rarely be splitted into 3 order to dull the pain
of 3-order allocation if and only if 3-order allocation has failed in both
normal buddy and the below TRIP_TO_QUAD.
  
* TRIP_TO_QUAD - this is mainly for 4-order mTHP allocation which can use
ARM64's CONT-PTE; but can sometimes be splitted into 3 order to dull the
pain of 3-order allocation if and only if 3-order allocation has failed in
normal buddy.

neither of above will be merged into 5 order or above; neither of above
will be splitted into 2 order or lower.

in compaction, we will skip both of above. I am seeing one disadvantage
of this approach is that I have to add a separate LRU list in each
zone to place those mTHP folios. if mTHP and small folios are put
in the same LRU list, the reclamation efficiency is extremely bad.

A separate zone, on the other hand, can avoid a separate LRU list
for mTHP as the new zone has its own LRU list.

[1] https://github.com/OnePlusOSS/android_kernel_oneplus_sm8650/blob/oneplus/sm8650_u_14.0.0_oneplus12/mm/page_alloc.c

> 
> [1] https://lore.kernel.org/20240215103205.2607016-1-ryan.roberts@arm.com/
> [2] https://lore.kernel.org/20200928175428.4110504-1-zi.yan@sent.com/
> 
> Signed-off-by: Yu Zhao <yuzhao@google.com>

Thanks
Barry



  parent reply	other threads:[~2024-03-05  8:41 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-29 18:34 [LSF/MM/BPF TOPIC] TAO: THP Allocator Optimizations Yu Zhao
2024-02-29 18:34 ` [Chapter One] THP zones: the use cases of policy zones Yu Zhao
2024-02-29 20:28   ` Matthew Wilcox
2024-03-06  3:51     ` Yu Zhao
2024-03-06  4:33       ` Matthew Wilcox
2024-02-29 23:31   ` Yang Shi
2024-03-03  2:47     ` Yu Zhao
2024-03-04 15:19   ` Matthew Wilcox
2024-03-05 17:22     ` Matthew Wilcox
2024-03-05  8:41   ` Barry Song [this message]
2024-03-05 10:07     ` Vlastimil Babka
2024-03-05 21:04       ` Barry Song
2024-03-06  3:05         ` Yu Zhao
2024-05-24  8:38   ` Barry Song
2024-11-01  2:35   ` Charan Teja Kalla
2024-11-01 16:55     ` Yu Zhao
2024-02-29 18:34 ` [Chapter Two] THP shattering: the reverse of collapsing Yu Zhao
2024-02-29 21:55   ` Zi Yan
2024-03-03  1:17     ` Yu Zhao
2024-03-03  1:21       ` Zi Yan
2024-06-11  8:32   ` Barry Song
2024-02-29 18:34 ` [Chapter Three] THP HVO: bring the hugeTLB feature to THP Yu Zhao
2024-02-29 22:54   ` Yang Shi
2024-03-01 15:42     ` David Hildenbrand
2024-03-03  1:46     ` Yu Zhao
2024-02-29 18:34 ` [Epilogue] Profile-Guided Heap Optimization and THP fungibility Yu Zhao
2024-03-05  8:37 ` [LSF/MM/BPF TOPIC] TAO: THP Allocator Optimizations Barry Song
2024-03-06 15:51 ` Johannes Weiner
2024-03-06 16:40   ` Zi Yan
2024-03-13 22:09   ` Kaiyang Zhao
2024-05-15 21:17 ` Yu Zhao
2024-05-15 21:52   ` Yu Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240305084116.25103-1-21cnbao@gmail.com \
    --to=21cnbao@gmail.com \
    --cc=corbet@lwn.net \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox