linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Zi Yan <ziy@nvidia.com>
To: Kaiyang Zhao <kaiyang2@cs.cmu.edu>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	hannes@cmpxchg.org, dskarlat@cs.cmu.edu
Subject: Re: [RFC PATCH 0/7] mm: providing ample physical memory contiguity by confining unmovable allocations
Date: Tue, 19 Mar 2024 22:47:50 -0400	[thread overview]
Message-ID: <9D567EE7-45F3-4252-974F-6AF90CE6D635@nvidia.com> (raw)
In-Reply-To: <20240320024218.203491-1-kaiyang2@cs.cmu.edu>

[-- Attachment #1: Type: text/plain, Size: 3490 bytes --]

On 19 Mar 2024, at 22:42, kaiyang2@cs.cmu.edu wrote:

> From: Kaiyang Zhao <kaiyang2@cs.cmu.edu>
>
> Memory capacity has increased dramatically over the last decades.
> Meanwhile, TLB capacity has stagnated, causing a significant virtual
> address translation overhead. As a collaboration between Carnegie Mellon
> University and Meta, we investigated the issue at Meta’s datacenters and
> found that about 20% of CPU cycles are spent doing page walks [1], and
> similar results are also reported by Google [2].
>
> To tackle the overhead, we need widespread uses of huge pages. And huge
> pages, when they can actually be created, work wonders: they provide up
> to 18% higher performance for Meta’s production workloads in our
> experiments [1].
>
> However, we observed that huge pages through THP are unreliable because
> sufficient physical contiguity may not exist and compaction to recover
> from memory fragmentation frequently fails. To ensure workloads get a
> reasonable number of huge pages, Meta could not rely on THP and had to
> use reserved huge pages. Proposals to add 1GB THP support [5] are even
> more dependent on ample availability of physical contiguity.
>
> A major reason for the lack of physical contiguity is the mixing of
> unmovable and movable allocations, causing compaction to fail. Quoting
> from [3], “in a broad sample of Meta servers, we find that unmovable
> allocations make up less than 7% of total memory on average, yet occupy
> 34% of the 2M blocks in the system. We also found that this effect isn't
> correlated with high uptimes, and that servers can get heavily
> fragmented within the first hour of running a workload.”
>
> Our proposed solution is to confine the unmovable allocations to a
> separate region in physical memory. We experimented with using a CMA
> region for the movable allocations, but in this version we use
> ZONE_MOVABLE for movable and all other zones for unmovable allocations.
> Movable allocations can temporarily reside in the unmovable zones, but
> will be proactively moved out by compaction.
>
> To resize ZONE_MOVABLE, we still rely on memory hotplug interfaces. We
> export the number of pages scanned on behalf of movable or unmovable
> allocations during reclaim to approximate the memory pressure in two
> parts of physical memory, and a userspace tool can monitor the metrics
> and make resizing decisions. Previously we augmented the PSI interface
> to break down memory pressure into movable and unmovable allocation
> types, but that approach enlarges the scheduler cacheline footprint.
> From our preliminary observations, just looking at the per-allocation
> type scanned counters and with a little tuning, it is sufficient to tell
> if there is not enough memory for unmovable allocations and make
> resizing decisions.
>
> This patch extends the idea of migratetype isolation at pageblock
> granularity posted earlier [3] by Johannes Weiner to an
> as-large-as-needed region to better support huge pages of bigger sizes
> and hardware TLB coalescing. We’re looking for feedback on the overall
> direction, particularly in relation to the recent THP allocator
> optimization proposal [4].
>
> The patches are based on 6.4 and are also available on github at
> https://github.com/magickaiyang/kernel-contiguous/tree/per_alloc_type_reclaim_counters_oct052023

Your reference links (1 to 4) are missing.

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

  parent reply	other threads:[~2024-03-20  2:48 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-20  2:42 kaiyang2
2024-03-20  2:42 ` [RFC PATCH 1/7] sysfs interface for the boundary of movable zone kaiyang2
2024-03-20  2:42 ` [RFC PATCH 2/7] Disallows high-order movable allocations in other zones if ZONE_MOVABLE is populated kaiyang2
2024-03-20  2:42 ` [RFC PATCH 3/7] compaction accepts a destination zone kaiyang2
2024-03-20  2:42 ` [RFC PATCH 4/7] vmstat counter for pages migrated across zones kaiyang2
2024-03-20  2:42 ` [RFC PATCH 5/7] proactively move pages out of unmovable zones in kcompactd kaiyang2
2024-03-20  2:42 ` [RFC PATCH 6/7] pass gfp mask of the allocation that waked kswapd to track number of pages scanned on behalf of each alloc type kaiyang2
2024-03-20  2:42 ` [RFC PATCH 7/7] exports the number of pages scanned on behalf of movable/unmovable allocations kaiyang2
2024-03-20  2:47 ` Zi Yan [this message]
2024-03-20  2:57 ` [RFC PATCH 0/7] mm: providing ample physical memory contiguity by confining unmovable allocations kaiyang2

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9D567EE7-45F3-4252-974F-6AF90CE6D635@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=dskarlat@cs.cmu.edu \
    --cc=hannes@cmpxchg.org \
    --cc=kaiyang2@cs.cmu.edu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox