linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yang Shi <shy828301@gmail.com>
To: Wei Xu <weixugc@google.com>
Cc: lsf-pc@lists.linux-foundation.org, Linux MM <linux-mm@kvack.org>,
	 Dan Williams <dan.j.williams@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	 Tim Chen <tim.c.chen@linux.intel.com>,
	David Rientjes <rientjes@google.com>,
	 Greg Thelen <gthelen@google.com>, Paul Turner <pjt@google.com>,
	Shakeel Butt <shakeelb@google.com>,
	 ying.huang@intel.com
Subject: Re: [LSF/MM/BPF TOPIC] Userspace managed memory tiering
Date: Mon, 21 Jun 2021 11:58:12 -0700	[thread overview]
Message-ID: <CAHbLzkqmG5Ljk+k=PEm0pdHz4UewUun8Omps_Rt5aTCERYrv5w@mail.gmail.com> (raw)
In-Reply-To: <CAAPL-u8Xz=BkTzgyf1o4yh3T2usD=yRfBOUWdLez2AAqooox3A@mail.gmail.com>

On Fri, Jun 18, 2021 at 10:50 AM Wei Xu <weixugc@google.com> wrote:
>
> In this proposal, I'd like to discuss userspace-managed memory tiering
> and the kernel support that it needs.
>
> New memory technologies and interconnect standard make it possible to
> have memory with different performance and cost on the same machine
> (e.g. DRAM + PMEM, DRAM + cost-optimized memory attached via CXL.mem).
> We can expect heterogeneous memory systems that have performance
> implications far beyond classical NUMA to become increasingly common
> in the future.  One of important use cases of such tiered memory
> systems is to improve the data center and cloud efficiency with
> better performance/TCO.
>
> Because different classes of applications (e.g. latency sensitive vs
> latency tolerant, high priority vs low priority) have different
> requirements, richer and more flexible memory tiering policies will
> be needed to achieve the desired performance target on a tiered
> memory system, which would be more effectively managed by a userspace
> agent, not by the kernel.  Moreover, we (Google) are explicitly trying
> to avoid adding a ton of heuristics to enlighten the kernel about the
> policy that we want on multi-tenant machines when the userspace offers
> more flexibility.
>
> To manage memory tiering in userspace, we need the kernel support in
> the three key areas:
>
> - resource abstraction and control of tiered memory;
> - API to monitor page accesses for making memory tiering decisions;
> - API to migrate pages (demotion/promotion).
>
> Userspace memory tiering can work on just NUMA memory nodes, provided
> that memory resources from different tiers are abstracted into
> separate NUMA nodes.  The userspace agent can create a tiering
> topology among these nodes based on their distances.
>
> An explicit memory tiering abstraction in the kernel is preferred,
> though, because it can not only allow the kernel to react in cases
> where it is challenging for userspace (e.g. reclaim-based demotion
> when the system is under DRAM pressure due to usage surge), but also
> enable tiering controls such as per-cgroup memory tier limits.
> This requirement is mostly aligned with the existing proposals [1]
> and [2].
>
> The userspace agent manages all migratable user memory on the system
> and this can be transparent from the point of view of applications.
> To demote cold pages and promote hot pages, the userspace agent needs
> page access information.  Because it is a system-wide tiering for user
> memory, the access information for both mapped and unmapped user pages
> is needed, and so are the physical page addresses.  A combination of
> page table accessed-bit scanning and struct page scanning should be
> needed.  Such page access monitoring should be efficient as well
> because the scans can be frequent. To return the page-level access
> information to the userspace, one proposal is to use tracepoint
> events. The userspace agent can then use BPF programs to collect such
> data and also apply customized filters when necessary.

Just FYI. There has been a project for userspace daemon. Please refer
to https://github.com/fengguang/memory-optimizer

We (Alibaba, when I was there) did some preliminary tests and
benchmarks with it. The accuracy was pretty good, but the cost was
relatively high. I agree with you that efficiency is the key. BPF may
be a good approach to improve the cost.

I'm not sure what the current status of this project is. You may reach
Huang Ying to get more information.

>
> The userspace agent can also make use of hardware PMU events, for
> which the existing kernel support should be sufficient.
>
> The third area is the API support for migrating pages. The existing
> move_pages() syscall can be a candidate, though it is virtual-address
> based and cannot migrate unmapped pages.  Is a physical-address based
> variant (e.g. move_pfns()), an acceptable proposal?
>
> [1] https://lore.kernel.org/lkml/9cd0dcde-f257-1b94-17d0-f2e24a3ce979@intel.com/
> [2] https://lore.kernel.org/patchwork/cover/1408180/
>
> Thanks,
> Wei
>


  parent reply	other threads:[~2021-06-21 18:58 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-18 17:50 Wei Xu
2021-06-18 19:13 ` Zi Yan
2021-06-18 19:23   ` Wei Xu
2021-06-18 21:07 ` David Rientjes
2021-06-19 23:43   ` Jason Gunthorpe
2021-06-21 18:58 ` Yang Shi [this message]
2021-06-22  3:00   ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHbLzkqmG5Ljk+k=PEm0pdHz4UewUun8Omps_Rt5aTCERYrv5w@mail.gmail.com' \
    --to=shy828301@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=gthelen@google.com \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=pjt@google.com \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=weixugc@google.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox