From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04EAEC48BDF for ; Fri, 18 Jun 2021 21:07:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9CB8B613B4 for ; Fri, 18 Jun 2021 21:07:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9CB8B613B4 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D7B116B006C; Fri, 18 Jun 2021 17:07:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D52176B006E; Fri, 18 Jun 2021 17:07:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C1ACA6B0072; Fri, 18 Jun 2021 17:07:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0235.hostedemail.com [216.40.44.235]) by kanga.kvack.org (Postfix) with ESMTP id 926976B006C for ; Fri, 18 Jun 2021 17:07:12 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 221A611009 for ; Fri, 18 Jun 2021 21:07:12 +0000 (UTC) X-FDA: 78268079904.29.2C3BA0C Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by imf18.hostedemail.com (Postfix) with ESMTP id CA85B2001103 for ; Fri, 18 Jun 2021 21:07:11 +0000 (UTC) Received: by mail-pl1-f182.google.com with SMTP id m17so2152692plx.7 for ; Fri, 18 Jun 2021 14:07:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=P8DjU1vSa6lkxTzpIhszFkpnGiV+h90PladaxYGvwH0=; b=C1KWg/3R9zfJ46kEfYt7tCoQ1qLO5KlNZnsAoLsVlJW9JotRCufsQFv3PGxkZj5hpo Ssji247255Hrfv7i34XOyXBjVUuc9/laku3GzRFG9bMmbhoMOAQgj4PLkP0PDWRPoPCN Ck1GsO5m1toDh6sAu0H4juBvxBHzzYcZtqclAuKaSZrq3jp2pvhvtiuRegx0TQFPMVfb JaAow1qHuW0EcEWOOxmuYO8nTRMYsL6wNKeZ/qOEnFGk26SSkGPIZU3hnnqjViZsMJtn +Mw9cUE+E7wuu9l+QoRbeZVzDXLsmG2plTP0JbcXovoh7WISgHeGvEXrg1M5CklKoqg6 ml1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=P8DjU1vSa6lkxTzpIhszFkpnGiV+h90PladaxYGvwH0=; b=gMB5sLP4PbIhu0ZFvRmAzILmQ8Y8TpyVN40PzzJSmrIGEavpUhaS8u63tXYOiC7oDT +MgmQFguOZm8VQb0UqIZrA5Oh07q1AgH8ttEMTZTHC7wT2RjAOuVaI0mnHYTh4wCLETy Eek5l2IzAlJvim0QGz+SQ3UADIPi7I4fKH5lyB6t/4xIC9k9iWegqEbKkEdGxjC8Z5wf Qm7XJf7f7+2VRZsdWF+yNblTvcpYHAokeBrvAqQih+0IdWZkMHokGHfPc+NR/kr3FBbK R1trt7MQFtd3UzF0WZ2ziUNVogGqtYF+R7Aa5LJ9zinE7ZJe+xlr0ZtBTUORV5yVeIL+ 9e9g== X-Gm-Message-State: AOAM5309i/vrqImWL9Et8BcBxY8DZGRqR52ERwBshfJsdyW5Y6LfrPv7 Wan2mnUmR5t2qWyq4j8b3u0YwQ== X-Google-Smtp-Source: ABdhPJxml+0Hbc3zL6c5pEGHb9shgLBMOjPxEjaMM915mmK721RMQisM90c34ymukBHvRUsKjwLXsA== X-Received: by 2002:a17:90a:4491:: with SMTP id t17mr9826451pjg.30.1624050430485; Fri, 18 Jun 2021 14:07:10 -0700 (PDT) Received: from [2620:15c:17:3:3a6:a5d0:1984:a150] ([2620:15c:17:3:3a6:a5d0:1984:a150]) by smtp.gmail.com with ESMTPSA id c18sm9279358pgf.66.2021.06.18.14.07.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Jun 2021 14:07:09 -0700 (PDT) Date: Fri, 18 Jun 2021 14:07:08 -0700 (PDT) From: David Rientjes To: Wei Xu cc: lsf-pc@lists.linux-foundation.org, Linux MM , Dan Williams , Dave Hansen , Tim Chen , Greg Thelen , Paul Turner , Shakeel Butt Subject: Re: [LSF/MM/BPF TOPIC] Userspace managed memory tiering In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b="C1KWg/3R"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf18.hostedemail.com: domain of rientjes@google.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=rientjes@google.com X-Rspamd-Server: rspam02 X-Stat-Signature: 9zkjkbbdu3iabwuougx4g9hz6imz1eq7 X-Rspamd-Queue-Id: CA85B2001103 X-HE-Tag: 1624050431-439400 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, 18 Jun 2021, Wei Xu wrote: > In this proposal, I'd like to discuss userspace-managed memory tiering > and the kernel support that it needs. > Thanks Wei. Yes, this would be very useful to discuss at LSFMMBPF. It would also be very helpful to hear from other interested parties here on the mailing list ahead of time. It would be great to know the motivations and priorities of others interested in memory tiering for the use cases that Wei enumerated so that we can do some early brainstorming. Thanks! > New memory technologies and interconnect standard make it possible to > have memory with different performance and cost on the same machine > (e.g. DRAM + PMEM, DRAM + cost-optimized memory attached via CXL.mem). > We can expect heterogeneous memory systems that have performance > implications far beyond classical NUMA to become increasingly common > in the future. One of important use cases of such tiered memory > systems is to improve the data center and cloud efficiency with > better performance/TCO. > > Because different classes of applications (e.g. latency sensitive vs > latency tolerant, high priority vs low priority) have different > requirements, richer and more flexible memory tiering policies will > be needed to achieve the desired performance target on a tiered > memory system, which would be more effectively managed by a userspace > agent, not by the kernel. Moreover, we (Google) are explicitly trying > to avoid adding a ton of heuristics to enlighten the kernel about the > policy that we want on multi-tenant machines when the userspace offers > more flexibility. > > To manage memory tiering in userspace, we need the kernel support in > the three key areas: > > - resource abstraction and control of tiered memory; > - API to monitor page accesses for making memory tiering decisions; > - API to migrate pages (demotion/promotion). > > Userspace memory tiering can work on just NUMA memory nodes, provided > that memory resources from different tiers are abstracted into > separate NUMA nodes. The userspace agent can create a tiering > topology among these nodes based on their distances. > > An explicit memory tiering abstraction in the kernel is preferred, > though, because it can not only allow the kernel to react in cases > where it is challenging for userspace (e.g. reclaim-based demotion > when the system is under DRAM pressure due to usage surge), but also > enable tiering controls such as per-cgroup memory tier limits. > This requirement is mostly aligned with the existing proposals [1] > and [2]. > > The userspace agent manages all migratable user memory on the system > and this can be transparent from the point of view of applications. > To demote cold pages and promote hot pages, the userspace agent needs > page access information. Because it is a system-wide tiering for user > memory, the access information for both mapped and unmapped user pages > is needed, and so are the physical page addresses. A combination of > page table accessed-bit scanning and struct page scanning should be > needed. Such page access monitoring should be efficient as well > because the scans can be frequent. To return the page-level access > information to the userspace, one proposal is to use tracepoint > events. The userspace agent can then use BPF programs to collect such > data and also apply customized filters when necessary. > > The userspace agent can also make use of hardware PMU events, for > which the existing kernel support should be sufficient. > > The third area is the API support for migrating pages. The existing > move_pages() syscall can be a candidate, though it is virtual-address > based and cannot migrate unmapped pages. Is a physical-address based > variant (e.g. move_pfns()), an acceptable proposal? > > [1] https://lore.kernel.org/lkml/9cd0dcde-f257-1b94-17d0-f2e24a3ce979@intel.com/ > [2] https://lore.kernel.org/patchwork/cover/1408180/ > > Thanks, > Wei >