From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 397D3C0218C for ; Sun, 26 Jan 2025 02:28:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6E10D6B00BC; Sat, 25 Jan 2025 21:28:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 669766B00BD; Sat, 25 Jan 2025 21:28:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 509F02800A6; Sat, 25 Jan 2025 21:28:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 309CD6B00BC for ; Sat, 25 Jan 2025 21:28:10 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 9E79C140F04 for ; Sun, 26 Jan 2025 02:28:09 +0000 (UTC) X-FDA: 83048018298.29.D848E26 Received: from out30-101.freemail.mail.aliyun.com (out30-101.freemail.mail.aliyun.com [115.124.30.101]) by imf04.hostedemail.com (Postfix) with ESMTP id 8C39B40008 for ; Sun, 26 Jan 2025 02:28:06 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=RCPE6PXf; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf04.hostedemail.com: domain of ying.huang@linux.alibaba.com designates 115.124.30.101 as permitted sender) smtp.mailfrom=ying.huang@linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737858488; a=rsa-sha256; cv=none; b=ilAi621LBqEGdQOl+TZ6mTBj3hsikMwJu9z8KNGHL/b9McW17UVkF/tpXNZzTLIszhzaYI 3OQDwWaJlBYcu3MxsagRofIypdinrIwENiF73SdI8bjpESRa8ct1doasRkQdJN4/zMyXfP oeOTBa9BDMXNTNCP6kZDdk4MxhjAnPY= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=RCPE6PXf; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf04.hostedemail.com: domain of ying.huang@linux.alibaba.com designates 115.124.30.101 as permitted sender) smtp.mailfrom=ying.huang@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737858488; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yJARYSWlOupVxuoaYY576KiiQlNDmztq5lfwbUGtEkI=; b=Qd9zBWTUF9D1YU/deoJkR3fWc9/hKOYeIpmqTAC04o2qhoLdKERP2Hf4WJIGjeez+8iMEy qCM6czL8GA96uHD0ZVqZP8s+5Bb76Hh+Ng3DT7kb+J9bi2Dgx2nu+4cQEhSZQEh/XlR0j5 LAsiIJPi4E6f6lS5YEM75tjPrBwv8YI= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1737858482; h=From:To:Subject:Date:Message-ID:MIME-Version:Content-Type; bh=yJARYSWlOupVxuoaYY576KiiQlNDmztq5lfwbUGtEkI=; b=RCPE6PXf15uuccEOoYWj1BaBqj83Sy9driKThb9U9KXUCcl9wvTb8YwVqlYajDb07fHKV9aEbzkjTv8MgEl53ZvrRDLVNzE5uW2UBd2GWJq5wLyVvNg3hVlKsFm5tqHNdeAs7tykq/6jYRn+LSd0oOeBlr8pCNiabuOaAzxLgio= Received: from DESKTOP-5N7EMDA(mailfrom:ying.huang@linux.alibaba.com fp:SMTPD_---0WOI6EsM_1737858474 cluster:ay36) by smtp.aliyun-inc.com; Sun, 26 Jan 2025 10:28:00 +0800 From: "Huang, Ying" To: Raghavendra K T Cc: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: Re: [LSF/MM/BPF TOPIC] Overhauling hot page detection and promotion based on PTE A bit scanning In-Reply-To: <20250123105721.424117-1-raghavendra.kt@amd.com> (Raghavendra K. T.'s message of "Thu, 23 Jan 2025 10:57:21 +0000") References: <20250123105721.424117-1-raghavendra.kt@amd.com> Date: Sun, 26 Jan 2025 10:27:55 +0800 Message-ID: <87o6zub978.fsf@DESKTOP-5N7EMDA> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 8C39B40008 X-Stat-Signature: os7yt9z48gdew7ke5s45i3porjneihrs X-Rspam-User: X-HE-Tag: 1737858486-217673 X-HE-Meta: U2FsdGVkX1+KrhQoV+nbBfxXGjkeYMq/FI8QkKahjQPR9FF+dPR6kDe7EN9IxhOEjXjLsbEvqOXiG1KWOJhXMi1xseJVua2Xg9U2Tztecrwd0KdEc3DDo6+O/gBaSvdu8r79Lmi8m9OWQnOcoZO9EnJCJ0BH+PaKzT6bgvLTkYo3BH4wxd7dpk0o633isIAYJj7oQEVSDRn+uaAsR33n6ht6hNYmSJV5UKYFzOgRUlA1hBChtgNzkns4jr7EcYtROW7gvH2KvVCArFVjca9mTUbgOMraPvhzHNcBVJac+AmGS9Hv372dXOfLAPNs3cwFyhdRW2KjN98hWqxzSgJDN5gLz3JX7I3JSf1xl0uo8LlFcKGE5z08obJWtaRkb3B6AlhjBsywdHk8Lo9gHizF7aM43FWm/XMjKQY4KfeOd2v/CRAUj0mVLs6L0/BN0GlpsUUh5T5cc5FStF41R6/ZqKuihA/zaZuVsvQDyoI7uw6REdhBMFupLYzUmy84N+B+igVXhVu36YnhYuRhP+Sd/Z8d4PGEBdwVrzHdMAyiyec0HsNp7Sz3mnKm0d/zDk+E72Co37g//A/5RCA9KKAl7lMxgntYsdAYwbngvqRFd2QY9MNFQ/AR0+u5cGsf56cRMG9gi2UXvb6gu1lu6gwGumA77zzsRfbmpvJFrLp6hJmqNx4HAzXwxyiG1/o1/hZ5IvDq7IXa2kaUAJVimwVbPvZTdqGslbr9o/C4UM8V9bxgRj8o7viwn/csfce2cyuRoUPciiP/ofNw7V9fb8rFcMRqL4v4DVdVuOR97/HzTSWr+Z2MvTPiuaq4gpjDeJMXU8J4jHI8BeUZRgS6Ukoky+0xlO8cCpFS+cVKCs+u9HOrRybgjOfn4qMg/1Z8YRiOhKsMSROcLBFBQZHav4yTmQ/jsOl/VCTigm0v8RTM8L5X4eElwn2FLbG7adsuhue2r6Ck6s8RNHiQxfhFyZG DONPkwXy Fm4OPYJg+fGfHW5Nf7G+iyVT24E0wZcyv1KiWwIY++cnegUB2UH+60h8JQANm5vxjYTtR+OF5fxEorLuYzsPeWzuJYbbtlFXkYyClP/33IZMouHI4q2HN3Z6XLnpMIbb3ZjIOsyz4v14+RWT06WzcLB4jdHwtYrKLPGiMijPHqJ/+zpQiIUmMPP6Si57Ql7yy7G83VqT6Wy/6QhHaSPh8uDdiBqRU7JqYc9Lo4f6vvYg2NMukcsTSyvuv6TJFIlmX6Zv2u5v97wc6Q+QW8Csf46kOgKZbn0aPFXZEMmDl96GpDTakLfMOxBj7bnsj0RcBStPi/OvEjnRW/DJUkNPvLiErS+uqMSrmnJYzEo9EdGn99N666FdXdtTAmNUCovUS8ib8rf7fo0VLnxA8zZpJbYW1jS/IaZOZ68oyR0z11wnnI5FMdA4B27tVew== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000021, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi, Raghavendra, Raghavendra K T writes: > Bharata and I would like to propose the following topic for LSFMM. > > Topic: Overhauling hot page detection and promotion based on PTE A bit scanning. > > In the Linux kernel, hot page information can potentially be obtained from > multiple sources: > > a. PROT_NONE faults (NUMA balancing) > b. PTE Access bit (LRU scanning) > c. Hardware provided page hotness info (like AMD IBS) > > This information is further used to migrate (or promote) pages from slow memory > tier to top tier to increase performance. > > In the current hot page promotion mechanism, all the activities including the > process address space scanning, NUMA hint fault handling and page migration are > performed in the process context. i.e., scanning overhead is borne by the > applications. > > I had recently posted a patch [1] to improve this in the context of slow-tier > page promotion. Here, Scanning is done by a global kernel thread which routinely > scans all the processes' address spaces and checks for accesses by reading the > PTE A bit. The hot pages thus identified are maintained in list and subsequently > are promoted to a default top-tier node. Thus, the approach pushes overhead of > scanning, NUMA hint faults and migrations off from process context. This has been discussed before too. For example, in the following thread https://lore.kernel.org/all/20200417100633.GU20730@hirez.programming.kicks-ass.net/T/ The drawbacks of asynchronous scanning including - The CPU cycles used are not charged properly - There may be no idle CPU cycles to use - The scanning CPU may be not near the workload CPUs enough It's better to involve Mel and Peter in the discussion for this. > The topic was presented in the MM alignment session hosted by David Rientjes [2]. > The topic also finds a mention in S J Park's LSFMM proposal [3]. > > Here is the list of potential discussion points: > 1. Other improvements and enhancements to PTE A bit scanning approach. Use of > multiple kernel threads, throttling improvements, promotion policies, per-process > opt-in via prctl, virtual vs physical address based scanning, tuning hot page > detection algorithm etc. One drawback of physical address based scanning is that it's hard to apply some workload specific policy. For example, if a low priority workload has many relatively hot pages, while a high priority workload has many relative warm (not so hot) pages. We need to promote the warm pages in the high priority workload, while physcial address based scanning may report the hot pages in the low priority workload. Right? > 2. Possibility of maintaining single source of truth for page hotness that would > maintain hot page information from multiple sources and let other sub-systems > use that info. > > 3. Discuss how hardware provided hotness info (like AMD IBS) can further aid > promotion. Bharata had posted an RFC [4] on this a while back. > > 4. Overlap with DAMON and potential reuse. > > Links: > > [1] https://lore.kernel.org/all/20241201153818.2633616-1-raghavendra.kt@amd.com/ > [2] https://lore.kernel.org/linux-mm/20241226012833.rmmbkws4wdhzdht6@ed.ac.uk/T/ > [3] https://lore.kernel.org/lkml/Z4XUoWlU-UgRik18@gourry-fedora-PF4VCD3F/T/ > [4] https://lore.kernel.org/lkml/20230208073533.715-2-bharata@amd.com/ > --- Best Regards, Huang, Ying