From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82E7AC02196 for ; Fri, 7 Feb 2025 19:08:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DE0D6280004; Fri, 7 Feb 2025 14:08:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D69B4280001; Fri, 7 Feb 2025 14:08:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BBC12280004; Fri, 7 Feb 2025 14:08:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 953A5280001 for ; Fri, 7 Feb 2025 14:08:05 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 206F181F2C for ; Fri, 7 Feb 2025 19:08:05 +0000 (UTC) X-FDA: 83094083730.29.4440DD0 Received: from iguana.tulip.relay.mailchannels.net (iguana.tulip.relay.mailchannels.net [23.83.218.253]) by imf13.hostedemail.com (Postfix) with ESMTP id 9320C20008 for ; Fri, 7 Feb 2025 19:08:02 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=stgolabs.net header.s=dreamhost header.b=ZmoBmPJr; spf=pass (imf13.hostedemail.com: domain of dave@stgolabs.net designates 23.83.218.253 as permitted sender) smtp.mailfrom=dave@stgolabs.net; dmarc=none; arc=pass ("mailchannels.net:s=arc-2022:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738955282; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=y9ZVzTIynyCcvuECIWXnJaYZt7TfAR7IJxQ+MS6iWgs=; b=NIoD1GJPmrnRAizxjXcg2ZcEky4Mqa72ONGvXXbsd2NFzMStKa+E7aNTK3q0aqk594utiK jNOiwlvICa5dMGjMivLI4LYJK5K7KkzpdLKB7KrPQeZpwrXfrZXbsn2xU0+V5Gwkh9giAU 2rgX372gxm6PTmp9ZhP5I76tVM5tbXA= ARC-Authentication-Results: i=2; imf13.hostedemail.com; dkim=pass header.d=stgolabs.net header.s=dreamhost header.b=ZmoBmPJr; spf=pass (imf13.hostedemail.com: domain of dave@stgolabs.net designates 23.83.218.253 as permitted sender) smtp.mailfrom=dave@stgolabs.net; dmarc=none; arc=pass ("mailchannels.net:s=arc-2022:i=1") ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1738955282; a=rsa-sha256; cv=pass; b=VnFfIZPgmwP5pXhSZYc/Nfq8ihw3wjkPFDrsv96KVWH4MkjrV4DycHz/t2jMVrNVl4Cvm2 vTIPew4WfoawIbmLHfLvKjkK5jIxa4Xfi+3AL9I75tk1Bnq2Uvmze2adJWFW2ChE49oy87 v5z07Bheo8j/50ZsRyXG/RLR6MaW9qs= X-Sender-Id: dreamhost|x-authsender|dave@stgolabs.net Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id A8A7C1A4B5A; Fri, 7 Feb 2025 19:06:57 +0000 (UTC) Received: from pdx1-sub0-mail-a311.dreamhost.com (100-126-221-226.trex-nlb.outbound.svc.cluster.local [100.126.221.226]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id B1B081A47BD; Fri, 7 Feb 2025 19:06:56 +0000 (UTC) ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1738955217; a=rsa-sha256; cv=none; b=A5oFwym0jFAjufwIThzNbtzS+inIi7thY853ikaqVdrd6tGGm9Q1hA2z0DOQkuKz3pPfHm X7rPHWx1K4FFCgZdqUIV+w74SzFPQ785c0g6nCV/ypu3+Fy3JfetieFSPSEELVxHuK5rmt fA8XIaPdYzsNsfDoLYsmJ8o1M8s1pBG8sjs3SQqBx3Iqn9U1KPTxe/mFy0Uco8ObtKuMNp sCQ9+3Yvof8kPt/M38xcyF/yqgbg8kRCMUpr3eNmZWDJXYQ/p22NFlnfmhYcI5PaASTzcu kajfc7mBcZP1ks/I8CZsL1XvPYH8P/siC7jraUcXOiQ9y7ZjdwFcGQA0upV+Ag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1738955217; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=y9ZVzTIynyCcvuECIWXnJaYZt7TfAR7IJxQ+MS6iWgs=; b=OqrnNmKXebMJRj3YP94DADgc7xkzZomWlQNjTj4NCFZZOls0kWbF1WGnvi+AZm7iQeGqHW GWhOY2PQiUIWS/wISdiJJ5cYRNKfAqRQerEnn6ZvK0oL5hd+ND6p6bwJDxRHVfiKH+fPIj qLRJFCDAPy74XBiUCCMp+V6VfLmO2EqAvMTh+I/KLwQnBbsW8gipoc+/0UiE/30pNdt3ZX 58+jKb0ZBrJFagGMOd5pf08ynSbrn+rqR+EHD9kGhypKG7q+eq4p5t/vvBWArRKQ58V7b0 oqzlub4DaYiGca+k1Ha5qBy6w5RA8y8AcNAzftP3EcxZsDg0W5pYgdP5Nh+pFA== ARC-Authentication-Results: i=1; rspamd-587f8d7697-l5xwh; auth=pass smtp.auth=dreamhost smtp.mailfrom=dave@stgolabs.net X-Sender-Id: dreamhost|x-authsender|dave@stgolabs.net X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|dave@stgolabs.net X-MailChannels-Auth-Id: dreamhost X-Abiding-Thread: 0792cc8c59fd190c_1738955217544_2368620000 X-MC-Loop-Signature: 1738955217544:1337459757 X-MC-Ingress-Time: 1738955217544 Received: from pdx1-sub0-mail-a311.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384) by 100.126.221.226 (trex/7.0.2); Fri, 07 Feb 2025 19:06:57 +0000 Received: from offworld (ip72-199-50-187.sd.sd.cox.net [72.199.50.187]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: dave@stgolabs.net) by pdx1-sub0-mail-a311.dreamhost.com (Postfix) with ESMTPSA id 4YqNkQ1z7Zz3S; Fri, 7 Feb 2025 11:06:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=stgolabs.net; s=dreamhost; t=1738955216; bh=y9ZVzTIynyCcvuECIWXnJaYZt7TfAR7IJxQ+MS6iWgs=; h=Date:From:To:Cc:Subject:Content-Type:Content-Transfer-Encoding; b=ZmoBmPJr6WiGEqM7Ytf0dHTIIIECi4KJcPDCCqy1KJFfMqiLz7kr48IJESUsfQ51l EGncwPjMwwKcx82E6APWX81vCo4qmzT8QJ/ThIJmKTi+Y7XZBLuXOJTkN3kr2mPsns iwkYBQWn/Cov9lPHccIXYAy2PKtPrlkE/Jj72xsogo3SLaOXLpN7KCf+bJtj3WWBSz Fp+Q/H1k8b9rwcToguoL6SdE1zoDZDdcIv2y9FqY88W9NT1cGTJjLgDu3Cdg4xSouP +njISAkRn+eFQXtce1m5CH+fY78toitk5jc1aUUnrHEMCO+Ph99X97IRfLGuaG8z1U VoOLqAZweLuIw== Date: Fri, 7 Feb 2025 11:06:51 -0800 From: Davidlohr Bueso To: "Huang, Ying" Cc: Raghavendra K T , linux-mm@kvack.org, akpm@linux-foundation.org, lsf-pc@lists.linux-foundation.org, bharata@amd.com, gourry@gourry.net, nehagholkar@meta.com, abhishekd@meta.com, nphamcs@gmail.com, hannes@cmpxchg.org, feng.tang@intel.com, kbusch@meta.com, Hasan.Maruf@amd.com, sj@kernel.org, david@redhat.com, willy@infradead.org, k.shutemov@gmail.com, mgorman@techsingularity.net, vbabka@suse.cz, hughd@google.com, rientjes@google.com, shy828301@gmail.com, liam.howlett@oracle.com, peterz@infradead.org, mingo@redhat.com, nadav.amit@gmail.com, shivankg@amd.com, ziy@nvidia.com, jhubbard@nvidia.com, AneeshKumar.KizhakeVeetil@arm.com, linux-kernel@vger.kernel.org, jon.grimm@amd.com, santosh.shukla@amd.com, Michael.Day@amd.com, riel@surriel.com, weixugc@google.com, leesuyeon0506@gmail.com, honggyu.kim@sk.com, leillc@google.com, kmanaouil.dev@gmail.com, rppt@kernel.org, dave.hansen@intel.com, dongjoo.linux.dev@gmail.com Subject: Re: [LSF/MM/BPF TOPIC] Overhauling hot page detection and promotion based on PTE A bit scanning Message-ID: <20250207190651.hpmkzl4f2zynqiun@offworld> Mail-Followup-To: "Huang, Ying" , Raghavendra K T , linux-mm@kvack.org, akpm@linux-foundation.org, lsf-pc@lists.linux-foundation.org, bharata@amd.com, gourry@gourry.net, nehagholkar@meta.com, abhishekd@meta.com, nphamcs@gmail.com, hannes@cmpxchg.org, feng.tang@intel.com, kbusch@meta.com, Hasan.Maruf@amd.com, sj@kernel.org, david@redhat.com, willy@infradead.org, k.shutemov@gmail.com, mgorman@techsingularity.net, vbabka@suse.cz, hughd@google.com, rientjes@google.com, shy828301@gmail.com, liam.howlett@oracle.com, peterz@infradead.org, mingo@redhat.com, nadav.amit@gmail.com, shivankg@amd.com, ziy@nvidia.com, jhubbard@nvidia.com, AneeshKumar.KizhakeVeetil@arm.com, linux-kernel@vger.kernel.org, jon.grimm@amd.com, santosh.shukla@amd.com, Michael.Day@amd.com, riel@surriel.com, weixugc@google.com, leesuyeon0506@gmail.com, honggyu.kim@sk.com, leillc@google.com, kmanaouil.dev@gmail.com, rppt@kernel.org, dave.hansen@intel.com, dongjoo.linux.dev@gmail.com References: <20250123105721.424117-1-raghavendra.kt@amd.com> <87o6zub978.fsf@DESKTOP-5N7EMDA> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: <87o6zub978.fsf@DESKTOP-5N7EMDA> User-Agent: NeoMutt/20220429 X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 9320C20008 X-Stat-Signature: 85riwstpahd3shuxo1cima7j75eau8mf X-HE-Tag: 1738955282-878066 X-HE-Meta: U2FsdGVkX1/tUIlidFkhmfyc++uJOa+1NW3li5gc/9HnlmRpZXodwQN7pwgXRfDJYM7qng9oj3+S+9KXYBXLwiHjERNye33WEQ1XQr5CFLHTQnagZg/858D8Sf4h7rnvymMRUXud2y6tJq9+iOOmTgp7wRrXrIU0aTWE0muKtJ/k/4K3PuvcGSrHwxB3lpdIXnkAPsoKdPcUQyF3xHONr7Q1PaSNzZgM3ukDT3gkBHe7eLTt1MbuWi5ICZxMgpm+mO2ihb7sgfv+pqDFyquN9am3zElUak20mbnMs5K+YvG8KY3AN5opFkInzOPM5bZP/rvIZ/wo0KYeZcBb4fflbnnv/RE1dmkF067q+wCPXhtWu6YP72nNcs71vfxMP+QP+IsYkse/Zy49rgqj5zckwiJbLRQ2yxM760r9RnARSYJLYsJ9yR2gUKg7GOL98LKQLEM5hwZxRxaHoDm/1/ohmrMl6eG5awCcagQ1t7+6gAfq/TU61kL7kaT1fYB5juPqbOnJ8cG60IIOgjbHoXHbsvsl7KVrhWxiokE6ky0Cl/YnCNrgaGXV65dUpa4Hk5ngMBnGqs8N67aCKv4BsfpwT8M3/IL+kEW1tcwEwpONnaMLWW+ZntbAVcnMJ7Mk7m+oG00ghDEHZj2Sjxjagzqjp57Ul3kWRLHS9jU0/eFsfAvQLiIhwp6E3btE/uVcH0qpo4EE2g5TvptQSNmj+5DjMvukfhmgAJZhi4rgRti8cjlQvMen0l0Ndxwu7wOmFfjsLJz4XGkC7XPx760ayQIKuVyyHjAnsUoOrcudrxQlF+IQchZ3FEQjCU30PylHnUMBiWkqyZFaPGNnL8uPsdbNVFF+0A22sI2DkdzU7YM++N0yvO6TVIL8vA3+0CjdeSffFHfnvOw4XOml2n9VmNUnyMMKjYX2ksd6Tekah1qdfq6T71BGCxFB1DcxCp7SkQw1zVLz7LlA8NvATYFaT/s Lv0VrfaX 2gEgzldn+ZJOVDa9Is5tDVD67r1OFMVNbbmoPziWuCWWlbvBsgvtWD+UhJLol9PD/FgekHYhYjUHeko9Oo23jLhlf6O15viYNquMWR7RR1PanGGFJhcBck6itc2ej6v5JuYOaIlqcXqA+m1piXriXImxyEKv2j6fXWJW+s8C3kQu0toAkLSsgb9Pc5TzXlncvAZiFMP8zcrw84UW9UiRrGkL/btC3sbTJSRazvIoiOr6nft+ji2PshfYrcsPnOdC+HGwCZ4gNlz5+w54lQHiOs+H/RQqZraNo2f17c9/u/pBg0VpxGXHXeQTwMTsKmUp9bu9sSeFC02dlb6dftm7MGPgIBgxA8yUnYsnwXrcfwnbxO5VqAmMXxJg0pCsRqqzToBqG856RC+TbhMdzIjDYJ6Qm/kDVZQMTuysGrpOcWNmmwRzUYMInRlEMbyaetlYTDDLg1PJVcm+RMmalzAzs0mseCPv58e8ehvqbWdlpblugE0aUrPT1DC9UIchjDA2IG6Wf+VO8rmQctsA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000019, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, 26 Jan 2025, Huang, Ying wrote: >Hi, Raghavendra, > >Raghavendra K T writes: > >> Bharata and I would like to propose the following topic for LSFMM. >> >> Topic: Overhauling hot page detection and promotion based on PTE A bit s= canning. >> >> In the Linux kernel, hot page information can potentially be obtained fr= om >> multiple sources: >> >> a. PROT_NONE faults (NUMA balancing) >> b. PTE Access bit (LRU scanning) >> c. Hardware provided page hotness info (like AMD IBS) >> >> This information is further used to migrate (or promote) pages from slow= memory >> tier to top tier to increase performance. >> >> In the current hot page promotion mechanism, all the activities includin= g the >> process address space scanning, NUMA hint fault handling and page migrat= ion are >> performed in the process context. i.e., scanning overhead is borne by the >> applications. >> >> I had recently posted a patch [1] to improve this in the context of slow= -tier >> page promotion. Here, Scanning is done by a global kernel thread which r= outinely >> scans all the processes' address spaces and checks for accesses by readi= ng the >> PTE A bit. The hot pages thus identified are maintained in list and subs= equently >> are promoted to a default top-tier node. Thus, the approach pushes overh= ead of >> scanning, NUMA hint faults and migrations off from process context. It seems that overall having a global view of hot memory is where folks are= leaning towards. In the past we have discussed an external thread to harvest inform= ation =66rom different sources and do the corresponding migration. I think your w= ork is a step in this direction (and shows promising numbers), but I'm not sure if i= t should be doing the scanning part, as opposed to just receive the information and = migrate (according to some policy based on a wider system view of what is hot; ie: = what CHMU says is hot might not be so hot to the rest of the system, or as is pointed= out below, workload based, as priorities). > >This has been discussed before too. For example, in the following thread > >https://lore.kernel.org/all/20200417100633.GU20730@hirez.programming.kicks= -ass.net/T/ > >The drawbacks of asynchronous scanning including > >- The CPU cycles used are not charged properly > >- There may be no idle CPU cycles to use > >- The scanning CPU may be not near the workload CPUs enough One approach we experimented with was doing only the page migration asynchr= onously, leaving the scanning to the task context, which also knows the dest numa no= de. Results showed that page fault latencies were reduced without affecting ben= chmark performance. Of course busy systems are an issue, as the window between ser= vicing the fault and actually making it available to the user in fast memory is en= larged. >It's better to involve Mel and Peter in the discussion for this. > >> The topic was presented in the MM alignment session hosted by David Rien= tjes [2]. >> The topic also finds a mention in S J Park's LSFMM proposal [3]. >> >> Here is the list of potential discussion points: >> 1. Other improvements and enhancements to PTE A bit scanning approach. U= se of >> multiple kernel threads, throttling improvements, promotion policies, pe= r-process >> opt-in via prctl, virtual vs physical address based scanning, tuning hot= page >> detection algorithm etc. > >One drawback of physical address based scanning is that it's hard to >apply some workload specific policy. For example, if a low priority >workload has many relatively hot pages, while a high priority workload >has many relative warm (not so hot) pages. We need to promote the warm >pages in the high priority workload, while physcial address based >scanning may report the hot pages in the low priority workload. Right? > >> 2. Possibility of maintaining single source of truth for page hotness th= at would >> maintain hot page information from multiple sources and let other sub-sy= stems >> use that info. >> >> 3. Discuss how hardware provided hotness info (like AMD IBS) can further= aid >> promotion. Bharata had posted an RFC [4] on this a while back. >> >> 4. Overlap with DAMON and potential reuse. >> >> Links: >> >> [1] https://lore.kernel.org/all/20241201153818.2633616-1-raghavendra.kt@= amd.com/ >> [2] https://lore.kernel.org/linux-mm/20241226012833.rmmbkws4wdhzdht6@ed.= ac.uk/T/ >> [3] https://lore.kernel.org/lkml/Z4XUoWlU-UgRik18@gourry-fedora-PF4VCD3F= /T/ >> [4] https://lore.kernel.org/lkml/20230208073533.715-2-bharata@amd.com/ >> > >--- >Best Regards, >Huang, Ying >