From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3042FC02194 for ; Thu, 6 Feb 2025 15:30:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 92325280003; Thu, 6 Feb 2025 10:30:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8D31E280001; Thu, 6 Feb 2025 10:30:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 79A9C280003; Thu, 6 Feb 2025 10:30:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 56CE8280001 for ; Thu, 6 Feb 2025 10:30:17 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 0DEC51C8667 for ; Thu, 6 Feb 2025 15:30:17 +0000 (UTC) X-FDA: 83089906074.18.389557D Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by imf15.hostedemail.com (Postfix) with ESMTP id B9F20A0013 for ; Thu, 6 Feb 2025 15:30:13 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=none; spf=pass (imf15.hostedemail.com: domain of jonathan.cameron@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=jonathan.cameron@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738855815; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rkA2isV8xXfWQz6Sf/fY4YAU4irA2FRdtivLEgQaXSA=; b=wzBMR9BBdM5cpsBjGFNUGBK1WHX2i9MPHMsrMSabOHLFakCagM6wG/tLJmxn/cHcurF73G oQ6Y8ricW2amkOCAGLaCpfOfdCX9hYYvzejKznAJZ3Zj/4kxK08pNp5FWLegTDK902Hq+O ikmiuoKKPXf2WTF5NRkxP12tDyWp/aE= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=none; spf=pass (imf15.hostedemail.com: domain of jonathan.cameron@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=jonathan.cameron@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738855815; a=rsa-sha256; cv=none; b=FxqqL2NQSSRIeekoJOE16JnB9EJe/pTmiDd+WxIPBM7/6Li5q6kCc6491KyJssdnw0D2XW GVaK/cH+5pgopYnRBc7MwJc3M3YAJL8g54U5uazXyA+JiHIz+LJOlwMbd1L8EDATKXRC1H AbBFnobfwjkSHLLsCi+lybPRxQoVo0E= Received: from mail.maildlp.com (unknown [172.18.186.231]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4Ypgvg6wwlz67TqW; Thu, 6 Feb 2025 23:27:27 +0800 (CST) Received: from frapeml500008.china.huawei.com (unknown [7.182.85.71]) by mail.maildlp.com (Postfix) with ESMTPS id D0E18140B38; Thu, 6 Feb 2025 23:30:09 +0800 (CST) Received: from localhost (10.203.177.66) by frapeml500008.china.huawei.com (7.182.85.71) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Thu, 6 Feb 2025 16:30:08 +0100 Date: Thu, 6 Feb 2025 15:30:06 +0000 From: Jonathan Cameron To: Johannes Weiner CC: Bharata B Rao , Raghavendra K T , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: Re: [LSF/MM/BPF TOPIC] Unifying sources of page temperature information - what info is actually wanted? Message-ID: <20250206153006.00003e0f@huawei.com> In-Reply-To: <20250205160529.GB1183495@cmpxchg.org> References: <20250123105721.424117-1-raghavendra.kt@amd.com> <20250131122803.000031aa@huawei.com> <20250131130901.00000dd1@huawei.com> <20250205160529.GB1183495@cmpxchg.org> X-Mailer: Claws Mail 4.3.0 (GTK 3.24.42; x86_64-w64-mingw32) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.203.177.66] X-ClientProxiedBy: lhrpeml100011.china.huawei.com (7.191.174.247) To frapeml500008.china.huawei.com (7.182.85.71) X-Rspamd-Queue-Id: B9F20A0013 X-Stat-Signature: 68ajebquuprx5xjbgcqhu4kss6r63txn X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1738855813-301007 X-HE-Meta: U2FsdGVkX1/fWIeBLLdLPga9ZLOaiStLoBPIQhpfYZXEpSD+gkb6569LcgLHQRAeXUywqra7t2gGo0d/1QRdJ9Kim7NZxjf0KoPUdvpvkQT+Tmrnj69KPyAHos820jiTrdgvxdigaQ5zUI/awDa4UQdUa6lFbd3QLlqWHuZgM6KLWJ/xYBh2mZ8agVE8SEBAkrj2AGTstMJpRAPYGS1FAx8cGCrTMpkRDHTaQQ2w9iUrkbVkOAjSBW02+orT6ZLCyUKKb2AJdwdihM9gWxMe+AG4afynHUeBE4dj69TNASIbzPHG/My+MeEcuyfu2+5tzRXbAgnZt9JnNHqf7JtESaipnwBcpSTSZTC5432qyYnT30F7xkC0VUV5BuVnJF7OUWt0/L0H4IcC65bytqGGef6fZBdWXRW6FEfQVJYy+4MU2a1o7HFFwUcQYeDxyfWadBJXARj/mk7MBj2nOLMjApaXf9CT8KWGs4Rn+dgomRhcCsexWoArDv347m4MFUqGi29eIhvHl760kxTrxT6dSfbsmhy5xUrydR/XnZzYJv35DbPUhxz309eMQNmow8kxgrl+sO2eE4IkpAEvtw+3+ngcRy25J9aXTD5VAnLKDwxsw20ksO9wyVfjUmm6GBwpstBHj4++tOZHnYEHrhYvGRBnq2Z5uL98RAxicoY2U1KFefyNkWpxveUADQdgYtsTwriR3JCEtEhDdhamwhCURZFXCVZjQxfB/6a2Dkdl8m5sZFlpQexDgLZvZ4fgED7fGYesRAUcxk3EWixtA9Se3uf7zxWKMgApzeSJ20CxOKhj41Hc2GxISfPt+p1xhOkF7/vP5hBxsN0TVN9E6GirIzaVNJ62Nmo9BGdUvyEHXbrCSEv54uW++f176M0h1JY/5J5pIxT3ChwbbBny4wpnczjnrOoIDYmf00fHblDOK3sZ8Fa6Q45cfUI2PR0wTGJhKWn/UluVY7mzJrqCe2D p/HhgPNv d1wRrgBQmmSYM0x/ztTybuGVVF4NQ1DGVh7d0u/enhrpzIidul+mj9QTwYZHtD9G+Kex8huyaU9x0u6lrFBwxmcsmtgKYhcM9d4LYGmF8m05MVBerMXionlOifZeBOdUw+2tvSbPE07wGJYRkJ7k5/7nql3I02CYNTlhFX6HeykjMDUazpCO8U94NHVVJheveUJVBDigBcyTGsPiQSohmfoBaPoJe7tzGP2/gaKBLzEhlspuUb8fXTZNjpvPrgbSJR7A1ydGpH5ZF1MufcFgVvWYPrTnDV14NSu++tD+j9LiBGz8a9P2e7yUawFM7CUA+14FmIDPzewUUkaYk80DTSsL0GKS+XaShYnahEulDX0Dx+ZibKqAMmcD6Lg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000870, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, 5 Feb 2025 11:05:29 -0500 Johannes Weiner wrote: > On Wed, Feb 05, 2025 at 11:54:05AM +0530, Bharata B Rao wrote: > > On 31-Jan-25 6:39 PM, Jonathan Cameron wrote: > > > On Fri, 31 Jan 2025 12:28:03 +0000 > > > Jonathan Cameron wrote: > > > > > >>> Here is the list of potential discussion points: > > >> ... > > >> > > >>> 2. Possibility of maintaining single source of truth for page hotness that would > > >>> maintain hot page information from multiple sources and let other sub-systems > > >>> use that info. > > >> Hi, > > >> > > >> I was thinking of proposing a separate topic on a single source of hotness, > > >> but this question covers it so I'll add some thoughts here instead. > > >> I think we are very early, but sharing some experience and thoughts in a > > >> session may be useful. > > > > > > Thinking more on this over lunch, I think it is worth calling this out as a > > > potential session topic in it's own right rather than trying to find > > > time within other sessions. Hence the title change. > > > > > > I think a session would start with a brief listing of the temperature sources > > > we have and those on the horizon to motivate what we are unifying, then > > > discussion to focus on need for such a unification + requirements > > > (maybe with a straw man). > > > > Here is a compilation of available temperature sources and how the > > hot/access data is consumed by different subsystems: > > This is super useful, thanks for collecting this. Absolutely agree! > > > PA-Physical address available > > VA-Virtual address available > > AA-Access time available > > NA-accessing Node info available > > > > I have left the slot blank for those which I am not sure about. > > ================================================== > > Temperature PA VA AA NA > > source > > ================================================== > > PROT_NONE faults Y Y Y Y > > -------------------------------------------------- > > folio_mark_accessed() Y Y Y > > -------------------------------------------------- > > For fma(), the VA info is available in unmap, but usually it isn't - > or doesn't meaningfully exist, as in the case of unmapped buffered IO. > > I'd say it's an N. > > > PTE A bit Y Y N N > > -------------------------------------------------- > > Platform hints Y Y Y Y > > (AMD IBS) > > -------------------------------------------------- > > Device hints Y > > (CXL HMU) > > ================================================== For the use cases where we have relatively few 'pages' the cost of a reverse map look up doesn't look to be a problem. Trick is to do it only after we've done what we can in PA space to cut down on the pages of interest. So maybe (Y) to reflect that it is indirect. Whether it makes sense to do that before or after some common layer is an interesting question. That PA/VA mapping might be out of date anyway by the time we see the data. > > For the following table, it might be useful to add *when* the source > produces this information. Sampling frequency is a likely challenge: > consumers have different requirements, and overhead should be limited > to the minimum required to serve enabled consumers. > > Here is an (incomplete) attempt - sorry about the long lines: > > > And here is an attempt to compile how different subsystems > > use the above data: > > ============================================================== > > Source Subsystem Consumption Activation/Frequency > > ============================================================== > > PROT_NONE faults NUMAB NUMAB=1 locality based While task is running, > > via process pgtable balancing rate varies on observed > > walk NUMAB=2 hot page locality and sysctl knobs. > > promotion > > ============================================================== > > folio_mark_accessed() FS/filemap/GUP LRU list activation On cache access and unmap > > ============================================================== > > PTE A bit via Reclaim:LRU LRU list activation, During memory pressure > > rmap walk deactivation/demotion > > ============================================================== > > PTE A bit via Reclaim:MGLRU LRU list activation, - During memory pressure > > rmap walk and process deactivation/demotion - Continuous sampling (configurable) > > pgtable walk for workingset reporting > > ============================================================== > > PTE A bit via DAMON LRU activation, Continuous sampling (configurable)? > > rmap walk hot page promotion, (I believe SJ is looking into > > demotion etc auto-tuning this). > > ============================================================== > > Platform hints NUMAB NUMAB=1 Locality based > > (AMD IBS) balancing and > > NUMAB=2 hot page > > promotion > > ============================================================== Based on the CXL one... > > Device hints NUMAB NUMAB=2 hot page Continuous sampling, frequency controllable. > > promotion Subsampling programable. > > ============================================================== > > The last two are listed as possibilities. > > > > Feel free to correct/clarify and add more. The above covers what the use cases require. Maybe we need to do similar for the controls needed the other way (frequency already covered) Filtering. * Process ID * Address range (PA / VA) * Access type (read vs write) may matter for migration cost. Also frequency is more nuanced perhaps: - How often to give data (timeliness) - How much data to give (bandwidth) - When don't I care (threshold) - How precise do I want it to be (subsampling etc) The layering is clearly to be complex, so maybe addressing each use case for what info that needs would be helpful? The following is probably too simplistic. ================================================================== Usecase Nature of data ================================================================== NUMAB =1 Enough hot pages with remote source. Balancing ================================================================== NUMAB =2 Enough hot pages in slow memory Tiering Promotion ================================================================== NUMAB = 2 Enough cold pages in fast memory Tiering Demotion =================================================================== LRU list Specific pages of interest accessed activation =================================================================== LRU list Enough cold pages? deactivation ==================================================================== Jonathan > > > > Regards, > > Bharata.