From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19975C0218F for ; Fri, 31 Jan 2025 12:28:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2CB562800ED; Fri, 31 Jan 2025 07:28:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 279B42800EC; Fri, 31 Jan 2025 07:28:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 11C9B2800ED; Fri, 31 Jan 2025 07:28:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id E86382800EC for ; Fri, 31 Jan 2025 07:28:22 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 873C146334 for ; Fri, 31 Jan 2025 12:28:15 +0000 (UTC) X-FDA: 83067674550.29.2FF2E62 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by imf16.hostedemail.com (Postfix) with ESMTP id 44F9C180011 for ; Fri, 31 Jan 2025 12:28:13 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf16.hostedemail.com: domain of jonathan.cameron@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=jonathan.cameron@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738326493; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iB4LseQvrWtwZNgRhQh9cAcA6WHO5LNmfbXj4Q1bwog=; b=CBZkcna8ZA0A8g2Z1aTJU1zWc1oGYK1dFHxFkjB4IJfFb5F+8PxZA8PKOz9epnVUDlx4gz 1GHHfqeAQs+arD4Eh4r+TF/Sk2WCNFM+hexj1ic6aDWkRJ4ApburPu/tt+W0vJLg6ZxZKU q0VIQGtNdRPItXr8nx8WF7RR+mxru34= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf16.hostedemail.com: domain of jonathan.cameron@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=jonathan.cameron@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738326493; a=rsa-sha256; cv=none; b=xs9Qo8K8DwVJf90xfEEbQVVULUxXcwsart9lXvoyQ+X7Zj4ljbxvOvtwk76lBe0wh5+SUG D2xl1Rk0XTFm/i+lDm4LyXlADabeSQI7kKQo6xXKEc/DqEA503a4Rp31wrvxhw9Sbh9brO qD7q+3l8O5qDntrrw0S7U2GoJxAiiq4= Received: from mail.maildlp.com (unknown [172.18.186.216]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4Ykw932TKzz6M4NV; Fri, 31 Jan 2025 20:25:59 +0800 (CST) Received: from frapeml500008.china.huawei.com (unknown [7.182.85.71]) by mail.maildlp.com (Postfix) with ESMTPS id 9BF571400DB; Fri, 31 Jan 2025 20:28:07 +0800 (CST) Received: from localhost (10.195.244.178) by frapeml500008.china.huawei.com (7.182.85.71) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Fri, 31 Jan 2025 13:28:05 +0100 Date: Fri, 31 Jan 2025 12:28:03 +0000 From: Jonathan Cameron To: Raghavendra K T CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: Re: [LSF/MM/BPF TOPIC] Overhauling hot page detection and promotion based on PTE A bit scanning Message-ID: <20250131122803.000031aa@huawei.com> In-Reply-To: <20250123105721.424117-1-raghavendra.kt@amd.com> References: <20250123105721.424117-1-raghavendra.kt@amd.com> X-Mailer: Claws Mail 4.3.0 (GTK 3.24.42; x86_64-w64-mingw32) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.195.244.178] X-ClientProxiedBy: lhrpeml100010.china.huawei.com (7.191.174.197) To frapeml500008.china.huawei.com (7.182.85.71) X-Rspamd-Queue-Id: 44F9C180011 X-Stat-Signature: wso7nf7dcr9f4zsdzkqo554y5kd7b96r X-Rspamd-Server: rspam08 X-Rspam-User: X-HE-Tag: 1738326493-106948 X-HE-Meta: U2FsdGVkX18dusKy6dl05Fnob1xHJIbG4r2H0km3rQM0WkeNRB+vE4722tkCrLzCKm0pcPqvOl4dArl8jegTWPTsuxvUdxdZ3bEu4XHI5zDNVjuIIjTGHW2J74kxo1vQTOXn6xxWv5/8l7iyJHbVCXXWF/QrJaJpLY2qd1lq9ywGsCX6knYlv3ziXzw39o/q5LduCewpYe0zV+mlHJSSOhH9yAwsQV+WhlHgWVqXWdX3DxPNUPHKlPWaWJlhGish6+JskQ9750OgzMsKlIPr6cyVTzKykAiyvuU88gT1zzAHyr/TUfA+z96L18caUHbw3FmojjVMuiCH9wofJ0NogVLfsND3SVYFmPQdGxJ5O4ZCmElobhuHBspMjq4PmlI42PReucMeBykzyv0i4VLn6ymcKTOrap8Qnlzkt9aF3V2PYYf/K845QJVXrKm8HuuGmcKx7TwM6hIvZYXrAM4Hdzna8sMTrZoN5tTCCaizA5gYvGfXRn6Zt32zqpn0FuH1WMadtXfLEtEZCADm+OSi+fjrsL8RbttpihVh4Fi66TbdixJl6cvWN9KAu8kwTDOpcyjGvXslgUWJHxDysPLacYdl3Ly1x2CW5n6djoWMUckcGHbiUcywX5JIbpp1+ir4u7FMtxvSeELAONbaUllKlEtAg4DDg6WqaXRAZUHk/NhlfVeO0EFbkpttO1NqbTInMGfo4uJsGEl+JYeh9VZeydRMal7VZ8EnurfKXXh3iIS9OQUEyDTyySiCpvQWTLWyiTa3j/0wXc/VcHgE5JnnNlK3wXkCMjVg7RnCMALLt1X0UfNpljT5eJccO17vpeS0DZsMEQUK4t0d2RbGNeAbW5LH+5MQ6T305friE7977DlKrw2iJ0RHKKUJ2CmeGXdt2ovSF/es0N9CrVa0753aZK3bPS1//q0qOBXgcUio26sUqUxY9x4YQU751vA1ATb0MS5AX+fMeh/UTmFJ4zC ZRsdHrix MvT5qBPTjFKCjtIpOTO0ekKXdeQrNcB3iTsWCbCvhVk4M5F+sthVT79T2vfNUplWtRr4zs5LYBeuZccbxRvROChexZ/tTbAZm/gbQ749ZJ1u7KvZk4J7PuFkHLNtxEU6g0QLF3HhBLvY3gLWCxvvyvl9XDOhE4MKJcdJWOnHu01Oxln4fBc/wUrk8Y4qW9ZDlcsLyB98penpJIWVkNrCPn+GHM7FBXYTgLWRuexH4hgIs3aS5ntM3cSbHnLgzljbnscYJOAEdTPDQoDNjCG2Qt2es0Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.029122, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > Here is the list of potential discussion points: ... > 2. Possibility of maintaining single source of truth for page hotness that would > maintain hot page information from multiple sources and let other sub-systems > use that info. Hi, I was thinking of proposing a separate topic on a single source of hotness, but this question covers it so I'll add some thoughts here instead. I think we are very early, but sharing some experience and thoughts in a session may be useful. What do the other subsystems that want to use a single source of page hotness want to be able to find out? (subject to filters like memory range, process etc) A) How hot is page X? - Is this useful, or too much data? What would use it? * Application optimization maybe. Very handy for developing algorithms to do the rest of the options here as an Oracle! - Provides both the cold and hot end of the scale, but maybe measurement techniques vary and can not be easily combined. Hard in general to combine multiple sources of truth if aiming for an absolute number. B) Which pages are super hot? - Probably these that make the most difference if they are in a slower memory tier. C) Some pages are hot enough to consider moving? - This may be good enough to get the key data into the fast memory over time. - Can combine sources of info as being able to compare precise numbers doesn't matter. D) Which pages are fairly cold? - Likewise maybe good enough over time. E) Which pages are very cold? - Ideal case for tiering. Swap these with the super hot ones. - Maybe extra signal for swap / zswap etc F) Did these hot pages remain hot (and same for cold) - This is needed to know when to back off doing things as we have unstable hotness (two phase applications are a pain for this), sampling a few pages may be fine. Messy corners: Temporal aspects. - If only providing lists of hottest / coldest in last second, very hard to find those that are of a stable temperature. We end up moving very hot data (which is disruptive) and it doesn't stay hot. - Can reduce that affect by long sampling windows on some measurement approaches (on hardware trackers that can trash accuracy due to resource exhaustion and other subtle effects). - bistable / phase based applications are a pain but perhaps up to higher levels to back off. My main interest is migrating in tiered systems but good to look at what else would use a common layer. Mostly I want to know something that is useful to move, and assume convergence over the long term with the best things to move so to me the ideal layer has following interface (strawman so shoot holes in it!): 1) Give me up to X hotish pages from a slow tier (greater than a specific measure of temperature) 2) Give me X coldish pages a faster tier. 3) I expect to ask again in X seconds so please have some info ready for me! 4) (a path to get an idea of 'unhelpful moves' from earlier iterations - this is bleeding the tiering application into a shared interface though). If we have multiple subsystems using the data we will need to resolve their conflicting demands to generate good enough data with appropriate overhead. I'd also like a virtualized solution for case of hardware PA trackers (what I have with CXL Hotness Monitoring Units) and classic memory pool / stranding avoidance case where the VM is the right entity to make migration decisions. Making that interface convey what the kernel is going to use would be an efficient option. I'd like to hide how the sausage was made from the VM. Jonathan