From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72920D64099 for ; Fri, 8 Nov 2024 23:25:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EC61990000A; Fri, 8 Nov 2024 18:25:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E7FC88D0002; Fri, 8 Nov 2024 18:25:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D14EF90000A; Fri, 8 Nov 2024 18:25:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id ACFA58D0002 for ; Fri, 8 Nov 2024 18:25:44 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 666C81A0F4E for ; Fri, 8 Nov 2024 23:25:44 +0000 (UTC) X-FDA: 82764510318.03.5498D9F Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf01.hostedemail.com (Postfix) with ESMTP id B564C40002 for ; Fri, 8 Nov 2024 23:25:13 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=AdHHRUKf; spf=pass (imf01.hostedemail.com: domain of sj@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=sj@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731108173; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=AUEVkb71pbFUqshDw5zqPu+kjW7PC9HOcySbbs5muXg=; b=5mIg6D1BfFuBgVeGxjx5gnuoYqGwsSyucHU9ZiCJRqdQYyFUOHJkqgaKXRTBvbOpwxDIo5 8r3WxTWSKJtCPqOeqxOb2Usq8I9I2+Fz8X2q6It7IQRZ/8D0d0yqAjFNfpdUz6QNqmAJKR aKT1eaYIg32fjeMEfH6DlQmBiUf22yg= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=AdHHRUKf; spf=pass (imf01.hostedemail.com: domain of sj@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=sj@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731108173; a=rsa-sha256; cv=none; b=nI1OYfAI2oa7hqzfyJ2qjaZIlEow4D2oABrI/of3dJ1vCLQZOPJbab8len1Gw7fh2G3fhw tLluV26mfnxaqSW8m/2V+R2txaJTC2DldqWEI9kqj3cyLEmueDzLIzzESVmT0E7gMEoqqa T93JIjNZwy/lMDladJ9mVQK0AWltPKQ= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 37BFEA450F7; Fri, 8 Nov 2024 23:23:47 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 55B96C4CECD; Fri, 8 Nov 2024 23:25:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1731108341; bh=pXEVBv+n1DJ2yvLsSljbx8dbic90R8cHdFizXO+pOcQ=; h=From:To:Cc:Subject:Date:From; b=AdHHRUKfJAzmf5jZkngYHjDmc7GS8i7dm9EoY+L2bvFwUx3u2QfqvGp2qZy1IM4+6 omPNt+uexXv1qDS/acx5pUzTLB4MFR6+/4k9wCdGFXR279ws83idG3DiHuZpacLLxQ H7nlEDjy6zbpbgRIQJg014NjAhLepl/lKZv4fhbvGlQQTxM7EhEjlBKRrpkrU1TVSV zI/uGVuFO0S1WOJtDPyHu76zbGvI02EFedYQrlXlo9ajN225Q9ZTXbXGl1GliYZs6o dsolyEtnGVvfo9vKZmCcoYBQaVxU4BOZC46KWdlDyXW01mskyDKqJdOI0GK2DAPHNN xBVXUip0jnMrg== From: SeongJae Park To: damon@lists.linux.dev Cc: SeongJae Park , linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: A guide to DAMON tuning and results interpretation for hot pages Date: Fri, 8 Nov 2024 15:25:36 -0800 Message-Id: <20241108232536.73843-1-sj@kernel.org> X-Mailer: git-send-email 2.39.5 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: B564C40002 X-Stat-Signature: dhhpw4cdbbrc75edfmt7uesjmb7f3eyu X-Rspam-User: X-HE-Tag: 1731108313-817119 X-HE-Meta: U2FsdGVkX1/sgDukmjumZHT9EBmnjpOPbUzcdFMun4u4Zta/fn9v6z+6bvsIfVSUzgWccsHyHgih8wTx0s4MlBxeayEQZRaUDbte25TDDC7pGQTzOMZ9yXSyGcJXFEGivyNikiC7kdAC9lvVMxsjZyp824vFESstm0ZafnttzkuDpEu8WJlfeX9McEBvbbBQTlx5cTWibG00OGE1I7H3Eo3te49ZZJswMS0bQiIdb1cMh5N7nfu8HktsaGFKFQc8km8yutloq4piaSl7OVhvyAeEJRhIQgFIrkQMoxTFGXkWXte/AmuUSiZ4Rq8pnRML2tHC+D9Gdvlny8huXxpECTmr6zYLM+GLuQo9Ln/CLVPXfyJb39BcZsaidMrzxdISaEGyAkk5rrOx1hMlLCqIeOixF998p+VygSOAhEocukHjU+ef4GarNV0/OAG2M2HiBviDZyhJ6mSrIRGkgtBZzubruwiVgIhE0GhFGiuYgI99wA/NXu1rbEeEgsY+fn9hHMcZNpTnLBEGKLr5XjrlEDqJeBM3rS5TOUkm05+7EYPPrHNyKWstFyZc73C1e5B87Vv9LBmO70i+u/X7dbXk0dBEskQBAhbACakrISovIb4NLFJjXZoLsTFc02zE9SbYsqopmYodHJF+8wiF1HbKWd2R7U/8CoTztGGzBSIb3bpgehbq4RnO9sKl8ao/YZ+LL/LXAzZlZBQKqZ7kp1NLLdh8GYJu1YPizMol0UXQX/qVDq7OnVpQ7m/dDJYNesmsx9ay/iJvNYAV4p4pxyn8dWLxj0ZMdSq3wKFDnlOSU8EUulAl3pIRvVzMZQtFUJgZ8RavNrDAAa1YW/d1vnPyiXkKYoCuUgGravjybJ7xOHngJFbE4p9dKABJzU0l2lMsyOslalIi7aNS7z2NxQrhrd+s+4bjUR7O0YsepVF/RqgDsAm/roxGOHa2zzYA/UyybmGFGcvZ+9/zev1AkiI 9o9nIN3E Es2NIzWHDFM7+FpWDdSuMfTGmKwLyWb2lLWB9L01nhcQi1YwhCjvKbLAM7iPBz4HgATImrgpMlq/gEBjQhRapcz/h0tCrSlw2BNp6CFXow5iM4TqFLTBK7Nteq/ilHn+/9+Gf0VHy6plYiR+GvqNpLL3nkWbbuDYyRNJHsfDO3LUwgeRyCaeHERvfgArG/gcQ0fJLKNrX/PIIFO9U42uStmbSD2A+wVNcab9/POWcjNfwLiVClmdOrWogdmmx7Dfi2oCEX3OA89Pv0Tr9wyAkzewFvcRsRFCTR+/zguni15bFU1ruo05qg6gqG337EcpGN6iUvFUmw6PlVPnkdTbFxms2hpnD+fLASD7A6X3Vxf8N5OT8CkdojWKZdXx+E7frqPfax19303Gp3P3sw1BLnBis+oLYDiLdNdQ+H/uWwVTSqK3EuI8VBzDKT3E6ephz2HzW X-Bogosity: Ham, tests=bogofilter, spamicity=0.004175, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hello DAMON community, One of common issues that I received from DAMON users is that DAMON's monitoring results show hot regions much less than expected. Specifically, the users find regions of only zero or low 'nr_accesses' value from the DAMON-generated access pattern snapshots. In some cases, it turned out the problem can be alleviated by tuning DAMON parameters or changing the way to interpret the results. I'd like to share the details and possible future improvements here. Note that I'm not saying this is users' tuning fault. I admit that the real root cause of the issue is the poor interface and lack of guides that makes correct tuning difficult, and the suboptimality of DAMON's mechanisms. We will continue working on advancing it in long term. Sharing some of the plans and status at the end of this email. TL; DR ------ Users show only low or zero nr_accesses regions mainly because they set 'aggregation intrval' too short compared to the workload's memory access intensiveness. Please increase the aggregation interval, or treat 'nr_accesses' zero regions of short 'age' as hot regions. Now let's walk through more details. The below sections assume you're familiar with DAMON's monitoring mechanisms including 'Access Frequency Monitoring', 'Regions Based Sampling', 'Adaptive Regions Adjustment', and 'Age Tracking'. You should particularly be familiar with terms including 'sampling interval', 'aggregation interval', 'nr_accesses', and 'age'. If you're not familiar with those, you can refer to the document (https://www.kernel.org/doc/html/next/mm/damon/design.html#monitoring). Problem ------- Some users reported that they expected DAMON will be useful for finding both cold memory regions and hot memory regions. And they found it indeed works for finding cold memory regions. But, they found difficulties at finding hot memory regions. Some detailed reported cases are as below. Case 1: Proactive Reclamation A user tried to use DAMOS for proactively reclaiming cold memory regions. They hence specified maximum access rate (nr_accesses) and minimum age of target regions for the DAMOS scheme as zero and 2 minutes, respectively. It means asking DAMON to find regions that not accessed for two minutes or more and reclaim those as soon as found. If they use DAMON user-space tool, damo, they would used a command like below. # damo start --damos_action pageout --damos_access rate 0% 0% --damos_age 2m max The result was as the user expected. The user reported me that they were able to save memory without performance degradation. My test setup also showed good results from similar DAMOS schemes, even though my setup was using even more aggressive approach (minimum age as 5 seconds). Case 2: Heatmap Visualization A user has recorded data access pattern of their workloads using DAMON user-space tool, damo, and draw heatmap using 'damo report heatmap' command. The workload was active and therefore the user expected the heatmap to show some heats. But the output was showing nearly zero heats. It was nearly always only dark. On my test setup, some workloads indeed showed very dark heatmap. But on multiple workloads, meaningful access patterns were identifiable using the heatmaps (https://damonitor.github.io/test/result/visual/next/rec.heatmap.1.png.html). Case 3: Prioritizing Hot Pages A user tried to use DAMOS for finding hot memory regions and make a prioritization action like backing the regions with THP, migrating the pages from CXL node to DRAM node, or moving the page from inactive LRU list to active LRU list. They hence specified minimum access rate (nr_accesses) for the DAMOS scheme with a high value. For example, a command like below was used: # damo start --damos_action hugepage --damos_access_rate 50% max $(pidof workload) However, they found DAMOS is doing nearly nothing. They reduced the minimum access_rate, and situation has been better, but still DAMOS was not finding expected amount of hot memory regions. Case 2 and 3 mean that they expected to show regions of high 'nr_accesses' value from DAMON-generated access pattern snapshots. But it showed only zero or low 'nr_accesses' values. On my test setup (https://damonitor.github.io/posts/damon_evaluation/), similar approach showed good results, though. One common thing that I found from the reports was that they were using default values for 'aggregation interval', which is 100 milliseconds. My test setup is also using the default values. Meaning of 'aggregation interval' --------------------------------- For every 'aggregation interval', DAMON generates one complete access pattern snapshot. That means 'aggregation interval' can be though of as the time amount of information that a single snapshot can contain. The value should hence be decided depending on how much information the user wants each snapshot to have. If it is too short, each snapshot contains nearly no information, so not be useful. If it is too long, each snapshot contains too much information that coarsely cumulated, so again not useful. The meaningful length of time depends on the data access intensiveness of the workload. The intensiveness should be measured with two factors: frequency and footprint. The frequency is how frequently the workload is making accesses. If the workload makes zero or only a low number of accesses per given 'aggregation interval', the snapshot will of course show zero or low number of accesses, due to 'Access Frequency Monitoring' mechanism. The footprint is the total amount of memory that the workload has accessed at least once. Due to 'Region Based Sampling' mechanism, it should be meaningfully large compared to the size of total monitoring target regions. For example, if DAMON is requested to monitor 1 TiB memory and the workload is accessing 1 MiB region of it, DAMON's sampling based approach will have difficulty at finding the 1 MiB region. 'Adaptive Regions Adjustment' mechanism will make DAMON to find the 1 MiB region eventually. But it will take time. The workload could stop accessing the 1 MiB region and starts accessing another 1 MiB region before the 'Adaptive Regions Adjustment' mechanism finds it. Note that the frequency and footprint of accesses from workloads would depend on not only the source code but also many factors. The system's total memory bandwidth, the extent of load and noisy neighbor workloads could be a few examples. So, if the desired maximum 'nr_accesses' on each snapshot is fixed, 'aggregation interval' should be increased as the access aggressiveness is decreased, and vice versa. Root Causes 1: Suboptimal Aggregation Interval Tuning ----------------------------------------------------- I suspect in many of the reported problematic cases, the 'aggregation interval' was too short. The default value is good enough for my test setup, but it would need tuning on different systems. Especially on large systems having limited bandwidth, 100 millisecond 'aggregation interval' could be not long enough to make meaningful amount of accesses in terms of both frequency and footprint within the interval. I actually suggested some of the reporters to increase 'aggregation interval', and it was helpful at alleviating the issue in some degree. I cannot know exactly who are using DAMON with what parameters. But at least one open-sourced usage from HMSDK is setting 'aggregation interval' as two seconds (https://github.com/skhynix/hmsdk/blob/hmsdk-v3.0/tools/gen_migpol.py#L14). Root Cause 2: Ignorance of Recency ('age') ------------------------------------------ The 'nr_accesses' represents the access frequency, and hence it is natural to assume hot memory regions would have high 'nr_accesses'. DAMON provides not only frequency. It also informs users how long the 'nr_accesses' of the regions is maintained, namely 'age', using 'Age Tracking' mechanism. If 'nr_accesses' is non-zero, 'age' can be used to calculate actual access hotness of the region (if we turn fire on for longer time, the temperature will be higher). If 'nr_accesses' is zero, 'age' can represent a sort of the recency information (when the regions has last accessed). The recency information can be useful for finding cold pages like case 1 (proactive reclaim). The opposite of finding cold pages is finding hot pages, so the recency information can also be used for finding relatively hot pages. In other words, if for any reason DAMON cannot generate a snapshot having enough non-zero 'nr_accesses' divergence for given purpose, users could further differentiate hot regions among zero 'nr_accesses' regions, using 'age' information. It would be not ideal, but still reasonable like a sort of LRU-based appraoches could be. So I suspect some of the problems occur from not using 'age' for zero 'nr_accesses' regions. Actually reports from case 3 (prioritizing hot pages), which were successful only on my test setup, were commonly using non-zero minimum 'nr_accesses' for DAMOS schemes, so were ignoring 'age' of zero 'nr_accesses' regions. Meanwhile, case 2 (proactive reclaim) was using 'age' information for zero 'nr_accesses' regions, and no negative results have reported so far. This might seem like not addressing case 2 (heatmap visualization), because heatmap shows 'nr_accesses' change of regions over time. But if the record is collected for long time, regions that shown non-zero 'nr_accesses' for a short period may look a very small dot on the low-resolution picture that not easy to shown with human eyes. The users might be able to get different results using 'age' information on the collected snapshot, like recency or temperation based histogram (https://github.com/damonitor/damo/blob/v2.5.4/USAGE.md#access-report-styles). Tuning Guide ------------ Based on above root cause theories, I suggest to try below tuning guides. If you show DAMON is not working well at finding hot pages, 1. Ensure your workload is making meaningfully intensive data accesses. 2. Gradually increase aggregation interval and show if it makes change. 3. Try using 'age' information even if 'nr_accesses' is zero. 4. If nothing works, report the problem to sj@kernel.org, damon@lists.linux.dev and/or linux-mm@kvack.org. If increasing aggregation interval alleviates your problem, you can further consider increasing 'sampling interval'. If it doesn't harm the quality of the access pattern snapshots, having low 'sampling interval' will only increase DAMON's CPU usage. For using 'age' information of zero 'nr_accesses' regions, different approaches could be used for profiling use case and DAMOS use case. For profiling use case, users can try reading recency or access temperature based histograms (https://github.com/damonitor/damo/blob/v2.5.4/USAGE.md#access-report-styles) of snapshots from record, or live-captured snapshots. If the use case is for DAMOS, applying the 'age' information on DAMOS target access pattern would be straightforward. Using DAMOS Quotas together can be useful, since it provides its own under-quota-prioritization logic that utilizes 'age' information for zero 'nr_accesses' regions, and further provides auto-tuning of the quota for given target metric/value. Future Plans ------------ Again, I admit that the real root cause of the issue is the poor interface and lack of guides that makes correct tuning difficult, and the suboptimality of DAMON's mechanisms. We will continue working on advancing it in long term. For easy use of 'age' information for zero 'nr_accesses' regions, DAMON is already providing its Quotas feature with auto-tuning. We will continue making the auto-tuning more advanced, and adding new features for ease of uses. One obvious hidden root cause is the absence of the guideline. I will collect more inputs and write an official document for this. I actually thought about writing the official document first, but this writing this mail took already over two weeks. So posting this mail first. For interval setting, we are planning a sort of auto-tuning. Like DAMOS quota auto-tuning, users will be able to set more intuitive knobs (e.g., how much diversity of regions they want) instead of directly setting the intervals. Then, DAMON will automatically tune the intervals and other low level knobs. This is in very early stage, so no specific design is made so far, and will take long time. Don't expect delivery of this in near future. Use the tuning guide for short term and/or ask prioritization of this project if needed. In my humble opinion, 'Adaptive Regions Adjustment' mechanism is not a root cause of the issue. Nonetheless, it also has many rooms for improvement that can make DAMON more lightweight and accurate. And any DAMON accuracy improvements would alleviate the hot page detection issue. We plan to do this together, and this is again a long term project that has no specific design yet. Nonetheless, we recently shared two simple short term features for this (https://lore.kernel.org/20241027204910.155254-1-sj@kernel.org). If you are interested in implementing the short term features, please step up. I'm eagerly looking forward to any input on this humble and shallow thoughts! Thanks, SJ