From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78DC8C433EF for ; Wed, 19 Jan 2022 13:32:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DAB206B0073; Wed, 19 Jan 2022 08:32:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D5CD26B0074; Wed, 19 Jan 2022 08:32:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BFC6E6B0075; Wed, 19 Jan 2022 08:32:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0111.hostedemail.com [216.40.44.111]) by kanga.kvack.org (Postfix) with ESMTP id B086D6B0073 for ; Wed, 19 Jan 2022 08:32:09 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 6648A815A778 for ; Wed, 19 Jan 2022 13:32:09 +0000 (UTC) X-FDA: 79047125178.28.0A2E6C9 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf30.hostedemail.com (Postfix) with ESMTP id 90A0180046 for ; Wed, 19 Jan 2022 13:32:08 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id 71A5CCE1C82; Wed, 19 Jan 2022 13:32:04 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CA0C0C004E1; Wed, 19 Jan 2022 13:32:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1642599122; bh=LJo017n4lXa66I4JspauneG5iK3AP7JKZnPkq0CjrXg=; h=From:To:Cc:Subject:Date:From; b=es8Kw0PiFvtvOeQUvHrIzu4GlfL4lyOVw1vKN0O6hwV0PrctiEHGXv8tkr2BQmORf v/7z9kF0mqO2W8a245JOY8wstUSS7G3spyLLoFO+tTtvMALjVN3140Y8MCX+3CDX+k bn9hFLjN9xgtTOMd4DKi6AAUfgYkA3eTInQ16P3sCQt5pTOLcRACjeBMVfhTDh9J1A 0DZIzfvkgX//8UJMKq8sHO7JVdSVHnm4+khxrPYC0jX1CS82icQlsrF30JlTJgEOiM J+ZjOCAb8qv62lIjMBaTW3glPnMSvhYV5A04EoGnrJw6Bc1W6UAaPSTWcjpVRlFPqL NqTLxuBN7ZHqQ== From: SeongJae Park To: Cc: SeongJae Park , akpm@linux-foundation.org, Jonathan.Cameron@Huawei.com, amit@kernel.org, benh@kernel.crashing.org, corbet@lwn.net, david@redhat.com, dwmw@amazon.com, elver@google.com, foersleo@amazon.de, gthelen@google.com, markubo@amazon.de, rientjes@google.com, shakeelb@google.com, baolin.wang@linux.alibaba.com, guoqing.jiang@linux.dev, xhao@linux.alibaba.com, hanyihao@vivo.com, changbin.du@gmail.com, kuba@kernel.org, rongwei.wang@linux.alibaba.com, rikard.falkeborn@gmail.com, geert@linux-m68k.org, kilobyte@angband.pl, linux-damon@amazon.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC PLAN] Some humble ideas for DAMON future works Date: Wed, 19 Jan 2022 13:31:10 +0000 Message-Id: <20220119133110.24901-1-sj@kernel.org> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 90A0180046 X-Stat-Signature: 5ya4gg4qxxz83qpndgpfabn5mmao7j9q Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=es8Kw0Pi; spf=pass (imf30.hostedemail.com: domain of sj@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=sj@kernel.org; dmarc=pass (policy=none) header.from=kernel.org X-HE-Tag: 1642599128-86830 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello, After the DAMON code is merged (kudos to the community for the great help= s), a few people asked me about my plan for DAMON future works, and if DAMON wi= ll be somewhat usable for their use cases. I indeed have some humble plans, th= ough those are only in rough brainsorming level at the moment. so I'd like to = share those here before going forward and start coding, so that I can get some feedback to fail fast. User-space Policy or In-kernel Policy? Both. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D When discussing about a sort of kernel involved system efficiency optimizations, I show two kinds of people who have slightly different opi= nions. The first party prefer to implement only simple but efficient mechanisms = in the kernel and export it to user space, so that users can make smart user spa= ce policy. Meanwhile, the second party prefer the kernel just works. I agr= ee with both parties. I think the first opinion makes sense as there are some valuable informat= ion that only user space can know. I think only such approaches could achiev= e the ultimate efficiency in such cases. I also agree to the second party, though, because there could be some peo= ple who don't have special information that only their applications know, or resources to do the additional work. In-kernel simple policies will be s= till beneficial for some users even though those are sub-optimal compared to t= he highly tuned user space policy, if it provides some extent of efficiency = gain and no regressions for most cases. I'd like to help both. For the reason, I made DAMON as an in-kernel mech= anism for both user and kernel-space policies. It provides highly tunable gene= ral user space interface to help the first party. It also provides in-kernel policies which built on top of DAMON using its kernel-space API for speci= fic common use cases with conservative default parameters that assumed to inc= ur no regression but some extent of benefits in most cases, namely DAMON-based proactive reclamation. I will continue pursuing the two ways. Imaginable DAMON-based Policies =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D I'd like to start from listing some imaginable data access-aware operatio= n policies that I hope to eventually be made. The list will hopefully shed= light on how DAMON should be evolved to efficiently support the policies. DAMON-based Proactive LRU-pages (de)Activation ---------------------------------------------- The reclamation mechanism which selects reclaim target using the active/inactive LRU lists sometimes doesn't work well. According to my previous work, providing access pattern-based hints can significantly imp= rove the performance under memory pressure[1,2]. Proactive reclamation is known to be useful for many memory intensive sys= tems, and now we have a DAMON-based implementation of it[3]. However, the proa= ctive reclamation wouldn't be so welcome to some systems having high cost of I/= O. Also, even though the system runs proactive reclamation, memory pressure = can still occasionally triggered. My idea for helping this situation is manipulating the orders of pages in= LRU lists using DAMON-provided monitoring results. That is, making DAMON proactively finds hot/cold memory regions and moves pages of the hot regi= ons to the head of the active list, while moving pages of the cold regions to th= e tail of the inactive list. This will help eventual reclamation under memory pressure to evict cold pages first, so incur less additional page faults. [1] https://www.usenix.org/conference/hotstorage19/presentation/park [2] https://linuxplumbersconf.org/event/4/contributions/548/ [3] https://docs.kernel.org/admin-guide/mm/damon/reclaim.html DAMON-based THP Coalesce/Split ------------------------------ THP is know to significantly improve performance, but also increase memor= y footprint[1]. We can minimize the memory overhead while preserving the performance benefit by asking DAMON to provide MADV_HUGEPAGE-like hints f= or hot memory regions of >=3D 2MiB size, and MADV_NOHUGEPAGE-like hints for cold= memory regions. Our experimental user space policy implementation[2] of this id= ea removes 76.15% of THP memory waste while preserving 51.25% of THP speedup= in total. [1] https://www.usenix.org/conference/osdi16/technical-sessions/presentat= ion/kwon [2] https://damonitor.github.io/doc/html/v34/vm/damon/eval.html DAMON-based Tiered Memory (Pro|De)motion ---------------------------------------- In tiered memory systems utilizing DRAM and PMEM[1], we can promote hot p= ages to DRAM and demote cold pages to PMEM using DAMON. A patch for allowing access-aware demotion user space policy development is already submitted[= 2] by Baolin. [1] https://www.intel.com/content/www/us/en/products/details/memory-stora= ge/optane-memory.html [2] https://lore.kernel.org/linux-mm/cover.1640171137.git.baolin.wang@lin= ux.alibaba.com/ DAMON-based Proactive Compaction -------------------------------- Compaction uses migration scanner to find migration source pages. Hot pa= ges would be more likely to be unmovable compared to cold pages, so it would = be better to try migration of cold pages first. DAMON could be used here. = That is, proactively monitoring accesses via DAMON and start compaction so tha= t the migration scanner scan cold memory ranges first. I should admit I'm not familiar with compaction code and I have no PoC data for this but just th= e groundless idea, though. How We Can Implement These -------------------------- Implementing most of the above mentioned policies wouldn't be too difficu= lt because we have DAMON-based Operation Schemes (DAMOS). That is, we will = need to implement some more DAMOS action for each policy. Some existing kerne= l functions can be reused. Such actions would include LRU (de)activation, = THP coalesce/split hints, memory (pro|de)motion, and cold pages first scannin= g compaction. Then, supporting those actions with the user space interface= will allows implementing user space policies. If we find reasonably good defa= ult DAMOS parameters and some kernel side control mechanism, we can further m= ake those as kernel policies in form of, say, builtin modules. How DAMON Should Be Evolved For Supporting Those =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Let's discuss what kind of changes in DAMON will be needed to efficiently support above mentioned policies. Simultaneously Monitoring Different Types of Address Spaces ----------------------------------------------------------- It would be better to run all the above mentioned policies simultaneously= on single system. As some policies such as LRU-pages (de)activation would b= etter to run on physical address space while some policies such as THP coalesce= /split would need to run on virtual address spaces, DAMON should support concurr= ently monitoring different address spaces. We can always do this by creating o= ne DAMON context for each address space and running those. However, as the address spaces will conflict, each other will be interfered. Current ide= a for avoiding this is allowing multiple DAMON contexts to run on a single thre= ad, forcing them to have same monitoring contexts. Online Parameters Updates ------------------------- Someone would also want to dynamically turn on/off and/or tune each polic= y. This is impossible with current DAMON, because it prohibits updating any parameter while it is running. We disallow the online parameters update mainly because we want to avoid doing additional synchronization between = the running kdamond and the parameters updater. The idea for supporting the = use case while avoiding the additional synchronization is, allowing users to = pause DAMON and update parameters while it is paused. A Better DAMON interface ------------------------ DAMON is currently exposing its major functionality to the user space via= the debugfs. After all, DAMON is not for only debugging. Also, this makes t= he interface depends on debugfs unnecessarily, and considered unreliable. A= lso, the interface is quite unflexible for future interface extension. I admi= t it was not a good choice. It would be better to implement another reliable and easily extensible interface, and deprecate the debugfs interface. The idea is exposing the interface via sysfs using hierarchical Kobjects under mm_kobject. For ex= ample, the usage would be something like below: # cd /sys/kernel/mm/damon # echo 1 > nr_kdamonds # echo 1 > kdamond_1/contexts/nr_contexts # echo va > kdamond_1/contexts/context_1/target_type # echo 1 > kdamond_1/contexts/context_1/targets/nr_targets # echo $(pidof ) > \ kdamond_1/contexts/context_1/targets/target_1/pid # echo Y > monitor_on The underlying files hierarchy could be something like below. /sys/kernel/mm/damon/ =E2=94=82 monitor_on =E2=94=82 kdamonds =E2=94=82 =E2=94=82 nr_kdamonds =E2=94=82 =E2=94=82 kdamond_1/ =E2=94=82 =E2=94=82 =E2=94=82 kdamond_pid =E2=94=82 =E2=94=82 =E2=94=82 contexts =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 nr_contexts =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 context_1/ =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 target_type (va | p= a) =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 attrs/ =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 intervals= /sampling,aggr,update =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 nr_region= s/min,max =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 targets/ =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 nr_target= s =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 target_1/ =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82= pid =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82= init_regions/ =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82= =E2=94=82 region1/ =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82= =E2=94=82 =E2=94=82 start,end =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82= =E2=94=82 ... =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 ... =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 schemes/ =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 nr_scheme= s =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 scheme_1/ =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82= action =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82= target_access_pattern/ =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82= =E2=94=82 sz/min,max =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82= =E2=94=82 nr_accesses/min,max =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82= =E2=94=82 age/min,max =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82= quotas/ =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82= =E2=94=82 ms,bytes,reset_interval =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82= =E2=94=82 prioritization_weights/ =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82= =E2=94=82 sz,nr_accesses,age =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82= watermarks/ =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82= metric,check_interval,high,mid,low =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82= stats/ =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82= =E2=94=82 quota_exceeds =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82= =E2=94=82 tried/nr,sz =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82= =E2=94=82 applied/nr,sz =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82= ... =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 ... =E2=94=82 =E2=94=82 ... More DAMON Future Works =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D In addition to above mentioned things, there are many works to do. It wo= uld be better to extend DAMON for more use cases and address spaces support, inc= luding page granularity, idleness only, read/write only, page cache only, and cg= roups monitoring supports. Also it would be valuable to improve the accuracy of monitoring, using so= me adaptive monitoring attributes tuning or some new fancy idea[1]. DAMOS could also be improved by utilizing its own autotuning feature, for example, by monitoring PSI and other metrics related to the given action. [1] https://linuxplumbersconf.org/event/11/contributions/984/ Thank you For Reading This =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D So, I shared current rough and immature plans off the top of my head here= . Hope this helps you understanding what I'm thinking about for the future = of DAMON. Please note again that those are only in brainstorming level and = some are only groundless idea. Some might be just insane ideas. Hence, every= thing is open for change or failure. If you have any comment, please feel free= to let me know. Thanks, SJ