From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1AFF4C27C75 for ; Tue, 11 Jun 2024 19:25:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8A26D6B00B5; Tue, 11 Jun 2024 15:25:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 850FB6B00B6; Tue, 11 Jun 2024 15:25:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6CA6B6B00BC; Tue, 11 Jun 2024 15:25:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 4661F6B00B5 for ; Tue, 11 Jun 2024 15:25:37 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id BA100A0F3D for ; Tue, 11 Jun 2024 19:25:36 +0000 (UTC) X-FDA: 82219587072.23.6483BD3 Received: from out-189.mta1.migadu.com (out-189.mta1.migadu.com [95.215.58.189]) by imf18.hostedemail.com (Postfix) with ESMTP id CE6991C0004 for ; Tue, 11 Jun 2024 19:25:33 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="U5m/6k/F"; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf18.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.189 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718133934; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=HHX4czqVnKBAHHkY7DkzRLGWyI8lBYat9UvV74uCHE8=; b=q9t2jnpZ4mjFIlzdforlA9n4KDij/xizXHu5o5MIsLc/FQfSfQxhqR9fMuLdmv3FOV3OBZ 8w9XKH7PWfStKh7ZdbYBTurUeD2OdEjxWT7AsdcPhKVBDXK5EmyUDipY5PI4BKGY1FBvuY Sae5pexDLAt+qiazH9cx1pwieOwFgSo= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="U5m/6k/F"; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf18.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.189 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718133934; a=rsa-sha256; cv=none; b=7GCdsq/rzk4PzKZ5q3gr74NgjJHvQTFaz/S7ELSxjvZT1PaGaoKZiT+5GIokKQurWis98+ GXKWg8PVFM+8Dr/12A/V8HvfN6yThJCq6M58dFZ8Kg2DQxWmjedIE6mkMUimoyvxtb3XsJ MigaBwFawUBmZAignBScsWL4nhhT2cg= X-Envelope-To: findns94@gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1718133931; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=HHX4czqVnKBAHHkY7DkzRLGWyI8lBYat9UvV74uCHE8=; b=U5m/6k/FSCGoNcJDsu7J7cbq3oMNJQTCeyoGCy8ucoPF30Flv6V6WBMaPuw2ABlJ+yub5u o5+y02lHLqw2MgS74XKeN/yMyfZLc7YMI2lvNCRzGxhwsCDQtDm5DYuQI5QKMg1WdziIKg G8GZQj7JbvRr4mvJhQAJou3ja510ARc= X-Envelope-To: akpm@linux-foundation.org X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: linux-mm@kvack.org X-Envelope-To: tj@kernel.org X-Envelope-To: lizefan.x@bytedance.com X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: corbet@lwn.net X-Envelope-To: mhocko@kernel.org X-Envelope-To: roman.gushchin@linux.dev X-Envelope-To: shakeelb@google.com X-Envelope-To: muchun.song@linux.dev X-Envelope-To: david@redhat.com X-Envelope-To: chrisl@kernel.org X-Envelope-To: willy@infradead.org X-Envelope-To: wangkefeng.wang@huawei.com X-Envelope-To: yosryahmed@google.com X-Envelope-To: findns94@gmail.com X-Envelope-To: hughd@google.com X-Envelope-To: schatzberg.dan@gmail.com Date: Tue, 11 Jun 2024 12:25:24 -0700 X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Yue Zhao , Andrew Morton Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Tejun Heo , Zefan Li , Johannes Weiner , Jonathan Corbet , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , David Hildenbrand , Chris Li , Matthew Wilcox , Kefeng Wang , Yosry Ahmed , Yue Zhao , Hugh Dickins , Dan Schatzberg Subject: Re: [PATCH v6 0/2] Add swappiness argument to memory.reclaim Message-ID: References: <20240103164841.2800183-1-schatzberg.dan@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240103164841.2800183-1-schatzberg.dan@gmail.com> X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: CE6991C0004 X-Rspamd-Server: rspam10 X-Rspam-User: X-Stat-Signature: aex1agjpjgnupu9i9iumq9dd6tfg3y6w X-HE-Tag: 1718133933-146675 X-HE-Meta: U2FsdGVkX18ARpftfNXcIMEkuqEkZ73noQGnS0MtcQMPxmmoegSQOoYxmHCZd94jC7m/NYdzva+WmACai5najNPOuJ9UMQZlwyW15q6nNj615RHTWsuqZX/apJvTtqGm7Igp7fTwrGqExYf/+JYc+J5sAP10pJohu5TfRoNyXOXaplR5HvBTV2WN1uAS3qhRFtUOiyCgDoaFZ5BYtY0d0DO8S6KRfHdOfg+V4J/drJToQiOkwhYbgIijJkB2+C5JqJ7nV0LDfdvjnrqx7lnjdMY8IWf5FPKDeZxgeAH3+XnqkKwp4VfDY8xDSzSi6mzkAoEXKFxe7TGk5dWAnQfYSzqFJmkLQrpSXdJ9dmQ4yH657ZZBgQqaRVrHXQb1ixJ36lauRsqq9BYfBMKmlNCOxqYGZ5RIhsr5/3lkt2X5PNsBdpx2vCLf5wGTQzyaQftMoMrZgp+GmT/uxq2/YDmiP0Jpxwap/lwaKd8nid/mc39W6toL+lX9a47mQHISYNYe4Hvpb2w7SRoHDuP5oQjud2n0zMQQF43s+sX8CgDXmljC5GNmGhqL+dAZWJgN7jJpa0ioVf0yxFYH7lZwA87qP7HJvWx5+kiE40XmSDODsux/4tRfjUrHY+KtKbgFGjDBfVOUcATrmfCdTG25JZ6oQL5qqR3BGc9I0heVBtmJ9XVRV9qGxI3VjvYSkfy6v1MRE6f3O+wVcZqbj/7HWctWSA5W2kMiMoWZk1tWvbGBPDj3UWxcueaCwL+PLdtargbWQ09+cF4Qhf55AECy1+YIKjiIF1Q2R0Bi2LR4px0l3uC7lN7ExzkaolfWCCFLgg0Pk3jXDPtl0Mkk/WeEf46wsbBDhXvqeWiGPs4pdcOmis8q86ftt6sYzCiX0PXqJIE0qVTmF5EAuYWgT4L52mv1GsakSOW4EDhBWKfxK+DqlDimjz554LJAmDC+cqqGUT9M9Y9fA09B+pq9aCrUFqS ZICboMsN B/rRLtLnvPuihC4Wv7fhVep7/yH8Mixv6p5gKUTkODu/ds9SyFfLgXe5SBgblyh2j0fwWQtf/76P9xxI0zqvWbPBYSO/9BraKYdBNw2F0rMMJAhtdfWM/WACJH2CwHkbXjWwD4L0G/9DUbIXdppL5iZMZcnIYx7s4rXNJJ0YtJB9JynksfsIi/TT3eZJrlnC+fTAfvdltXeXCxkdf300jYEwedZ3To5U2WbTQo0ZZwb7IcCbmlj9WZLUgEeQeuD5+MKh1aJ/fdQYfZevR+2y+SO7MNgNAH63/iocEPOM3/jDrF+0kI0gGh7+jWxOqc12ZnprNdkAF/RUvoR7NxZz30i7Yz9T6Y84px9M0pfz/XK2w/FQYKGhdR32YYNR9xKCmcuP4UozbleefNpij6z8htQ+ltIkwhhusAA4TSGty4I8FlJa24ZfcIYb7AA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi folks, This series has been in the mm-unstable for several months. Are there any remaining concerns here otherwise can we please put this in the mm-stable branch to be merged in the next Linux release? On Wed, Jan 03, 2024 at 08:48:35AM GMT, Dan Schatzberg wrote: > Changes since V5: > * Made the scan_control behavior limited to proactive reclaim explicitly > * created sc_swappiness helper to reduce chance of mis-use > > Changes since V4: > * Fixed some initialization bugs by reverting back to a pointer for swappiness > * Added some more caveats to the behavior of swappiness in documentation > > Changes since V3: > * Added #define for MIN_SWAPPINESS and MAX_SWAPPINESS > * Added explicit calls to mem_cgroup_swappiness > > Changes since V2: > * No functional change > * Used int consistently rather than a pointer > > Changes since V1: > * Added documentation > > This patch proposes augmenting the memory.reclaim interface with a > swappiness= argument that overrides the swappiness value for that instance > of proactive reclaim. > > Userspace proactive reclaimers use the memory.reclaim interface to trigger > reclaim. The memory.reclaim interface does not allow for any way to effect the > balance of file vs anon during proactive reclaim. The only approach is to adjust > the vm.swappiness setting. However, there are a few reasons we look to control > the balance of file vs anon during proactive reclaim, separately from reactive > reclaim: > > * Swapout should be limited to manage SSD write endurance. In near-OOM > situations we are fine with lots of swap-out to avoid OOMs. As these are > typically rare events, they have relatively little impact on write endurance. > However, proactive reclaim runs continuously and so its impact on SSD write > endurance is more significant. Therefore it is desireable to control swap-out > for proactive reclaim separately from reactive reclaim > > * Some userspace OOM killers like systemd-oomd[1] support OOM killing on swap > exhaustion. This makes sense if the swap exhaustion is triggered due to > reactive reclaim but less so if it is triggered due to proactive reclaim (e.g. > one could see OOMs when free memory is ample but anon is just particularly > cold). Therefore, it's desireable to have proactive reclaim reduce or stop > swap-out before the threshold at which OOM killing occurs. > > In the case of Meta's Senpai proactive reclaimer, we adjust vm.swappiness before > writes to memory.reclaim[2]. This has been in production for nearly two years > and has addressed our needs to control proactive vs reactive reclaim behavior > but is still not ideal for a number of reasons: > > * vm.swappiness is a global setting, adjusting it can race/interfere with other > system administration that wishes to control vm.swappiness. In our case, we > need to disable Senpai before adjusting vm.swappiness. > > * vm.swappiness is stateful - so a crash or restart of Senpai can leave a > misconfigured setting. This requires some additional management to record the > "desired" setting and ensure Senpai always adjusts to it. > > With this patch, we avoid these downsides of adjusting vm.swappiness globally. > > Previously, this exact interface addition was proposed by Yosry[3]. In response, > Roman proposed instead an interface to specify precise file/anon/slab reclaim > amounts[4]. More recently Huan also proposed this as well[5] and others > similarly questioned if this was the proper interface. > > Previous proposals sought to use this to allow proactive reclaimers to > effectively perform a custom reclaim algorithm by issuing proactive reclaim with > different settings to control file vs anon reclaim (e.g. to only reclaim anon > from some applications). Responses argued that adjusting swappiness is a poor > interface for custom reclaim. > > In contrast, I argue in favor of a swappiness setting not as a way to implement > custom reclaim algorithms but rather to bias the balance of anon vs file due to > differences of proactive vs reactive reclaim. In this context, swappiness is the > existing interface for controlling this balance and this patch simply allows for > it to be configured differently for proactive vs reactive reclaim. > > Specifying explicit amounts of anon vs file pages to reclaim feels inappropriate > for this prupose. Proactive reclaimers are un-aware of the relative age of file > vs anon for a cgroup which makes it difficult to manage proactive reclaim of > different memory pools. A proactive reclaimer would need some amount of anon > reclaim attempts separate from the amount of file reclaim attempts which seems > brittle given that it's difficult to observe the impact. > > [1]https://www.freedesktop.org/software/systemd/man/latest/systemd-oomd.service.html > [2]https://github.com/facebookincubator/oomd/blob/main/src/oomd/plugins/Senpai.cpp#L585-L598 > [3]https://lore.kernel.org/linux-mm/CAJD7tkbDpyoODveCsnaqBBMZEkDvshXJmNdbk51yKSNgD7aGdg@mail.gmail.com/ > [4]https://lore.kernel.org/linux-mm/YoPHtHXzpK51F%2F1Z@carbon/ > [5]https://lore.kernel.org/lkml/20231108065818.19932-1-link@vivo.com/ > > Dan Schatzberg (2): > mm: add defines for min/max swappiness > mm: add swapiness= arg to memory.reclaim > > Documentation/admin-guide/cgroup-v2.rst | 18 +++++--- > include/linux/swap.h | 5 ++- > mm/memcontrol.c | 58 ++++++++++++++++++++----- > mm/vmscan.c | 39 ++++++++++++----- > 4 files changed, 90 insertions(+), 30 deletions(-) > > -- > 2.39.3 >