From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E82DC35274 for ; Thu, 21 Dec 2023 09:30:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7D5B06B0071; Thu, 21 Dec 2023 04:30:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 786A36B0074; Thu, 21 Dec 2023 04:30:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6267C6B0075; Thu, 21 Dec 2023 04:30:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 508646B0071 for ; Thu, 21 Dec 2023 04:30:05 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 051A21A0694 for ; Thu, 21 Dec 2023 09:30:04 +0000 (UTC) X-FDA: 81590303970.23.A45A8E1 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf05.hostedemail.com (Postfix) with ESMTP id 55E36100023 for ; Thu, 21 Dec 2023 09:30:02 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=AgAxTj55; dkim=pass header.d=suse.com header.s=susede1 header.b=AgAxTj55; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf05.hostedemail.com: domain of mhocko@suse.com designates 195.135.223.131 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703151002; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=H4HSDhkn3mbzmn+HYU4/3nbaXxmpHuloAKvKi9O9PuU=; b=vjfjtW/Vs1y4L8HuSsQErnDPB23BeK/aqDw2KW1K3iS5tuG5LGgoJlSjyNxtc/CGhWwzno HZuyWd+8z93W2cu3BjFuQhZYLMx3sGcDyddx14m5ssFBahs/N8zq3+w98JIO8ZItCKRHHo ntoA9IbCSBwLXdC18V79y1POzQ8rjrs= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=AgAxTj55; dkim=pass header.d=suse.com header.s=susede1 header.b=AgAxTj55; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf05.hostedemail.com: domain of mhocko@suse.com designates 195.135.223.131 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703151002; a=rsa-sha256; cv=none; b=w+2KgKnq27xlhbFacl8Gc1BmU5PswdyaFzmkeUdP9RRK+CFAkxZIYaSduzKma9bkaB9AKE z2oehMXAQLJq3Fhi+SFHx8nN6Tt1TDJ9kH3r5/X6qUd8kpLkWQOy+HlL4CzEig+cV5Uu2B B+GA4QTFLYsYX0sa+iteegXUw7CWlUc= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 3AA9F1FB5E; Thu, 21 Dec 2023 09:30:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1703151000; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=H4HSDhkn3mbzmn+HYU4/3nbaXxmpHuloAKvKi9O9PuU=; b=AgAxTj55I08NGUtiw7xFFRpCJA9QY9GYswORVLiK6V8E/0xB8HJFqZg7nkUOPXh+xTjrtF itfvWm1l5RiV7Yp/cQ0k17lUCDF+jZh9y1zcEZsDNVwHBRFx0Qp4yApRMvZXqkpK1CabC7 JuC/t2PNhuMfCWcaCH3mDiJgqGnZyo0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1703151000; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=H4HSDhkn3mbzmn+HYU4/3nbaXxmpHuloAKvKi9O9PuU=; b=AgAxTj55I08NGUtiw7xFFRpCJA9QY9GYswORVLiK6V8E/0xB8HJFqZg7nkUOPXh+xTjrtF itfvWm1l5RiV7Yp/cQ0k17lUCDF+jZh9y1zcEZsDNVwHBRFx0Qp4yApRMvZXqkpK1CabC7 JuC/t2PNhuMfCWcaCH3mDiJgqGnZyo0= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id EC01213725; Thu, 21 Dec 2023 09:29:59 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id MaiZNpcFhGU2MgAAD6G6ig (envelope-from ); Thu, 21 Dec 2023 09:29:59 +0000 Date: Thu, 21 Dec 2023 10:29:59 +0100 From: Michal Hocko To: Dan Schatzberg Cc: Johannes Weiner , Roman Gushchin , Yosry Ahmed , Huan Yang , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Tejun Heo , Zefan Li , Jonathan Corbet , Shakeel Butt , Muchun Song , Andrew Morton , Kefeng Wang , SeongJae Park , "Vishal Moola (Oracle)" , Nhat Pham , Yue Zhao Subject: Re: [PATCH v5 2/2] mm: add swapiness= arg to memory.reclaim Message-ID: References: <20231220152653.3273778-1-schatzberg.dan@gmail.com> <20231220152653.3273778-3-schatzberg.dan@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231220152653.3273778-3-schatzberg.dan@gmail.com> X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 55E36100023 X-Stat-Signature: g1jux7y6kysjcehk6rsrao4fb84qn8he X-Rspam-User: X-HE-Tag: 1703151002-969854 X-HE-Meta: U2FsdGVkX18SKzIOAtY7OiuNWqScMBx9QcOpeOXkStbcsU0Zjrp07jGLR/rHccyvKnKZGQevV8iOXS3YB3ZoYvzMZ71tpaUa4vMrKc0P5apy4vQFV4zQAUkwMTJinsWtksoLvVnacvrkN3zX7/jk5Nxmxhgni6sO8zeqAjSy0qnskdUxS/F0K2e3GzfNeL9lYa+RPnbWBd6SXzElIEefyRDIZi8CgqbU7ZUAWd9IeKMRXYpxiaH6LN8+o66Hua/H+4l7Cq/Jsq18cQqGJRHG6N2Z6/DMOOGDjxq0niRBvur5+GSMQlGzxv5bgxVmQZeSENWY7e40A0wmu5lxVxiVtCXANBuVKnkB9rcwNi1Ka/O1ge0gotEPIiyqAA9KcvZGN+9otT/2hqgTaxg7SgVFV1tzZ4VU4UM6XvKuGopze7OIFytO7cPAgBjTFgyywkINWGUoFc/K+PzraSVwGS/AUc9vQMQSGSviRzO8/8xNLsWsFKUTNnEqFvJCOUg/QmgJvPOHdTzYGN02xLwpgsJbVFpJEQsMJX/YppIIcbGvmD0zKut/+/SdUDYsp6zqGqgxVWKySp3Ez9VwHMj0mvnUID19QEbrJlsURnPGJUB509CCn7nT1mu6MFsPVvsPKZosxcW/m8fVRQmUTRb08gcy6VPUF8GEB71bmhQcmpFcCjYJIzyW4tK6kj9AR1/X4zuWrYMmHfwaZUm9kEdmXt2x+4mMwQueuTpflkRrO6N+9Vrd17x0K4otfhSftuAciYh8U4MUZJkmaipnid4kIvy2jS9hp4uFq8tkb+HvYtQEuO87qoFr2o8i2WB5236V7D+lq7wnFNYXYSJAPTW8erwsgwKE+uhUpglUIDWNl4az4aQ7nG9rD0GURiRflm7z6EKtTDtdkFAEmCsJiznbqiIa3SMVKUe0/wKAOvAlc/zknF4KaT+UU6Ag/vM+WwaZEnudWpcScIDLKO5HbUrMWDv TneEzSsQ Tv8fYE82AnIXDx6u5q68oGfVcjcJeUYXq5b4UwPK9vs8tYe+tY9Nh0F18sV5C0TOqabW1MI5xuwiVVfdXdpA8LjIv240oSlC/c+SIm36rpElgWa8oSHBMBt9eOUTAW62Z8hQ6xFbQxGExnXgNfF/1hOVOR0iDgNVfpqekmHjq8Jld3Ql/Q2jUwc3Z6+z+HTWC1h+2ZmF/n541OJCDYtopM1fUVWVeTlkywZcr/fq5eOfK54E4XQlrAY7tZ1Rl8dC2rHO6f7Mtb2e+J8WwyB9zcTQmVqxO0RDKyrqNf8ZsGeRvVmB68D2GPWpd4/gZ7u9UMgAQoqxE1UozaTNhWG6LL6zefR8wLuOkrnJrUF8KZV6YU1yKtV6cadoMqu5YrXNT+bZ5TH+nx1v3zPT6iRcpvdgXqvLncudix8e2NPul6G+h2gZglkcdzvYrGrFR4wBz72LI1i+U+ZhPiTULwhlC3S5xb3HbojFcszzk9pMzId+ZoN+wlvw/qmpqzw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed 20-12-23 07:26:51, Dan Schatzberg wrote: > Allow proactive reclaimers to submit an additional swappiness= > argument to memory.reclaim. This overrides the global or per-memcg > swappiness setting for that reclaim attempt. > > For example: > > echo "2M swappiness=0" > /sys/fs/cgroup/memory.reclaim > > will perform reclaim on the rootcg with a swappiness setting of 0 (no > swap) regardless of the vm.swappiness sysctl setting. > > Userspace proactive reclaimers use the memory.reclaim interface to > trigger reclaim. The memory.reclaim interface does not allow for any way > to effect the balance of file vs anon during proactive reclaim. The only > approach is to adjust the vm.swappiness setting. However, there are a > few reasons we look to control the balance of file vs anon during > proactive reclaim, separately from reactive reclaim: > > * Swapout should be limited to manage SSD write endurance. In near-OOM > situations we are fine with lots of swap-out to avoid OOMs. As these are > typically rare events, they have relatively little impact on write > endurance. However, proactive reclaim runs continuously and so its > impact on SSD write endurance is more significant. Therefore it is > desireable to control swap-out for proactive reclaim separately from > reactive reclaim > > * Some userspace OOM killers like systemd-oomd[1] support OOM killing on > swap exhaustion. This makes sense if the swap exhaustion is triggered > due to reactive reclaim but less so if it is triggered due to proactive > reclaim (e.g. one could see OOMs when free memory is ample but anon is > just particularly cold). Therefore, it's desireable to have proactive > reclaim reduce or stop swap-out before the threshold at which OOM > killing occurs. > > In the case of Meta's Senpai proactive reclaimer, we adjust > vm.swappiness before writes to memory.reclaim[2]. This has been in > production for nearly two years and has addressed our needs to control > proactive vs reactive reclaim behavior but is still not ideal for a > number of reasons: > > * vm.swappiness is a global setting, adjusting it can race/interfere > with other system administration that wishes to control vm.swappiness. > In our case, we need to disable Senpai before adjusting vm.swappiness. > > * vm.swappiness is stateful - so a crash or restart of Senpai can leave > a misconfigured setting. This requires some additional management to > record the "desired" setting and ensure Senpai always adjusts to it. > > With this patch, we avoid these downsides of adjusting vm.swappiness > globally. Thank you for extending the changelog with usecases! > [1]https://www.freedesktop.org/software/systemd/man/latest/systemd-oomd.service.html > [2]https://github.com/facebookincubator/oomd/blob/main/src/oomd/plugins/Senpai.cpp#L585-L598 > > Signed-off-by: Dan Schatzberg > --- > Documentation/admin-guide/cgroup-v2.rst | 18 ++++---- > include/linux/swap.h | 3 +- > mm/memcontrol.c | 56 ++++++++++++++++++++----- > mm/vmscan.c | 13 +++++- > 4 files changed, 69 insertions(+), 21 deletions(-) LGTM Acked-by: Michal Hocko swappiness) + return *sc->swappiness; + return mem_cgroup_swappiness(memcg); +} #else static bool cgroup_reclaim(struct scan_control *sc) { @@ -245,6 +254,10 @@ static bool writeback_throttling_sane(struct scan_control *sc) { return true; } +static int sc_swappiness(struct scan_control *sc, struct mem_cgroup *memcg) +{ + return READ_ONCE(vm_swappiness); +} #endif static void set_task_reclaim_state(struct task_struct *task, @@ -2330,8 +2343,7 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, struct pglist_data *pgdat = lruvec_pgdat(lruvec); struct mem_cgroup *memcg = lruvec_memcg(lruvec); unsigned long anon_cost, file_cost, total_cost; - int swappiness = sc->swappiness ? - *sc->swappiness : mem_cgroup_swappiness(memcg); + int swappiness = sc_swappiness(sc, memcg); u64 fraction[ANON_AND_FILE]; u64 denominator = 0; /* gcc */ enum scan_balance scan_balance; @@ -2612,10 +2624,7 @@ static int get_swappiness(struct lruvec *lruvec, struct scan_control *sc) mem_cgroup_get_nr_swap_pages(memcg) < MIN_LRU_BATCH) return 0; - if (sc->swappiness) - return *sc->swappiness; - - return mem_cgroup_swappiness(memcg); + return sc_swappiness(sc, memcg); } static int get_nr_gens(struct lruvec *lruvec, int type) -- Michal Hocko SUSE Labs