From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C8DDC41535 for ; Fri, 22 Dec 2023 05:32:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CC0156B0081; Fri, 22 Dec 2023 00:32:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C49256B0082; Fri, 22 Dec 2023 00:32:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AC2B56B0085; Fri, 22 Dec 2023 00:32:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 967476B0081 for ; Fri, 22 Dec 2023 00:32:40 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 5D1641C1788 for ; Fri, 22 Dec 2023 05:32:40 +0000 (UTC) X-FDA: 81593334480.12.DD55444 Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com [209.85.128.48]) by imf29.hostedemail.com (Postfix) with ESMTP id 8BF42120004 for ; Fri, 22 Dec 2023 05:32:38 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Kk7Srnr8; spf=pass (imf29.hostedemail.com: domain of yuzhao@google.com designates 209.85.128.48 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703223158; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=aei49JaNshkuJJzrb21/FHbkROcvggHQaKMtSVNdNfI=; b=F+Wui19vicsuSm15OHLNWdA0VH1CZK+O6ooujiIwKpUfmz4dIQ5yNNuynJ15LqEWQnoB9/ QLeSofiwFWkE7eQgmn65ZG7PTb6sj+gSlD8DAloX5Wkec2fR/dx36/d58lSkpedoSGa6Gk zbK48B9xbAYE5hzTcWALwcBNX0Y7mEs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703223158; a=rsa-sha256; cv=none; b=LOpi+3Oik2R+PVwdx81L4CkEKA2kzMXh5lfMzpqzPsu4ZnJkbEO2oyk9PyKfwGn1Gnn+Rj Hc96MWhFCD/8a1q14Cwc0URSdm2uW1K75qegvMEFfbwmibaiN4WIgcM93x0Y6+jBJrxRxq T3C8X3l8yKoDLSaeDnTJ8yxhdgOumfI= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Kk7Srnr8; spf=pass (imf29.hostedemail.com: domain of yuzhao@google.com designates 209.85.128.48 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-wm1-f48.google.com with SMTP id 5b1f17b1804b1-40d3102d5d6so34865e9.0 for ; Thu, 21 Dec 2023 21:32:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1703223157; x=1703827957; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=aei49JaNshkuJJzrb21/FHbkROcvggHQaKMtSVNdNfI=; b=Kk7Srnr8yiDaPx4cTfkEVxR6rXb3cty3KRa9ZXVgorUeHltW7OmuV4y1d9XiFS35lK EzzZvF3rH3ALJConuXnTk68qLVUne56q2p+9GYN/z9g3dnPiTd4K5WvghaBrH+7Hr91f AbEh6rejRuQV2RD+6hS4AcytrvvqZhCGHhms26QSeLB7T1MnOFdBR/MkiofdrzuykhnX 2j05XFzSVo9GtOvZIjV/SHovMPnjlYkStInlzkAoi4Wb136GNoJNAtczrbSijWkZsPti QnwaPR7/hd1U0/ZGbdRcOZVglnUpaRs241e4OfS0BROPMiAlcw+s2RVqN/zBsX33k6X+ Ojyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703223157; x=1703827957; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=aei49JaNshkuJJzrb21/FHbkROcvggHQaKMtSVNdNfI=; b=EIVl3zbn55M1HOQEz7hmP7WwWkTgaI0bBNqGFKpuu1BRqxtDMTosksEzHMHkYlt27H ZvztM/pigXeLfujuZYKuZgHxcxcakkJRxMEGGOaJtN0bEER+jA+rlSP82yB7H9+YxKmq tZy6tjT6lscgWXicNw92ho1uJogreZTTSALlsomcx2rVo6271te96EMFdT/bHE+2n15u YaXH4wEFb2nL4SdZ7Qm7zLI+BIT3Jd8ccgBxPh3qEgGsFfmXPTfEhrbaMW8PFo8fmmJI S2nF8Gu41Kf0RozxVmyJH36KZDrYNMUeFHvy2UQ0qhClaiivBR41PGB/yHbS1aCTPW5w bnwA== X-Gm-Message-State: AOJu0YyZYwmj8vEGq6orH1cFcOP4qqsTFNy72cIGEc4YZSsHcaaW6Jr4 2QtZL9vhiC0CE8mEE9oOUa7fXSovv47zXJrMuWhroyZVdO7R X-Google-Smtp-Source: AGHT+IFtz4p23N7qFQyJ8bmGETMjPxi+UsXnIB0l2LyxDlv5gRmgh40FQ/h5BJF9BFQu01W4sbr5+rn5xst7zCPNa6g= X-Received: by 2002:a05:600c:1c85:b0:40b:4355:a04b with SMTP id k5-20020a05600c1c8500b0040b4355a04bmr70112wms.6.1703223156969; Thu, 21 Dec 2023 21:32:36 -0800 (PST) MIME-Version: 1.0 References: <20231220152653.3273778-1-schatzberg.dan@gmail.com> <20231220152653.3273778-3-schatzberg.dan@gmail.com> In-Reply-To: <20231220152653.3273778-3-schatzberg.dan@gmail.com> From: Yu Zhao Date: Thu, 21 Dec 2023 22:31:59 -0700 Message-ID: Subject: Re: [PATCH v5 2/2] mm: add swapiness= arg to memory.reclaim To: Dan Schatzberg Cc: Johannes Weiner , Roman Gushchin , Yosry Ahmed , Huan Yang , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Tejun Heo , Zefan Li , Jonathan Corbet , Michal Hocko , Shakeel Butt , Muchun Song , Andrew Morton , Kefeng Wang , SeongJae Park , "Vishal Moola (Oracle)" , Nhat Pham , Yue Zhao Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 966n8umzs1cxxshrejdd85qkje3mnmq7 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 8BF42120004 X-Rspam-User: X-HE-Tag: 1703223158-739457 X-HE-Meta: U2FsdGVkX19NzlZXDx84oqeHGrhaz0kcFmwzmfUBcua/d0lti/27Yp8XzhOujUHpq2OKxwaGYw7RIEZgK5E1Aalp8MXYgkI/VB4XmEt/1n0/sWb1/6wNP9x9Ub47xHY+bUf8SsuWm73kUsba29RtoGmRJB94W91kbTz/xdPx3B98UH+LjsHFBMgqr+PAd7IrhteYKXYEkP+2xg7WadH1e1kPQjsJ4w+yPxj+am5SxhQUCrAPqpj1q0hidIftLiq7+fmUwBYViYPuN8BaNbMit+UXW10wOZuBnrFe9yYmJB1nWzqVVz+x/0a5GxObUuf9iD6Qke/rsCXt1WFE2oHRzLpyTwWOAXOKxoyVB6k7NRWVf1+NPDgVCyEw2t8OtqeEe0pwOl+R1EbJy9XWsJygZ20c001+inO8i3b5JTZyBp+H8qW4e/KUxU5dreLnR3bwk4ef/X+RSDYN3hDoQIJK1KOxOxTl0AH25Jl0VDFzikEitLlzVwNbZ+IKDmphvRqYIxFJmR4sCCRojhb67iFCi/OYyBL8t9fmP61ZBBSgneoONgIvZ+W81e/sgw/MnbSAa4fwXJbMi/ORuRVBduLQ557KlRnyOGjMZvAaNhCJIlRYcYdy5FFG6YYXFQg/C9Vz9e8FrHpQhrM2QK+hJnny6qmxaSt0AKo1Gft2GA+YTsYthwNodel0P1OrI4iTj/ZQRpwdqwRFBLmuL/lh1bQt9Z+d30on1KBuJYKnuYPPiDwAPBTFHKKxTIeDpAySd8qjnQb7iBlvBCSaJGLGBHvL36LMtmeUX/rYR1AXd51KLWxuMH+PSPYMlUk9Xk7Z2AOfWXnY5YpySz9b2GV5s0Mem9gS6w6xYOVTancIZv9P8cH3c8Hh0X8zoZwXIYWffE46M7UNq9KF4Lw/jPhOnYdSduVYXfbiwWWrDfj/0w6jdvnh3pyNVwCfCLJR7bNuXAZbVJifUo+UkTi7O3QFrhF UyEGnZCK tyekcRSlVodgQk8NeGjdtVp+G8q/9sQbCzXfe1tP1KYJR11cIcZhu9JCYzG/zKnKcslpTml32VuHsD+/InKXFl5S7SvxqjZgTGeuaZVrQx6MWpZXxvfsTw0vFBwnXeQvD85d6QH9iDOlIFGmc46JoEhok+Y1omv9AINaNVSjFZkQMWk9FZ/zIQvIsuYRuKWRRO4/SA0KOKRKpuJS69P1TywTcibCv6bw6ft7rg48KsaZJfTy/7r5qpvJYsHKWpV2hztO2gU2ePJFLxgm9MG4cm2wFFuC012eCtBCGDlBvsPliFbOPRe7iSkIily6maf1kHJnul5LhSrfZ798Sx+MG14mr0hiI7pEO4ZxEbJALRxU/T+Ze436p2PZ57PnedNKILHFdQDZyWRi9HHQ+Sqbc5NuHMkYyEZN1KukD62HwyJmLfXl2vucjmfe4xZ0s4w4gNUe21LQl70x7gxwksrvUQPJj0743exhYGJXo3m6M6BBSsZSw/FRnZnujY+5zQueM6y5Fkh3NW8rjtMnyt6tY0mNZHSBXKYMKy/ia X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Dec 20, 2023 at 8:27=E2=80=AFAM Dan Schatzberg wrote: > > Allow proactive reclaimers to submit an additional swappiness=3D > argument to memory.reclaim. This overrides the global or per-memcg > swappiness setting for that reclaim attempt. > > For example: > > echo "2M swappiness=3D0" > /sys/fs/cgroup/memory.reclaim > > will perform reclaim on the rootcg with a swappiness setting of 0 (no > swap) regardless of the vm.swappiness sysctl setting. > > Userspace proactive reclaimers use the memory.reclaim interface to > trigger reclaim. The memory.reclaim interface does not allow for any way > to effect the balance of file vs anon during proactive reclaim. The only > approach is to adjust the vm.swappiness setting. However, there are a > few reasons we look to control the balance of file vs anon during > proactive reclaim, separately from reactive reclaim: > > * Swapout should be limited to manage SSD write endurance. In near-OOM > situations we are fine with lots of swap-out to avoid OOMs. As these are > typically rare events, they have relatively little impact on write > endurance. However, proactive reclaim runs continuously and so its > impact on SSD write endurance is more significant. Therefore it is > desireable to control swap-out for proactive reclaim separately from > reactive reclaim > > * Some userspace OOM killers like systemd-oomd[1] support OOM killing on > swap exhaustion. This makes sense if the swap exhaustion is triggered > due to reactive reclaim but less so if it is triggered due to proactive > reclaim (e.g. one could see OOMs when free memory is ample but anon is > just particularly cold). Therefore, it's desireable to have proactive > reclaim reduce or stop swap-out before the threshold at which OOM > killing occurs. > > In the case of Meta's Senpai proactive reclaimer, we adjust > vm.swappiness before writes to memory.reclaim[2]. This has been in > production for nearly two years and has addressed our needs to control > proactive vs reactive reclaim behavior but is still not ideal for a > number of reasons: > > * vm.swappiness is a global setting, adjusting it can race/interfere > with other system administration that wishes to control vm.swappiness. > In our case, we need to disable Senpai before adjusting vm.swappiness. > > * vm.swappiness is stateful - so a crash or restart of Senpai can leave > a misconfigured setting. This requires some additional management to > record the "desired" setting and ensure Senpai always adjusts to it. > > With this patch, we avoid these downsides of adjusting vm.swappiness > globally. > > [1]https://www.freedesktop.org/software/systemd/man/latest/systemd-oomd.s= ervice.html > [2]https://github.com/facebookincubator/oomd/blob/main/src/oomd/plugins/S= enpai.cpp#L585-L598 > > Signed-off-by: Dan Schatzberg The cover letter says: "Previously, this exact interface addition was proposed by Yosry[3]." So I think it should be acknowledged with a Suggested-by, based on: "A Suggested-by: tag indicates that the patch idea is suggested by the person named and ensures credit to the person for the idea." from https://docs.kernel.org/process/submitting-patches.html#using-reported-by-t= ested-by-reviewed-by-suggested-by-and-fixes > Documentation/admin-guide/cgroup-v2.rst | 18 ++++---- > include/linux/swap.h | 3 +- > mm/memcontrol.c | 56 ++++++++++++++++++++----- > mm/vmscan.c | 13 +++++- > 4 files changed, 69 insertions(+), 21 deletions(-) ... > diff --git a/mm/vmscan.c b/mm/vmscan.c > index d91963e2d47f..aa5666842c49 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -92,6 +92,9 @@ struct scan_control { > unsigned long anon_cost; > unsigned long file_cost; > > + /* Swappiness value for reclaim. NULL will fall back to per-memcg= /global value */ > + int *swappiness; Using a pointer to indicate whether the type it points to is overridden isn't really a good practice. A better alternative was suggested during the v2: "Perhaps the negative to avoid unnecessary dereferences." https://lore.kernel.org/linux-mm/dhhjw4h22q4ngwtxmhuyifv32zjd6z2relrcjgnxsw= 6zys3mod@o6dh5dy53ae3/ Since only proactive reclaim can override swappiness, meaning it only happens if sc->proactive is true, I think the best way to make it work without spending much effort is create a helper as Michal suggest but it should look like: sc_swappiness() { return sc->proactive ? sc->swappiness : mem_cgroup_swappiness(memcg); } In this patchset, sc->swappiness really means sc->proactive_swappiness. So it should be renamed accordingly.