From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98496C27C65 for ; Tue, 11 Jun 2024 19:32:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 23CFC6B009A; Tue, 11 Jun 2024 15:32:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1A08A6B009E; Tue, 11 Jun 2024 15:32:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F0CC76B00BF; Tue, 11 Jun 2024 15:32:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id CF6E96B009A for ; Tue, 11 Jun 2024 15:32:05 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 59C021A14BA for ; Tue, 11 Jun 2024 19:32:05 +0000 (UTC) X-FDA: 82219603410.15.4686F4A Received: from mail-ed1-f54.google.com (mail-ed1-f54.google.com [209.85.208.54]) by imf11.hostedemail.com (Postfix) with ESMTP id 6BC414000D for ; Tue, 11 Jun 2024 19:32:03 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=K8TP47Nf; spf=pass (imf11.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.54 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718134323; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=d6NHyMlh1XjpM0rUHf3UC9mWBmlaeBrABi+eYS7la6Y=; b=uwstWlYsjE2dyTredRkZeaTgbpGOMA/a2OlfgrfLHY6Tc7YDGm6MPk5Dan3GB1DJp7gvxS AjbMmAVyfEw+9YkIz7Y/2DaECFkhSgftmjKelBC+q5rHTD5B/7TfKG8S6A7wCefufzKxUi MF51N6syB8D8pf0UDFCemimlc4AAWvo= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=K8TP47Nf; spf=pass (imf11.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.54 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718134323; a=rsa-sha256; cv=none; b=3DPZfuDxeY019iC4z29J6tczVANGJ4YlMYMLwXG5d5RtxJAhp6k7NZmMBgmsrEBC0zkzbI nT8IepBtHaFa5Hsz8+vERjmeXkCrgNdy/cJqLaX9cucZR04KR7bAPo9DR25+0KV19A2vsy FWek4+8RwQ7G7Dwu+3rfwknM/uPfA9Y= Received: by mail-ed1-f54.google.com with SMTP id 4fb4d7f45d1cf-57c5c51cb89so2064790a12.2 for ; Tue, 11 Jun 2024 12:32:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1718134322; x=1718739122; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=d6NHyMlh1XjpM0rUHf3UC9mWBmlaeBrABi+eYS7la6Y=; b=K8TP47NfpWNG4fv4ItNnK0iZZp25Rel8c8z23gCNVo+2JHTJxCgR+FPVcaPV2TDqKs cRfqINOq2BegwkgvtrODwSvCd3eSgdK25Pf5jba/sBAqlTEFKo+2L3ty/p/z3+Q4gS/u 66mfWkscijdaYRcC7OQLiooIbMG7E5hJprq09fdB8yCyKVPJtTGWzha+7QdiB25T98aW c6gr/HqSKFncklDoqPctnupuQ0zNplRsCAwztCOdRGhbFKjMs8L6zt3HGI0vBhDcXBMA siWyMnbZcGgp2K91IN1zr7+SSM41Gth2Z33pXTkoZtuDHajX1dY3RCwmXxMQPVcJ8BOA X1EQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718134322; x=1718739122; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=d6NHyMlh1XjpM0rUHf3UC9mWBmlaeBrABi+eYS7la6Y=; b=gLYcEj6kTO1w0EVlrYTvK+V5h8H08Sb3REq3RMhW+5qcpsTmOcc/iRBGsQxChRmGcd HofxWR8d+rG18rLSlLiBEJyJ8fa7RAHmpp+qm3q6zvbexdcbURhs7xg57maE4lWRbGef mS5V5d6BL+0sj+HYa4aTAC7hOJ7vzqm1l8OtPv+euZjqZGUn0dSCWwQtqxSEzUnJmO8J bAuXRYREEIBthzBU5pUSs9Refd2N1CzpMyyxJdo95bhSUrxp4fRjpvHhLP2clwri1kO2 dPVhXhNpi4RLwUJgSgNj5suxUEaluTOX/ipUqwT69oC8+0NVf0C691ZcMHExKMZD+YqO tJ3Q== X-Forwarded-Encrypted: i=1; AJvYcCVxxx+BhWEXnmx4Rc+prUZPGaejBB9lzATiPxHbrOM2Zef1gVuslHLN+C0QWShFxiQndKCYYqukybRNZ8ViVRqdpdo= X-Gm-Message-State: AOJu0Yz0Ex2plt+IImiJiscWXaryGESzJJnoSNGfvZ3cWYF9+2qvRltu YM4RTL/M6+iPvhK5MjehmXsIK8FSE9jarXCBWuTNaCVBaXyBywJvLvIGSDaGJ48vKVn5zAWaFxl okr+VtRxoDDS3K8IsW339yl+mJnSIB1C6d64T X-Google-Smtp-Source: AGHT+IExfj4luHZ81TErnishCsFt0Gz8tNLLIOMBYI36AT2E/MwqrUGy8quklbIqIexFlV0Wz3Fp9vvk4zfubgQF6a0= X-Received: by 2002:a17:907:9448:b0:a6f:1e23:c4af with SMTP id a640c23a62f3a-a6f1e23c744mr639377866b.62.1718134321446; Tue, 11 Jun 2024 12:32:01 -0700 (PDT) MIME-Version: 1.0 References: <20240103164841.2800183-1-schatzberg.dan@gmail.com> In-Reply-To: From: Yosry Ahmed Date: Tue, 11 Jun 2024 12:31:23 -0700 Message-ID: Subject: Re: [PATCH v6 0/2] Add swappiness argument to memory.reclaim To: Shakeel Butt , Yu Zhao Cc: Yue Zhao , Andrew Morton , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Tejun Heo , Zefan Li , Johannes Weiner , Jonathan Corbet , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , David Hildenbrand , Chris Li , Matthew Wilcox , Kefeng Wang , Hugh Dickins , Dan Schatzberg Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 6BC414000D X-Stat-Signature: wmo7jcaozohcgrs3iomdjm8bpjxemqn3 X-HE-Tag: 1718134323-906227 X-HE-Meta: U2FsdGVkX1+HzEMMfXLwrY0VHmjX3Lce0/N/HwrWYKGG78hl+6njgjYR4IY+r54ScEnE4nkGvhn4Wro0bg+GHZPr3imELOeCfDSecVNq4u587IuTqClWJrdqh2+ivTEpnG+lbkSCrU1J7puU5D0dyr0zNGM8qFttgjGUHVp9DMH83Kf+b3dcuJq7chgqoxdp4dFyF/tbfaWX0qSA7In5sJz+7b+9KTKE4t97SaVxgVxlBlql/ywuFi1GbSagHndLAknhzCubCuDOI/8ZlPcNK6+9e3Ki57XqCOLp9ncgMXAp6vGMFYpfId0+rtrGKJyxAXqZ0zSWjdBfBi5dKEx6Gc18uh7OAbiTE01DV506gz7FKtCAAMpjrIz3pJ2k6QfsgkQ9ZLCI1LQ/3zpbZ5XHOeJV+RkNFFMWRzpdcb5tqjdQDAeVEgtFYGHg5o7mgB5UMRBc1VUzy5pIT55y9dCv+SeVHE68HKiDQiaYRAJg/huR/tiWf20sYSgpjWCySVJm/g5EnI3KmcwADz2H/EWOkTk9qShjRV2P+T+u0GHK3ZdCqFG/cudwoVGtl6j778s69//1DzbDKcCrofD/IdHoHGEdJrUHZdNS4zps7jQQIGHdVteYC1dCMLWXRK+jXvg6g1rg7EGQvyE8tSxA7jylvnpw3rJKpzkATNkToDudrIbmNhH0dWQW98E/i4pn6iX5bqE3Bg1X91+shFi7tvBYIUULsAQeBZP8UcRYrssxRzJqFIHYGfE/3uxEdEIgWcJlKabPTgZjtxs2xl9NlMga28xU21p3gQofCSC0HwfBceC3+BKEdA2uU49VdmOLEvC7BhBNQgw+g0hha9UrfuM3YQHH6cV/1JLDechr/SeAh8R1XerQieZNnGIAycwFu/wch+zKIbGI7wvwVduntl/EiNMUxjtYtkW2QANOGHyHCtwMKQo4WSpu0oFaQkArGH4YVgHU1NN7s1H89VPssEJ LonTqIuu ScqSpcpyvLT++ZHDi9Zi4T2r/TCbmhp/O/cl+ZWpZRsvqRL/dJa9tvUrrKUiusDoEc1hOd4G1SeODJsevtBCSPizuPPAX9ZsuPF9/JMCkyjv4rFLykMe/isJbZErNXc/g2rOxqUM0OzadLiG7mdGg7OqF9C/Mgk2Kix41zAa7vnUaqHMscdTRwoFPmE4DIBc3HmGBSJSRc/oLLQK/tSRagUsOUwcjD9OHN39Mk0FWpZ2/PiCiXPTdU3t50U4iZLudxWA0ws0JytKYcUmGh+N6xY/L1cuho0DnOaYEWE4hEaHFIEdnW+5GF7q3FpKHODCf0T2Wx1MG/W/1L/rs/xv2Nk7XvWUgj/T24hv6PkHQALEflcXaMLBGKD3W1lBPKUOfuoGXYO/pOJBgJZkLChExzImqv39oxIGQ+Bp7Kv5dvmuaNoXy/dGhPpvVuS8GDAyNB9QcCFxg5dosB7fGK5Etvsr8fwFOyfulzAbA X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 11, 2024 at 12:25=E2=80=AFPM Shakeel Butt wrote: > > Hi folks, > > This series has been in the mm-unstable for several months. Are there > any remaining concerns here otherwise can we please put this in the > mm-stable branch to be merged in the next Linux release? +Yu Zhao I don't think Yu Zhao was correctly CC'd on this :) > > On Wed, Jan 03, 2024 at 08:48:35AM GMT, Dan Schatzberg wrote: > > Changes since V5: > > * Made the scan_control behavior limited to proactive reclaim explici= tly > > * created sc_swappiness helper to reduce chance of mis-use > > > > Changes since V4: > > * Fixed some initialization bugs by reverting back to a pointer for s= wappiness > > * Added some more caveats to the behavior of swappiness in documentat= ion > > > > Changes since V3: > > * Added #define for MIN_SWAPPINESS and MAX_SWAPPINESS > > * Added explicit calls to mem_cgroup_swappiness > > > > Changes since V2: > > * No functional change > > * Used int consistently rather than a pointer > > > > Changes since V1: > > * Added documentation > > > > This patch proposes augmenting the memory.reclaim interface with a > > swappiness=3D argument that overrides the swappiness value for tha= t instance > > of proactive reclaim. > > > > Userspace proactive reclaimers use the memory.reclaim interface to trig= ger > > reclaim. The memory.reclaim interface does not allow for any way to eff= ect the > > balance of file vs anon during proactive reclaim. The only approach is = to adjust > > the vm.swappiness setting. However, there are a few reasons we look to = control > > the balance of file vs anon during proactive reclaim, separately from r= eactive > > reclaim: > > > > * Swapout should be limited to manage SSD write endurance. In near-OOM > > situations we are fine with lots of swap-out to avoid OOMs. As these = are > > typically rare events, they have relatively little impact on write en= durance. > > However, proactive reclaim runs continuously and so its impact on SSD= write > > endurance is more significant. Therefore it is desireable to control = swap-out > > for proactive reclaim separately from reactive reclaim > > > > * Some userspace OOM killers like systemd-oomd[1] support OOM killing o= n swap > > exhaustion. This makes sense if the swap exhaustion is triggered due = to > > reactive reclaim but less so if it is triggered due to proactive recl= aim (e.g. > > one could see OOMs when free memory is ample but anon is just particu= larly > > cold). Therefore, it's desireable to have proactive reclaim reduce or= stop > > swap-out before the threshold at which OOM killing occurs. > > > > In the case of Meta's Senpai proactive reclaimer, we adjust vm.swappine= ss before > > writes to memory.reclaim[2]. This has been in production for nearly two= years > > and has addressed our needs to control proactive vs reactive reclaim be= havior > > but is still not ideal for a number of reasons: > > > > * vm.swappiness is a global setting, adjusting it can race/interfere wi= th other > > system administration that wishes to control vm.swappiness. In our ca= se, we > > need to disable Senpai before adjusting vm.swappiness. > > > > * vm.swappiness is stateful - so a crash or restart of Senpai can leave= a > > misconfigured setting. This requires some additional management to re= cord the > > "desired" setting and ensure Senpai always adjusts to it. > > > > With this patch, we avoid these downsides of adjusting vm.swappiness gl= obally. > > > > Previously, this exact interface addition was proposed by Yosry[3]. In = response, > > Roman proposed instead an interface to specify precise file/anon/slab r= eclaim > > amounts[4]. More recently Huan also proposed this as well[5] and others > > similarly questioned if this was the proper interface. > > > > Previous proposals sought to use this to allow proactive reclaimers to > > effectively perform a custom reclaim algorithm by issuing proactive rec= laim with > > different settings to control file vs anon reclaim (e.g. to only reclai= m anon > > from some applications). Responses argued that adjusting swappiness is = a poor > > interface for custom reclaim. > > > > In contrast, I argue in favor of a swappiness setting not as a way to i= mplement > > custom reclaim algorithms but rather to bias the balance of anon vs fil= e due to > > differences of proactive vs reactive reclaim. In this context, swappine= ss is the > > existing interface for controlling this balance and this patch simply a= llows for > > it to be configured differently for proactive vs reactive reclaim. > > > > Specifying explicit amounts of anon vs file pages to reclaim feels inap= propriate > > for this prupose. Proactive reclaimers are un-aware of the relative age= of file > > vs anon for a cgroup which makes it difficult to manage proactive recla= im of > > different memory pools. A proactive reclaimer would need some amount of= anon > > reclaim attempts separate from the amount of file reclaim attempts whic= h seems > > brittle given that it's difficult to observe the impact. > > > > [1]https://www.freedesktop.org/software/systemd/man/latest/systemd-oomd= .service.html > > [2]https://github.com/facebookincubator/oomd/blob/main/src/oomd/plugins= /Senpai.cpp#L585-L598 > > [3]https://lore.kernel.org/linux-mm/CAJD7tkbDpyoODveCsnaqBBMZEkDvshXJmN= dbk51yKSNgD7aGdg@mail.gmail.com/ > > [4]https://lore.kernel.org/linux-mm/YoPHtHXzpK51F%2F1Z@carbon/ > > [5]https://lore.kernel.org/lkml/20231108065818.19932-1-link@vivo.com/ > > > > Dan Schatzberg (2): > > mm: add defines for min/max swappiness > > mm: add swapiness=3D arg to memory.reclaim > > > > Documentation/admin-guide/cgroup-v2.rst | 18 +++++--- > > include/linux/swap.h | 5 ++- > > mm/memcontrol.c | 58 ++++++++++++++++++++----- > > mm/vmscan.c | 39 ++++++++++++----- > > 4 files changed, 90 insertions(+), 30 deletions(-) > > > > -- > > 2.39.3 > >