From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6CC59C433F5 for ; Tue, 8 Mar 2022 17:24:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 026F38D0009; Tue, 8 Mar 2022 12:24:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EEFD78D0001; Tue, 8 Mar 2022 12:24:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D6A4C8D0009; Tue, 8 Mar 2022 12:24:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0095.hostedemail.com [216.40.44.95]) by kanga.kvack.org (Postfix) with ESMTP id C03EE8D0001 for ; Tue, 8 Mar 2022 12:24:02 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 74DA5A9439 for ; Tue, 8 Mar 2022 17:24:02 +0000 (UTC) X-FDA: 79221891924.24.68FEB39 Received: from mail-qk1-f178.google.com (mail-qk1-f178.google.com [209.85.222.178]) by imf27.hostedemail.com (Postfix) with ESMTP id BB6D340005 for ; Tue, 8 Mar 2022 17:24:01 +0000 (UTC) Received: by mail-qk1-f178.google.com with SMTP id 85so7327219qkm.9 for ; Tue, 08 Mar 2022 09:24:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=esxtRUXLKKLTp63lR9z0/kXho5aywljrnacanGfc2Uo=; b=xOwzGHMLQ7UdQiqTipKRjJjH1O3SYrEbZ/SvEOgDnZ7V678Q4OPFWaZVLB7Hzwzi7d yTkrcx5jGlpqhaC+ADkeYjoMeQPv0fTgTPyoay4uOliI5qKpDss/RQ7NauSC1wITVEk3 I44WqWcneVvtfclJ8n94dVGS7hJtD5G+jd/Xd8xfX/kNw+ycG8JG8UMHNh1jNHdQWaF4 Ytemvw+voMd2jxap1doJIa8vU8qchMDn8AilY+WAHoDJ2nJCEFXsmJrm/rWrFQdjVd2F 5o7Pxi1heIjGznMOtyT5H68EzvRjdx6c6jUGmb3uuT1kVRqjUNCewKvo2SpHpLiht1QZ Q83g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=esxtRUXLKKLTp63lR9z0/kXho5aywljrnacanGfc2Uo=; b=qYsegfMlN4EyCuSzGL6pdKlgetTcCa3z0zP6FqpI3QWsybuLmFIOoNWNFKz+r6c/Bd g+gUu01wN4wdNLTCYIY3aLZEUrfRO+kO71ifWxwy3cg9Lf51ii2CMGjSsxWJRTe/7jUD B9+vtBGaLgdrOiEryz+b+nMyq+lPTEL0cWSRR7GWzf7hzUJO6ySysE7G/hYGbNeXwn+C TwbU0SCToAhUJW0TnO63XbwyRfP2MzSmx0+JEvG/OxmTAboGSHjNXGxqosgsVvi1nqcX owtfhTZc4QtyrH1bQDyOs78z2V+BOhEAyQ7FloliokAP0kxIrYqljbbBoBRKTfpJGkEs U9tA== X-Gm-Message-State: AOAM530/PWtvE8zcA6zN7M9aS+cXHUZWi5u814+YV8PQi0Gt3COEmBbm ey0VDMHBBuD8R4bmba2KcVZRgQ== X-Google-Smtp-Source: ABdhPJz3fiZxkyzkpaBqfMO9UbwNokADc83S/JrmwaRzch7EXMPckpI3o8NKh+LqAYJp52AC51NCYA== X-Received: by 2002:ae9:e607:0:b0:67b:1209:afbb with SMTP id z7-20020ae9e607000000b0067b1209afbbmr8812748qkf.584.1646760240801; Tue, 08 Mar 2022 09:24:00 -0800 (PST) Received: from localhost (cpe-98-15-154-102.hvc.res.rr.com. [98.15.154.102]) by smtp.gmail.com with ESMTPSA id n131-20020a372789000000b0067bce1ac001sm1798964qkn.71.2022.03.08.09.24.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Mar 2022 09:24:00 -0800 (PST) Date: Tue, 8 Mar 2022 12:23:59 -0500 From: Johannes Weiner To: Michal Hocko Cc: Dan Schatzberg , Shakeel Butt , David Rientjes , Andrew Morton , Yu Zhao , Dave Hansen , linux-mm@kvack.org, Yosry Ahmed , Wei Xu , Greg Thelen Subject: Re: [RFC] Mechanism to induce memory reclaim Message-ID: References: <5df21376-7dd1-bf81-8414-32a73cea45dd@google.com> <20220307183141.npa4627fpbsbgwvv@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: BB6D340005 X-Stat-Signature: f4ngczzt61iqi4rxy5udhb6ai91qmwh7 Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=xOwzGHML; spf=pass (imf27.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.178 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org X-Rspam-User: X-HE-Tag: 1646760241-334332 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Mar 08, 2022 at 05:05:11PM +0100, Michal Hocko wrote: > On Tue 08-03-22 09:44:35, Dan Schatzberg wrote: > > On Tue, Mar 08, 2022 at 01:53:19PM +0100, Michal Hocko wrote: > > > On Mon 07-03-22 15:26:18, Johannes Weiner wrote: > [...] > > > > A mechanism to request a fixed number of pages to reclaim turned out > > > > to work much, much better in practice. We've been using a simple > > > > per-cgroup knob (like here: https://lkml.org/lkml/2020/9/9/1094). > > > > > > Could you share more details here please? How have you managed to find > > > the reclaim target and how have you overcome challenges to react in time > > > to have some head room for the actual reclaim? > > > > We have a userspace agent that just repeatedly triggers proactive > > reclaim and monitors PSI metrics to maintain some constant but low > > pressure. In the complete absense of pressure we will reclaim some > > configurable percentage of the workload's memory. This reclaim amount > > tapers down to zero as PSI approaches the target threshold. > > > > I don't follow your question regarding head-room. Could you elaborate? > > One of the concern that was expressed in the past is how effectively > can pro-active userspace reclaimer act on memory demand transitions. It > takes some time to get refaults/PSI changes and then you should > be acting rather swiftly. This was a concern with the fixed limit, but not so much with the one-off requests for reclaim. There is nothing in the way that would prevent the workload from quickly allocating all the memory it needs. The goal of proactive reclaim isn't to punish or restrict the workload, but rather to continuously probe it for cold pages, to measure the minimum amount of memory it requires to run healthily. > At least if you aim at somehow smooth transition. Tuning this up to > work reliably seems to be far from trivial. Not to mention that > changes in the memory reclaim implementation could make the whole > tuning rather fragile. When reclaim becomes worse at finding the coldest memory, pressure rises with fewer pages evicted and we back off earlier. So a reclaim regression doesn't necessarily translate to less smooth operations or increased workload impact, but rather to an increased memory footprint. This may be measurable, but isn't really an operational emergency - unless reclaim gets 50% worse, which isn't very likely, and in which case we'd stop the kernel upgrade until the bug is fixed ;) It's pretty robust. The tuning was done empirically, but now the same configuration has held up to many different services; some with swap, some with zswap, some with just cache, different types of SSDs, different kernel versions, even drastic reclaim changes such as Joonsoo's workingset for anon pages change.