From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1CC3BC433F5 for ; Tue, 8 Mar 2022 22:37:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8E1278D0002; Tue, 8 Mar 2022 17:37:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 869A18D0001; Tue, 8 Mar 2022 17:37:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6E3C78D0002; Tue, 8 Mar 2022 17:37:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 59C0D8D0001 for ; Tue, 8 Mar 2022 17:37:04 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 2315224BF1 for ; Tue, 8 Mar 2022 22:37:04 +0000 (UTC) X-FDA: 79222680768.12.BA9EB7E Received: from mail-qk1-f174.google.com (mail-qk1-f174.google.com [209.85.222.174]) by imf08.hostedemail.com (Postfix) with ESMTP id 9415116000C for ; Tue, 8 Mar 2022 22:37:03 +0000 (UTC) Received: by mail-qk1-f174.google.com with SMTP id 85so337919qkm.9 for ; Tue, 08 Mar 2022 14:37:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=1P9MQ2wgxrhOQbYZgkfFcCGyB1brGFZ1MuFoGm2wseI=; b=AYnH+jkjS3tO6aacLHWc0VvRFaSn/4Z8sA61HDqrzbtO/fvGnDdkh92ku0VKpExXNd 9zJD2+HqoxqkFfN2OgSkIqvvkfoieduEQRNLm8EEsBS+LHW8eFNUVnlQpgzC5dE+9BUb rpx/Zz8hfApV3QmvKwVnE9x82dtEiePvRx8zie6jW7n6uzgUNJ2v6pl3gpU3EVQgEhqH AWtUIqGkyhLaPuK+JkaUyjajEaDVift3+6SdPl14Ke0t6q5sT2AZOnPcxJgz9d5mXGRZ AKHnMjR1gryeRDudedSDG5pm+wQ/hovup5SLOzp0aLTclvS9GtONR397/clXHgtlE2Wp 0IzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=1P9MQ2wgxrhOQbYZgkfFcCGyB1brGFZ1MuFoGm2wseI=; b=61xXEnd/N0Rkvib6bjbBINlmBQnpDTC0U554MgEs8OEpMJLR5D4VItfOJCqYRtc4f1 uGgfs+M7pLk9fKllkNFmnrSFY5803BvMi2tiTu5cY+XyTbLFgF7eE6oSfR5D2DEyrssA S/bpMXgOxPKxGzv+K+tY7i2eTTeYQilqXA63D5Uks5bcO/7s+/u2nmScPqWii49Hr9CA vbD6fNJp3qDkR2tcAwGYbK6e3d5gNpvs69/iuSTyDSnRZZtDqKpu/qf3eSU1wfdq6zxu 3CADWMzVg9qp7pKTt8EO9T5gRw6cciuodyyG4pT40Y1FH1UTIWcUjp72sigPk8SYo2xR gWYg== X-Gm-Message-State: AOAM530A5tQiJa2hRDiQToigrdB2XEOAgLTrwAWG6NNMash8b4UHnCBX pJUEQRhC3SJ69IRz8B2JETo= X-Google-Smtp-Source: ABdhPJzphs0EW/0saKbTmptqVsRUVgh0PPO13PqJTsyQNKIx+zr5hEvINeCGKcbzni85UyJZq7sfTQ== X-Received: by 2002:a05:620a:709:b0:67c:85ef:711e with SMTP id 9-20020a05620a070900b0067c85ef711emr4806173qkc.173.1646779022770; Tue, 08 Mar 2022 14:37:02 -0800 (PST) Received: from dschatzberg-fedora-PC0Y6AEN.dhcp.thefacebook.com ([2620:10d:c091:500::3:bd3]) by smtp.gmail.com with ESMTPSA id u14-20020ac858ce000000b002de89087e7dsm152408qta.78.2022.03.08.14.37.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Mar 2022 14:37:02 -0800 (PST) Date: Tue, 8 Mar 2022 17:37:00 -0500 From: Dan Schatzberg To: Johannes Weiner Cc: David Rientjes , Andrew Morton , Michal Hocko , Yu Zhao , Dave Hansen , linux-mm@kvack.org, Yosry Ahmed , Wei Xu , Shakeel Butt , Greg Thelen Subject: Re: [RFC] Mechanism to induce memory reclaim Message-ID: References: <5df21376-7dd1-bf81-8414-32a73cea45dd@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 9415116000C X-Stat-Signature: 837whxoi6eo9wwjza5rj496ft6nuu1ag X-Rspam-User: Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=AYnH+jkj; spf=pass (imf08.hostedemail.com: domain of schatzberg.dan@gmail.com designates 209.85.222.174 as permitted sender) smtp.mailfrom=schatzberg.dan@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam07 X-HE-Tag: 1646779023-930633 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Mar 08, 2022 at 02:27:49PM -0500, Johannes Weiner wrote: > On Tue, Mar 08, 2022 at 09:49:20AM -0500, Dan Schatzberg wrote: > > On Mon, Mar 07, 2022 at 03:50:36PM -0500, Johannes Weiner wrote: > > > On Sun, Mar 06, 2022 at 03:11:23PM -0800, David Rientjes wrote: > > > > - swappiness factor > > > > > > This I'm not sure about. > > > > > > Mostly because I'm not sure about swappiness in general. It balances > > > between anon and file, but both of them are aged according to the same > > > LRU rules. The only reason to prefer one over the other seems to be > > > when the cost of reloading one (refault vs swapin) isn't the same as > > > the other. That's usually a hardware property, which in a perfect > > > world we'd auto-tune inside the kernel based on observed IO > > > performance. Not sure why you'd want this per reclaim request. > > > > I think this could be useful for budgeting write-endurance. You may > > want to tune down a workload's swappiness on a per-reclaim basis in > > order to control how much swap-out (and therefore disk writes) its > > doing. Right now the only way to control this is by writing to > > vm.swappiness before doing the explicit reclaim which can momentarily > > effect other reclaim behavior on the machine. > > Yeah the global swappiness setting is not ideal for tuning behavior of > individual workloads. On the other hand, flash life and write budget > are global resources shared by all workloads on the system. Does it > make sense longer term to take a workload-centric approach to that? Indeed flash life is a global resource, but it may be desireable to budget it on a per-workload basis. Consider a workload with a lot of warm anonymous memory - proactive reclaim of this workload may be able to consume the entire write budget of the machine. This could result in a co-located workload getting reduced reclaim due to insufficient write budget. We'd like some form of isolation here so that the co-located workload receives some fair-share of the write budget which is hard to do without some additional control. > There are also filesystem writes to think about. If the swappable set > has already been swapped and cached, reclaiming it again doesn't > require IO. Reclaiming dirty cache OTOH requires IO, and upping > reclaim pressure on files will increase the writeback flush rates > (which reduces cache effectiveness and increases aggregate writes). > > I wonder if it would make more sense to recognize the concept of write > endurance more broadly in MM code than just swap. Where you specify a > rate limit (globally? with per-cgroup shares?), and then, yes, the VM > will back away from swap iff it writes too much. But also throttle > writeback and push back on file reclaim and dirtying processes in > accordance with that policy. Absolutely, we should discuss details but broadly I agree with the idea that there's more than just per-cgroup swappiness control as a way to gain control over mm-induced write endurance consumption.