From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C7B0C433EF for ; Wed, 9 Mar 2022 22:30:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D43E38D0002; Wed, 9 Mar 2022 17:30:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CCB5C8D0001; Wed, 9 Mar 2022 17:30:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B45498D0002; Wed, 9 Mar 2022 17:30:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 9FD318D0001 for ; Wed, 9 Mar 2022 17:30:27 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 6634924635 for ; Wed, 9 Mar 2022 22:30:27 +0000 (UTC) X-FDA: 79226292894.01.23E08EE Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by imf08.hostedemail.com (Postfix) with ESMTP id D0E27160018 for ; Wed, 9 Mar 2022 22:30:26 +0000 (UTC) Received: by mail-pl1-f179.google.com with SMTP id n18so535141plg.5 for ; Wed, 09 Mar 2022 14:30:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=HmKrQ6Fb2ByS6VCnDFkgeZVlbeqJeaTUHeObLxMJla8=; b=iPXEC7xSJ0/BC30cRW/vWLW1n5SgzHyu4Lqlk2xRKBEnzFa5pU9YUSGnKtiQKfgsXM 9oGFaUDpX8WRP/AwubmFeRcvHK0dWUc5pu65f9eJyFmbk1yN2ZgREqRexZYTH2D1WfPZ M/dLjaT+gNYjQnOuTSqClHJLtogGKWVDalmoI+nIcb/EfNm487z6aXQLtOJzmjAOQp1s DSfWJyXB7ADJfgCOflKS9OJYrJzy8JtUBuLVlpxPaYHA9u3hu23yGnkIPqHTvVXzJL1f 3jO7L62ra/iA4VRygbsk2TeHXUvgNke7pUFFqVAv1rsC+R+oaiS4s0AnCPlaUeKUwBPQ x48Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=HmKrQ6Fb2ByS6VCnDFkgeZVlbeqJeaTUHeObLxMJla8=; b=li+JO5srPBn1gAgS+w7rv1mEoMBBC2vLs08f3Fp82+H/gV4UUPmgQRJl3S1tZoj5vd axGSGJTeZsM5i4Q/S2h3WXXR9a3/VvfbfVouDbrRIop4G+UUU3XiElJTXFnnGEWJjMkZ xNV6XLYYFnP31l2u2TNdjUScFBdutLe5Lp6We0bEQcLvuzNKTRBF+SSdEtxkA9XCnmSA R2cJBJAB+0jcxb+C+DKfloJZkTXErwfGJUnN89povDf6N92VfhEBtj037VEvyTqFu4DB JlH1WGk7aOZP61pzN0sFZ9JcmrNbW8rh7dNEIx3w1JflfxW2i5P62KuLkCpkfTum10M1 JX9w== X-Gm-Message-State: AOAM5300vlktDW9dXg8Df7u4fac2+u8EaRHvqQUttSMllBdpHsNWRkx/ qByh/gqcNWXKAp3uEiWINOtqEw== X-Google-Smtp-Source: ABdhPJzMbNsaQHsa98mCiFCJQ2MrQ2K2CtwF1NR8RmwaIazzZ3ZOOjl8KpwDEkLNHx5YZpptGuRhMQ== X-Received: by 2002:a17:90a:1596:b0:1bd:4af:6055 with SMTP id m22-20020a17090a159600b001bd04af6055mr12921271pja.139.1646865025530; Wed, 09 Mar 2022 14:30:25 -0800 (PST) Received: from [2620:15c:29:204:9181:7c9:2e7e:9306] ([2620:15c:29:204:9181:7c9:2e7e:9306]) by smtp.gmail.com with ESMTPSA id p16-20020a056a000b5000b004f756b6c315sm4010641pfo.66.2022.03.09.14.30.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Mar 2022 14:30:25 -0800 (PST) Date: Wed, 9 Mar 2022 14:30:24 -0800 (PST) From: David Rientjes To: Johannes Weiner cc: Andrew Morton , Michal Hocko , Yu Zhao , Dave Hansen , linux-mm@kvack.org, Yosry Ahmed , Wei Xu , Shakeel Butt , Greg Thelen Subject: Re: [RFC] Mechanism to induce memory reclaim In-Reply-To: Message-ID: <3c1589b-d558-72a2-8166-c289643d732d@google.com> References: <5df21376-7dd1-bf81-8414-32a73cea45dd@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: D0E27160018 X-Rspam-User: Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=iPXEC7xS; spf=pass (imf08.hostedemail.com: domain of rientjes@google.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=rientjes@google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: yrdzdkg9rrsmxgrkm6ryjr4p86jgdbjw X-HE-Tag: 1646865026-722751 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 7 Mar 2022, Johannes Weiner wrote: > > IOW, this would be a /sys/devices/system/node/nodeN/reclaim mechanim for > > each NUMA node N on the system. (It would be similar to the existing > > per-node sysfs "compact" mechanism used to trigger compaction from > > userspace.) > > I generally think a proactive reclaim interface is a good idea. > > A per-cgroup control knob would make more sense to me, as cgroupfs > takes care of delegation, namespacing etc. and so would permit > self-directed proactive reclaim inside containers. > This is an interesting point and something that would need to be decided. There's pros and cons to both approaches, per-cgroup mechanism vs purely a per-node sysfs mechanism that can take a cgroup id. The reason we'd like this in sysfs is because of users who do not enable CONFIG_MEMCG but would still benefit from proactive reclaim. Such users do exist and do not rely on memcg, such as Chrome OS, and from my understanding this is normally done to speed up hibernation. But I note your use of "per-cgroup" control knob and not specifically "per-memcg". Were you considering a proactive reclaim mechanism for a cgroup other than memcg? A new one? I'm wondering if it would make sense for such a cgroup interface, if eventually needed, to be added incrementally on top of a per-node sysfs interface. (We know today that there is a need for proactive reclaim for users who do not use memcg at all.) > > Userspace would write the following to this file: > > - nr_to_reclaim pages > > This makes sense, although (and you hinted at this below), I'm > thinking it should be in bytes, especially if part of cgroupfs. > If we agree upon a sysfs interface I assume there would be no objection to this in nr_to_reclaim pages? I agree if this is to be a memcg knob that it should be expressed in bytes for consistency with other knobs. > > - swappiness factor > > This I'm not sure about. > > Mostly because I'm not sure about swappiness in general. It balances > between anon and file, but both of them are aged according to the same > LRU rules. The only reason to prefer one over the other seems to be > when the cost of reloading one (refault vs swapin) isn't the same as > the other. That's usually a hardware property, which in a perfect > world we'd auto-tune inside the kernel based on observed IO > performance. Not sure why you'd want this per reclaim request. > > > - flags to specify context, if any[**] > > > > [**] this is offered for extensibility to specify the context in which > > reclaim is being done (clean file pages only, demotion for memory > > tiering vs eviction, etc), otherwise 0 > > This one is curious. I don't understand the use cases for either of > these examples, and I can't think of other flags a user may pass on a > per-invocation basis. Would you care to elaborate some? > If we combine the above two concerns, maybe only a flags argument is sufficient where you can specify only anon or only file (and neither means both)? What is controllable by swappiness could be controlled by two different writes to the interface, one for (possibly) anon and one for (possibly) file. There was discussion about treating the two different types of memory differently as a function of reload cost, cost of doing I/O for discard, and how much swap space we want proactive reclaim to take, as well as the only current alternative is to be playing with the global vm.swappiness. Michal asked if this would include slab reclaim or shrinkers, I think the answer is "possibly yes," but no initial use case for this (flags would be extensible to permit the addition of it incrementally). In fact, if you were to pass a cgroup id of 0 to induce global proactive reclaim you could mimic the same control we have with vm.drop_caches today but does not include reclaiming all of a memory type.