From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9CE7FD68B3D for ; Thu, 14 Nov 2024 16:57:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1B5056B009A; Thu, 14 Nov 2024 11:57:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 164296B009D; Thu, 14 Nov 2024 11:57:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 003976B009E; Thu, 14 Nov 2024 11:57:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id D67E96B009A for ; Thu, 14 Nov 2024 11:57:54 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 7C0821C6F77 for ; Thu, 14 Nov 2024 16:57:54 +0000 (UTC) X-FDA: 82785306744.18.8D218E5 Received: from mail-lf1-f43.google.com (mail-lf1-f43.google.com [209.85.167.43]) by imf23.hostedemail.com (Postfix) with ESMTP id 255FE14001A for ; Thu, 14 Nov 2024 16:57:20 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=k+un+Vtu; spf=pass (imf23.hostedemail.com: domain of urezki@gmail.com designates 209.85.167.43 as permitted sender) smtp.mailfrom=urezki@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731603415; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3E8KoyGXHwNoJ3biinuum5PL8wDFoLA6D0OgiN4eob8=; b=hbD9k9hOeVB6Su5t+zygytliBwwfnlDsZ8Yi6rBJCWXG4THDrghUBRcJu/QFjCvgcTWQlS SMJ1Wef9QJXwiWn/bug7RO49hpp0sQEeBH78XqxzNy2ZJu68x0GD9b59HeKYSnL+2xpcBu irKI2G53qUistZU1uuUtB+QNwFz13W4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731603415; a=rsa-sha256; cv=none; b=fQ64TFa1eboXFb2FtVRxUcFg94nw0J1TtkqGlvKpyt4WKhJws0HYEePgvJ+6Va4Th+F1XG g3s8MQRwvizFK1NBFd9m77BmlSL+6XAGKoWsKDbigUuRH7hIZDpwQXazOOhHJFBaTyZe+/ 0bsV4AbjjNxW0L8BWOmv5a2TnkMbH00= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=k+un+Vtu; spf=pass (imf23.hostedemail.com: domain of urezki@gmail.com designates 209.85.167.43 as permitted sender) smtp.mailfrom=urezki@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-lf1-f43.google.com with SMTP id 2adb3069b0e04-53d8c08cfc4so723808e87.3 for ; Thu, 14 Nov 2024 08:57:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731603467; x=1732208267; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:date:from:from:to:cc:subject:date:message-id:reply-to; bh=3E8KoyGXHwNoJ3biinuum5PL8wDFoLA6D0OgiN4eob8=; b=k+un+VtuukAL4ZOqggPnYAklnnV4V378OXvF9V7CPS0YmcRFZuAIdy4hgwCkbKBoU1 v+HmqpBxU7/2ktnoHDVw+rDmjUi6W5N6vHd8Bs4ZYgQmzbXY7lxy7vU3TZLjzod/GAp5 zZFLnwNFtbtirWjlUoNsGnqwM5sJ9ZmwvTNMIpizwF3L1BL7ZnqNlIr1p0Wnwn3khAKu hLF0z8v5+DcjCAO3FmYrOdZ8dASztpZw8/amI3KNXoYRmmt5qkmcPbfAD2FUAu6OUmyk dlb+9EsYq2t+QGJd8xz038CWqwpp3zhYGV4PXgZmGIPiLcaMtV4UgnQJv0D+M83PwlHi KaJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731603467; x=1732208267; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:date:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=3E8KoyGXHwNoJ3biinuum5PL8wDFoLA6D0OgiN4eob8=; b=mQ0/KG/i3ZZ5FeKoMchkOIxqKdtSZgYkm1JqUJBMAz3JyTTcj+wHdkVtEso0P1msga q96sBPX6rBiRfVjYojJTlTv/JLwvBPYT/pl1GIe9umwacPo8eT/maSxZ2QsFNadaxUjr jlutsjJiV/DkDo4h6IcKIkzdduw1ORqwRfNm5XWXTgNRnjbnRbscJ7dK/N3jgR+eGg8a nzYIKrxcO2IhLfSS+PC279UoBCfAqTNXAr9ogga1eDzNIFFb59UmrXms1ZK5gJMbRdYf olXGI7rw9WsxfR+1XugB2mwaaLYyhGm9VGpV4eMjrSJn3i5J369HLTvKiUbXjy2WSfTW sB/g== X-Forwarded-Encrypted: i=1; AJvYcCVuDOSyjzLqSrnLzhTakp+jqo76bf9FpcX6CN1CM3fJe3zswjJClLCOukgDYWQ/yPOOnKpkwmSeZw==@kvack.org X-Gm-Message-State: AOJu0YxDf/4ovm32FiBaLr8C4A5x9Scqvqy+pe3L/QGwTiD1FVHsc75Y vHnQGpypQOrDFr2XYSQ/+9JmHD/HxKPFloi6P69MwiQdMEToBWsC X-Google-Smtp-Source: AGHT+IERlzVRllFSvMo2kIMW15bEw0UKyUPG8/FpBsB+MFpT1AAY9B81Xgd6tQkLcA12hc2IPAH/oQ== X-Received: by 2002:a05:6512:3043:b0:53d:a321:db74 with SMTP id 2adb3069b0e04-53da5c7b50dmr1568868e87.50.1731603467118; Thu, 14 Nov 2024 08:57:47 -0800 (PST) Received: from pc636 (host-95-203-27-13.mobileonline.telia.com. [95.203.27.13]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-53da64f29f0sm246264e87.39.2024.11.14.08.57.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Nov 2024 08:57:45 -0800 (PST) From: Uladzislau Rezki X-Google-Original-From: Uladzislau Rezki Date: Thu, 14 Nov 2024 17:57:42 +0100 To: Vlastimil Babka Cc: Suren Baghdasaryan , "Liam R. Howlett" , Christoph Lameter , David Rientjes , Pekka Enberg , Joonsoo Kim , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, "Paul E. McKenney" , Lorenzo Stoakes , Matthew Wilcox , Boqun Feng , Uladzislau Rezki , linux-mm@kvack.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, maple-tree@lists.infradead.org Subject: Re: [PATCH RFC 2/6] mm/slub: add sheaf support for batching kfree_rcu() operations Message-ID: References: <20241112-slub-percpu-caches-v1-0-ddc0bdc27e05@suse.cz> <20241112-slub-percpu-caches-v1-2-ddc0bdc27e05@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20241112-slub-percpu-caches-v1-2-ddc0bdc27e05@suse.cz> X-Stat-Signature: fpqubgx8oa7redfeamdzec3tg3dcpqsn X-Rspam-User: X-Rspamd-Queue-Id: 255FE14001A X-Rspamd-Server: rspam02 X-HE-Tag: 1731603440-801570 X-HE-Meta: U2FsdGVkX18xKEpD9KfRJTC8aqWi7ht0olGmKQBGYvsjPt8lIAyR/Ln0rXqXoS9RVyIFLnnC7nLQ5iG1DMIx0MeqN5WqkS8KhO3N28vVfz26Il4nBlmY7OZxUXi5wiVRXT86ZDyUHKB55ZlOW0P+8Ax4l0Hzjy02jWH3/2TjG0Kw25dGdCsY4Ykr3WkOczrgcT9Unvs9xd+B34Ohxlt8oqtdnkRMdyGtvntLCSo+mFsjECxMi7Gz3X+nRslXKeNwIZNFs4ia7U5NUkyVwtUjDo4Lgi0BE2tovKcgO1OfXR/H/U/mEcXCjYxtkFfG9KJI0gI4J4mr+HHlwe2hO3nhupZqpSTW5Fxy4F9A2vPmQ0oPcKYcPEPNR11s5Tn17cGKKnzF/s0dO/2ewpEUqmh1TvVRIB459b8SU7Sy4CQVkh+ithVxL0Al/xqm8FnztQ8fIDPlAQ/R06LA9EysxErv/Gp4vRlxpHSTioiNJSxFf5Vsk6BGNpy+A/b9m+1sH3PUvF9EGUaGUsybr6QzDrh3ZMDBXwfoUR8Jb9Y2m6bAbiuAlvOh6kNTjl7QP/c15urUMLqsJSm3pMtQGzspX6Pl1mD45QuIGHptgZnoiigSdylx/IBB5TswKMeijdOsOeZd5ZmU6ujJb9WZ4w9JTm5VbNdi9sW5N5mOaJQwOtUJLznwQPtAnvIwrsuhK+QsmyMEXi7jWATuso1h56d1msUZC/okX/UBCd/GgsZxF03lry4bMw8Rad/ZWWawIe6c92aBivrqLYil0eBp0eSuyOAgFDu/9qxMI01JEDGsRzCZD2KL7pWTLzCXAPK6Ga72uzlTBlk0EViNfqaYhETOw/wvaFU8P8duC/YPMnHZ4+T6z1OgM7DMaWryVt7ELU+6QxEl1PbWkDhGXOKSeVJKF5J/s/7uDtuPGLIlRAssQkHPnTkn8KnrqriSGuZ6B2tHHXrhgnIsen5IhXa0n0QPJoK fGkzN7aE hB9ctPEoUrP1ClS7xH3kNUmkSo1Nw+VUxtBME2Bl1ueMZ2wc+7lLaAss8+UMEWAOascSbZq1O1aUF4PSy2rbsLSNR6XBN4LG48a27vl8c6rjfh9ZNWjmHatwlC/L5LYeWfg6BmFapER5FXdOnUsvd8yHLFfs1hVSQS7PiL1IcBhwHr/lujo4BYPgx4ErjQ5uaVqk7+QV29INellvd8MeNZpFs8wyO64SZlaiV6l+lxXkRoCreSQhJSZqpFQkuUtOiXApzCmkHkJdJBaLi/pMyu5GForLlPhyu7QQtsDnC3J28P13hykTHrZYT2I8Rb/dD1ztSI9nKzJSr6juZzEadMvZjbTqbXDMiQJMebCfdF3gR64EU5lBTXyx/CAe9f4/NfBI4A/CTRSVY3Vj6d6hCopaqgWFs2fxXwuGXUfdJYRtjQSkkc345Euc4mDMeltMDJKKVTtcbd+Jh90Aiv1G6COpg7C+ZNFKuV8SUlU+eObmLpB//23VVcTYlFJ0A20U0y496FdX9zijA6Po= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Nov 12, 2024 at 05:38:46PM +0100, Vlastimil Babka wrote: > Extend the sheaf infrastructure for more efficient kfree_rcu() handling. > For caches where sheafs are initialized, on each cpu maintain a rcu_free > sheaf in addition to main and spare sheaves. > > kfree_rcu() operations will try to put objects on this sheaf. Once full, > the sheaf is detached and submitted to call_rcu() with a handler that > will try to put in on the barn, or flush to slab pages using bulk free, > when the barn is full. Then a new empty sheaf must be obtained to put > more objects there. > > It's possible that no free sheafs are available to use for a new > rcu_free sheaf, and the allocation in kfree_rcu() context can only use > GFP_NOWAIT and thus may fail. In that case, fall back to the existing > kfree_rcu() machinery. > > Because some intended users will need to perform additonal cleanups > after the grace period and thus have custom rcu_call() callbacks today, > add the possibility to specify a kfree_rcu() specific destructor. > Because of the fall back possibility, the destructor now needs be > invoked also from within RCU, so add __kvfree_rcu() that RCU can use > instead of kvfree(). > > Expected advantages: > - batching the kfree_rcu() operations, that could eventually replace the > batching done in RCU itself > - sheafs can be reused via barn instead of being flushed to slabs, which > is more effective > - this includes cases where only some cpus are allowed to process rcu > callbacks (Android) > > Possible disadvantage: > - objects might be waiting for more than their grace period (it is > determined by the last object freed into the sheaf), increasing memory > usage - but that might be true for the batching done by RCU as well? > > RFC LIMITATIONS: - only tree rcu is converted, not tiny > - the rcu fallback might resort to kfree_bulk(), not kvfree(). Instead > of adding a variant of kfree_bulk() with destructors, is there an easy > way to disable the kfree_bulk() path in the fallback case? > > Signed-off-by: Vlastimil Babka > --- > include/linux/slab.h | 15 +++++ > kernel/rcu/tree.c | 8 ++- > mm/slab.h | 25 +++++++ > mm/slab_common.c | 3 + > mm/slub.c | 182 +++++++++++++++++++++++++++++++++++++++++++++++++-- > 5 files changed, 227 insertions(+), 6 deletions(-) > > diff --git a/include/linux/slab.h b/include/linux/slab.h > index b13fb1c1f03c14a5b45bc6a64a2096883aef9f83..23904321992ad2eeb9389d0883cf4d5d5d71d896 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -343,6 +343,21 @@ struct kmem_cache_args { > * %0 means no sheaves will be created > */ > unsigned int sheaf_capacity; > + /** > + * @sheaf_rcu_dtor: A destructor for objects freed by kfree_rcu() > + * > + * Only valid when non-zero @sheaf_capacity is specified. When freeing > + * objects by kfree_rcu() in a cache with sheaves, the objects are put > + * to a special percpu sheaf. When that sheaf is full, it's passed to > + * call_rcu() and after a grace period the sheaf can be reused for new > + * allocations. In case a cleanup is necessary after the grace period > + * and before reusal, a pointer to such function can be given as > + * @sheaf_rcu_dtor and will be called on each object in the rcu sheaf > + * after the grace period passes and before the sheaf's reuse. > + * > + * %NULL means no destructor is called. > + */ > + void (*sheaf_rcu_dtor)(void *obj); > }; > > struct kmem_cache *__kmem_cache_create_args(const char *name, > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index b1f883fcd9185a5e22c10102d1024c40688f57fb..42c994fdf9960bfed8d8bd697de90af72c1f4f58 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -65,6 +65,7 @@ > #include > #include > #include "../time/tick-internal.h" > +#include "../../mm/slab.h" > > #include "tree.h" > #include "rcu.h" > @@ -3420,7 +3421,7 @@ kvfree_rcu_list(struct rcu_head *head) > trace_rcu_invoke_kvfree_callback(rcu_state.name, head, offset); > > if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset))) > - kvfree(ptr); > + __kvfree_rcu(ptr); > > rcu_lock_release(&rcu_callback_map); > cond_resched_tasks_rcu_qs(); > @@ -3797,6 +3798,9 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr) > if (!head) > might_sleep(); > > + if (kfree_rcu_sheaf(ptr)) > + return; > + > This change crosses all effort which has been done in order to improve kvfree_rcu :) For example: performance, app launch improvements for Android devices; memory consumption optimizations to minimize LMK triggering; batching to speed-up offloading; etc. So we have done a lot of work there. We were thinking about moving all functionality from "kernel/rcu" to "mm/". As a first step i can do that, i.e. move kvfree_rcu() as is. After that we can switch to second step. Sounds good for you or not? -- Uladzislau Rezki