From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0666C47DAF for ; Mon, 22 Jan 2024 17:38:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 31EB88D0006; Mon, 22 Jan 2024 12:38:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2A7AA8D0001; Mon, 22 Jan 2024 12:38:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 147C18D0006; Mon, 22 Jan 2024 12:38:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id ED6CB8D0001 for ; Mon, 22 Jan 2024 12:38:40 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id AFDBEA1AB0 for ; Mon, 22 Jan 2024 17:38:40 +0000 (UTC) X-FDA: 81707656800.05.D58AA10 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) by imf11.hostedemail.com (Postfix) with ESMTP id C9B3A40029 for ; Mon, 22 Jan 2024 17:38:38 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="erY0/Yku"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf11.hostedemail.com: domain of shakeelb@google.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=shakeelb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705945118; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5UuQsVtmQQmr5HjqyMtyAk8R8Y84z4aC4k61fXm7dcc=; b=0TRHgcz8fA0Z/B1MdEJjNA7dm6t0CwpNrjlFR+ktMCJb7nX3VgMK0me528La6aygX24U8p hO5ZdZ3uEh/PPKEys6Xa0/mHdOOx+PyAwDrrVDx87aUt+9GC3eB2lrfSCtg72vQMdyItpz YRACCHDYnyJm0/4hy2FD5cHAa8j2Nh4= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="erY0/Yku"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf11.hostedemail.com: domain of shakeelb@google.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=shakeelb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705945118; a=rsa-sha256; cv=none; b=3zDP/RP5TFzD9+JppXlHK+wm1ip6f9+cGfh0IYsZyjnGT1D4VmU6kXRWEagWA1RON3OWaD X1y9ERFvS/US7QuPNEumqhsnENNqbSh9sSSvbV+OGrZe2efDlPFEoDK4Zz63vDvo38/9uL jpeeq7ULtvJCIWXDVog026po2gvgdug= Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-1d72d240c69so253835ad.1 for ; Mon, 22 Jan 2024 09:38:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1705945117; x=1706549917; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=5UuQsVtmQQmr5HjqyMtyAk8R8Y84z4aC4k61fXm7dcc=; b=erY0/Yku4PUEf+mYTz+R3qH055auPVExzf3fOpGqzqwRFDY3N/GvEfBvsZCqKEMKG7 Slw4H20POogEiIZ9eL8BytH0Ilzs0injYvdCN0+VtNIN397pT7ZD5iRYKit/6hsjj3IB sy5VoPMai+2uNq5bhoRUX0SPH0TOVT7iMb83Zl4Fk+vHQCZleW8z8Tmiun4Mjfv+25RH E4zOnSu9u9jfpcqvbB4KMDYc6fxKpv3UvEx6bopnewxQ6WCG/Ykk7mOLel27Nb17VTEB q96jVhAVuymuxX45CkgjOu23b/+tmGXJc/JvZD2458oAKzI0dleNhbvzVvYA18hMeFmW DhPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705945117; x=1706549917; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5UuQsVtmQQmr5HjqyMtyAk8R8Y84z4aC4k61fXm7dcc=; b=WfHVQkO5dYGbHrImhRYXfG1YVkB0rGCHtjYgQIZ4R8TyKPkcCHwMvI2a3BBzLPJSHE FOz9kAHyt7+WDsjShr+GJicdbgMs74wEVwVxFXevPOtDTU42xsxKyCRXIwlJJlEGXCUK e5DmN6XY/j+D+fbbKgYWf+rw3R1nmUkQW+Ywgwq/ENW0HsNSKyEeJ56PCMRkGqR1U92k 882WDNYIeLvSa23tUO77MrbOm8+Ig+fXWFPFHWLmbaBt2QpV2sfBJRwPiquo/NZpJmyJ sPCpeNWoqf2p2cU/fujbWuFdktS76EXK8wA8L89Jw4Qq1cOQ+zhZZWXr3txH8R0uCQGN FIIA== X-Gm-Message-State: AOJu0YxMwpoPbzGrP425cBEPqGrAorYucbTlruGy34EiGsMwE+K7naT0 rrB+mS1YqB+GWtR5FKiK5QsW6mEA0g4o7QCt+pzzJrp/zB9CNQkvaEjiNLgnito4Zso2VnliePn 9Q85WWok1RucoVrms6bZXUC/9NAiU+pQuF9KW X-Google-Smtp-Source: AGHT+IFYv5tHX1WQPl3UKkucHmsgclS/9mQvJ8BtkspQnHW8INm/bfJkBEp88NMt7Nyb4tbGmP3GUFNzcPfeUEN4+yY= X-Received: by 2002:a17:902:e747:b0:1d6:f66b:1057 with SMTP id p7-20020a170902e74700b001d6f66b1057mr2985plf.28.1705945117286; Mon, 22 Jan 2024 09:38:37 -0800 (PST) MIME-Version: 1.0 References: <6667b799702e1815bd4e4f7744eddbc0bd042bb7.camel@kernel.org> <20240117193915.urwueineol7p4hg7@treble> In-Reply-To: From: Shakeel Butt Date: Mon, 22 Jan 2024 09:38:23 -0800 Message-ID: Subject: Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again To: Linus Torvalds Cc: Roman Gushchin , Josh Poimboeuf , Vlastimil Babka , Jeff Layton , Chuck Lever , Johannes Weiner , Michal Hocko , linux-kernel@vger.kernel.org, Jens Axboe , Tejun Heo , Vasily Averin , Michal Koutny , Waiman Long , Muchun Song , Jiri Kosina , cgroups@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: kxpkmrkzqi5wb8zc1uh4jxojpt58aj98 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: C9B3A40029 X-HE-Tag: 1705945118-965338 X-HE-Meta: U2FsdGVkX18KblCGyPz4UltirglapvnSWkrnyaPo/IY599htMrhqryvxl04LtZqUL0Yomg5w9tMZ8tSnLWLljlFR1elYYcfCHDecDTPVfKaXgYqOz24rNM6TVBbFPnbuytdnynd1AkPhcls4p3V3kQH1huA7dfUobVyrofHsC7OujMsnCSvyTTVw19MP2aziv0GxhFSfqQAgGQYI+C3EfdBnRFPjO9Byvg5nxEehlfvWOlHYPLRAajRWvUq9OsuJcOP6oweDy3YXTHEm7Sq3HmEZhs1GbGowLJWnC/x4eZyhYPfqAq6pfQWu5/AWDG1qN4Q7Rsoz1eT8ASD5a3YkS6Y0ibWV35ZppB1wK/jB//rdkvNWZ92Rw91jea2hoXsWievisMHEJaXYyDy+DHRdAQmLzts13qL03RSbqM7MjK19EtYh5WgErXiKZQGH/waFmqlt0uhuyAqBedBLejZQS8m+i37x7hyGpt1P1HQFK5qFt8b1fF+x99CfNr1wq/1f89mS58p/hajPYdryjW9euFQr1/AgQOpuZitkUVUlM8Yul3jHj0Px3SMnJDAZpnt7ffM9FfA45ZnFuQU6LE3NsUcV/9QSlNByeNOUsMkflY4o39R+h016OpFnAu+cr6VmAow86QOvQje3YeCTayvjrMihtohnjabRiQjb1i2YmM4nbPD7/eq1USVMG+e0enZWVW67uuRrtbH1InmPUQ7LD2wq8f07ZEbGhP5lvra3OrMSkI71XyhtYUMTvEhIQWkWrYJsa3DMZRbqXoyzofhBe1XyAUNfsq4uPBTEfotB0KZ+2R0nKVOsAhAmBavvnWf9LedJwNF6FltaaveJ4ZSeJG1IWDoQ37QSQ4Dfhy31azG+l5E8l1cNnSEAddnFHiwlzHoVoUzhTGSRYDMVnTVbiAmDwlGTHqY7n31pHeTkQ+i4fXrfnZPdRDFJH41S7iAqxAkVHpAjoXfH2HMUqiK 2i/6b3yp V1KXkP5+FEvo2gtpubOmLsKeqL7PBLNL34Od83lONcWhQEnFWcVfhYqPb+dENGjZSH+YmEC03xAtWZ9B5xEGgCldXTfWoJlyaQd7O2mvUXNOblMtkukjMhIPxbY2B5+cwDqA47tVIE+sSCUk0/5H6L5dqHjNnALpRi7zE+jQ81avAw7UC9SPiraQaJbWX6UF+CUairjjtC9SlPF6YZ8QSeD/u4pT0ffWMuJG3uNBPV5vtt0mC13AP+vNHWqCuiVS0Gyr5SGo6I70aOCRS4mDpr4nIK0eQO9n+CGT2ZAhPvuUtlMEKRrQxpzWu4t+uZ6LmWAGcm0b0l3QTGSAA/DtOR6gZpbsYplpdEfHmegZU5tHOe6P+eJ0YgR168t0sumU5sWvuaKuAhFtTBC0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Linus, On Sun, Jan 21, 2024 at 9:16=E2=80=AFPM Linus Torvalds wrote: > > On Wed, 17 Jan 2024 at 14:56, Shakeel Butt wrote: > > > > > > So I don't see how we can make it really cheap (say, less than 5% ove= rhead) > > > without caching pre-accounted objects. > > > > Maybe this is what we want. Now we are down to just SLUB, maybe such > > caching of pre-accounted objects can be in SLUB layer and we can > > decide to keep this caching per-kmem-cache opt-in or always on. > > So it turns out that we have another case of SLAB_ACCOUNT being quite > a big expense, and it's actually the normal - but failed - open() or > execve() case. > > See the thread at > > https://lore.kernel.org/all/CAHk-=3Dwhw936qzDLBQdUz-He5WK_0fRSWwKAjtb= VsMGfX70Nf_Q@mail.gmail.com/ > > and to see the effect in profiles, you can use this EXTREMELY stupid > test program: > > #include > > int main(int argc, char **argv) > { > for (int i =3D 0; i < 10000000; i++) > open("nonexistent", O_RDONLY); > } > > where the point of course is that the "nonexistent" pathname doesn't > actually exist (so don't create a file called that for the test). > > What happens is that open() allocates a 'struct file *' early from the > filp kmem_cache, which has SLAB_ACCOUNT set. So we'll do accounting > for it, failt the pathname open, and free it again, which uncharges > the accounting. > > Now, in this case, I actually have a suggestion: could we please just > make SLAB_ACCOUNT be something that we do *after* the allocation, kind > of the same way the zeroing works? > > IOW, I'd love to get rid of slab_pre_alloc_hook() entirely, and make > slab_post_alloc_hook() do all the "charge the memcg if required". > > Obviously that means that now a failure to charge the memcg would have > to then de-allocate things, but that's an uncommon path and would be > marked unlikely and not be in the hot path at all. > > Now, the reason I would prefer that is that the *second* step would be to > > (a) expose a "kmem_cache_charge()" function that takes a > *non*-accounted slab allocation, and turns it into an accounted one > (and obviously this is why you want to do everything in the post-alloc > hook: just try to share this code) > > (b) remote the SLAB_ACCOUNT from the filp_cachep, making all file > allocations start out unaccounted. > > (c) when we have *actually* looked up the pathname and open the file > successfully, at *that* point we'd do a > > error =3D kmem_cache_charge(filp_cachep, file); > > in do_dentry_open() to turn the unaccounted file pointer into an > accounted one (and if that fails, we do the cleanup and free it, of > course, exactly like we do when file_get_write_access() fails) > > which means that now the failure case doesn't unnecessarily charge the > allocation that never ends up being finalized. > > NOTE! I think this would clean up mm/slub.c too, simply because it > would get rid of that memcg_slab_pre_alloc_hook() entirely, and get > rid of the need to carry the "struct obj_cgroup **objcgp" pointer > along until the post-alloc hook: everything would be done post-alloc. > > The actual kmem_cache_free() code already deals with "this slab hasn't > been accounted" because it obviously has to deal with allocations that > were done without __GFP_ACCOUNT anyway. So there's no change needed on > the freeing path, it already has to handle this all gracefully. > > I may be missing something, but it would seem to have very little > downside, and fix a case that actually is visible right now. > Thanks for the idea. Actually I had a discussion with Vlastimil at LPC '22 on a similar idea but for a different problem i.e. allocation in irq context or more specifically charging sockets for incoming TCP connections. If you see inet_csk_accept() we solved this for TCP memory by charging at accept() syscall but the kernel memory holding socket is still uncharged. So, I think having the framework you suggested would help more use-cases. I will take a stab at this in the next couple of days (sorry stuck in some other stuff atm). thanks, Shakeel