From: Vlastimil Babka <vbabka@suse.cz>
To: Linus Torvalds <torvalds@linux-foundation.org>,
Shakeel Butt <shakeelb@google.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>,
Josh Poimboeuf <jpoimboe@kernel.org>,
Jeff Layton <jlayton@kernel.org>,
Chuck Lever <chuck.lever@oracle.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
linux-kernel@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
Tejun Heo <tj@kernel.org>,
Vasily Averin <vasily.averin@linux.dev>,
Michal Koutny <mkoutny@suse.com>,
Waiman Long <longman@redhat.com>,
Muchun Song <muchun.song@linux.dev>,
Jiri Kosina <jikos@kernel.org>,
cgroups@vger.kernel.org, linux-mm@kvack.org,
Howard McLauchlan <hmclauchlan@fb.com>, bpf <bpf@vger.kernel.org>,
Jens Axboe <axboe@kernel.dk>
Subject: Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again
Date: Fri, 26 Jan 2024 10:50:44 +0100 [thread overview]
Message-ID: <6d5bb852-8703-4abf-a52b-90816bccbd7f@suse.cz> (raw)
In-Reply-To: <CAHk-=whYOOdM7jWy5jdrAm8LxcgCMFyk2bt8fYYvZzM4U-zAQA@mail.gmail.com>
On 1/22/24 06:10, Linus Torvalds wrote:
> On Wed, 17 Jan 2024 at 14:56, Shakeel Butt <shakeelb@google.com> wrote:
>> >
>> > So I don't see how we can make it really cheap (say, less than 5% overhead)
>> > without caching pre-accounted objects.
>>
>> Maybe this is what we want. Now we are down to just SLUB, maybe such
>> caching of pre-accounted objects can be in SLUB layer and we can
>> decide to keep this caching per-kmem-cache opt-in or always on.
>
> So it turns out that we have another case of SLAB_ACCOUNT being quite
> a big expense, and it's actually the normal - but failed - open() or
> execve() case.
>
> See the thread at
>
> https://lore.kernel.org/all/CAHk-=whw936qzDLBQdUz-He5WK_0fRSWwKAjtbVsMGfX70Nf_Q@mail.gmail.com/
>
> and to see the effect in profiles, you can use this EXTREMELY stupid
> test program:
>
> #include <fcntl.h>
>
> int main(int argc, char **argv)
> {
> for (int i = 0; i < 10000000; i++)
> open("nonexistent", O_RDONLY);
> }
This reminded me I can see should_failslab() in the profiles (1.43% plus the
overhead in its caller) even if it does nothing at all, and it's completely
unconditional since commit 4f6923fbb352 ("mm: make should_failslab always
available for fault injection").
We discussed it briefly when Jens tried to change it in [1] to depend on
CONFIG_FAILSLAB again. But now I think it should be even possible to leave
it always available, but behind a static key. BPF or whoever else uses these
error injection hooks would have to track how many users of them are active
and manage the static key accordingly. Then it could be always available,
but have no overhead when there's no active user? Other similars hooks could
benefit from such an approach too?
[1]
https://lore.kernel.org/all/e01e5e40-692a-519c-4cba-e3331f173c82@kernel.dk/#t
> where the point of course is that the "nonexistent" pathname doesn't
> actually exist (so don't create a file called that for the test).
>
> What happens is that open() allocates a 'struct file *' early from the
> filp kmem_cache, which has SLAB_ACCOUNT set. So we'll do accounting
> for it, failt the pathname open, and free it again, which uncharges
> the accounting.
>
> Now, in this case, I actually have a suggestion: could we please just
> make SLAB_ACCOUNT be something that we do *after* the allocation, kind
> of the same way the zeroing works?
>
> IOW, I'd love to get rid of slab_pre_alloc_hook() entirely, and make
> slab_post_alloc_hook() do all the "charge the memcg if required".
>
> Obviously that means that now a failure to charge the memcg would have
> to then de-allocate things, but that's an uncommon path and would be
> marked unlikely and not be in the hot path at all.
>
> Now, the reason I would prefer that is that the *second* step would be to
>
> (a) expose a "kmem_cache_charge()" function that takes a
> *non*-accounted slab allocation, and turns it into an accounted one
> (and obviously this is why you want to do everything in the post-alloc
> hook: just try to share this code)
>
> (b) remote the SLAB_ACCOUNT from the filp_cachep, making all file
> allocations start out unaccounted.
>
> (c) when we have *actually* looked up the pathname and open the file
> successfully, at *that* point we'd do a
>
> error = kmem_cache_charge(filp_cachep, file);
>
> in do_dentry_open() to turn the unaccounted file pointer into an
> accounted one (and if that fails, we do the cleanup and free it, of
> course, exactly like we do when file_get_write_access() fails)
>
> which means that now the failure case doesn't unnecessarily charge the
> allocation that never ends up being finalized.
>
> NOTE! I think this would clean up mm/slub.c too, simply because it
> would get rid of that memcg_slab_pre_alloc_hook() entirely, and get
> rid of the need to carry the "struct obj_cgroup **objcgp" pointer
> along until the post-alloc hook: everything would be done post-alloc.
>
> The actual kmem_cache_free() code already deals with "this slab hasn't
> been accounted" because it obviously has to deal with allocations that
> were done without __GFP_ACCOUNT anyway. So there's no change needed on
> the freeing path, it already has to handle this all gracefully.
>
> I may be missing something, but it would seem to have very little
> downside, and fix a case that actually is visible right now.
>
> Linus
next prev parent reply other threads:[~2024-01-26 9:50 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-17 16:14 [PATCH RFC 0/4] " Josh Poimboeuf
2024-01-17 16:14 ` [PATCH RFC 1/4] fs/locks: " Josh Poimboeuf
2024-01-17 19:00 ` Jeff Layton
2024-01-17 19:39 ` Josh Poimboeuf
2024-01-17 20:20 ` Linus Torvalds
2024-01-17 21:02 ` Shakeel Butt
2024-01-17 22:20 ` Roman Gushchin
2024-01-17 22:56 ` Shakeel Butt
2024-01-22 5:10 ` Linus Torvalds
2024-01-22 17:38 ` Shakeel Butt
2024-01-26 9:50 ` Vlastimil Babka [this message]
2024-01-30 11:04 ` Vlastimil Babka
2024-01-19 7:47 ` Shakeel Butt
2024-01-17 21:19 ` Vlastimil Babka
2024-01-17 21:50 ` Roman Gushchin
2024-01-18 9:49 ` Michal Hocko
2024-01-17 16:14 ` [PATCH RFC 2/4] fs/locks: Add CONFIG_FLOCK_ACCOUNTING Josh Poimboeuf
2024-01-17 16:14 ` [PATCH RFC 3/4] mitigations: Expand 'mitigations=off' to include optional software mitigations Josh Poimboeuf
[not found] ` <3e803d5aee5dd1f4c738f0de1e839e6cfcb9dc41.1705507931.git.jpoimboe@kernel.org>
2024-01-18 9:04 ` [PATCH RFC 4/4] mitigations: Add flock cache accounting to 'mitigations=off' Michal Koutný
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6d5bb852-8703-4abf-a52b-90816bccbd7f@suse.cz \
--to=vbabka@suse.cz \
--cc=axboe@kernel.dk \
--cc=bpf@vger.kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=chuck.lever@oracle.com \
--cc=hannes@cmpxchg.org \
--cc=hmclauchlan@fb.com \
--cc=jikos@kernel.org \
--cc=jlayton@kernel.org \
--cc=jpoimboe@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=longman@redhat.com \
--cc=mhocko@kernel.org \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=shakeelb@google.com \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vasily.averin@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox