From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90068C5475B for ; Fri, 1 Mar 2024 18:53:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 010FF94000D; Fri, 1 Mar 2024 13:53:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EDBB2940007; Fri, 1 Mar 2024 13:53:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D2EB594000D; Fri, 1 Mar 2024 13:53:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id BC88A940007 for ; Fri, 1 Mar 2024 13:53:52 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id B0E2A1A062C for ; Fri, 1 Mar 2024 18:53:51 +0000 (UTC) X-FDA: 81849369462.14.4EBFDE0 Received: from out-184.mta1.migadu.com (out-184.mta1.migadu.com [95.215.58.184]) by imf06.hostedemail.com (Postfix) with ESMTP id DD1E0180024 for ; Fri, 1 Mar 2024 18:53:48 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=gn7rELY1; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf06.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.184 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709319229; a=rsa-sha256; cv=none; b=yOy2K1Y+/tfIqJNdH7J7buPYEfRszvpwQPavZZJcY4JH6RhLfIJ4lIJLVd463RNo0R/GTE +yvKkKN/Rv0TzjrLMNRV3VuDqe4jY4B3FUbg8fa1Dhn4EYUuI83XIJzhkQAmjV+YYgAXeO b4H1aNVJ1XaNug3QiERUjckCs9FNB7M= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=gn7rELY1; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf06.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.184 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709319229; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IWBU7wfenw5GY8snfrhpUtl/VpqnsNlazcQskb2ijoE=; b=kERnhilmfjFAfX69t1Pm+D8jiH3HlFm1IhRR6tfMHOr4ZGvGLY+rduHu3dIf/m+j1wtNvt ck5NvWm9UUgIN+uctQ1vukkUkPhpzpNh+O75nN5XcWkV1Adb1Inhld/2QWtx3TI9W6npsa 2vR//gmyecb2qF3l4DpCa327B0SjoUM= Date: Fri, 1 Mar 2024 10:53:30 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1709319226; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=IWBU7wfenw5GY8snfrhpUtl/VpqnsNlazcQskb2ijoE=; b=gn7rELY1u3fct2iRcbLSiaFrreySSkXP5ifRGFhjrkx+a6/PEGsm66m4/fEeeq1Sngh7QT xWzEkkTe12Sb7X25cK2QB2o8B5n+ye/j01hRxqXPVJqzXs7JRk52awHqGmMGKP2DkTyqgI fw7Z2bVSVnnZdJa7ObS6mxnFMPy11Rk= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Linus Torvalds Cc: Vlastimil Babka , Josh Poimboeuf , Jeff Layton , Chuck Lever , Kees Cook , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Andrew Morton , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , Alexander Viro , Christian Brauner , Jan Kara , linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH RFC 4/4] UNFINISHED mm, fs: use kmem_cache_charge() in path_openat() Message-ID: References: <20240301-slab-memcg-v1-0-359328a46596@suse.cz> <20240301-slab-memcg-v1-4-359328a46596@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: DD1E0180024 X-Stat-Signature: q78c55z5t3difu4nuuothtntz9z9pqak X-HE-Tag: 1709319228-915382 X-HE-Meta: U2FsdGVkX18wsIm00UKMaTv9cK0oZGSzmsUjrE05U7bubdcngR48aG36/238jMdXVn/DWFZYC8Bnp/lGHm7ZOI9YOjyw5Slnv3UKRDXSr0dC4d90FbxFZKkgdfcSWH3iaRuTVltUfLSaOZHoTa0yxKuFkjUeeIJ7fDpaW9GwT1Djf5W+b+nVIA6+Tm1SU/b5HiEwUZtYGupHChJ2Ghbu6lQVVkNp0Li5M9B3sgMr5tWjQ3ossk8CxRzN9Z03Wu457/XPL+a9v2rrT4Q1zP8yueBN9Gcy7kmPFbaJjmWoRJnDrpgze9rrvj8E/4qtmUkvwRnhAJ3cSzh8Jr61nHxidmZ2anoz+7BYgakOvA7NV5eZkZu1kFervJCmuimuW1X9JB/hR6NozjT8xmUSW4jDze/5bWwXlikakSJOd84yNRo8AB8atvX6MFT63JcwIoO6ZhvInx8urFN97/EX+35Ipfmtxg20SJfCu5Bv5KF5+ILOaXZgs7C2/ozeBOI8dncrms3p9J7yJOm5Bknpiv6cVJI2BH4YijhycPA7emV4WbHEbVZhsBZ8O9nBn4Dx6lz9h6eOfpkALnFIA5t9Cv8VPZwEbkxmFt3QyeR68py4lRmuK/keJLKFcRDCBaw/cfXE4XG4buXmoGHASheCDeMVEcSx+/KD1Gc5Z91w+rVe5Z13UyCTqhr1pcZAEa9SYZfq9cVVnhM7W9d8HecjSFaMFy7j3Q8P5+JewYTZ/ATmZ398kLOd/3zhCqrd8dWfD1aHjRYPFfUa75hJmfk+ro4YfvFRQhfea2drjlBsLpEUEd00xOuZGOd0QhHEa+9HFIXqk+UF9qo/y3iTFWQTJQ5x3wGIaJq9uM0WJLEt2ybsP26EtTZjUEtVFAF765ZLCMTBTeikFLRoVQeLLLnOKZY5Uq6MkSxMi0ewL90MdzXbrdREc5/xrOlyluSojwvD7w0YaTU3EzhzzdXNusq69FI fuQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Mar 01, 2024 at 09:51:18AM -0800, Linus Torvalds wrote: > On Fri, 1 Mar 2024 at 09:07, Vlastimil Babka wrote: > > > > This is just an example of using the kmem_cache_charge() API. I think > > it's placed in a place that's applicable for Linus's example [1] > > although he mentions do_dentry_open() - I have followed from strace() > > showing openat(2) to path_openat() doing the alloc_empty_file(). > > Thanks. This is not the right patch, but yes, patches 1-3 look very nice to me. > > > The idea is that filp_cachep stops being SLAB_ACCOUNT. Allocations that > > want to be accounted immediately can use GFP_KERNEL_ACCOUNT. I did that > > in alloc_empty_file_noaccount() (despite the contradictory name but the > > noaccount refers to something else, right?) as IIUC it's about > > kernel-internal opens. > > Yeah, the "noaccount" function is about not accounting it towards nr_files. > > That said, I don't think it necessarily needs to do the memory > accounting either - it's literally for cases where we're never going > to install the file descriptor in any user space. > > Your change to use GFP_KERNEL_ACCOUNT isn't exactly wrong, but I don't > think it's really the right thing either, because > > > Why is this unfinished: > > > > - there are other callers of alloc_empty_file() which I didn't adjust so > > they simply became memcg-unaccounted. I haven't investigated for which > > ones it would make also sense to separate the allocation and accounting. > > Maybe alloc_empty_file() would need to get a parameter to control > > this. > > Right. I think the natural and logical way to deal with this is to > just say "we account when we add the file to the fdtable". > > IOW, just have fd_install() do it. That's the really natural point, > and also makes it very logical why alloc_empty_file_noaccount() > wouldn't need to do the GFP_KERNEL_ACCOUNT. > > > - I don't know how to properly unwind the accounting failure case. It > > seems like a new case because when we succeed the open, there's no > > further error path at least in path_openat(). > > Yeah, let me think about this part. Becasue fd_install() is the right > point, but that too does not really allow for error handling. > > Yes, we could close things and fail it, but it really is much too late > at this point. > > What I *think* I'd want for this case is > > (a) allow the accounting to go over by a bit > > (b) make sure there's a cheap way to ask (before) about "did we go > over the limit" > > IOW, the accounting never needed to be byte-accurate to begin with, > and making it fail (cheaply and early) on the next file allocation is > fine. > > Just make it really cheap. Can we do that? > > For example, maybe don't bother with the whole "bytes and pages" > stuff. Just a simple "are we more than one page over?" kind of > question. Without the 'stock_lock' mess for sub-page bytes etc > > How would that look? Would it result in something that can be done > cheaply without locking and atomics and without excessive pointer > indirection through many levels of memcg data structures? I think it's possible and I'm currently looking into batching charge, objcg refcnt management and vmstats using per-task caching. It should speed up things for the majority of allocations. For allocations from an irq context and targeted allocations (where the target memcg != memcg of the current task) we'd probably need to keep the old scheme. I hope to post some patches relatively soon. I tried to optimize the current implementation but failed to get any significant gains. It seems that the overhead is very evenly spread across objcg pointer access, charge management, objcg refcnt management and vmstats. Thanks!