From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 065D2C54798 for ; Tue, 27 Feb 2024 05:17:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6CE896B02DE; Tue, 27 Feb 2024 00:17:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 67E9A6B02DF; Tue, 27 Feb 2024 00:17:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 546166B02E0; Tue, 27 Feb 2024 00:17:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 45F846B02DE for ; Tue, 27 Feb 2024 00:17:48 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 204061A0528 for ; Tue, 27 Feb 2024 05:17:48 +0000 (UTC) X-FDA: 81836426616.03.8D1E927 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf26.hostedemail.com (Postfix) with ESMTP id E199A140007 for ; Tue, 27 Feb 2024 05:17:45 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=VcT0y2q4; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf26.hostedemail.com: domain of "SRS0=xPBP=KE=paulmck-ThinkPad-P17-Gen-1.home=paulmck@kernel.org" designates 145.40.73.55 as permitted sender) smtp.mailfrom="SRS0=xPBP=KE=paulmck-ThinkPad-P17-Gen-1.home=paulmck@kernel.org" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709011066; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wgzCNGTWmSF0b987KNUZ9d3G6KS/YeBz2tfVQW8Ycu0=; b=6KUQSTz8f39hy86Or1TGReod8n/FQKPwmuQu5pzzBlsiz27bCaz392xd9n91q6UWBeyhdN /jmxKGVMEAY1tlt+jrymHxwzQWd/kNOhdLH3VHunOa2IcZ54UGe0Jw9diToj/krnZyE1Bw V4CF/zTB6IjZnu+bea2KtbeVfoAnQzs= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=VcT0y2q4; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf26.hostedemail.com: domain of "SRS0=xPBP=KE=paulmck-ThinkPad-P17-Gen-1.home=paulmck@kernel.org" designates 145.40.73.55 as permitted sender) smtp.mailfrom="SRS0=xPBP=KE=paulmck-ThinkPad-P17-Gen-1.home=paulmck@kernel.org" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709011066; a=rsa-sha256; cv=none; b=YH61MmvCnijxDD2fPH2cxSgfp7y/qfefMCbLP02YonEcLj+XeEL6pgdEL35EDsV3uRZ6hy TkFzO0YFuzFQmC8OEg/vZmAa+croYDDq6z8LYoUssKeFq0ARu2DrxDAyiKR8NGgvVpuGCD JE0UeKmc+xzZa8r7peDjyRf3vOaTPss= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id ECF60CE1B43; Tue, 27 Feb 2024 05:17:42 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 27B45C433F1; Tue, 27 Feb 2024 05:17:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1709011062; bh=MUTrLdwn8CzojDvpZGIbClrFw7b8sZKraVuvLtzG5RM=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=VcT0y2q4a7leZ6R2TvkQqa2KGLHi+mVjFVOv/BPjdbTLBn68R9a3k30P9eYRzRGCU 4p91BDccF1vuUk33rVYrRB0GaahYByQiZWsO3fXuUTVrCUt7sNvY0ft6oLtogA7D0x So3FmCIr+oeDx5vwaxtC7ATcVxnEm757SRks899uAl9tNoxV1sn7r+qcwZ+GZV/+mc db2WcvMz9lYZ3CbZw0sUDCGfgzQCmJsDtY2TyOyWKNrorVtfLvPziPiuyngPPgS3zU vjUpg9+c01TJBSQxd0qXsHPtVEmjhokRAqdWQETu3aAsXp/kRIDm1+nst6AREq4hj4 We3e1M48aZcuQ== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id B9EEDCE1108; Mon, 26 Feb 2024 21:17:41 -0800 (PST) Date: Mon, 26 Feb 2024 21:17:41 -0800 From: "Paul E. McKenney" To: Kent Overstreet Cc: Matthew Wilcox , Linus Torvalds , Al Viro , Luis Chamberlain , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm , Daniel Gomez , Pankaj Raghav , Jens Axboe , Dave Chinner , Christoph Hellwig , Chris Mason , Johannes Weiner Subject: Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO Message-ID: Reply-To: paulmck@kernel.org References: <5c6ueuv5vlyir76yssuwmfmfuof3ukxz6h5hkyzfvsm2wkncrl@7wvkfpmvy2gp> <49354148-4dea-4c89-b591-76b21ed4a5d1@paulmck-laptop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: E199A140007 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 4ap94tb1ygxocak1pe5cbcpce56qkjht X-HE-Tag: 1709011065-961845 X-HE-Meta: U2FsdGVkX1/l3g5zd5MEW2dUhRolVDhbpKXGoCQ8zTsBNj/TVa5QlkGD5Igs+4KvyRLcf8W7qwbYhAEr3+LcLyugOpxQuWZ6cbQw4r2ZUHmcARvaG6x2UpinSn87nvDW8BujWK+iMEAEPEcDC11+GCFmwoKmIhTytsp8C8rQ8q2N+4JqyVyfL/rPDvzpbeXWODj5aiV0IoG/+JnayKMBpDXIa5qrFo0cedUbQvlVX6s9wTmD9pno/xzBh/+F2xq+dTAf0eA5Epa1fyYKCEVtXWHB1gg8F96dUceYZkcLbVpVuC2mvj5eErV9sEaUo2rNxeeX7DZ3KqnqLGhObsphebDLWvyF5S9+wNybimdNzWqup3A4RncV55bbhW416IKrka+lCV/rE8WpFCq+E/UUfGBa+ZuSpk56R2oQOlzhxCSYwFUnVuQqx1Eq+1aP2g0uYJQGAdUfzlpXqkPL3KEG2sMO9wPk5WoC4AoPWvl768U70RGXI8YZtoZLNH3olLSuk8H6Gf676O1ditPk9JdIRdXQL62DwvXHuzUYvLwtWuD9nEJj6g4Wxf00BphD496x1bujAtXBvXZJptbtskkFCs5T+LeLHQ6lAULO7xOuYuBU29XDLJmTGrmlZtO2YqqH+tmLwcdnplyifx/3KQ2lNDDdCfD90S4ASaU/PVZDLjFRKqyC4wMmG5VG04YY1YJKugFB5B7EZqDmUeaDvglPf103Gcf/S8mfodGx4v4lR0cXXF92/2eOhdL2grHJ7I+EsE7sOVxRsAowcOyJEajSaFBRtqlJru9GDBpZnPDPEwgeZ/k7BALo6O8wTTKvn/NT5i5FeeJpMl3/nXl+JFjQPhXtrZyHE0s9uCUSWDLpTZC8ysqMoA0sRCqg4pwj4GdP4rvu9b6hvZFcOiUJxw7ULPELF0YVKUAj4XRT0oXZDxkHG9dikP+TdSxVfUk5TykEDuIiNInMABzHvYBPGOm BdKiKT9e W7yKw6vyFiJ52q/5yGBXVXn7PCgwz/3Yf9aal8ApXyzbl9P0xqqkhMuTTnBzqEi0VcVxZeUYSIUKixbwI0o+pWHAUPbtGZ38VcqgOpqYFqZT+wrzvt7fkdaBVH1gx358vbVDwgxGUJLJfzZ6wiFv9MqsSRsRScFclpAReWpm3K3WdXFEqORb/MskDCpwbnOXVlVQctEa8ojSwN5cH9cqicUYu4U6fmcFbXU8fTX24CGk02snD66yJaHeo2fEY2/LQzNY+RYZyHi2qNYE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Feb 26, 2024 at 08:08:17PM -0500, Kent Overstreet wrote: > On Mon, Feb 26, 2024 at 04:55:29PM -0800, Paul E. McKenney wrote: > > On Mon, Feb 26, 2024 at 07:29:04PM -0500, Kent Overstreet wrote: > > > On Mon, Feb 26, 2024 at 04:05:37PM -0800, Paul E. McKenney wrote: > > > > On Mon, Feb 26, 2024 at 06:29:43PM -0500, Kent Overstreet wrote: > > > > > Well, we won't want it getting hammered on continuously - we should be > > > > > able to tune reclaim so that doesn't happen. > > > > > > > > > > I think getting numbers on the amount of memory stranded waiting for RCU > > > > > is probably first order of business - minor tweak to kfree_rcu() et all > > > > > for that; there's APIs they can query to maintain that counter. > > > > > > > > We can easily tell you the number of blocks of memory waiting to be freed. > > > > But RCU does not know their size. Yes, we could ferret this on each > > > > call to kmem_free_rcu(), but that might not be great for performance. > > > > We could traverse the lists at runtime, but such traversal must be done > > > > with interrupts disabled, which is also not great. > > > > > > > > > then, we can add a heuristic threshhold somewhere, something like > > > > > > > > > > if (rcu_stranded * multiplier > reclaimable_memory) > > > > > kick_rcu() > > > > > > > > If it is a heuristic anyway, it sounds best to base the heuristic on > > > > the number of objects rather than their aggregate size. > > > > > > I don't think that'll really work given that object size can very from < > > > 100 bytes all the way up to 2MB hugepages. The shrinker API works that > > > way and I positively hate it; it's really helpful for introspection and > > > debugability later to give good human understandable units to this > > > stuff. > > > > You might well be right, but let's please try it before adding overhead to > > kfree_rcu() and friends. I bet it will prove to be good and sufficient. > > > > > And __ksize() is pretty cheap, and I think there might be room in struct > > > slab to stick the object size there instead of getting it from the slab > > > cache - and folio_size() is cheaper still. > > > > On __ksize(): > > > > * This should only be used internally to query the true size of allocations. > > * It is not meant to be a way to discover the usable size of an allocation > > * after the fact. Instead, use kmalloc_size_roundup(). > > > > Except that kmalloc_size_roundup() doesn't look like it is meant for > > this use case. On __ksize() being used only internally, I would not be > > at all averse to kfree_rcu() and friends moving to mm. > > __ksize() is the right helper to use for this; ksize() is "how much > usable memory", __ksize() is "how much does this occupy". > > > The idea is for kfree_rcu() to invoke __ksize() when given slab memory > > and folio_size() when given vmalloc() memory? > > __ksize() for slab memory, but folio_size() would be for page > allocations - actually, I think compound_order() is more appropriate > here, but that's willy's area. IOW, for free_pages_rcu(), which AFAIK we > don't have yet but it looks like we're going to need. > > I'm scanning through vmalloc.c and I don't think we have a helper yet to > query the allocation size - I can write one tomorrow, giving my brain a > rest today :) Again, let's give the straight count of blocks a try first. I do see that you feel that the added overhead is negligible, but zero added overhead is even better. Thanx, Paul