From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6041C54798 for ; Tue, 27 Feb 2024 15:53:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DE5036B00C7; Tue, 27 Feb 2024 10:53:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D6E6B6B00C8; Tue, 27 Feb 2024 10:53:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C0F8A6B00C9; Tue, 27 Feb 2024 10:53:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id AC7916B00C7 for ; Tue, 27 Feb 2024 10:53:00 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 4A5F140C27 for ; Tue, 27 Feb 2024 15:53:00 +0000 (UTC) X-FDA: 81838027320.14.87B5057 Received: from out-175.mta0.migadu.com (out-175.mta0.migadu.com [91.218.175.175]) by imf02.hostedemail.com (Postfix) with ESMTP id 58D7580002 for ; Tue, 27 Feb 2024 15:52:58 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=AETDdD1Y; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf02.hostedemail.com: domain of kent.overstreet@linux.dev designates 91.218.175.175 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709049178; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=C39b/mCt7F7Qjgbxzc0LW6hLxEjJK9uDWuCglP11aLs=; b=LKJNWQT6bwhDfV99AP48q6zfT8I2/lw6bjaT3rXoePdxbsoDIJuxT9CGQxtqdxOJ7cAM1C +640DFha+bpSKD+S9zQgCr5bWxHMrppS0aeJF3CS65scR8fxw5AGDEIPjVZAhj9RiJYzS/ 9BVoVGf7I93agk3qkTmscS05DdLjZKM= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=AETDdD1Y; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf02.hostedemail.com: domain of kent.overstreet@linux.dev designates 91.218.175.175 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709049178; a=rsa-sha256; cv=none; b=LlmpjWD9mVtUv8uoYB8cWmPnMCl6NgHzuKZMecwejDMi2a5N9qZI/yGpZAJor81TvzBbo4 xmTKKBuJ03HNtDJ4j389JEF1KVKjO8B8RwNBoSoarm94PNR8jeVZlzA55dCsSsNWoWXElJ J+Ftf6/0blPBG3DoxDaBEw36W5sehoU= Date: Tue, 27 Feb 2024 10:52:51 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1709049176; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=C39b/mCt7F7Qjgbxzc0LW6hLxEjJK9uDWuCglP11aLs=; b=AETDdD1YfVpSQ9mcro7CXy6SYi55/8zWIIFU16nIlSCtS1Xb+O8x8UoyR1c3gXtmQ7QUcn ZkrVASO8yHgPMwy/nZCGssvOB4TgAhjudBm5oe/RAiHejuiUBk/hvMmL3Dxjk36eoS28bj UKSBZ56setJKxyR5QAjxdag5Lri++/4= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: "Paul E. McKenney" Cc: Matthew Wilcox , Linus Torvalds , Al Viro , Luis Chamberlain , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm , Daniel Gomez , Pankaj Raghav , Jens Axboe , Dave Chinner , Christoph Hellwig , Chris Mason , Johannes Weiner Subject: Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO Message-ID: References: <5c6ueuv5vlyir76yssuwmfmfuof3ukxz6h5hkyzfvsm2wkncrl@7wvkfpmvy2gp> <49354148-4dea-4c89-b591-76b21ed4a5d1@paulmck-laptop> <6xpyltamnbd7q7nesntqspyfjfq3jexkmfyj2fekrk2mrhktcr@73vij67d5vne> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 58D7580002 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: xj9xjsoudmi3akyherc1o95bjgw1oj6n X-HE-Tag: 1709049178-670397 X-HE-Meta: U2FsdGVkX19YPdUtCQo648uCh/CTqhCc0mqkwlZZRnxgJh+6vkQEcOTUmnsBrQlZd7JBefUEaPCCjOljI3gIMK+jegXRcB/zT+RduE99xaI+RouPOkOKm2J6jU0QfhaB1mV0cI1TSiYbdXqc+A5679rvYpII0E6v6Z3Y1CjS6ReTY5bcbdUR3EAFIVaxMfHutFB8+DkfcD5xanfUcfQ/a0w4c/UpbQOxX6u4mINwdV3BfVcRLP5MggX19hjdXeWdiZIbezqyU3XjyvfjNaRRQmP3LciOyT+Rn2N/o5W/ltgQXoAK+xzeEM8zY2xgX5IUPp88mSseIC8HvkLpvRMpb7nDX19qSy8CChtxuF+SJHnzkYwGQAjSsZF0TItmfgu7JpKJORFYgCBTV1TWV56MnLaaF+nfsaEjswzcaL+gyo+789lOkobS0lMPas26iUvLWGTW05m0DMDbvhRUcw8nVmpizkVqCFtxWX3HD22C4ugj/R2O5YEWvNX9OZIoxdxQ5N0ushYKW+BMy+yRDqfyiNdZrqYs3oyGNLrGKPWB9WBgCehQX62r9X8Oo27p8hQvyFDG0ez8opW6BdExmM1V+2Raqjb5DSq51Xp0dV6vS0y1o2MqF9cpSoJKuGE4jaVgudugRvN6jiLYeFwd+JtqsfRVKFG6Wi3qZIC6LkLKFPNroiMvNdFSV+rlnMmlnS8Ot9XFumTNq3njT0fy5EBZOcHahfOdfl/hMYl+3Pt9krjTH4XzqtbzRxm0EidT6rJeSgBgGoaM4VZxrwQudI8kwOlC66eCJdIOCY6Fw8AoNQ3hsH59uJJhHcEuGGc/ldcUGn4EneGadlRfIXdewAQI6Er2ahZZvWnKwnZo/Dwl07AIEEjOMLWuLxpleQ4YvPm3VsYrFNB+n/Nlc72kpDyUFSbIL+HJ9ZPv7J0m0TZZT6kzGkyADS5iU7LI6kgDHmOTwv/moaIY+E5Ked1k8zn dgyAK5co dgqOxtMwnQfMQhhtIdVyp6kDfFaNbqHBUrrCMKNJ/YMxoxUiDvWBiIIA1cNKdQZNh6MIrbRNYNfC5HQRw7Gej9SzSCw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Feb 27, 2024 at 07:32:32AM -0800, Paul E. McKenney wrote: > I could simply use the same general approach that I use within RCU > itself, which currently has absolutely no idea how much memory (if any) > that each callback will free. Especially given that some callbacks > free groups of memory blocks, while other free nothing. ;-) > > Alternatively, we could gather statistics on the amount of memory freed > by each callback and use that as an estimate. > > But we should instead step back and ask exactly what we are trying to > accomplish here, which just might be what Dave Chinner was getting at. > > At a ridiculously high level, reclaim is looking for memory to free. > Some read-only memory can often be dropped immediately on the grounds > that its data can be read back in if needed. Other memory can only be > dropped after being written out, which involves a delay. There are of > course many other complications, but this will do for a start. > > So, where does RCU fit in? > > RCU fits in between the two. With memory awaiting RCU, there is no need > to write anything out, but there is a delay. As such, memory waiting > for an RCU grace period is similar to memory that is to be reclaimed > after its I/O completes. > > One complication, and a complication that we are considering exploiting, > is that, unlike reclaimable memory waiting for I/O, we could often > (but not always) have some control over how quickly RCU's grace periods > complete. And we already do this programmatically by using the choice > between sychronize_rcu() and synchronize_rcu_expedited(). The question > is whether we should expedite normal RCU grace periods during reclaim, > and if so, under what conditions. > > You identified one potential condition, namely the amount of memory > waiting to be reclaimed. One complication with this approach is that RCU > has no idea how much memory each callback represents, and for call_rcu(), > there is no way for it to find out. For kfree_rcu(), there are ways, > but as you know, I am questioning whether those ways are reasonable from > a performance perspective. But even if they are, we would be accepting > more error from the memory waiting via call_rcu() than we would be > accepting if we just counted blocks instead of bytes for kfree_rcu(). You're _way_ overcomplicating this. The relevant thing to consider is the relative cost of __ksize() and kfree_rcu(). __ksize() is already pretty cheap, and with slab gone and space available in struct slab we can get it down to a single load. > Let me reiterate that: The estimation error that you are objecting to > for kfree_rcu() is completely and utterly unavoidable for call_rcu(). hardly, callsites manually freeing memory manually after an RCU grace period can do the accounting manually - if they're hot enough to matter, most aren.t and with memory allocation profiling coming, which also tracks # of allocations, we'll also have an easy way to spot those.