From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id F04FFC48BF6
	for <linux-mm@archiver.kernel.org>; Mon, 26 Feb 2024 21:17:30 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 62BCD4401C6; Mon, 26 Feb 2024 16:17:30 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 5B45244017F; Mon, 26 Feb 2024 16:17:30 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 455D54401C6; Mon, 26 Feb 2024 16:17:30 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11])
	by kanga.kvack.org (Postfix) with ESMTP id 2D4A044017F
	for <linux-mm@kvack.org>; Mon, 26 Feb 2024 16:17:30 -0500 (EST)
Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay10.hostedemail.com (Postfix) with ESMTP id E7BC5C0A88
	for <linux-mm@kvack.org>; Mon, 26 Feb 2024 21:17:29 +0000 (UTC)
X-FDA: 81835216218.19.280D20E
Received: from out-185.mta0.migadu.com (out-185.mta0.migadu.com [91.218.175.185])
	by imf09.hostedemail.com (Postfix) with ESMTP id DA35F140017
	for <linux-mm@kvack.org>; Mon, 26 Feb 2024 21:17:27 +0000 (UTC)
Authentication-Results: imf09.hostedemail.com;
	dkim=pass header.d=linux.dev header.s=key1 header.b=Bu4rM9z3;
	dmarc=pass (policy=none) header.from=linux.dev;
	spf=pass (imf09.hostedemail.com: domain of kent.overstreet@linux.dev designates 91.218.175.185 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708982248; a=rsa-sha256;
	cv=none;
	b=pxa2NAiU0JYXhBqJJ6YM2LDjJSSLtPyml/smDAs9cDpFtu3quqHlCRScbQxwSfxeYdlAbp
	DlRA8uC3sMRzAXC4Sx4zEwsKjygdz5m/X2EEXb7XGXb+uJOsfBHnyZcCKwBFCPNUSi3FlY
	2h9lRGvSc6cnc5Yf4iM6jwQCgOmUsW0=
ARC-Authentication-Results: i=1;
	imf09.hostedemail.com;
	dkim=pass header.d=linux.dev header.s=key1 header.b=Bu4rM9z3;
	dmarc=pass (policy=none) header.from=linux.dev;
	spf=pass (imf09.hostedemail.com: domain of kent.overstreet@linux.dev designates 91.218.175.185 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1708982248;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=jLf3txpUuSUSMJ8XqtuKRPYwmCVpIzNopRtHeIncJA4=;
	b=rQJAJsTdFNxCpGT3L3QvzsIJ1lzDHM+Aj5yUxAELyXuZYGPNAaibuzDoRjBpLR70A/DGLE
	69Do9rjUMBAzTJy0ebrlgGCB7os3+Vqh0cTOcGN+E6DtKEPG77Zo4lbwSFLGpPbJN1uidb
	qiab4uju1vdGEYSYOrmdb5XHMVLnRqo=
Date: Mon, 26 Feb 2024 16:17:19 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1;
	t=1708982245;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=jLf3txpUuSUSMJ8XqtuKRPYwmCVpIzNopRtHeIncJA4=;
	b=Bu4rM9z3Wp3E+ZycnuTvhybTjYR4KFdnGFue8NWUvqNmDaD6dH45V2pfMRv8/IiI+Z3XrA
	yM14ZGch4szglv8fPpQ7wndHLu4JmC6PUFUUOhYMnnmbcpYY25m7B7U6c2yRaJ5O1F5Tqx
	FSQzNljdPRWVo2Y37IEQsH1gqT2Vd7I=
X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers.
From: Kent Overstreet <kent.overstreet@linux.dev>
To: Matthew Wilcox <willy@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>, 
	Al Viro <viro@kernel.org>, Luis Chamberlain <mcgrof@kernel.org>, 
	lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm <linux-mm@kvack.org>, 
	Daniel Gomez <da.gomez@samsung.com>, Pankaj Raghav <p.raghav@samsung.com>, 
	Jens Axboe <axboe@kernel.dk>, Dave Chinner <david@fromorbit.com>, 
	Christoph Hellwig <hch@lst.de>, Chris Mason <clm@fb.com>, Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO
Message-ID: <znixgiqxzoksfwwzggmzsu6hwpqfszigjh5k6hx273qil7dx5t@5dxcovjdaypk>
References: <o4a6577t2z5xytjwmixqkl33h23vfnjypwbx7jaaldtldpvjf5@dzbzkhrzyobb>
 <Zds8T9O4AYAmdS9d@casper.infradead.org>
 <CAHk-=wgVPHPPjZPoV8E_q59L7i8zFjHo_5hHo_+qECYuy7FF6g@mail.gmail.com>
 <Zduto30LUEqIHg4h@casper.infradead.org>
 <CAHk-=wibYaWYqs5A30a7ywJdsW5LDT1LYysjcCmzjzkK=uh+tQ@mail.gmail.com>
 <bk45mgxpdbm5gfa6wl37nhecttnb5bxh6wo3slixsray77azu5@pi3bblfn3c5u>
 <CAHk-=wjnW96+oP0zhEd1zjPNqOHvrddKkwp0+CuS5HpZavfmMQ@mail.gmail.com>
 <Zdv8dujdOg0dD53k@duke.home>
 <CAHk-=wiEVcqTU1oQPSjaJvxj5NReg3GzkBO8zpL1tXFG1UVyvg@mail.gmail.com>
 <Zdz9p_Kn0puI1KEL@casper.infradead.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <Zdz9p_Kn0puI1KEL@casper.infradead.org>
X-Migadu-Flow: FLOW_OUT
X-Rspam-User: 
X-Rspamd-Server: rspam06
X-Rspamd-Queue-Id: DA35F140017
X-Stat-Signature: 639ip1zq6kpteqifn5dc5a89h4uf9rfj
X-HE-Tag: 1708982247-648378
X-HE-Meta: U2FsdGVkX18qScvYWVklSLydKaF56rj/VOh4Gf4G9nupqjuas0VSJpOksb06ydSQL34RQRvEI2opvFbxkd1qtxoq2dU6FZUTHtflYvwL0mE6Gr48CBG3IrPgZzBc/CCoXDjYThQGjCFbgeWOXFNiKhN0ZTM/4Uf0a9D1gRM1LvInV1Ey+rq+8zTjya6vmkXe6SqNciiNduCgdCEMB0kZZGT9sSRo7KmoQiSXm8NYSpYhjV86HwFVt2c9RbQzfO7uV4cUDGqsE/83Cbt3dupMaluzbQMFizN+5FLvm8pYLwDUw19y1hf5/LZoAwgGq/0TkZUhwVlLYZlKYgMnsQwWr91sfH79yYwGhrGOv9fv2l2hZ0xzxnYJo5W/dlv9+TpVPRqRfrRZFq/+pCpQ8w/DuUG8jAwLn6M4PAytrJNKSmC8ldpb9CbznOwSU/RcWShLboFPDefdv9sCydtP1wyWRBMp6vR3Op039mTKV6pNM4eyKxGD1v2CI2XUUHxrwpiDGKvPLNo3XitDisKSmBuHMxTP++prchaVf5pXZnA49H8/jnFpLkgZDyMcylw9695qWXFU+upPEpBZDzUozMRE/ZQJ4dVyqD+sjsngmUcV+KSHtFnyvxrYZGNZYDpoDiYXNFeFgGwcs10pbCvdiJy46nmyeqWYafgG0w1d1xRc33mb/9UxWrV8iQ+vM3+wVBx3bn3r0y3Fga/eNqBVOW+5kRYEcTY1yIem22C+6LKISfpvwwsQXKmZPPzNUZUHCi0MrlG9P3VMLxO9WHH7gDPUsJyzTwgHXlmwyTiI3nfIcCPeLCGlKJQjS/OKrUrWwuUfJxVvmaspWct9BZK0lUXm1obOmv77NPfwAUkcLYJChKkqnOm/RQTZuuPue88ORZRzapC82WrQcpmEDxLskoqKeDUaextvE+uTnIdIWxbDcszUaffaq6iRKg4ulhHSlkH5xlVGK8bmjRX1T/90qQM
 gUdFsCNp
 1Wooap64MFhelLBvheDt2vh099VXOOW3xMkHV8TOfE1ahJXFFQaHuKSLvCsQz4TrzUdjPQfPGWY2SrJJZ6FA/7IyZ+A==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Mon, Feb 26, 2024 at 09:07:51PM +0000, Matthew Wilcox wrote:
> On Mon, Feb 26, 2024 at 09:17:33AM -0800, Linus Torvalds wrote:
> > Willy - tangential side note: I looked closer at the issue that you
> > reported (indirectly) with the small reads during heavy write
> > activity.
> > 
> > Our _reading_ side is very optimized and has none of the write-side
> > oddities that I can see, and we just have
> > 
> >   filemap_read ->
> >     filemap_get_pages ->
> >         filemap_get_read_batch ->
> >           folio_try_get_rcu()
> > 
> > and there is no page locking or other locking involved (assuming the
> > page is cached and marked uptodate etc, of course).
> > 
> > So afaik, it really is just that *one* atomic access (and the matching
> > page ref decrement afterwards).
> 
> Yep, that was what the customer reported on their ancient kernel, and
> we at least didn't make that worse ...
> 
> > We could easily do all of this without getting any ref to the page at
> > all if we did the page cache release with RCU (and the user copy with
> > "copy_to_user_atomic()").  Honestly, anything else looks like a
> > complete disaster. For tiny reads, a temporary buffer sounds ok, but
> > really *only* for tiny reads where we could have that buffer on the
> > stack.
> > 
> > Are tiny reads (handwaving: 100 bytes or less) really worth optimizing
> > for to that degree?
> > 
> > In contrast, the RCU-delaying of the page cache might be a good idea
> > in general. We've had other situations where that would have been
> > nice. The main worry would be low-memory situations, I suspect.
> > 
> > The "tiny read" optimization smells like a benchmark thing to me. Even
> > with the cacheline possibly bouncing, the system call overhead for
> > tiny reads (particularly with all the mitigations) should be orders of
> > magnitude higher than two atomic accesses.
> 
> Ah, good point about the $%^&^*^ mitigations.  This was pre mitigations.
> I suspect that this customer would simply disable them; afaik the machine
> is an appliance and one interacts with it purely by sending transactions
> to it (it's not even an SQL system, much less a "run arbitrary javascript"
> kind of system).  But that makes it even more special case, inapplicable
> to the majority of workloads and closer to smelling like a benchmark.
> 
> I've thought about and rejected RCU delaying of the page cache in the
> past.  With the majority of memory in anon memory & file memory, it just
> feels too risky to have so much memory waiting to be reused.  We could
> also improve gup-fast if we could rely on RCU freeing of anon memory.
> Not sure what workloads might benefit from that, though.

RCU allocating and freeing of memory can already be fairly significant
depending on workload, and I'd expect that to grow - we really just need
a way for reclaim to kick RCU when needed (and probably add a percpu
counter for "amount of memory stranded until the next RCU grace
period").