From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 45F6EC54798
	for <linux-mm@archiver.kernel.org>; Sun, 25 Feb 2024 05:18:34 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 9C49D6B00EE; Sun, 25 Feb 2024 00:18:33 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 974606B00EF; Sun, 25 Feb 2024 00:18:33 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 83C1C6B00F0; Sun, 25 Feb 2024 00:18:33 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10])
	by kanga.kvack.org (Postfix) with ESMTP id 719786B00EE
	for <linux-mm@kvack.org>; Sun, 25 Feb 2024 00:18:33 -0500 (EST)
Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay06.hostedemail.com (Postfix) with ESMTP id 17895A0C26
	for <linux-mm@kvack.org>; Sun, 25 Feb 2024 05:18:32 +0000 (UTC)
X-FDA: 81829170864.18.2B1F4CD
Received: from out-182.mta1.migadu.com (out-182.mta1.migadu.com [95.215.58.182])
	by imf04.hostedemail.com (Postfix) with ESMTP id 2AC0440004
	for <linux-mm@kvack.org>; Sun, 25 Feb 2024 05:18:28 +0000 (UTC)
Authentication-Results: imf04.hostedemail.com;
	dkim=pass header.d=linux.dev header.s=key1 header.b=JA7cRX5I;
	dmarc=pass (policy=none) header.from=linux.dev;
	spf=pass (imf04.hostedemail.com: domain of kent.overstreet@linux.dev designates 95.215.58.182 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1708838309;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=RMvvtI3oHlC3+eGlhXg1wyOhQeHTqMrpBTAyQcmrNl8=;
	b=WdMbeAabst3S3235kr0msFu+9g3GbZR2K08pNcNYo7Qg8Yd13Qi3KmgKyK9rDX5FYsZiYw
	FoiGuLRBhCV3iudrP/rL+g5YWAr0gETTLXswmAszJHTT9Xtmud/TX2QBw7MeoVffC6n/BF
	Zn/+FcK7QjApwWqAmM5ZMHLMaqv3a4k=
ARC-Authentication-Results: i=1;
	imf04.hostedemail.com;
	dkim=pass header.d=linux.dev header.s=key1 header.b=JA7cRX5I;
	dmarc=pass (policy=none) header.from=linux.dev;
	spf=pass (imf04.hostedemail.com: domain of kent.overstreet@linux.dev designates 95.215.58.182 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708838309; a=rsa-sha256;
	cv=none;
	b=UBIkPunoCQ9DJt3ktyZ6wXbt1gBkfVzPS7tBJKT+6EHD6lT3UZUd4ncNFrQZlCpCRyjuDo
	/YpVkBuguswV0SR3b+nrKCXIOBceD40GQLqBNCPH5+yBgRRpFB7HnokdJPRSGAimDAzCKz
	oLnCKM13awDLdnakAXC+hze5ws3Fx4U=
Date: Sun, 25 Feb 2024 00:18:23 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1;
	t=1708838307;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=RMvvtI3oHlC3+eGlhXg1wyOhQeHTqMrpBTAyQcmrNl8=;
	b=JA7cRX5IMqTETECo+eZdzsrpyGWVpDI4GQsB6yJ8+/SnTuwJ86wx32xSJ0WFCx1xGWwZsO
	i5FAfW8OLJCSqIPasQAns2lOiZxlK0eZl0nB2yv0ZCayaE+nSaiwe+DQAEJ2aaHku/LYCb
	XzldhcQ3+Qh8IUE34SoAdjKxRicsGOA=
X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers.
From: Kent Overstreet <kent.overstreet@linux.dev>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>, 
	Luis Chamberlain <mcgrof@kernel.org>, lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, 
	linux-mm <linux-mm@kvack.org>, Daniel Gomez <da.gomez@samsung.com>, 
	Pankaj Raghav <p.raghav@samsung.com>, Jens Axboe <axboe@kernel.dk>, Dave Chinner <david@fromorbit.com>, 
	Christoph Hellwig <hch@lst.de>, Chris Mason <clm@fb.com>, Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO
Message-ID: <o4a6577t2z5xytjwmixqkl33h23vfnjypwbx7jaaldtldpvjf5@dzbzkhrzyobb>
References: <Zdkxfspq3urnrM6I@bombadil.infradead.org>
 <Zdlsr88A6AAlJpcc@casper.infradead.org>
 <CAHk-=wjUkYLv23KtF=EyCrQcmf9NGwE8Yo1cuxdaLF8gqx5zWw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAHk-=wjUkYLv23KtF=EyCrQcmf9NGwE8Yo1cuxdaLF8gqx5zWw@mail.gmail.com>
X-Migadu-Flow: FLOW_OUT
X-Rspam-User: 
X-Rspamd-Server: rspam12
X-Rspamd-Queue-Id: 2AC0440004
X-Stat-Signature: 5r78ryw8sawoqgk3rspw7qkpf7g9j7mg
X-HE-Tag: 1708838308-670366
X-HE-Meta: U2FsdGVkX19WQUxutHrKFIb2anurjUdG6rCag5qhlxQKleUJCRSHczbAeM+AGViHgtuG4vLlrsNyUMu2BH+sNypbYOGUCgh2bPq8QrCW+E2ngCU6NtEJDlrDAr8DUYHllWJQU6dAH5GJQg6/I2OXPthxNGNoXGhnadBJYI1NRSdt2fvtdM2c38VJu3PfYHEVlDIK1u6MDsN9znpWCOBQHf8yW0PlMJpurhu0Tabmw0kHVMmN/TBv91JnRmkauvZWmyGJV8P7DutoMnq8k5nQ36CrKAza02mFNvcKrEHDvP+J4O75HW2IerxdFaYBHMQ53GFmr5Cz77anC1Bqs7exCYyEOJAP5BtY7Ts1gxlHUe/xtLwY+6bfQeiLbT+1Xi0dwV3iapg8RGt61Rc4geYHAwb+dI5o4TgWyGrpcmU7TK+24weZj0hHa0MNlx0c8AnL/nue8bqRJAiUKGOw+c4pFqau4Nv8V79Jqnso4Nhj7DrSEyl5GCC7R66TNcCP1FETtYPP2BDltWmEpUB5xrce0Vw+f+ayIb6lJ8O+y3OVaLAzx/d6OAR7/HHaLJZms7TwnA0w+3dtxe50npjIm3dqi/J6J812qkNLyEkkfC3w0jdB1l/YER29XDoRrt0SVCkH6ldnx4BvGYqzeQJ+HCrLzd/QOtNJYM4tjl4mVHZZl1/ptOy1bG3vK24r4q/9PwnfM0X4mvOubyqr0j19ULoX3Y6PbuN+RQE8u9bw4cKqGWX8QBBh1uzKdCAtCixVm3k0+JWq0F+nKPbESnZA5MrWQOU37+V/wXErlmp8tom5r0NYaMld+7CNbxZFoJvUI81uMjwonVQCbBELhFF9CoJv1ImkbIQATQSAma7UTcg9AmZQfvD0nNZB0vAQfkAkp0hR30WU15Yuj3wFjzVkVf2bDXQg0jLyYCoXKpQEsQOvdeYlNvX6xrLf06Fqe2WE4XUtXBHD0quC2pKnUUSlBIN
 eTeDgy40
 u4KtZjiRUY3dKShfVLVgm6QQIhUQdFeG9kjv4PSTH2sc+qd+963BWt6APbu8E28mMDZurwBZLA9Yfl37lTwcjtLo4nKghDUMm1IzOX0s27jsmXwfcJ/xJ5j4pwcVs+lQ+5y62MaCS2bwA1D614mRF2LzC6Q==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Sat, Feb 24, 2024 at 09:31:44AM -0800, Linus Torvalds wrote:
> On Fri, 23 Feb 2024 at 20:12, Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Fri, Feb 23, 2024 at 03:59:58PM -0800, Luis Chamberlain wrote:
> > >  What are the limits to buffered IO
> > > and how do we test that? Who keeps track of it?
> >
> > TLDR: Why does the pagecache suck?
> 
> What? No.
> 
> Our page cache is so good that the question is literally "what are the
> limits of it", and "how we would measure them".
> 
> That's not a sign of suckage.
> 
> When you have to have completely unrealistic loads that nobody would
> actually care about in reality just to get a number for the limit,
> it's not a sign of problems.
> 
> Or rather, the "problem" is the person looking at a stupid load, and
> going "we need to improve this because I can write a benchmark for
> this".
> 
> Here's a clue: a hardware discussion forum I visit was arguing about
> memory latencies, and talking about how their measured overhead of
> DRAM latency was literally 85% on the CPU side, not the DRAM side.
> 
> Guess what? It's because the CPU in question had quite a bit of L3,
> and it was spread out, and the CPU doesn't even start the memory
> access before it has checked caches.
> 
> And here's a big honking clue: only a complete nincompoop and mentally
> deficient rodent would look at that and say "caches suck".
> 
> > >  ~86 GiB/s on pmem DIO on xfs with 64k block size, 1024 XFS agcount on x86_64
> > >      Vs
> > >  ~ 7,000 MiB/s with buffered IO
> >
> > Profile?  My guess is that you're bottlenecked on the xa_lock between
> > memory reclaim removing folios from the page cache and the various
> > threads adding folios to the page cache.
> 
> I doubt it's the locking.
> 
> In fact, for writeout in particular it's probably not even the page
> cache at all.
> 
> For writeout, we have a very traditional problem: we care about a
> million times more about latency than we care about throughput,
> because nobody ever actually cares all that much about performance of
> huge writes.

Before large folios, we had people very much bottlenecked by 4k page
overhead on sequential IO; my customer/sponsor was one of them.

Factor of 2 or 3, IIRC; it was _bad_. And when you looked at the
profiles and looked at the filemap.c code it wasn't hard to see why;
we'd walk a radix tree, do an atomic op (get the page), then do a 4k
usercopy... hence the work I did to break up
generic_file_buffered_read() and vectorize it, which was a huge
improvement.

It's definitely less of a factor when post large folios and when we're
talking about workloads that don't fit in cache, but I always wanted to
do a generic version of the vectorized write path that brfs and bcachefs
have.