From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45F6EC54798 for ; Sun, 25 Feb 2024 05:18:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9C49D6B00EE; Sun, 25 Feb 2024 00:18:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 974606B00EF; Sun, 25 Feb 2024 00:18:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 83C1C6B00F0; Sun, 25 Feb 2024 00:18:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 719786B00EE for ; Sun, 25 Feb 2024 00:18:33 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 17895A0C26 for ; Sun, 25 Feb 2024 05:18:32 +0000 (UTC) X-FDA: 81829170864.18.2B1F4CD Received: from out-182.mta1.migadu.com (out-182.mta1.migadu.com [95.215.58.182]) by imf04.hostedemail.com (Postfix) with ESMTP id 2AC0440004 for ; Sun, 25 Feb 2024 05:18:28 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=JA7cRX5I; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf04.hostedemail.com: domain of kent.overstreet@linux.dev designates 95.215.58.182 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708838309; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RMvvtI3oHlC3+eGlhXg1wyOhQeHTqMrpBTAyQcmrNl8=; b=WdMbeAabst3S3235kr0msFu+9g3GbZR2K08pNcNYo7Qg8Yd13Qi3KmgKyK9rDX5FYsZiYw FoiGuLRBhCV3iudrP/rL+g5YWAr0gETTLXswmAszJHTT9Xtmud/TX2QBw7MeoVffC6n/BF Zn/+FcK7QjApwWqAmM5ZMHLMaqv3a4k= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=JA7cRX5I; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf04.hostedemail.com: domain of kent.overstreet@linux.dev designates 95.215.58.182 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708838309; a=rsa-sha256; cv=none; b=UBIkPunoCQ9DJt3ktyZ6wXbt1gBkfVzPS7tBJKT+6EHD6lT3UZUd4ncNFrQZlCpCRyjuDo /YpVkBuguswV0SR3b+nrKCXIOBceD40GQLqBNCPH5+yBgRRpFB7HnokdJPRSGAimDAzCKz oLnCKM13awDLdnakAXC+hze5ws3Fx4U= Date: Sun, 25 Feb 2024 00:18:23 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1708838307; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=RMvvtI3oHlC3+eGlhXg1wyOhQeHTqMrpBTAyQcmrNl8=; b=JA7cRX5IMqTETECo+eZdzsrpyGWVpDI4GQsB6yJ8+/SnTuwJ86wx32xSJ0WFCx1xGWwZsO i5FAfW8OLJCSqIPasQAns2lOiZxlK0eZl0nB2yv0ZCayaE+nSaiwe+DQAEJ2aaHku/LYCb XzldhcQ3+Qh8IUE34SoAdjKxRicsGOA= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: Linus Torvalds Cc: Matthew Wilcox , Luis Chamberlain , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm , Daniel Gomez , Pankaj Raghav , Jens Axboe , Dave Chinner , Christoph Hellwig , Chris Mason , Johannes Weiner Subject: Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 2AC0440004 X-Stat-Signature: 5r78ryw8sawoqgk3rspw7qkpf7g9j7mg X-HE-Tag: 1708838308-670366 X-HE-Meta: U2FsdGVkX19WQUxutHrKFIb2anurjUdG6rCag5qhlxQKleUJCRSHczbAeM+AGViHgtuG4vLlrsNyUMu2BH+sNypbYOGUCgh2bPq8QrCW+E2ngCU6NtEJDlrDAr8DUYHllWJQU6dAH5GJQg6/I2OXPthxNGNoXGhnadBJYI1NRSdt2fvtdM2c38VJu3PfYHEVlDIK1u6MDsN9znpWCOBQHf8yW0PlMJpurhu0Tabmw0kHVMmN/TBv91JnRmkauvZWmyGJV8P7DutoMnq8k5nQ36CrKAza02mFNvcKrEHDvP+J4O75HW2IerxdFaYBHMQ53GFmr5Cz77anC1Bqs7exCYyEOJAP5BtY7Ts1gxlHUe/xtLwY+6bfQeiLbT+1Xi0dwV3iapg8RGt61Rc4geYHAwb+dI5o4TgWyGrpcmU7TK+24weZj0hHa0MNlx0c8AnL/nue8bqRJAiUKGOw+c4pFqau4Nv8V79Jqnso4Nhj7DrSEyl5GCC7R66TNcCP1FETtYPP2BDltWmEpUB5xrce0Vw+f+ayIb6lJ8O+y3OVaLAzx/d6OAR7/HHaLJZms7TwnA0w+3dtxe50npjIm3dqi/J6J812qkNLyEkkfC3w0jdB1l/YER29XDoRrt0SVCkH6ldnx4BvGYqzeQJ+HCrLzd/QOtNJYM4tjl4mVHZZl1/ptOy1bG3vK24r4q/9PwnfM0X4mvOubyqr0j19ULoX3Y6PbuN+RQE8u9bw4cKqGWX8QBBh1uzKdCAtCixVm3k0+JWq0F+nKPbESnZA5MrWQOU37+V/wXErlmp8tom5r0NYaMld+7CNbxZFoJvUI81uMjwonVQCbBELhFF9CoJv1ImkbIQATQSAma7UTcg9AmZQfvD0nNZB0vAQfkAkp0hR30WU15Yuj3wFjzVkVf2bDXQg0jLyYCoXKpQEsQOvdeYlNvX6xrLf06Fqe2WE4XUtXBHD0quC2pKnUUSlBIN eTeDgy40 u4KtZjiRUY3dKShfVLVgm6QQIhUQdFeG9kjv4PSTH2sc+qd+963BWt6APbu8E28mMDZurwBZLA9Yfl37lTwcjtLo4nKghDUMm1IzOX0s27jsmXwfcJ/xJ5j4pwcVs+lQ+5y62MaCS2bwA1D614mRF2LzC6Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Feb 24, 2024 at 09:31:44AM -0800, Linus Torvalds wrote: > On Fri, 23 Feb 2024 at 20:12, Matthew Wilcox wrote: > > > > On Fri, Feb 23, 2024 at 03:59:58PM -0800, Luis Chamberlain wrote: > > > What are the limits to buffered IO > > > and how do we test that? Who keeps track of it? > > > > TLDR: Why does the pagecache suck? > > What? No. > > Our page cache is so good that the question is literally "what are the > limits of it", and "how we would measure them". > > That's not a sign of suckage. > > When you have to have completely unrealistic loads that nobody would > actually care about in reality just to get a number for the limit, > it's not a sign of problems. > > Or rather, the "problem" is the person looking at a stupid load, and > going "we need to improve this because I can write a benchmark for > this". > > Here's a clue: a hardware discussion forum I visit was arguing about > memory latencies, and talking about how their measured overhead of > DRAM latency was literally 85% on the CPU side, not the DRAM side. > > Guess what? It's because the CPU in question had quite a bit of L3, > and it was spread out, and the CPU doesn't even start the memory > access before it has checked caches. > > And here's a big honking clue: only a complete nincompoop and mentally > deficient rodent would look at that and say "caches suck". > > > > ~86 GiB/s on pmem DIO on xfs with 64k block size, 1024 XFS agcount on x86_64 > > > Vs > > > ~ 7,000 MiB/s with buffered IO > > > > Profile? My guess is that you're bottlenecked on the xa_lock between > > memory reclaim removing folios from the page cache and the various > > threads adding folios to the page cache. > > I doubt it's the locking. > > In fact, for writeout in particular it's probably not even the page > cache at all. > > For writeout, we have a very traditional problem: we care about a > million times more about latency than we care about throughput, > because nobody ever actually cares all that much about performance of > huge writes. Before large folios, we had people very much bottlenecked by 4k page overhead on sequential IO; my customer/sponsor was one of them. Factor of 2 or 3, IIRC; it was _bad_. And when you looked at the profiles and looked at the filemap.c code it wasn't hard to see why; we'd walk a radix tree, do an atomic op (get the page), then do a 4k usercopy... hence the work I did to break up generic_file_buffered_read() and vectorize it, which was a huge improvement. It's definitely less of a factor when post large folios and when we're talking about workloads that don't fit in cache, but I always wanted to do a generic version of the vectorized write path that brfs and bcachefs have.