From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4FA13C47DD9 for ; Sun, 25 Feb 2024 06:04:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5D52E6B00ED; Sun, 25 Feb 2024 01:04:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 585516B00F1; Sun, 25 Feb 2024 01:04:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 427D76B00ED; Sun, 25 Feb 2024 01:04:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 2828B6B00E7 for ; Sun, 25 Feb 2024 01:04:26 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id D433F1C03A1 for ; Sun, 25 Feb 2024 06:04:25 +0000 (UTC) X-FDA: 81829286490.09.E099D01 Received: from out-183.mta0.migadu.com (out-183.mta0.migadu.com [91.218.175.183]) by imf07.hostedemail.com (Postfix) with ESMTP id 065CB40008 for ; Sun, 25 Feb 2024 06:04:23 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=V43mQgOl; spf=pass (imf07.hostedemail.com: domain of kent.overstreet@linux.dev designates 91.218.175.183 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708841064; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=EU36V1tQnx8vg/C/vT87h1j5srV/PUH7jtLIv7q7Y8c=; b=l5FKrvw5DVPNEcGSZKBCSjiorjPP5sILgvw3IfoU6AchTwmWLyAs2g+tFyaouts1VL9A4M 3aMhXN9bwwNT8o1bmBlvoNn442/zFCo+ZEeVf3MZbMWiIUsPV+uifLV+5JBG1VWbX3JL1B 9/Pc/ULCDOQGQ/YtPcgQuzIk/sNhNW0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708841064; a=rsa-sha256; cv=none; b=wa+EqmJKZAL87jRT37Kol8sRZ4LdBtupzO0XO1PEQKJXXlEuM1zSI1PhBrPYZKv33dNaf2 zQgtiICb8QQaBK0ugRKGTVhdA8d39ZC9+RkME++cdDrEXnsqi0GKtJwOJkFWv9U2wO9nU5 KYgaSIs9Do4kiiwxXjX91guKiDxhlPg= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=V43mQgOl; spf=pass (imf07.hostedemail.com: domain of kent.overstreet@linux.dev designates 91.218.175.183 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev; dmarc=pass (policy=none) header.from=linux.dev Date: Sun, 25 Feb 2024 01:04:16 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1708841061; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=EU36V1tQnx8vg/C/vT87h1j5srV/PUH7jtLIv7q7Y8c=; b=V43mQgOl/1kcRNScKdPzTjfI1RWxuVr3a/tvexOgqRTZWPAIhGs+V4YBcFfKrCeYf8CgWI bChE1KDA27Xd0vTX6/2sgk9hGj9PaSYpXAULgD4vCdEZXQop5IsyRk3ugszrPTRjb0b0On xDfFwzScN9ipv/ibN0O1Oq5HGGX/eEo= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: Linus Torvalds Cc: Matthew Wilcox , Luis Chamberlain , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm , Daniel Gomez , Pankaj Raghav , Jens Axboe , Dave Chinner , Christoph Hellwig , Chris Mason , Johannes Weiner Subject: Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Stat-Signature: fx5kc7htbbmo3tywemw6d351xbybomi1 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 065CB40008 X-Rspam-User: X-HE-Tag: 1708841063-207501 X-HE-Meta: U2FsdGVkX1+Ij9+asHkCRfv4ipjuC9XuJ2inI1xB5tZPApd0aOYP8YyiFZ8D1j1lhFQ4KJ4UbBFCln70nCLJo1A0O6b6iU3kaBM/qddkcg3nJMmfrA0eHtgewLH3DCK4YF1BNJ1GN56pMiN6YIsurngMQ6sWpxCk926bHfF5dKHcV6mxRLE0Cx880JxvlXTRcC5SSbKqOJTiuzkK/VOeZuBPxiZbXCF3tRmNEuv8mBLubywZOxcGDYUIZNdSfe04n0NW2CuHfs4N0aj39yrtvGLlP9gDsMxc1Woxvq8mIasaKI2azZ+wzm3JgxAgyRi6HqSQdrqnL5L0f/A5oBQaYS2+c7Jkku1OYS5mSZ9IReJjzPkq8yCt4oE6wG/tJsL4a5rcPuGL5a4uBJkqWHT4ae1hXDfjhmcM+fDsnBBBJ4lugW7RrEgnqDgH7pDnrtxy9RJ4E2QET9MJsPZ3KkQjLAxyxiN8n465PpHG7uKOGLtX15ogsh7iBlaIx6s3+YAwW8Gdqzb1abdvt36ucJEXbg9miWqiGJo4e4w453+VzN7VQP/thmey/ofjF1V5cV4IL7iyWKLdYZfwC6S8+8w93JBcRNWgUzQiV8TfLmpaDy/d9qHO/ETVNt0OsGyDVbFeMyaqfj8WW0Th2X5PYGKQi1fGmymGcmGXEHgzJlPDw03kFEm1bQ1e2QmZ9mqK3s0Vz7RK/Fr86mmHBc+Kaq4+soiGhn5NihyTueY9ZSNLr2nEno3MidSGeY7zNVgC9oozla7RGzJC7OycqCPTzqKULz9Ym2f9GsGFsQCA0XQJE6xmJ7ByE4wgaLV8KT8/M4Wh0xjcRh5Wu+EsTJWCggHOYcdmNsrYWdruzTuvL+K2ds+DRN3Xvd0hMbQk77c6PklyIclQ9x1JeNiMu97gl7pD9iTvP8pwRa9D4laa2hePQg/WvJBwI0YijulSa9QrS8W7F8UbzLIVhxpy4sQKfl1 hSlag9Sd /PzfIwSEV3JnQzGOg9txp4zfRM+WKYwXMss7F8Xe2mxUoBsy1PMNSPxTVvFnvnkktnrQx57G7EhvZQO0hskFUPh5USg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Feb 25, 2024 at 12:18:23AM -0500, Kent Overstreet wrote: > On Sat, Feb 24, 2024 at 09:31:44AM -0800, Linus Torvalds wrote: > Before large folios, we had people very much bottlenecked by 4k page > overhead on sequential IO; my customer/sponsor was one of them. > > Factor of 2 or 3, IIRC; it was _bad_. And when you looked at the > profiles and looked at the filemap.c code it wasn't hard to see why; > we'd walk a radix tree, do an atomic op (get the page), then do a 4k > usercopy... hence the work I did to break up > generic_file_buffered_read() and vectorize it, which was a huge > improvement. > > It's definitely less of a factor when post large folios and when we're > talking about workloads that don't fit in cache, but I always wanted to > do a generic version of the vectorized write path that brfs and bcachefs > have. to expound further, our buffered io performance really is crap vs. direct in lots of real world scenarios, and what was going on in generic_file_buffered_read() was just one instance of a larger theme - walking data structures, taking locks/atomics/barriers, then doing work on the page/folio with cacheline bounces, in a loop - lots of places where batching/vectorizing would help a lot but it tends to be insufficient. i had patches that went further than the generic_file_buffered_read() rework to vectorize add_to_page_cache_lru(), and that was another significant improvement. the pagecache lru operations were another hot spot... willy and I at one point were spitballing getting rid of the linked list for a dequeue, more for getting rid of the list_head in struct page/folio and replacing it with a single size_t index, but it'd open up more vectorizing possibilities i give willy crap about the .readahead interface... the way we make the filesystem code walk the xarray to get the folios instead of just passing it a vector is stupid folio_batch is stupid, it shouldn't be fixed size. there's no reason for that to be a small fixed size array on the stack, the slub fastpath has no atomic ops and doesn't disable preemption or interrupts - it's _fast_. just use a darray and vectorize the whole operation but that wouldn't be the big gains, bigger would be hunting down all the places that aren't vectorized and should be. i haven't reviewed the recent .writepages work christoph et all are doing, if that's properly vectorized now that'll help