From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BAB95C54798 for ; Tue, 27 Feb 2024 14:57:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4D46B28001B; Tue, 27 Feb 2024 09:57:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 484D2280017; Tue, 27 Feb 2024 09:57:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 373C928001B; Tue, 27 Feb 2024 09:57:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 2946E280017 for ; Tue, 27 Feb 2024 09:57:40 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id C8A4D1A0C0F for ; Tue, 27 Feb 2024 14:57:39 +0000 (UTC) X-FDA: 81837887838.15.C760299 Received: from out-175.mta0.migadu.com (out-175.mta0.migadu.com [91.218.175.175]) by imf22.hostedemail.com (Postfix) with ESMTP id 73AD0C000D for ; Tue, 27 Feb 2024 14:57:36 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=q1J9vV39; spf=pass (imf22.hostedemail.com: domain of kent.overstreet@linux.dev designates 91.218.175.175 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709045856; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=u7soQeQMolVADcku4EZWS0C8/aJsrG13W4BWj0N1xKY=; b=I/J5TTe58dZMBwUbU3RG2LDUiJpuP87q8+wmAff+ekkE5Kg3B5LH5Syl+xpMENH8saUETr 8Fy2455QPOZC8SIaFEnlBDqMRWDD+GWl+qoiUv7eD0BSs9DjNTNt5Z7U5dhlL1UpbXwgdH ak+2XClJ7YIRduOiocjpVvOJJ0qm5O4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709045857; a=rsa-sha256; cv=none; b=VLh8Suw5tn2U5Nf/SJ9s2YtsSCwgz2xF34laaJ2+ojp7QNjhpg3WoKi+JsTZWAyhLlTwg+ xrxT+GvILS0LBC4m2SoPswTvB/t27rsMX0D/IuXEHaKMc44GwM6BtXCUh1IrKOEOkov9Q7 MJGDZyyOD8K+poB+mKhR8OoEhKnTMqk= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=q1J9vV39; spf=pass (imf22.hostedemail.com: domain of kent.overstreet@linux.dev designates 91.218.175.175 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev; dmarc=pass (policy=none) header.from=linux.dev Date: Tue, 27 Feb 2024 09:57:25 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1709045853; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=u7soQeQMolVADcku4EZWS0C8/aJsrG13W4BWj0N1xKY=; b=q1J9vV39pPY36rQVXE+3UVDqPULGJhnH+50vQQh1MB0jPtpuXig21WzDSD+byu8URD8p5u vC8pAjNZONuXIoiow771GQG0+kFSmcEQmL+K8E5Htacxqb5p6hOoee4wv7fNbAPu96FsPo 3R6iD6oSMxwzKjpmVaETZ8dq8s9sshI= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: Luis Chamberlain Cc: lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm , Daniel Gomez , Pankaj Raghav , Jens Axboe , Dave Chinner , Christoph Hellwig , Chris Mason , Johannes Weiner , Matthew Wilcox , Linus Torvalds Subject: Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 73AD0C000D X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: qkpsc67pwuih5968ue664wxtw3n8skrc X-HE-Tag: 1709045856-171461 X-HE-Meta: U2FsdGVkX18ckU1pGUVkzUqO+VXoE/sjWYEnEpYT2URuQpqzc07tDvGuLSmAbIflB0UAKSP6dXIqtaIcbYmux90QnULsZgwVNO0mFQRbNCoFNOigiQCWk1m/yN+I5rR/rFz64uQOC3Y3JdzVOwLJ6um8VNIAduqLgMqKoPx4fEFiPcoBynIcVNO2d/hyTR5qDqgmfKjcO9X2eilunffjTSjO5ctbTQABj8jls/95/hIdmmbIivvA50zYOW2v/DXdCQvUxWpx48HyKx8ztpfDefVp9hOe1hj5zq3rmDsP0JSHywE95MCe6pbVCqmVYuP8vSlcKjDN/rpyYHWR+YiVJdyq6K6ph49AjqzoBTgNKjvOC05SCDxnlOwdu1mb/1wKiWc+H9HFM6DSbm14zopKLRV8CW4uDUYk5nm8NUXKGNqSqJIuKAUBFoUmkp+SxENre+Wx0YgKlCjRWo2ikC6oEudTV8svSVB0+Wz+OTzPWqX23cTpwQupLvTfgBjvxqWvVUBxFcex0hoITdn2S/EdYbfj128iyo5GiEuH8kmy/NcErpsIeKJ5EgMVL0QLwtEc1ejXmbDodg3L8nFgudTEjfN48us6Fi6Vr/PspfE+SU1qgBze2F1KalTC6NfK8XeOWCq3jX/t3lRgnZwwF8qxE/U6Xrobb4A9sZkiiTTonB+YmqG5sAsuOHfo5NYgnI7aNQRUA2YGXDtcFeoq7/OSrpxUb544SOkgpaS/z/GRRpxX5LCDTpG70Hab+hRtbX9KGkogkR09nyU0o05xghDQKjSOlF1AP+wl7+ohEhd1hkc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Feb 27, 2024 at 06:08:57AM -0800, Luis Chamberlain wrote: > On Tue, Feb 27, 2024 at 05:07:30AM -0500, Kent Overstreet wrote: > > On Fri, Feb 23, 2024 at 03:59:58PM -0800, Luis Chamberlain wrote: > > > Part of the testing we have done with LBS was to do some performance > > > tests on XFS to ensure things are not regressing. Building linux is a > > > fine decent test and we did some random cloud instance tests on that and > > > presented that at Plumbers, but it doesn't really cut it if we want to > > > push things to the limit though. What are the limits to buffered IO > > > and how do we test that? Who keeps track of it? > > > > > > The obvious recurring tension is that for really high performance folks > > > just recommend to use birect IO. But if you are stress testing changes > > > to a filesystem and want to push buffered IO to its limits it makes > > > sense to stick to buffered IO, otherwise how else do we test it? > > > > > > It is good to know limits to buffered IO too because some workloads > > > cannot use direct IO. For instance PostgreSQL doesn't have direct IO > > > support and even as late as the end of last year we learned that adding > > > direct IO to PostgreSQL would be difficult. Chris Mason has noted also > > > that direct IO can also force writes during reads (?)... Anyway, testing > > > the limits of buffered IO limits to ensure you are not creating > > > regressions when doing some page cache surgery seems like it might be > > > useful and a sensible thing to do .... The good news is we have not found > > > regressions with LBS but all the testing seems to beg the question, of what > > > are the limits of buffered IO anyway, and how does it scale? Do we know, do > > > we care? Do we keep track of it? How does it compare to direct IO for some > > > workloads? How big is the delta? How do we best test that? How do we > > > automate all that? Do we want to automatically test this to avoid regressions? > > > > > > The obvious issues with some workloads for buffered IO is having a > > > possible penality if you are not really re-using folios added to the > > > page cache. Jens Axboe reported a while ago issues with workloads with > > > random reads over a data set 10x the size of RAM and also proposed > > > RWF_UNCACHED as a way to help [0]. As Chinner put it, this seemed more > > > like direct IO with kernel pages and a memcpy(), and it requires > > > further serialization to be implemented that we already do for > > > direct IO for writes. There at least seems to be agreement that if we're > > > going to provide an enhancement or alternative that we should strive to not > > > make the same mistakes we've done with direct IO. The rationale for some > > > workloads to use buffered IO is it helps reduce some tail latencies, so > > > that's something to live up to. > > > > > > On that same thread Christoph also mentioned the possibility of a direct > > > IO variant which can leverage the cache. Is that something we want to > > > move forward with? > > > > > > Chris Mason also listed a few other desirables if we do: > > > > > > - Allowing concurrent writes (xfs DIO does this now) > > > > AFAIK every filesystem allows concurrent direct writes, not just xfs, > > it's _buffered_ writes that we care about here. > > The context above was a possible direct IO variant, that's why direct IO > was mentioned and that XFS at least had support. > > > I just pushed a patch to my CI for buffered writes without taking the > > inode lock - for bcachefs. It'll be straightforward, but a decent amount > > of work, to lift this to the VFS, if people are interested in > > collaborating. > > > > https://evilpiepirate.org/git/bcachefs.git/log/?h=bcachefs-buffered-write-locking > > Neat, this is sort of what I wanted to get a sense for, if this sort of > topic was worth discussing at LSFMM. > > > The approach is: for non extending, non appending writes, see if we can > > pin the entire range of the pagecache we're writing to; fall back to > > taking the inode lock if we can't. > > Perhaps a silly thought... but initial reaction is, would it make sense > for the page cache to make this easier for us, so we have this be > easier? It is not clear to me but my first reaction to seeing some of > these deltas was what if we had something like the space split up, as we > do with XFS agcounts, and so each group deals with its own ranges. I > considered this before profiling, and as with Matthew I figured it might > be lock contenton. It very likely is not for my test case, and as Linus > and Dave has clarified we are both penalized and also have a > singlthreaded writeback. If we had a group split we'd have locks per > group and perhaps a writeback a dedicated thread per group. Wtf are you talking about?