From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4DD61C54798 for ; Tue, 27 Feb 2024 22:42:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A09686B020F; Tue, 27 Feb 2024 17:42:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9B8936B0213; Tue, 27 Feb 2024 17:42:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 809736B0246; Tue, 27 Feb 2024 17:42:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 6A1F46B020F for ; Tue, 27 Feb 2024 17:42:07 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 3D77A120C81 for ; Tue, 27 Feb 2024 22:42:07 +0000 (UTC) X-FDA: 81839058294.23.48030B4 Received: from mail-pf1-f179.google.com (mail-pf1-f179.google.com [209.85.210.179]) by imf13.hostedemail.com (Postfix) with ESMTP id 755882001A for ; Tue, 27 Feb 2024 22:42:05 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b="j/1D/nt9"; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf13.hostedemail.com: domain of david@fromorbit.com designates 209.85.210.179 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709073725; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PkVksek+2Yokv9SQLRuuk3Z/l3ApyoSU2xNBGzr2QoI=; b=B5cVYfTEo7ZEbOA/mbcT2G/ik40gYqPYa3y7HuRsq1kNxeQdi9cb7RL1qsnFJzb74HnOKO wdvVhQrU5dzkKX4wfy1qorrr3ELHoZFx5es/dJ9/EGnctA+Dr8gKtAAYAJDO3kH2jUCNCW Bw4r9JotShceBzPmXop4owGKBEguygk= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b="j/1D/nt9"; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf13.hostedemail.com: domain of david@fromorbit.com designates 209.85.210.179 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709073725; a=rsa-sha256; cv=none; b=7uXjXaG0/8bdtw9vxmtOzRXfsrQjtFpk19iY2MOGmErj7A2e0cQa0/phN7rldRbbEGMzaN ZcvZEqS7zcbKYRtnZyFRJ76Np/x8uNviJ1OXumvl+pgRtl3RVBbnE99fQQnu2kZG9K8ALS HFLDIqKKwMe6S43CVpadMB/OUjdWYvI= Received: by mail-pf1-f179.google.com with SMTP id d2e1a72fcca58-6e558a67f70so484444b3a.0 for ; Tue, 27 Feb 2024 14:42:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1709073724; x=1709678524; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=PkVksek+2Yokv9SQLRuuk3Z/l3ApyoSU2xNBGzr2QoI=; b=j/1D/nt9IJODt3ghTdzYh+nd58aee/XIq9nl9h/wGQF+1h/5jl0tLymd5lRvRrZysh zPlHdhZEL2+SoQ+zrICiK50Nl2n1rAf8aO9zNnQnDA8WFTNiOQbH66OJsdyb8JDQLHSg pVI6lbrzDscb/Xra69Ptk25KffSDIme8ZDA9a+gGzL5XSpAAprgX7S+NHFiA0l4E41M4 8oMnRvjXodLVnnBkn7tOvBTA8Oz0dB/iA4OObNLeLDr69lq6Q+YctZRXesF0BmpNnSFO ejW0UbvtHs3VTIYDopnyCa3xL9n7PSGaA0WAgkb1i+EHjLTdm8wcho7xvUmBufvyp8l+ t0Lg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709073724; x=1709678524; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=PkVksek+2Yokv9SQLRuuk3Z/l3ApyoSU2xNBGzr2QoI=; b=f3n77KVKIPokMbBxlqnsmokiqiqiipG/CRUxdeWkiyzzbph7HSE/3YiiYjSRSc7+Ji 8/ZxJL0zywFy2SlE5S/q1Mlepl9K+eqEDi1KGVBWK3aWTQov1ofXF9yHXWNxluLlD7O9 RCCbk+4B5gYGhF/p3COM+mVlbo5cmsbV8FtaUqkP8tXfz71ZB6P60CLeJ6myqaGY9BAq waeo5GQmMvXExm4S7DMeTuT89S20sSaSwO5LPrm3H7t3wcqbYtBceb1xUCEsfqnJ3gJB MfdcYYHPgSH9W0LMQp/z930jt6pp+1Rk2FRAtF/2aXqvZNcfivpT0SGPE+HYNiARUnjj aWCA== X-Forwarded-Encrypted: i=1; AJvYcCXrG1AFSMOmsj2Ud/qbU32lAQBPUu7sqZyvM/WLHH1u+JCam0wr3u1l40/uQwXCZeGikGjfXmnF6POP5kmBv7yYAkk= X-Gm-Message-State: AOJu0YycFg90fKxr24IJcXoRw+O+bqlG9GW2BufXH49W/Wu/bNsylX0Z Xwiu8K2VbU08m2dwNR6WyxPe8CZkbMPFG3Yjuxc+UjtAM75UFLNW+G1vV3ZFwZA= X-Google-Smtp-Source: AGHT+IFFtXjwcLeYcVGZxI2X1WW3ZNMqL4hDOXyIwEVmMtZ1cQNb7CfMoJ0bvGbR3GOiCqyhH+aU0w== X-Received: by 2002:aa7:985c:0:b0:6e5:47b1:4f4d with SMTP id n28-20020aa7985c000000b006e547b14f4dmr2907156pfq.10.1709073724235; Tue, 27 Feb 2024 14:42:04 -0800 (PST) Received: from dread.disaster.area (pa49-181-247-196.pa.nsw.optusnet.com.au. [49.181.247.196]) by smtp.gmail.com with ESMTPSA id e13-20020aa7980d000000b006e1463c18f8sm6474261pfl.37.2024.02.27.14.42.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Feb 2024 14:42:03 -0800 (PST) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1rf69B-00COjr-19; Wed, 28 Feb 2024 09:42:01 +1100 Date: Wed, 28 Feb 2024 09:42:01 +1100 From: Dave Chinner To: Kent Overstreet Cc: Luis Chamberlain , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm , Daniel Gomez , Pankaj Raghav , Jens Axboe , Christoph Hellwig , Chris Mason , Johannes Weiner , Matthew Wilcox , Linus Torvalds Subject: Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Stat-Signature: 8ii7hyk8itcwc5xnt5fmwgxn9fboe753 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 755882001A X-HE-Tag: 1709073725-908617 X-HE-Meta: U2FsdGVkX1+wiGy+1lYC+TcJoUomImrP6sCQGlAmxbGx1YYiA5oTUUTqgg7IDL7vRzwbYpU6zN+jpwwzp990nK+eC2g0M+haWzSqYYLvGl1CVE+W0YjHiY0y92i14FgM6XZsSd1zQ4j7hQZYJH5li4adRWuCaOQ953k01XS2ATWclnLw9rnJIB7QPmQVlktc8BzSTiO7NwIhe55fT/6/UbEonBIKhIPWRM9j3aYasHEnpprgMwXUxUfv7RataYHFKrVy47SsOvGeL+93WWzggZCy3Ky0Thl0Yll0CV2vHbCHJU1xOUotYys27Vy58bcVjj34Mn0R/n1F2PEzjvH+z4eH58NHcPxQqF67DlllFkbFc4qYhhLVZrTS/bCgCQGp2AYp4425P3WS7hkiXAG1/Pa/C0/HC/CMwQOcNYSO4QhoIsfH5krmNYndLF03HImgGIAxH0/V+DX3LJXSJdw2tvVC5KahUZHKU2WQcVmbpCfD7NCVvvFCwRLxKF4DWe1X8HVhhAAevPWJiqdvh0M5UsKArClyEP6BzLHn9XM67oGGJjAL6+oyAwEJK+zx/mv+kdkzP48ag789D7wnKMipD7MWoVbKgecxAEbbALx/J4pLWZhsvCZq2Aqk7+I7hdOcct7/uyuG+anmaaIFGsrIMdcOquATpD7WJfcG5IK7lbSumZ9xCV2QJpjVPGC6MhqlpbWVvgwfNzUt98kr32McBhIfeMxDvN0+f9B8AHVHgB4MAl1uRsIuWqJEpjyO3FOOfoSjPFYnnpKiweoxBAZiyiXuu5fi4SP+af/1dEFiEPjcjvYrGHMcxfv+Xo56dg3iTJ8SDRxNhBJHTSCbgypcgXRDnK2Ly43fcX5wQDLlyiIrM53LA7/Bzmne5u8aBdvraAX14sob9SNHzfMiDoySsKKhzGT9gl2f73CzwZjZrl83GncFNqD49JPeCLeU3eF/9JsdDGb9MhQNAO9OqHd m7fzHlFa xFH78/Eqlq177iWyVAhHMPfMl8Ij8/nSSfxY2dPv210IqqoCLRqyUxAQ10qxzk6zdpGtnMfy1+8G6ls4XOsGOzbyVcdBjTl4NjiBjDj0oW+REa4d2tkbHbaPVEw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Feb 27, 2024 at 05:21:20PM -0500, Kent Overstreet wrote: > On Wed, Feb 28, 2024 at 09:13:05AM +1100, Dave Chinner wrote: > > On Tue, Feb 27, 2024 at 05:07:30AM -0500, Kent Overstreet wrote: > > > AFAIK every filesystem allows concurrent direct writes, not just xfs, > > > it's _buffered_ writes that we care about here. > > > > We could do concurrent buffered writes in XFS - we would just use > > the same locking strategy as direct IO and fall back on folio locks > > for copy-in exclusion like ext4 does. > > ext4 code doesn't do that. it takes the inode lock in exclusive mode, > just like everyone else. Uhuh. ext4 does allow concurrent DIO writes. It's just much more constrained than XFS. See ext4_dio_write_checks(). > > The real question is how much of userspace will that break, because > > of implicit assumptions that the kernel has always serialised > > buffered writes? > > What would break? Good question. If you don't know the answer, then you've got the same problem as I have. i.e. we don't know if concurrent applications that use buffered IO extensively (eg. postgres) assume data coherency because of the implicit serialisation occurring during buffered IO writes? > > > If we do a short write because of a page fault (despite previously > > > faulting in the userspace buffer), there is no way to completely prevent > > > torn writes an atomicity breakage; we could at least try a trylock on > > > the inode lock, I didn't do that here. > > > > As soon as we go for concurrent writes, we give up on any concept of > > atomicity of buffered writes (esp. w.r.t reads), so this really > > doesn't matter at all. > > We've already given up buffered write vs. read atomicity, have for a > long time - buffered read path takes no locks. We still have explicit buffered read() vs buffered write() atomicity in XFS via buffered reads taking the inode lock shared (see xfs_file_buffered_read()) because that's what POSIX says we should have. Essentially, we need to explicitly give POSIX the big finger and state that there are no atomicity guarantees given for write() calls of any size, nor are there any guarantees for data coherency for any overlapping concurrent buffered IO operations. Those are things we haven't completely given up yet w.r.t. buffered IO, and enabling concurrent buffered writes will expose to users. So we need to have explicit policies for this and document them clearly in all the places that application developers might look for behavioural hints. -Dave. -- Dave Chinner david@fromorbit.com