From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9CDD8C54E41 for ; Wed, 28 Feb 2024 07:49:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ED8656B02A9; Wed, 28 Feb 2024 02:49:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E886C6B02AB; Wed, 28 Feb 2024 02:49:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D50CC6B02AC; Wed, 28 Feb 2024 02:49:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id C768A6B02A9 for ; Wed, 28 Feb 2024 02:49:01 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 751EEA1436 for ; Wed, 28 Feb 2024 07:49:01 +0000 (UTC) X-FDA: 81840436482.01.DAE8934 Received: from mail-qt1-f170.google.com (mail-qt1-f170.google.com [209.85.160.170]) by imf10.hostedemail.com (Postfix) with ESMTP id D111AC0011 for ; Wed, 28 Feb 2024 07:48:59 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ACAK4pac; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of amir73il@gmail.com designates 209.85.160.170 as permitted sender) smtp.mailfrom=amir73il@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709106539; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gBOMkWAlc3iPURHBvRAcCLLQVJz7DCIAd8c31DcYh/s=; b=4FMPvgr4uwKaSVEC3dqZNZPnrgxlbndcXuu5+7XoPmsefrFRpcPB9+k2osvTCAi+ChD5mp 23JY51IR5nXrg4+mz7mnNNQ5A95I6MPOfbvO672lhTdhGGIv3pVrbbWWOvRkNFszSiDykE joBv468SViOqZbE24/JSICMW/5l44Io= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ACAK4pac; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of amir73il@gmail.com designates 209.85.160.170 as permitted sender) smtp.mailfrom=amir73il@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709106539; a=rsa-sha256; cv=none; b=hAMheGmxuDAWl4ZxwN/giiuHD5yMyWU9yHEYWmvjBo0CQKvNHI3N6IL8SyeGATT7HpwWid uQgbFpzWKIzK7CblFGaduaBxznHOVncNeHZbAJo7/C3V62W8FpN4UILRucQk1c9gbJDae2 cGaX7nkwK2Byf5ocHDn73uvIc24ftB8= Received: by mail-qt1-f170.google.com with SMTP id d75a77b69052e-429de32dad9so34048191cf.2 for ; Tue, 27 Feb 2024 23:48:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709106539; x=1709711339; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=gBOMkWAlc3iPURHBvRAcCLLQVJz7DCIAd8c31DcYh/s=; b=ACAK4pacYIwxjrMhiBgi19ccRucyzhDBsduFeXFvkblMxJPPokp00DsPVL6DEjXVIq d/9c8kRIfPyPNqR4H0TQw4cp1xpUfeNGpSsCSyQTn2Yi7wRPIGhYWSslHPlb12hub/yK N49QrvPPxVkY26y8kE/Z8TDsUGFYNFeBRpdogEl0PxM9JLdxdNmIgxRSQlhuiBQg8Dif 6eChh4YASZ+zqJEW33Z14LtvztNKX5jFGUjKqM+JtiRjPiPjUDlKF+AOtxEN3SowNYlf Lpym+1JJpxPCDRKZzqcsVne2K21naC14R7Kl2iwGSRBmTHAZeqC8Fn7sbRroo0lwHZ54 Dz/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709106539; x=1709711339; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gBOMkWAlc3iPURHBvRAcCLLQVJz7DCIAd8c31DcYh/s=; b=iwsmBAIVl5cF8fuH0RE0Cftcsfqk9PCCzF5HRLXKVPdZVq9OXz1pLWr189S4h9vmvC RBNvqQ5F7e5uKiCTkBF9A2SbIsPEyGAr8Pxe854F01EzGkahWHKuAWi0KRtm9hKCqNxU 4gjTohUxQ1BFaAscYnuJf44D+OVnDuLGmqDtaMyYz7Y1TsxCVA0ZNc0ckINiQLvaECwm J/4xCqWwPryLWfuDFOpV232w2gFYNs4gYoNk71CFWo7ZHZLJZPlyB+DPLrYpstwcntMw wUIQFacJMaZiNgiH6M88YUu9/bE8mARsR2LZnbp9m6t/KTxklumAxEGWr2rmH1K76kby wQbw== X-Forwarded-Encrypted: i=1; AJvYcCXF9GIUmXEaC0evs49Fck9IrnBvojcVjr9FJVVdlRxsqz+o2Ig35+izaSGoMhgX8jCz9BNjVyOH4iqDca33qxs43yE= X-Gm-Message-State: AOJu0Ywz2I0IT6BrTc3oDAXZNI+faTn43G0DvvZ8eTOBOsF0oZQIAOSD jEkQviSU9Qjo/hGrONV3SCMtCJs7wAPxDQaFRWI8pQ2uUEtNG8gLJqItr/0PhAjV4Jzt7X0jMpv HTxPbENWNgBk244Kki4nGRt0hr0M= X-Google-Smtp-Source: AGHT+IGvn3hRL2cRCYqeM4rBr6ppmr8RuKk7IYJC9u1AAiBrdhfPJHqt5n6A9FZI/QI7kTPuHQ9c52gPb640FxnKQdI= X-Received: by 2002:ac8:5c4a:0:b0:42e:b140:405a with SMTP id j10-20020ac85c4a000000b0042eb140405amr295977qtj.26.1709106538899; Tue, 27 Feb 2024 23:48:58 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Amir Goldstein Date: Wed, 28 Feb 2024 09:48:46 +0200 Message-ID: Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO To: Dave Chinner Cc: Kent Overstreet , Pankaj Raghav , Jens Axboe , Chris Mason , Matthew Wilcox , Daniel Gomez , linux-mm , Luis Chamberlain , Johannes Weiner , linux-fsdevel@vger.kernel.org, lsf-pc@lists.linux-foundation.org, Linus Torvalds , Christoph Hellwig , Josef Bacik , Jan Kara Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: D111AC0011 X-Stat-Signature: 9h61ijjf8oaoe6m77mpubgg9e7bkh8uc X-Rspam-User: X-HE-Tag: 1709106539-735720 X-HE-Meta: U2FsdGVkX1/TxwjyMYQ0GpzZ+Mtd6b2fIBfuh8SqXRQnz0keg/XhSytY0pkUMmsJzkFEC7vr9BA93O6Q8GN+V3ZV9HJQTd5+3cMqDKI8Df9LXQfF8KORFsriB/1ojkPezlm4Sp/owKRiXhztBIGZ2xBvGTHOcSyZ2tXGEeICS7ntEBdvaM+L5tZ6tyUUxjXgkHEoPDU6gaopOVtmQQJloCJPw/CtNbT7rbHbZV5t4Xn4a5ZNzo5TkfTz6wkCpLlmBluNgxFL+peIUeebcQNO0ukFbhzgfeJ/7G4/2L3aSF5v6HpbuDttGD1R9TUQo1Y4hkT8FLJ+5E09rW8/aRRecrBHpkDiId2hrsG/qgOVbcEFrmuHY7B20ZEwSwNNuBU9qmzwwGrqaNiY9SIDOHy22I2qvL3FHwNpkF2WgydG7jVr6ODn5t4bCOfaYrW3LKOxl1Hx0b/nbNxiuKSTtCLhc4TptG1VNkgcyCVz8H+o3GY9xnJ1Kw72dl5QY5kyROioHYt3lJpT3wP1DMShYfMiufpEjEV37j8ZmBj4SPJls5oAogfhd7jcJnYpVJW6Bdh3fqY6bBSne2OwimZ6bn7EyPEk2woCfU0mLCK4V/cpA0jIIP3MJwKxnNB4k3blDNUK/4KvXj/gHb6NTbgFqyqStp98iAuM5PuuyBlrc6KucdJD9wbqwxmLz44Aa18jwDbKfaJAg+KWtFlu3qVhHbJ5eB1KqqBDLMCi+rBcMOcJ9kFwtkxUIRArTAwetiF4RZJhAbbngbYQEkpA+pZPElld2RWFStvLfiSKB3uQb2hCtcSQyVPZteixyAtHS54BmjHkXyb2VbOJ1dtGPqf7pWwfmeNX03KQvXf5BOKZhXwbXQyMXNVO5yqtKxTXX/arVD3eZBs2mnQBDVy+ll+K4uLw11ON5GcOBIi3vZYG8aX9HJWZ3LsH6+lFdrVfDqnO06ktyc797Xb+fet9UhmxxAE 3XUNSo0p kbOOLpwSIQclVCdyHXfSa6JP0lZ4XvpavOKTj X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Feb 28, 2024 at 12:42=E2=80=AFAM Dave Chinner via Lsf-pc wrote: > > On Tue, Feb 27, 2024 at 05:21:20PM -0500, Kent Overstreet wrote: > > On Wed, Feb 28, 2024 at 09:13:05AM +1100, Dave Chinner wrote: > > > On Tue, Feb 27, 2024 at 05:07:30AM -0500, Kent Overstreet wrote: > > > > AFAIK every filesystem allows concurrent direct writes, not just xf= s, > > > > it's _buffered_ writes that we care about here. > > > > > > We could do concurrent buffered writes in XFS - we would just use > > > the same locking strategy as direct IO and fall back on folio locks > > > for copy-in exclusion like ext4 does. > > > > ext4 code doesn't do that. it takes the inode lock in exclusive mode, > > just like everyone else. > > Uhuh. ext4 does allow concurrent DIO writes. It's just much more > constrained than XFS. See ext4_dio_write_checks(). > > > > The real question is how much of userspace will that break, because > > > of implicit assumptions that the kernel has always serialised > > > buffered writes? > > > > What would break? > > Good question. If you don't know the answer, then you've got the > same problem as I have. i.e. we don't know if concurrent > applications that use buffered IO extensively (eg. postgres) assume > data coherency because of the implicit serialisation occurring > during buffered IO writes? > > > > > If we do a short write because of a page fault (despite previously > > > > faulting in the userspace buffer), there is no way to completely pr= event > > > > torn writes an atomicity breakage; we could at least try a trylock = on > > > > the inode lock, I didn't do that here. > > > > > > As soon as we go for concurrent writes, we give up on any concept of > > > atomicity of buffered writes (esp. w.r.t reads), so this really > > > doesn't matter at all. > > > > We've already given up buffered write vs. read atomicity, have for a > > long time - buffered read path takes no locks. > > We still have explicit buffered read() vs buffered write() atomicity > in XFS via buffered reads taking the inode lock shared (see > xfs_file_buffered_read()) because that's what POSIX says we should > have. > > Essentially, we need to explicitly give POSIX the big finger and > state that there are no atomicity guarantees given for write() calls > of any size, nor are there any guarantees for data coherency for > any overlapping concurrent buffered IO operations. > I have disabled read vs. write atomicity (out-of-tree) to make xfs behave as the other fs ever since Jan has added the invalidate_lock and I believe that Meta kernel has done that way before. > Those are things we haven't completely given up yet w.r.t. buffered > IO, and enabling concurrent buffered writes will expose to users. > So we need to have explicit policies for this and document them > clearly in all the places that application developers might look > for behavioural hints. That's doable - I can try to do that. What is your take regarding opt-in/opt-out of legacy behavior? At the time, I have proposed POSIX_FADV_TORN_RW API [1] to opt-out of the legacy POSIX behavior, but I guess that an xfs mount option would make more sense for consistent and clear semantics across the fs - it is easier if all buffered IO to inode behaved the same way. Thanks, Amir. [1] https://lore.kernel.org/linux-xfs/CAOQ4uxguwnx4AxXqp_zjg39ZUaTGJEM2wNUP= nNdtiqV2Q9woqA@mail.gmail.com/