From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53B2FC54798 for ; Mon, 26 Feb 2024 01:58:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BA52994000A; Sun, 25 Feb 2024 20:58:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B54B9940008; Sun, 25 Feb 2024 20:58:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A434694000A; Sun, 25 Feb 2024 20:58:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 950FB940008 for ; Sun, 25 Feb 2024 20:58:52 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 66D4DC06C9 for ; Mon, 26 Feb 2024 01:58:52 +0000 (UTC) X-FDA: 81832296504.13.7120073 Received: from out-188.mta1.migadu.com (out-188.mta1.migadu.com [95.215.58.188]) by imf03.hostedemail.com (Postfix) with ESMTP id 8DF8020009 for ; Mon, 26 Feb 2024 01:58:50 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=pNnIfsoo; spf=pass (imf03.hostedemail.com: domain of kent.overstreet@linux.dev designates 95.215.58.188 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708912730; a=rsa-sha256; cv=none; b=ydo4ipbisfbsIH3U8Z+A0bj0kaiBRRJupIrkkC361/jAVXsfbHbHzvxYctDJna2tptwkEy o8a54QMOoNPNnrBRqfqJwTS+t7gYEYp934V+/AYQBQCA8UdS9rx/L3BpwioCX4WpBvXREj TLnLcDyndiX4yFqOxkFobGpPoEuPWU0= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=pNnIfsoo; spf=pass (imf03.hostedemail.com: domain of kent.overstreet@linux.dev designates 95.215.58.188 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708912730; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=804cGcdfu0K0a3utkQagxhz2g70tjHQPhGhJk4xW09Q=; b=S+tVKOOBwUbs4k3PsX2aWnU+cAdhcbSzthOgNBPhKX/mUaperm68+MGz/h50kyGD/oQukT RXKBJtz00WwNO1HksAZtyx3L1cBlgbPxSGjORuLC3JW+u63tbbo8dVjzDAQv98Ett8JPhB Q5PUt2qqQb8B9umZjRh7tUt5WB9aA/w= Date: Sun, 25 Feb 2024 20:58:28 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1708912728; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=804cGcdfu0K0a3utkQagxhz2g70tjHQPhGhJk4xW09Q=; b=pNnIfsoonTZWx2IN0lH5l572ED1qgWbWXFIU4aSkZCoTI03mcTwSfuit9zJw0HRvhw6ua7 JznOYP2lSB4Zx4GFvn7dTgno4NumBqKPB5qRppqm6rNiRz4tjp+8/dX0JB5b490ojChyLT ajwbY4beZ8oUYIfdSZkJMc7DfxdtEZM= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: Linus Torvalds Cc: Matthew Wilcox , Luis Chamberlain , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm , Daniel Gomez , Pankaj Raghav , Jens Axboe , Dave Chinner , Christoph Hellwig , Chris Mason , Johannes Weiner Subject: Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 8DF8020009 X-Stat-Signature: t5xjji6cgrut8co57mzd8c7ukmi4817b X-Rspam-User: X-HE-Tag: 1708912730-559802 X-HE-Meta: U2FsdGVkX19k9bovsreRV0rRyJjXWbioTJBJY44jpgKbtbiNL4EyGdoiU+BylOS0Q3piVXQadjE04qsX3s6SaJLG5PnqOuUm4vnoy3CjFof8YWYX+cTjurXRLGOP9QYiITX0LdRN1SUzxFxFgtW/2EUL9ZVUc2AZ8vvg+yM6XBzOjep3PLk4K3vrAnSoDZui+N/Ul6X3QoltGyC1PKx0+2llaIFQtdePcdHw+aQtN3KBEoeGjRx96CnV2F4hdsMDOhnjs3GeMKo7MjtN+cG1qKF69MJagxb41ky09z4t0q8sSWS0zAA33f2yAdETqcMTyT4EZmfKbi4JhD/CPJbE0t8XJGsajsktITzqCXiLNA8yLVrij2RkCjLAEzV19u7f8BCfWxgKW7M3ibzXxEoMutspLZmMFSfr8codTVFDYMCUzDmWmD4twLg40tyEAbnxGKqhduej4AFF0OR/nbJz9ZckNLs3RDn63NIMS59dx5UNGybeq+f8cnkrkoPN+CTSbXiFx5t8DBFweacRhnjlImqFtbFv5ZPBygXIoXqv3glXR6pfD1pM/OUapiEZmjwP2kF3OG0Snd1lq0GAhCxrSOInVwt51+PGkJtzlGhKMj/yCBsR/VgT+NT0uVPvp3Uk0hK1/Q0L2xuaeBxgJzKSfAl0J5siQsi6E+C1/mbJoSWID5uybcd713q1TSsM7CjD7We2hNk4C+HM00NFFgsFffwDS0GXM5dKaLk0jy3mglzkLfRhuS/uLoTVD2Tsf+yPl6d8AtIwvDC+K4tg6Zl5zTOeKEzf/kJDHxDrCpB0oMM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Feb 25, 2024 at 05:32:14PM -0800, Linus Torvalds wrote: > On Sun, 25 Feb 2024 at 17:03, Kent Overstreet wrote: > > > > We could satisfy the posix atomic writes rule by just having a properly > > vectorized buffered write path, no need for the inode lock - it really > > should just be extending writes that have to hit the inode lock, same as > > O_DIRECT. > > > > (whenever people bring up range locks, I keep trying to tell them - we > > already have that in the form of the folio lock, if you'd just use it > > properly...) > > Sadly, that is *technically* not proper. > > IOW, I actually agree with you that the folio lock is sufficient, and > several filesystems do too. > > BUT. > > Technically, the POSIX requirements are that the atomicity of writes > are "all or nothing" being visible, and so ext4, for example, will > have the whole write operation inside the inode_lock. ... > (It's not just ext2. It's all the old filesystems: anything that uses > generic_file_write_iter() without doing inode_lock/unlock around it, > which is actually most of them). According to my reading just now, ext4 and btrfs (as well as bcachefs) also don't take the inode lock in the read path - xfs is the only one that does. Perhaps we should just lift it to the VFS and make it controllable as a mount/open option, as nice of a property as it is in theory I can't see myself wanting to make everyone pay for it if ext4 and btrfs aren't doing it and no one's screaming. I think write vs. write consistency is the more interesting case; the question there is does falling back to the inode lock when we can't lock all the folios simultaneously work. Consider thread A doing a 1 MB write, and it ends up in the path where it locks the inode and it's allowed to write one folio at a time. Then you have thread B doing some form of overlapping write, but without the inode lock, and with all the folios locked simultaneously. I think everything works; we need the end result to be consistent with some total ordering of all the writes, IOW, thread B's write (if fully within thread A's write) should be fully overwritten or not at all, and that clearly is the case. But there may be situations involving more than two threads where things get weirder.