From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B52A9C54798 for ; Mon, 26 Feb 2024 02:34:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 293926B00F1; Sun, 25 Feb 2024 21:34:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 244096B00F2; Sun, 25 Feb 2024 21:34:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0E4FE6B00F3; Sun, 25 Feb 2024 21:34:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id F210D6B00F1 for ; Sun, 25 Feb 2024 21:34:49 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 9D9FDA0663 for ; Mon, 26 Feb 2024 02:34:49 +0000 (UTC) X-FDA: 81832387098.22.B6B637D Received: from mail-ed1-f45.google.com (mail-ed1-f45.google.com [209.85.208.45]) by imf26.hostedemail.com (Postfix) with ESMTP id 7CEB014001C for ; Mon, 26 Feb 2024 02:34:47 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=Dlg1qBnX; dmarc=none; spf=pass (imf26.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.45 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708914887; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OqNRWTy0P8CmYFL5DyLB1MrNOJdjhuOOEAc3K/VmsrE=; b=30FAazL/3BSY9fKV1cFD3oUYVYZ++urDv6mjbxdRMSr/cmxZxXuSUhCPOVmo1iTO5CjnyO FMIH7+40p7IaXDOUVIyXeJiMzZ1GFOH9RzfX28fCPUWqUxkyw/KlIWXbX0O++7IcaAjnUj jPIwlE6ocL1Mel/ROqr5L/ps0b+VLwU= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=Dlg1qBnX; dmarc=none; spf=pass (imf26.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.45 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708914887; a=rsa-sha256; cv=none; b=f38cgbJejIb0cJlMP59CokoeFr3cnmnf34gIPtVHxL5w6nx46pAEgk4DIzyKKKi+RW61Yy NaQzL5TA6W+OnThSvskUfDy3j97s7M8pAbsEe3un0hjLsgb0Rp3F9DEdtplq2lJqZSkTtW ikUlXBF52AP9o5UtoRfCL7sKt8oOZbg= Received: by mail-ed1-f45.google.com with SMTP id 4fb4d7f45d1cf-565c6cf4819so1382187a12.1 for ; Sun, 25 Feb 2024 18:34:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; t=1708914886; x=1709519686; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=OqNRWTy0P8CmYFL5DyLB1MrNOJdjhuOOEAc3K/VmsrE=; b=Dlg1qBnXfWoBAbddmX8Ps35oMSxMXOvIu/4rxbwjZEjyXa/iUJJ9uHQOAN0mn2b/EX Onn9zuZjt3ZkRLZJg0P5H6DEwfkANQGF0k4RCCT/DmJ/5LfSnTBqd6goBNHRhlRzG+gU XDcKMqV26HzF1UantrTutZ5P0GzUzb9uXXmeM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708914886; x=1709519686; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=OqNRWTy0P8CmYFL5DyLB1MrNOJdjhuOOEAc3K/VmsrE=; b=pazn7g+f1sAoyoLMEmIIAwDL15AEEORzWtl4M6YTkHdUgf01/FSvItt1ySnLKrDKZJ HiR2wTm1t03IaemilAAjSi70/OTsqwX9Oc0eWaQG5lAr68qdWV0TBrsVnzP2driS4fG8 ppFMjJROfMazzgVnI24RugWVazr/n5o8eOD/IBsNgnp9v0VuIWYJEreS7TFAckXmMww9 azHHCvf6gJ2rBcgoW5k72vP6As50gGrk+UelqH7oflA9CjE3iggfB9FV9hOktEtFmlu+ nIBvcMkjzlov0ozEfYF0hAnLUm5A9gioUVH154CDvCLE2AY9aol/UT3RihZrxAFSYJvj ZvFw== X-Forwarded-Encrypted: i=1; AJvYcCXS1Ado+kzsPk73koB8xsBB8vDpZXa0n/vzOkr9uBI01MuK4Tnebym1tw4jcQfXvkX8iq1bS1n3vrKsOCjFMwv+jI4= X-Gm-Message-State: AOJu0Yxiqwym+K04q0dlVXUtYCmf2Xoqd9epafYvvJ4urcQwFtL/k3jd HEMsTppeOQoK7nbG++Uu4Jg/3eTuKGbwPNqtGklYVemLlDrrvgrbJ4fZT3bQQfIp2XFQD0dUv6U 949tkNQ== X-Google-Smtp-Source: AGHT+IELr+fLAVG2Cu5ULfj3tdYXmtg1QIRteGAQ6rTH2eULlbGvorBl4X9vLgkH6aBGuiDyUSkvVw== X-Received: by 2002:aa7:dc0f:0:b0:565:778a:cd26 with SMTP id b15-20020aa7dc0f000000b00565778acd26mr5229147edu.11.1708914885752; Sun, 25 Feb 2024 18:34:45 -0800 (PST) Received: from mail-ej1-f43.google.com (mail-ej1-f43.google.com. [209.85.218.43]) by smtp.gmail.com with ESMTPSA id fd13-20020a056402388d00b00562d908daf4sm1882568edb.84.2024.02.25.18.34.44 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 25 Feb 2024 18:34:44 -0800 (PST) Received: by mail-ej1-f43.google.com with SMTP id a640c23a62f3a-a3ddc13bbb3so371368866b.0 for ; Sun, 25 Feb 2024 18:34:44 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCWd0mW0HIPG8bytTmAmOrBz7WP7PdiFbQ3QjUC3ijwdy6UKLRRqyycGn54yioVIuEjoPVoxHRiFQn1ASX4Ew52Rb/M= X-Received: by 2002:a17:906:b798:b0:a41:3d8b:80d with SMTP id dt24-20020a170906b79800b00a413d8b080dmr4244517ejb.37.1708914883747; Sun, 25 Feb 2024 18:34:43 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Linus Torvalds Date: Sun, 25 Feb 2024 18:34:26 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO To: Kent Overstreet Cc: Matthew Wilcox , Luis Chamberlain , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm , Daniel Gomez , Pankaj Raghav , Jens Axboe , Dave Chinner , Christoph Hellwig , Chris Mason , Johannes Weiner Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 7CEB014001C X-Stat-Signature: jkno5ud3bm1tx1opna5cz8hdjdubfg58 X-HE-Tag: 1708914887-8351 X-HE-Meta: U2FsdGVkX1+8FuXCJq95G6Jfv+8wKYTlzenDODhcumS9nJnKPBNDNhcIUgMd3YKLkds6ybuXJU3+YGzKdTd+1nYeJx0ok/+mFHT+JdlwIMjh9z4MyUuHewhsnqXi/e/SHbb32Mt4tPfb1EU0BjnECt3DEuDSfURVF5aodzEVFGAwBSsPdFLJd4dwUwFm0OL+kJPAUWZ4cC+8YUXUtvobwmDle9dHn7/O5OzA7BqE9iUXeFoR0DrL54GHqQFJxSkPBBP5eiqfR1binSfzBBVkpYlqBVtJJZLxoL4d/DUUMhlVVjZn5z7jh/5QKmgM9dy5wkUDIuwK5XelC/uTGxiB0tppELW7LL02d4hG2u1NvOW947tAYU8zgEfFpMSRKPGE/ifwuuBFJMFe/4NvdzgIbGJVhFYIwA5HS5/K9H+V4/Y1KNa8Kzc/jy/U8PVmUq/IWw+ffozxPfsqrGs/y7njj4Sada5N293uOv0q3w0gwO3M9ON0BA6nwT0rkAg/eNYMGjwU/AS1ivjNtmCyRMyKe2klfPGcRgpTwyOtdDtFpHtIejC7UFub0RuiK/smrKo+3T76UIJ8pNHCmPxiespz1eVQEtQD4Hd75av9UnqO6MubhvrOc0mytPtEFZERa8B1AtIV0wLha/ZpSeoCaSGVZDNFlrjukqlGuENQIqqJiXZXcvGvBWRH1Lr1ebRjKJqFFYfgbt0Z/oRNdroJgGw7B4NSyL3YJi6ftP80gMzsV1rqdwgILeAHUckUNatCHzsGWN4FwKiECvAVjJHOE2v92ISte7upkDx51AV/nuGaPGEIg7JPqeVzdyQ65EumD+smb0AFzI14fnICUR37RtugPnSNjHQeq571O3ga0InAnIKG1Aj2Qzlj5whIkNoWKPJN1sKjPCcQcFwWGz00yzR8FmFhWnAX7/j17902/U+cHgD39iPCaKr0sKrHN0X/4X3u5gkIhnM3zqqfVjrF47Y HSE/ZVYI NKq4tul+p1CZR3is4Ac/VkcgNkUEcVEo0h6Tgt5kIkbzBXzWFf0OXiItr66v0SO4gJ39dGyhpDr2S3EnkKTKtVdWx7FUOv0Qe0mjy1+ryl2a6kMH5BqWTaibFNqddbnJWU6y1KdXxwB9HkCGDHOWkJfsHO+QmKcoXcvw6FwHJeW/Bvr+MRMBqEjSocHYo7T4UTGLoav8oi8OzW7ohEM3+26qMfCmiQ6tQ8TxzmaaGNUUxxhHCjGZuUFBGsOVyUgb8AvELn/CsLrfJec1sB+4S4oOaV4rUpZ6Su0aLEjUAgqtRrbebBAphGSjhozSUoQXNbE3rkYacVNh1r8ti+m05yC01PvbMjsgFIEZTVr2wPKnynjU2yG6xhwSfXDCzDxQwPq3vGdj0Lk2a03R76uTLlrNvGU8O5vGQfzfPpn6FQ6Awc3K2wBEDIpkCW6qyRwOTaGXxj+1Ewtcl+yrCY1DsOMYAHHjNuxcpkIUXOP7oCXkg1xgnf93d9SOE7A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, 25 Feb 2024 at 17:58, Kent Overstreet wrote: > > According to my reading just now, ext4 and btrfs (as well as bcachefs) > also don't take the inode lock in the read path - xfs is the only one > that does. Yeah, I should have remembered that detail - DaveC has pointed it out at some point how other filesystems don't actually honor the whole "all or nothing visible to read". And I was actually wrong about the common cases like ext2 - they use generic_file_write_iter(), which does take that inode lock, and I was confused with generic_perform_write() (which does not). It was always the read side that didn't care, as you point out. It's been some time since I looked at that. But as mentioned, nobody has actually ever shown any real interest in caring about the lack of POSIX technicality. > I think write vs. write consistency is the more interesting case; the > question there is does falling back to the inode lock when we can't lock > all the folios simultaneously work. I really don't think the write-write consistency is all that interesting either, and it really does hurt. If you're some toy database that would love to use buffered writes on just a DB file, that "no concurrent writes" can hurt a lot. So then people say "use DIO", but that has its own issues... There is one obvious special case, and I think it's the primary one why we end up having that inode_lock: O_APPEND or any other write extending the size of the file. THAT one obviously has to work right, and that's the case when multiple writers actually do want to get write-write consistency, and where it makes total sense to serialize them all. That's the one case that even DIO cares about. In the other cases, it's hard to argue that "one or the other wins the whole range" is seriously hugely better than "one or the other wins at some granularity". What the hell are you doing overlapping write ranges for if you have a "one or the other" mentality? Of course, maybe in practice it would be fine to do the "lock all the folios, with the fallback being the inode lock" - and we could even start with "all" being a pretty small number (perhaps starting with "one" ;^). Linus