From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E716C5478C for ; Wed, 28 Feb 2024 18:18:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0EF1D6B00A6; Wed, 28 Feb 2024 13:18:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 09FA06B00A7; Wed, 28 Feb 2024 13:18:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E81186B00A9; Wed, 28 Feb 2024 13:18:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id D87EB6B00A6 for ; Wed, 28 Feb 2024 13:18:47 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 84BDD81183 for ; Wed, 28 Feb 2024 18:18:47 +0000 (UTC) X-FDA: 81842023494.01.5546EB9 Received: from out-181.mta0.migadu.com (out-181.mta0.migadu.com [91.218.175.181]) by imf17.hostedemail.com (Postfix) with ESMTP id D10864001F for ; Wed, 28 Feb 2024 18:18:44 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=rxo53puZ; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf17.hostedemail.com: domain of kent.overstreet@linux.dev designates 91.218.175.181 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709144325; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ktW7gDP8RS4aqQ9o96vvdlURM8LIbWuOijf5JKN4lW0=; b=bzGh+jUynbrW+ycvlq0OwcCYhKReQ7jFGXIa3n0UQbOBfPbQGMylViLpcKcMwtI00sX6OK obTRoe4yitw3umInxgDKPjiVIHG8f6MJpW6yFDkK0tIiVFko1Aaw0SqUb/DGbheS9yVt6b +aC6SJtcjDgWUqAItRGSDrNdZ5lhSEE= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=rxo53puZ; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf17.hostedemail.com: domain of kent.overstreet@linux.dev designates 91.218.175.181 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709144325; a=rsa-sha256; cv=none; b=3tgLij1GBkIP2Unr6PMG1RotCffx6cuHGdA9PLp0C6xhxvGP2uUQsmXBSGf4wNDJEJ6k9I 4AXOIcKEKW6bn9PpnP6Ufk0FE2WzDlYFMSEq/aMTLYDXqf/8nMelJ9MHlWPE6UbrYLPJzi U3RZTtjPliuIV9VXLnlVQVZFiITi82A= Date: Wed, 28 Feb 2024 13:18:35 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1709144322; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=ktW7gDP8RS4aqQ9o96vvdlURM8LIbWuOijf5JKN4lW0=; b=rxo53puZnE4Zv6zsYZHpzPwr7G8EXFsUNAN3O4uoPqcdgQdBxr9YCUZPdRQdwp2H7xfOX2 PM/BvpSrMiuLF8C/pEssDHAlNDDGkDGzSHkbdf5eL8VZmwJyWT46w8lZTS123xjuhnyrZ+ vfpEIr68DwKqor5TBV95Ducc5CHOhag= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: Linus Torvalds Cc: Dave Chinner , Luis Chamberlain , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm , Daniel Gomez , Pankaj Raghav , Jens Axboe , Christoph Hellwig , Chris Mason , Johannes Weiner , Matthew Wilcox Subject: Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO Message-ID: <4uiwkuqkx3lt7cbqlqchhxjq4pxxb3kdt6foblkkhxxpohlolb@iqhjdbz2oy22> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: D10864001F X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: j6jjpp69n55h8am13nakacpfcaxse5gj X-HE-Tag: 1709144324-538321 X-HE-Meta: U2FsdGVkX1/tRlC0cB3HrsGvNywwWOjwSm8J1LXBmVVmAVMhMDApZ6O3b3/9YkNx9xEHinJ3UX39swwgMx/+WAtUC2JYqucOjBt+FuF8b0j7x86Fuxtamqiuh5DVvhvSZArsjDx4Whfr2EDlDJ1FQXGZuXbJ8+z1KIqN5v4vJasgfMmXq7mnNYQCThosjfySkxp805O+r8iFBvCSrSDsAmF+VYWvUCmSU8JLqgTm+Icd7AX1yh0DXhgty1Fvz/tzKh9pUBiIOh3ZIxyj7eTYyxcKMq9EpfmyAi9kuMuyZNqYEcIyTswr07hesLdatda4dYoISRzhb+BeKWtFvQ/PGdumO6PvdztuXfNe92TxECx47gFzlm7fI4/z/c0gxslie4QFXP6xiCtoiQsyI7ihEvpKMCneav4+oT1XblYuxnN7if2uUYTHK4o/D0qXzFpYhF5jFPUOEe/dVJP8nQjlr+W1wiectGyu9nKJE+0+l2XiuiuaHswt+OOo/qCY6boXmeDuI6aDgW7M0SjOOz3l1BWVwtiGpneTgqc0iFS5X8kTYYPdP7nDe6SasKBJaMg8+mDM5NVjvFUg7tA4MjWRrzE7fgHsR6SlEE0syxKdisaz/6xIj53fn2G+qSwYmBlS3JESEFLfJkyjX1BdXyGEL9J6EfvOWN6pGd5EQHI1rAzTSVFwRr15rL0pbd/HDPaDzLe8Gzjkh4IROpL0xnc2J1tr0HceFv469pWwb0C8htnUx7HzQui4ps6h1mBGLcISOaEDEUZjHLojbKnBEuiUxbHgREAUfizLa2UIucgjf4w= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Feb 27, 2024 at 02:46:11PM -0800, Linus Torvalds wrote: > On Tue, 27 Feb 2024 at 14:21, Kent Overstreet wrote: > > > > ext4 code doesn't do that. it takes the inode lock in exclusive mode, > > just like everyone else. > > Not for dio, it doesn't. > > > > The real question is how much of userspace will that break, because > > > of implicit assumptions that the kernel has always serialised > > > buffered writes? > > > > What would break? > > Well, at least in theory you could have concurrent overlapping writes > of folio crossing records, and currently you do get the guarantee that > one or the other record is written, but relying just on page locking > would mean that you might get a mix of them at page boundaries. I think we can keep that guarantee. The tricky case was -EFAULT from copy_from_user_nofault(), where we have to bail out, drop locks, re-fault in the user buffer - and redo the rest of the write, this time holding the inode lock. We can't guarantee that partial writes don't happen, but what we can do is restart the write from the beginning, so the partial write gets overwritten with a full atomic write. This way after writes complete we'll never have weird torn writes left around.