From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9DD8EC5475B for ; Thu, 29 Feb 2024 01:07:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0CCEC6B00A1; Wed, 28 Feb 2024 20:07:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 055896B00A2; Wed, 28 Feb 2024 20:07:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E117E6B00A4; Wed, 28 Feb 2024 20:07:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id CC51B6B00A1 for ; Wed, 28 Feb 2024 20:07:08 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 9DF7A14117F for ; Thu, 29 Feb 2024 01:07:08 +0000 (UTC) X-FDA: 81843052536.27.743882D Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) by imf28.hostedemail.com (Postfix) with ESMTP id D8442C0014 for ; Thu, 29 Feb 2024 01:07:06 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b="z5/4HMud"; spf=pass (imf28.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709168826; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SrzdtSOupJJBK7dKXluhkHSUFQeW/bSHKTHR/2daV1E=; b=DHoNUP7GFCFPlMvxLaK7+o9B3QRzn1KnLn7BVRbG960v5TjciTS7byawNyMn7KotwYH6c5 TIig9MLEAFhRE4uOkj1teHmE0kb0vwmshD5tWhgIDNtRzAlbg7zvJ/EVxjr9tm7Uv/NgMH mxfJmw5TeZ0gm5wjKvLhyukMLwv9fWU= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b="z5/4HMud"; spf=pass (imf28.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709168826; a=rsa-sha256; cv=none; b=D+X1KB5f6fam/uKa4ZX+1YAyZttj/lwiDY3GOhCt0IhGSMFXsEclgsXjY5hZzxPosdIOev MY5FA+8Or9ztsPGmy9H9RNu+v8PXcQvjhr+E8B7FlKQUrETdoKcxKqVg1oMvFr6Hz0sh5W UXeOudOJubzhIUefwBzOIoD+8boUfTE= Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-1dc3b4b9b62so3261955ad.1 for ; Wed, 28 Feb 2024 17:07:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1709168826; x=1709773626; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=SrzdtSOupJJBK7dKXluhkHSUFQeW/bSHKTHR/2daV1E=; b=z5/4HMudTbdprBDWzmZPuJ1gvNit14jqCtk3PVAl4nGDiRSIbU3yc9XMuWeNDOf4uj pnURQDws9pLZD7LtohycO1NXJOVNl8vDboELMxZUM4192k5TQGZNRc5pPT17sy/rSDZl EcqL/aOjPOeGHfI6Up2EWqO0GtGHzy5dgRiZeSBMgYjNMNpZuiIed+WM4f+W9jeWlPhv r6foNfp4DnLT5F1Ex+Eklrd9yNyaMBbe/h2bu/WD37UCvLz8caiz8wEszQNsXi4qeuaK QCBDGiEo66PqvJEFR5qGy1DHNfk2R9oEEBv0px1UcUWUNkP8N47TzI4tWTeIpRjOAGdn tznw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709168826; x=1709773626; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=SrzdtSOupJJBK7dKXluhkHSUFQeW/bSHKTHR/2daV1E=; b=VWoVYCPe/lazbIlwjhfAZFyZsdXvKXGCu47A8YEou4fwDwf4tPpFRqvqX6F+KG3gU6 R1qbOxBOQha6SL0AXTf1gIpJEHe/sdVOLEdQrbj2Jj+05CXeUFRcUBvSd5Vh4AKei61m ndAvJKNQ52hLabQk1zEvgmF8rqKf9amP0As/RUPeq0wjvB4VSOkiUdpOdjKX0JBeKqvU 6kWvF3jOtKx9HRnToPw3wvViut6OVyuSj6vf3rA6mLQfCj3STemFY6HKtkwNFecVX592 RQm7VYaUxkKKjxXYgxw9NI5bHk6k1mFp5elbdgyfyvo7/CVm5mkDbkCWe1lT3jx7FmEN K7DQ== X-Forwarded-Encrypted: i=1; AJvYcCVF8UhCDL7jT7xX4FUO3TrT2hyrHzyXVYlCyKpapAJ2RVot6GnFomt2lZEIVCrbQCSSqQNl2RM7XDyFcy0zgyIjPmY= X-Gm-Message-State: AOJu0YyXF4O15Ur3NgBaj4JZ/vt29tBT/bkR5gtDWfo9UMtZPu/2i4Dj BuYASh2QDH7xTfkn0tavwGJy2uwKsEaJ+pmaGw++bSbX2xSntM8EVbl5t75Og7s= X-Google-Smtp-Source: AGHT+IHnVSbdGfy5S2qVm0yA9yYOqPfRfXhuAXhBUKQbPCChjbOp6PZu4LehIXhjbQNAaNRtbHSXFA== X-Received: by 2002:a17:902:ef81:b0:1dc:cc98:ef33 with SMTP id iz1-20020a170902ef8100b001dccc98ef33mr445511plb.31.1709168825659; Wed, 28 Feb 2024 17:07:05 -0800 (PST) Received: from dread.disaster.area (pa49-181-247-196.pa.nsw.optusnet.com.au. [49.181.247.196]) by smtp.gmail.com with ESMTPSA id a12-20020a170902eccc00b001dcb4a4e461sm82337plh.163.2024.02.28.17.07.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 17:07:05 -0800 (PST) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1rfUt4-00CtZf-2V; Thu, 29 Feb 2024 12:07:02 +1100 Date: Thu, 29 Feb 2024 12:07:02 +1100 From: Dave Chinner To: Theodore Ts'o Cc: Matthew Wilcox , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm Subject: Re: [LSF/MM/BPF TOPIC] untorn buffered writes Message-ID: References: <20240228061257.GA106651@mit.edu> <20240228233354.GC177082@mit.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240228233354.GC177082@mit.edu> X-Rspamd-Queue-Id: D8442C0014 X-Rspam-User: X-Stat-Signature: q8c6szdbx4gpd54oo9icfrrgdkq6hgk1 X-Rspamd-Server: rspam01 X-HE-Tag: 1709168826-203881 X-HE-Meta: U2FsdGVkX19sAPpEQOoayEcG2X+RDj43PTskKShO6gjkioB3Z4jxhBI4bk1VMGwXAAJfzZDUfxgLKKsxKgwfpzd2ldqlvEcBKU4Ax+ndTRvrkM4ZoGagk5kDTXZ5Vlv5PxBz7oabhY3+eizeR8o5XnJRRKb+1Z8z2goohiBIAqW/4qTbdF45SUGrG8/sGjKCxjLnrwVt4ikje8f9gOTZ+PP4aGoEt9Fc2pSFG6zNIuejp2DOYa4zdepdHZUJNXgE1eFx53xUGzEYBkqUFwN5XQAEmM6PDw9qh18s07iVYAvYHu7uvnwMPuR/bwQpOOt8IJrMV9QX1l/XpToUU/lmgGNNa7ymfD8k9VbbUI78u66fOpyufzcHSHazh+iDBZ5Fa5nq4/evU7wx7SeDVah4vU0zWLWIlrMSrzatjcQODFNa9CUea3UOpnOJxjknvFNaSq9QJJ5kMXvj/wdhK6FddX5+XKHSBzijpec0t5Flv0/5+LoiNZ420mOBOryD/cOMDUWNntL3dQZPpq6xu7KJqC4bBEhXZbQRTJ27rXEYiM4RJE41cgWnfg0zhqVPwDd8XxFGnVLOwoEUb0vJhkfxwTclh24TyfeqO6s0qzAb8geEMDw4ffkpXyY3jTXclLXO4kAW4iGK8H8W9UTIpSJs9YpNT2/3XHQegD6oziy9focqWhO5uYzxKSqvRQmh44n+zhfBTHmKwSGKTXRweLnyAgZPhSdvnJztzZMwtdjcffJIcirJygIUyBb5R0+6WJ+buSRlH37PilN73fFa9wGp0TRfFYBByOfgXfcX0TBv5+//1TvEAJsXvH1h4IwdHW/qW5syPNP2YhcbaK+DGGMCb1iyPtpEL6/MhsRpy/ZFbLTe7pML0oQXp1gBRQHEr3gZZgFhGmcS+sQo7GVlOEC+Wy1VPsl0u3s3WEMRqqvYQu1yumB/jYj0AAqq7jxe+b8nB1ydivl0PshDhvWoK5/ SKfoTX5t OHhDZ8RMKs7LvkexcmPIENa7xhZJSDhb/389cYzx3PzymD3fF7yaNg4stJDLWi9u0NnhlYVUTOWSUXe8dUnb/VVMxLTPEsOKGcW7SAW8NsCfUzicc/XQGpf1G9w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Feb 28, 2024 at 05:33:54PM -0600, Theodore Ts'o wrote: > On Wed, Feb 28, 2024 at 02:11:06PM +0000, Matthew Wilcox wrote: > > I'm not entirely sure that it does become a mess. If our implementation > > of this ensures that each write ends up in a single folio (even if the > > entire folio is larger than the write), then we will have satisfied the > > semantics of the flag. > > What if we do a 32k write which spans two folios? And what > if the physical pages for those 32k in the buffer cache are not > contiguous? Are you going to have to join the two 16k folios > together, or maybe two 8k folios and an 16k folio, and relocate pages > to make a contiguous 32k folio when we do a buffered RWF_ATOMIC write > of size 32k? RWF_ATOMIC defines contraints that a 32kB write must be 32kB aligned. So the only way a 32kB write would span two folios is if a 16kB write had already been done in this space. WE are already dealing with this problem for bs > ps with the min order mapping constraint. We can deal with this easily by ensuring that when we set the inode as supporting atomic writes. This already ensures physical extent allocation alignment, we can also set the mapping folio order at this time to ensure that we only allocate RWF_ATOMIC compatible aligned/sized folios.... > > I think we'd be better off treating RWF_ATOMIC like it's a bs>PS device. Which is why Willy says this... > > That takes two somewhat special cases and makes them use the same code > > paths, which probably means fewer bugs as both camps will be testing > > the same code. > > But for a bs > PS device, where the logical block size is greater than > the page size, you don't need the RWF_ATOMIC flag at all. Yes we do - hardware already supports REQ_ATOMIC sizes larger than 64kB filesystem blocks. i.e. RWF_ATOMIC is not restricted to 64kB or any specific filesystem block size, and can always be larger than the filesystem block size. > All direct > I/O writes *must* be a multiple of the logical sector size, and > buffered writes, if they are smaller than the block size, *must* be > handled as a read-modify-write, since you can't send writes to the > device smaller than the logical sector size. The filesystem will likely need to constrain minimum RWF_ATOMIC sizes to a single filesystem block. That's the whole point of having the statx interface - the application is going to have to query what the min/max atomic write sizes supported are and adjust to those. Applications will not be able to use 2kB RWF_ATOMIC writes on a 4kB block size filesystem, and it's no different with larger filesystem block sizes. -Dave. -- Dave Chinner david@fromorbit.com