From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55358C25B10 for ; Mon, 6 May 2024 23:19:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C8F446B007B; Mon, 6 May 2024 19:19:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C3F586B0082; Mon, 6 May 2024 19:19:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AB88D6B0083; Mon, 6 May 2024 19:19:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 8A6A86B007B for ; Mon, 6 May 2024 19:19:51 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 04414C0419 for ; Mon, 6 May 2024 23:19:50 +0000 (UTC) X-FDA: 82089540582.23.D24E60A Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf30.hostedemail.com (Postfix) with ESMTP id 14FF480003 for ; Mon, 6 May 2024 23:19:48 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=SQD9zcpf; spf=pass (imf30.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715037589; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JfmNU7T7Zux55fuk0/ynp0mrkZlbfnBJFtPeSgTSd88=; b=wnf+MO6QtMUHcKskgr/Qs44tokYvM2gSu/fDpNh3lNUSEZtILFwUn7Uio3DOuAxYCWJP+x fBM/qEjykTgob2FR82v3deK2FBQUuX0BUysbUnw9eEaqgsqr7etQMcQ4jAwsJTzYLMAv3y 3NQ1RLRxnVhlKeplWhI+U7NmTxdTZjs= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=SQD9zcpf; spf=pass (imf30.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715037589; a=rsa-sha256; cv=none; b=ONxfqaIJogQDaH3z/QFhv/8eCOoNQ/Oj2YuMUAWKkHzcuHyG6ryfouj78DE2qRYudS3d/D S4LriMMkWGIB987/4cLiTdFfhyBalLeQIUt3e7tYGgcUCNefLCDEVR+Jhh/igJX18noDgK 58cunSw6iaSe1hOszn9Cv8OcITu5RA0= Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-1ecddf96313so21443715ad.2 for ; Mon, 06 May 2024 16:19:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1715037588; x=1715642388; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=JfmNU7T7Zux55fuk0/ynp0mrkZlbfnBJFtPeSgTSd88=; b=SQD9zcpfXaIXwHusUzjvzozo1sKtR5NicwTgDUQiq8Up3EaMRr9OtNjiCLrOtS0/xZ ZPVmR/F++WC/a9AtpVjE7p3gJS1gXObN5Z0hepFEHXtFLY7TZqlE6Uctkav6n2qh/7cP +A49Yc+k0d1S2bKf/NEracPvoVHF+GvxqaHWS2I0KWE4MO8zQjsVss2wke5dEo6plBCC xsyD0aLIYgeBtTLu2Ck8CchtR/0RkWWbx8SY5nvb7wrk9Og5gfcqX2FX4nbcZOSI+UBE 4G6hni64H+wySaGAEIaKiLGVL4rTlagNMxznBgrmZx2JjBebsgILfZ1SXTc6LqAbjieF C1/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715037588; x=1715642388; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=JfmNU7T7Zux55fuk0/ynp0mrkZlbfnBJFtPeSgTSd88=; b=HMtA+1ynjeOVe8wPVzNj+eNKH/AFYnmdn00uXr6ZoypA9iJuGHVVMalJ6jTgI/USrL v+Z17QJRNYJbrVGBxb7Qitl1Zfw3UOfRdtV6WSa2IKc4qwJECe5a3vSXwd9j2nJPtJLD 2L6sbFmwKiXF2q6gnghX9i3nXsQGbtOtw8xLC0LRgdFaNcNYZ8mY2Wam0AoXNWFZD1yT I1KZ7gX6kCy7zECt6yqWr94Ds/gjfzlQQqeUHbT3YJ4pcArG4u2sVHkmaQW8w5ner6RL fc3plNvFMCtxl+WT8tGe1FDzodaSWk/Ihjz99l5nMXtHBSiMKuvUXjRsO8Nb5yUcKMQ5 /LCw== X-Forwarded-Encrypted: i=1; AJvYcCVt0G5BPzy81nR+uRvTRwdpIEXdlV+h+Sz9k69w15XrH3tEwczMNTUwPWMivfJihdKY0hyJs2PMF/8+olcT/KJ0Bms= X-Gm-Message-State: AOJu0YxPWccmqbVqt0FFDhN2QFc/slDKoZDu7NzQFi0emABZLGmZLOHz S5tTNI4TykvfgI516V1tRoO83jIRQIloNBMR+tb9TK1HHqzii/tJd0C03yTmNhc= X-Google-Smtp-Source: AGHT+IGTdCbkJ6dCeXIlEqW0jHcIBzUUTSQchp3eoxUw0itjwcXxpPE+rnXs+G996BJVeTg/0lgcEQ== X-Received: by 2002:a17:902:ecc7:b0:1eb:dae:bdab with SMTP id a7-20020a170902ecc700b001eb0daebdabmr16603631plh.46.1715037587536; Mon, 06 May 2024 16:19:47 -0700 (PDT) Received: from dread.disaster.area (pa49-179-32-121.pa.nsw.optusnet.com.au. [49.179.32.121]) by smtp.gmail.com with ESMTPSA id i12-20020a17090332cc00b001ecd2c44ae0sm8800306plr.4.2024.05.06.16.19.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 May 2024 16:19:47 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1s47cW-0064gO-0e; Tue, 07 May 2024 09:19:44 +1000 Date: Tue, 7 May 2024 09:19:44 +1000 From: Dave Chinner To: Zhang Yi Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz, ritesh.list@gmail.com, hch@infradead.org, djwong@kernel.org, willy@infradead.org, zokeefe@google.com, yi.zhang@huawei.com, chengzhihao1@huawei.com, yukuai3@huawei.com, wangkefeng.wang@huawei.com Subject: Re: [RFC PATCH v4 24/34] ext4: implement buffered write iomap path Message-ID: References: <20240410142948.2817554-1-yi.zhang@huaweicloud.com> <20240410142948.2817554-25-yi.zhang@huaweicloud.com> <96bbdb25-b420-67b1-d4c4-b838a5c70f9f@huaweicloud.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <96bbdb25-b420-67b1-d4c4-b838a5c70f9f@huaweicloud.com> X-Stat-Signature: 3j4epcptazfqkdwgdqnyewexmmawqr6f X-Rspamd-Queue-Id: 14FF480003 X-Rspamd-Server: rspam10 X-Rspam-User: X-HE-Tag: 1715037588-866417 X-HE-Meta: U2FsdGVkX19ucxXmEiA6H/4yiMCYZiqmoTSN7eBroV8grD0fDWQopu+vJBb6xxTAT08+VnRU5fNxmCpMX7QrN0GE8LTYunMXi7DWaNJXSnqWRIt1h9vLZ18r26obz0wG+DTjllE/Uc4k4ijomm3Rt0W5yKiI+nkSfRtFeYv0Be5+ihP/47eVqmTlO/wKHWylbcZ1Tw/2hLw6yAIKHzbVvOEJvSODRQ9X3qY/vcVIFqy0pphLkMoJe2todZkb5H9qfiS2ws9Q8HzI2JppXfvNLK067MB6cqztguXY/a+lflqjPuMkpTYcll8ufgmbSD4j5WN6qHIpelWbnVVUP3ML9rnLttA14pLwD4TayXQNZ8VBOUeyCnD/HhHyyMr6RJoxOje2zSlkPEFI4PH/rxMTVqCX8t7fgZ0mgEKOtesV2Lea3bgsNad9v/QIN9ctsWJyHMNAtAJVh4zSw+BUtBY6zj25w/yApGHPEya5ZMtDNjPybZcUc/QB0+8H1TFC9GfWnksHP5e2cQdJbI/GB25gXuJJ00Tha3iIRd50HvsLsRL1QPMkXAhzODDxVnKv1oQ1u2S6uCM8R+6b36x3aZw+uBmIxnx1RzLS8MEVjyOhXYitt4znQ66lbl8hNKGPkVLv1sN5DxVumoDwBgWPSodGaiHOSxGGOsMsvho6faRY7GWV+DXeYIi2gMamCnwIxwGzCsvT3qIiE8IkbOm+6DpI7PDY3NFqlKTv5Nzb5f3twbW5xsYhuIG/TbiTY+6V4ClUuG3GCxAadkNEKNWOKT5kFNAFz+lHk/JJb66TA/pwAFX8ZEREkLZ7HDSTVCcOPGgJQHXCeh9glYRUvxKpGaF6bucUTp1SzDvkd86Vjj6lqeIxL9eZMMstgYzoTpR6E7zjlBY256xZTQqyLn8cm+XMJq6vNtwwc7tnYG1mxDghwObveGJNlYlaDSZQ3pny4clfuPhYhec4hkY+6symirK Kr1d7TI+ li/uQUJWRsvb1EEBncxcHnVWNErAsOEeSd2DWsy43xJzTnyFcJIKayr8owws6lBSeERX0OBioAbWsuz4279URq6whoPc7XPSqrSVlOFuy/4eGapQhbgHml+MiZhP89PXJmHkfSI7KVAt01P354sbS4SijnMxVJYr/TsoZRkV3sZnuZK+4YBEcE09CSkkOIwgFxo+Lfigj/zC7uvlHO7eqHn5LwnQfRPoNNQ3AZsW3tHHEKjcH3wZRw1osqbky6dY/gYkH/MmNjgznnXztxyczWsAIXslc5BYyXcfspgu62Ecp7mXVyx6S+ycoBa/vDDG4ceriZhfo/Mk9B47a8CJHDH6jgJMv93I9XJY08mHvdWhLKQ+ux+n7zt2WCwnhAuEy61fx8BN01HNPgY5qL4ybI7FTRQV/5Ct0za+wsV6qn2WwwHL2lAtMnyBlej3yNKEYEJ3XdEf/eu+/55nmuS5UbQ0vRzBj+Uh8GsIks8aWVs0XD/Oi/VWkEFQ45XAuNj1soV0ZbQmo+rIA5VaoJrxwyts9FAluaLvzyUZX X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, May 06, 2024 at 07:44:44PM +0800, Zhang Yi wrote: > On 2024/5/1 16:33, Dave Chinner wrote: > > On Wed, May 01, 2024 at 06:11:13PM +1000, Dave Chinner wrote: > >> On Wed, Apr 10, 2024 at 10:29:38PM +0800, Zhang Yi wrote: > >>> From: Zhang Yi > >>> > >>> Implement buffered write iomap path, use ext4_da_map_blocks() to map > >>> delalloc extents and add ext4_iomap_get_blocks() to allocate blocks if > >>> delalloc is disabled or free space is about to run out. > >>> > >>> Note that we always allocate unwritten extents for new blocks in the > >>> iomap write path, this means that the allocation type is no longer > >>> controlled by the dioread_nolock mount option. After that, we could > >>> postpone the i_disksize updating to the writeback path, and drop journal > >>> handle in the buffered dealloc write path completely. > > ..... > >>> +/* > >>> + * Drop the staled delayed allocation range from the write failure, > >>> + * including both start and end blocks. If not, we could leave a range > >>> + * of delayed extents covered by a clean folio, it could lead to > >>> + * inaccurate space reservation. > >>> + */ > >>> +static int ext4_iomap_punch_delalloc(struct inode *inode, loff_t offset, > >>> + loff_t length) > >>> +{ > >>> + ext4_es_remove_extent(inode, offset >> inode->i_blkbits, > >>> + DIV_ROUND_UP_ULL(length, EXT4_BLOCK_SIZE(inode->i_sb))); > >>> return 0; > >>> } > >>> > >>> +static int ext4_iomap_buffered_write_end(struct inode *inode, loff_t offset, > >>> + loff_t length, ssize_t written, > >>> + unsigned int flags, > >>> + struct iomap *iomap) > >>> +{ > >>> + handle_t *handle; > >>> + loff_t end; > >>> + int ret = 0, ret2; > >>> + > >>> + /* delalloc */ > >>> + if (iomap->flags & IOMAP_F_EXT4_DELALLOC) { > >>> + ret = iomap_file_buffered_write_punch_delalloc(inode, iomap, > >>> + offset, length, written, ext4_iomap_punch_delalloc); > >>> + if (ret) > >>> + ext4_warning(inode->i_sb, > >>> + "Failed to clean up delalloc for inode %lu, %d", > >>> + inode->i_ino, ret); > >>> + return ret; > >>> + } > >> > >> Why are you creating a delalloc extent for the write operation and > >> then immediately deleting it from the extent tree once the write > >> operation is done? > > > > Ignore this, I mixed up the ext4_iomap_punch_delalloc() code > > directly above with iomap_file_buffered_write_punch_delalloc(). > > > > In hindsight, iomap_file_buffered_write_punch_delalloc() is poorly > > named, as it is handling a short write situation which requires > > newly allocated delalloc blocks to be punched. > > iomap_file_buffered_write_finish() would probably be a better name > > for it.... > > > >> Also, why do you need IOMAP_F_EXT4_DELALLOC? Isn't a delalloc iomap > >> set up with iomap->type = IOMAP_DELALLOC? Why can't that be used? > > > > But this still stands - the first thing > > iomap_file_buffered_write_punch_delalloc() is: > > > > if (iomap->type != IOMAP_DELALLOC) > > return 0; > > > > Thanks for the suggestion, the delalloc and non-delalloc write paths > share the same ->iomap_end() now (i.e. ext4_iomap_buffered_write_end()), > I use the IOMAP_F_EXT4_DELALLOC to identify the write path. Again, you don't need that. iomap tracks newly allocated IOMAP_DELALLOC extents via the IOMAP_F_NEW flag that should be getting set in the ->iomap_begin() call when it creates a new delalloc extent. Please look at the second check in iomap_file_buffered_write_punch_delalloc(): if (iomap->type != IOMAP_DELALLOC) return 0; /* If we didn't reserve the blocks, we're not allowed to punch them. */ if (!(iomap->flags & IOMAP_F_NEW)) return 0; > For > non-delalloc path, If we have allocated more blocks and copied less, we > should truncate extra blocks that newly allocated by ->iomap_begin(). Why? If they were allocated as unwritten, then you can just leave them there as unwritten extents, same as XFS. Keep in mind that if we get a short write, it is extremely likely the application is going to rewrite the remaining data immediately, so if we allocated blocks they are likely to still be needed, anyway.... > If we use IOMAP_DELALLOC, we can't tell if the blocks are > pre-existing or newly allocated, we can't truncate the > pre-existing blocks, so I have to introduce IOMAP_F_EXT4_DELALLOC. > But if we split the delalloc and non-delalloc handler, we could > drop IOMAP_F_EXT4_DELALLOC. As per above: IOMAP_F_NEW tells us -exactly- this. IOMAP_F_NEW should be set on any newly allocated block - delalloc or real - because that's the flag that tells the iomap infrastructure whether zero-around is needed for partial block writes. If ext4 is not setting this flag on delalloc regions allocated by ->iomap_begin(), then that's a serious bug. > I also checked xfs, IIUC, xfs doesn't free the extra blocks beyond EOF > in xfs_buffered_write_iomap_end() for non-delalloc case since they will > be freed by xfs_free_eofblocks in some other inactive paths, like > xfs_release()/xfs_inactive()/..., is that right? XFS doesn't care about real blocks beyond EOF existing - xfs_free_eofblocks() is an optimistic operation that does not guarantee that it will remove blocks beyond EOF. Similarly, we don't care about real blocks within EOF because we alway allocate data extents as unwritten, so we don't have any stale data exposure issues to worry about on short writes leaving allocated blocks behind. OTOH, delalloc extents without dirty page cache pages over them cannot be allowed to exist. Without dirty pages, there is no trigger to convert those to real extents (i.e. nothing to write back). Hence the only sane thing that can be done with them on a write error or short write is remove them in the context where they were created. This is the only reason that the iomap_file_buffered_write_punch_delalloc() exists - it abstracts this nasty corner case away from filesystems that support delalloc so they don't have to worry about getting this right. That's whole point of having delalloc aware infrastructure - individual filesysetms don't need to handle all these weird corner cases themselves because the infrastructure takes care of them... -Dave. -- Dave Chinner david@fromorbit.com