From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ABC30C25B77 for ; Wed, 15 May 2024 04:04:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E2C708D006B; Wed, 15 May 2024 00:04:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DDB9B8D004F; Wed, 15 May 2024 00:04:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CCA648D006B; Wed, 15 May 2024 00:04:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id AF22B8D004F for ; Wed, 15 May 2024 00:04:34 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 11B1DC01D8 for ; Wed, 15 May 2024 04:04:34 +0000 (UTC) X-FDA: 82119288468.07.6F3776B Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf06.hostedemail.com (Postfix) with ESMTP id 00A2D18000B for ; Wed, 15 May 2024 04:04:30 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b="i/Z9ShYI"; spf=none (imf06.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715745871; a=rsa-sha256; cv=none; b=Qw5QLBW+zVQ8QJp29D/nCfhKc8I5IRNt3QqnYbvbmHuWpUtphLYpMn6z/QZx1ZnKr31IMY 2LqqrCuBgJdrDjt1bEANsP4HQlo1x16cOAtJOS3mMVIC2QCyRt1hZZpcCzfrWQNt2vEJNK mDbg/NJPg6zjUO9xCDBWJ8Ah0w4EvdU= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b="i/Z9ShYI"; spf=none (imf06.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715745871; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dM/9pOuJniex+bZsh3M6i6DcaU/wvknCDaVpt9pIARk=; b=BAUoilR8BbDWrp2JvBjB6+OX9rJJs+BaWMajtfSlxc0IMH8io61jatBjX29aGgNflEGBZi JMwJoDLfjSRQ1g3EBX9K85nP1SF7bAvTJM5xIHpy1hloqJStGqNoabM0FPNtEa0SDTXpX0 9ZVpVPNJvNkzoGo60ACn5bannDCKpJM= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=dM/9pOuJniex+bZsh3M6i6DcaU/wvknCDaVpt9pIARk=; b=i/Z9ShYIvy4cDyjQ2302bBjXoY ri/wt2PmECEHMe1QxB2u509R79rqBLXj1BjZmjN7nHkqSga0uM96kOHQBmD9uI8Y2JtlydQFlvqKq E954mFHbN30s4nLIIfCPRB1WWXp0lL0LXmYQ22r5SKsb8PziwAo/0e8ZDIMT+NPqlbZ6Uazg8drJB f1OSD9GZCiNEH1ZbDIJxlLG56fKaIDf7ChvESBsWu02uf3ZNAsIVqN1QwJGkxHPluBE5CTWtydNyd 9XGvT8Yw+H9P+nPU2Ijyc6CAAbDsShZlroX9/M3E/e9tk9HYNd+JEvmVwqKpHoff5CG5lx2Qea2LA dJ0K5XCg==; Received: from willy by casper.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1s75sE-00000009tym-11dv; Wed, 15 May 2024 04:04:14 +0000 Date: Wed, 15 May 2024 05:04:14 +0100 From: Matthew Wilcox To: Keith Busch Cc: "Pankaj Raghav (Samsung)" , hch@lst.de, mcgrof@kernel.org, akpm@linux-foundation.org, brauner@kernel.org, chandan.babu@oracle.com, david@fromorbit.com, djwong@kernel.org, gost.dev@samsung.com, hare@suse.de, john.g.garry@oracle.com, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-xfs@vger.kernel.org, p.raghav@samsung.com, ritesh.list@gmail.com, ziy@nvidia.com Subject: Re: [RFC] iomap: use huge zero folio in iomap_dio_zero Message-ID: References: <20240503095353.3798063-8-mcgrof@kernel.org> <20240507145811.52987-1-kernel@pankajraghav.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 00A2D18000B X-Rspam-User: X-Rspamd-Server: rspam12 X-Stat-Signature: o1skf95d8j78pbchnea9i6q6issxztpj X-HE-Tag: 1715745870-224223 X-HE-Meta: U2FsdGVkX1+xi8ZKcPNZ0eqcldkdpLiFkR62EZ+YJYxmRq83EZ5rrurIdXdIhgOq0qMdXy55R/ZMvWmE4bO6YJtikLDwG2Lfsgkc6OFvib8NnOS0hsGSWamdPrA7XdujH/1gHdYnzlhacQ5JxH/TIVvNXENUJKN40ogu28OjFbUwofhqlgDe2Cx2E4yOvD/opbyyDmd/ra9sXwuANoZGxM7nZGf59+KBHhDcwa8F2zRsSY07zOMxcEfCuTzBBxtlsI5sF53j2IR6jnrTLtYTgSHzcHjGaphdZ04jti5DHoVCsQaxEeJhGKzffW//JhnFDSahKUoOrnPRlvcQuf24SXn/ZkCjGEGL2F4BMifx5oLOHVryeci55SKgb7YyGxPU0DJQi0UoplPfbba4H8RJyQqIQK6HuJFIaASdw4xtRii/NGyL0C7OzwaC9i8LykXuuatDAI8/HIezMZ/C4wJX+cGmyDApfzNLVt+Or1a/CbNHn10ki4GwqekMG+7pz0sxIremel5vzoMFI+f1t/fM9ztcJ/xQN/LEDP2c8qgHXPJccD8PkLVGae4yE7Jh9/4wPhxha0ZlGOHhzBI081IVZNV//e01P7kwk4Xs2cdNJpESndlFFv/Yyblogd4Qu7KnFXdpG5rBthjFfKYAtRog4inV2/LM7kHaRiZG+AGllfRARBf4WRWlu9/LJZ1oW/PKALlz2vDXWZPNrd0oyBdB5Kfs5C0/79JGygwpzluYoctVkYZ95G0V+Ei/HMZ0vVV86nT001Xbb/IMWhSkn+Vql0KcHQd0Pbk+VIMFC5udprv2G/a0lUUwUEgWI1gn7v6xNk4hGL219mSxqjBmhnd5pKGkvLyeHQjU0dF+C4RVlGvLTgj+DErCD3Q9cZhKm52EsoJcLBdq1/JWveFztyoFniT8qLpY9SJ7+1Au2+05PULaWdzKDZDpp0QDg56ofupBcWpEh4pvA8LLIzQ4ln1 qxdEV8Da qK/qNdzzBkXSIR6/5l67nJ68/u9SEKVpdz/DD1ILWrVRqOpBLUJcXmoqK4Vk3mzrNlo5fN9MFNxQZO9Xv6TVpLUVmpIh0Tu6j+mHUBjyd/Y/EDY/l391tgB5kj5EOtAjrgRS+/A0RCaJMfcjcOavTv9DRdHike6Ihr+Om+RIZfpSyXz6i/hltxX5ClY3gKbG6WUSjKr3rUbs0ZJzegLD+f5iuchm1AkdfXYy1Cv0sS30G98v6vsiNgesEpZ+I4Er1AOIk X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, May 14, 2024 at 08:34:09PM -0600, Keith Busch wrote: > On Wed, May 15, 2024 at 01:50:53AM +0100, Matthew Wilcox wrote: > > On Tue, May 07, 2024 at 04:58:12PM +0200, Pankaj Raghav (Samsung) wrote: > > > Instead of looping with ZERO_PAGE, use a huge zero folio to zero pad the > > > block. Fallback to ZERO_PAGE if mm_get_huge_zero_folio() fails. > > > > So the block people say we're doing this all wrong. We should be > > issuing a REQ_OP_WRITE_ZEROES bio, and the block layer will take care of > > using the ZERO_PAGE if the hardware doesn't natively support > > WRITE_ZEROES or a DISCARD that zeroes or ... > > Wait a second, I think you've gone too far if you're setting the bio op > to REQ_OP_WRITE_ZEROES. The block layer handles the difference only > through the blkdev_issue_zeroout() helper. If you actually submit a bio > with that op to a block device that doesn't support it, you'll just get > a BLK_STS_NOTSUPP error from submit_bio_noacct(). Ohh. This is a bit awkward, because this is the iomap direct IO path. I don't see an obvious way to get the semantics we want with the current blkdev_issue_zeroout(). For reference, here's the current function: static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio, loff_t pos, unsigned len) { struct inode *inode = file_inode(dio->iocb->ki_filp); struct page *page = ZERO_PAGE(0); struct bio *bio; bio = iomap_dio_alloc_bio(iter, dio, 1, REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits, GFP_KERNEL); bio->bi_iter.bi_sector = iomap_sector(&iter->iomap, pos); bio->bi_private = dio; bio->bi_end_io = iomap_dio_bio_end_io; __bio_add_page(bio, page, len, 0); iomap_dio_submit_bio(iter, dio, bio, pos); } and then: static void iomap_dio_submit_bio(const struct iomap_iter *iter, struct iomap_dio *dio, struct bio *bio, loff_t pos) { struct kiocb *iocb = dio->iocb; atomic_inc(&dio->ref); /* Sync dio can't be polled reliably */ if ((iocb->ki_flags & IOCB_HIPRI) && !is_sync_kiocb(iocb)) { bio_set_polled(bio, iocb); WRITE_ONCE(iocb->private, bio); } if (dio->dops && dio->dops->submit_io) dio->dops->submit_io(iter, bio, pos); else submit_bio(bio); } so unless submit_bio() can handle the fallback to "create a new bio full of zeroes and resubmit it to the device" if the original fails, we're a little mismatched. I'm not really familiar with either part of this code, so I don't have much in the way of bright ideas. Perhaps we go back to the "allocate a large folio at filesystem mount" plan.