From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53737C25B77 for ; Fri, 17 May 2024 12:36:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7E3A26B0085; Fri, 17 May 2024 08:36:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 76D256B0088; Fri, 17 May 2024 08:36:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 60D4A6B0089; Fri, 17 May 2024 08:36:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 401186B0085 for ; Fri, 17 May 2024 08:36:38 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id D022440284 for ; Fri, 17 May 2024 12:36:37 +0000 (UTC) X-FDA: 82127836434.15.F242243 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf08.hostedemail.com (Postfix) with ESMTP id 77EA216001D for ; Fri, 17 May 2024 12:36:34 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=ydH76rsx; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=bk9pEpDF; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=TXXm5qRm; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=xtO6hOK0; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf08.hostedemail.com: domain of hare@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=hare@suse.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715949394; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ue3vJL1SAMo1R7zI163Hy0EYrAeewo1Z63UFC2G4Xyg=; b=kWdrn9i90/dVo5Wl7biFcfigbLYst545ENUoMcacQppRN8/ZIycnR+QyQVJpj5Osnw5eF5 D3f3kU3htq2mursO/F3DOBnyBo0vvrnqBSJ2MPiIWxtOwpwfhXaVFve2qoZy+R9ylwUBwg vJrRTkKWYL2p5SgpXrqOSg9CQ7iL4vI= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=ydH76rsx; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=bk9pEpDF; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=TXXm5qRm; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=xtO6hOK0; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf08.hostedemail.com: domain of hare@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=hare@suse.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715949394; a=rsa-sha256; cv=none; b=EAErDszUXniYkCtVW1KL7MLkC1xugzHbk6Cft5fsVOP/wUWGsSTMguxRtINYS9QsFe1se1 0WaLwlIEk11XIvuxXa56GK9TtHZwSJjefpSWM/7XHp0HD9iUMAVTtBpTKnTZSskvqzOGx3 P9T4Ku4j+xK6AU3jsOxYQJJhKtFeLGE= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 435CA37472; Fri, 17 May 2024 12:36:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1715949392; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Ue3vJL1SAMo1R7zI163Hy0EYrAeewo1Z63UFC2G4Xyg=; b=ydH76rsxdcfKe18Ri9nE9LKLEiPj9U7dcgoTI/rlRld70TFEumeFp3ro0x5UJlHSF5Iiq2 kuh1gPt9jeR/69fxDCMrIHcI4io0fNX+E7P8j+o9BBdGv2liJg2+r4oobKZ6tghsFKsumP 8Ypw782OJceIGc6q+S4TRNBnv+U7dwI= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1715949392; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Ue3vJL1SAMo1R7zI163Hy0EYrAeewo1Z63UFC2G4Xyg=; b=bk9pEpDFlPWzp/xArkxHF/skFla39sDMuyh4FFjzo77NgsJuR1yH9+lDcOUBQm8dPzz7Zw TuETGXRrHnhV/YDQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1715949391; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Ue3vJL1SAMo1R7zI163Hy0EYrAeewo1Z63UFC2G4Xyg=; b=TXXm5qRm60AH+633Mfa6cA0TeNY/PXkrguk54+N2dM79pS+QDOl27cY/jnJznGAjwz0LIZ J8yDEflDUX/aKkolv2mgiXeUZHff24t1mADj7F0/B6UPWNLo+44u8WondS3zCgC9r1Qfaz GH7XuOHvKd2ann9txx/uNKZxnQ1O0gE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1715949391; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Ue3vJL1SAMo1R7zI163Hy0EYrAeewo1Z63UFC2G4Xyg=; b=xtO6hOK0ZgJ7afvvYaFtlJtlYoVKPvJJkrAQZLxP+EnJHFz6AOZGARAfaZjdNhvCOXDPez ufbLyS0pV3GSAfCg== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 7193413942; Fri, 17 May 2024 12:36:30 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id xvbqGU5PR2Z5TAAAD6G6ig (envelope-from ); Fri, 17 May 2024 12:36:30 +0000 Message-ID: Date: Fri, 17 May 2024 14:36:29 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC] iomap: use huge zero folio in iomap_dio_zero Content-Language: en-US To: "Pankaj Raghav (Samsung)" , Matthew Wilcox Cc: david@fromorbit.com, djwong@kernel.org, hch@lst.de, Keith Busch , mcgrof@kernel.org, akpm@linux-foundation.org, brauner@kernel.org, chandan.babu@oracle.com, gost.dev@samsung.com, john.g.garry@oracle.com, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-xfs@vger.kernel.org, p.raghav@samsung.com, ritesh.list@gmail.com, ziy@nvidia.com References: <20240503095353.3798063-8-mcgrof@kernel.org> <20240507145811.52987-1-kernel@pankajraghav.com> <20240515155943.2uaa23nvddmgtkul@quentin> <20240516150206.d64eezbj3waieef5@quentin> From: Hannes Reinecke In-Reply-To: <20240516150206.d64eezbj3waieef5@quentin> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Action: no action X-Rspamd-Queue-Id: 77EA216001D X-Stat-Signature: triywcu39pq4egw7p1m7ppq8de8tze3p X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1715949394-394907 X-HE-Meta: U2FsdGVkX1/SlvLrv/MAYwnih9k1u18MCvhf00BOxZlf3wNuNclrwpnsQ+FEiuXDdP8svNE+T1TmUxWp/7Fpauvj94tYDpbNIcgoqS0B+Lm5dyR/ZyF+KV4CxpGgXwCOTiCByb4TUOcODOeQAnAGIDMy1OWBawCn0qfUVXDuX+hzT8nsEJnUxXqcn1D0ZDb8KdJgLGHvUvVWnxtBkMuFgOy6N0Oj5n81Gx/Lh5Jg9A7Z3lm8wtjrNw0+jR+Z6PNJ8hd6eYQP67z6EUGftJrMxA+qtlXOpgg/nvRrT7zDgCOtQUlmnHgODXZbbo3ly3NdBfc90aRj9z3FfboSOcSUFpeqoeo9KTMiuaDWl9JdMAaqQQmCnMpLrkhywIB+DserjupyaV5YrahtQtvcnCDUVK11zugTVHkTxqC0IHVezPtEgQDVuE3TBGjA55b4DHFXfZpQTHtaBY1Aaip42nw4Y6FJDiwE+Vq3iWKLD0PV3gT8DHCpiQPNPfocqDTvjraCR2nsXjJtrUWjciBBvhD+KHO+7Yu52dogTkx10Yi2p+9hlMgCWaWJv59WAdZv7RUDD+ZpSFzW6k5OlorCuGSTfThcAP753QFcachWPt9Mu8V9DOXDF0myLTku6Ywf3K0VxYocdy6Mi/mdo+N54lBKqZeTBunmid+sOryVfOcpRsVYbs6qK8xvDHv7D6t/qZtlhTUsFd7kYr/TL6LRJ21BQ5Eo1m4RX2PbpEOm2SETu/7/fL473DIGE8iw9dbNqTkw8oT94qDNFZxpozLz23gKQ56Y+uLJ+1xf8K9smZkyMhvjQJOheW/7WfThD9YLNB/IBgLdtV5P74MC995hR6n+ja8cJD8UJ+PY29fZGqKoEi5ioCpK9UdnPfLd3213TgkiAZtD5YYxwIqwn2YRDkaXJthVASCmFdZXTzJDseZq//VS52LttTPi3H3XdEnpY6wBCzsJtdT9sTT1tikZPwa kAEd8yzK B7HVoGozrot0h/9GjQjeOpLEln3k/BJPGTxBqFhr0I2AwE7L20+yUIJ6urYTOZgYferBoYg2vpbjpWe1m2VR8iNqFJFBQxoTGqj8U+zsSwBtMW4tIW4OR4jWAATPVukmJOr56HFr01WMxJ7dKp3QEsTjXm0Kzdu0tMr1TWE2uJiGslJpAg6/GExblDx5P6l8vTzEb8R1iOjjK2ANnOvnAY8rtjeDDHh/KLgCeWNNiwZH24UvCuhK+VteQTgfjItao20XLtv7n/8mWZ8WUWQxJ6mwmFLuJU/I7X00VBgA05yCcIrzUo8MVbMT3SQtP5kAV+AWB+7Gy8hsip/hs21x2RkMS9aAUZv2Ujl5E74s966JVWhnSDexUreUP+7vQ+M1AL6z+5YbIbx/BZc41uFi61w/cQYcqoa4yyjOyIphLyT/ysA4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 5/16/24 17:02, Pankaj Raghav (Samsung) wrote: > On Wed, May 15, 2024 at 07:03:20PM +0100, Matthew Wilcox wrote: >> On Wed, May 15, 2024 at 03:59:43PM +0000, Pankaj Raghav (Samsung) wrote: >>> static int __init iomap_init(void) >>> { >>> + void *addr = kzalloc(16 * PAGE_SIZE, GFP_KERNEL); >> >> Don't use XFS coding style outside XFS. >> >> kzalloc() does not guarantee page alignment much less alignment to >> a folio. It happens to work today, but that is an implementation >> artefact. >> >>> + >>> + if (!addr) >>> + return -ENOMEM; >>> + >>> + zero_fsb_folio = virt_to_folio(addr); >> >> We also don't guarantee that calling kzalloc() gives you a virtual >> address that can be converted to a folio. You need to allocate a folio >> to be sure that you get a folio. >> >> Of course, you don't actually need a folio. You don't need any of the >> folio metadata and can just use raw pages. >> >>> + /* >>> + * The zero folio used is 64k. >>> + */ >>> + WARN_ON_ONCE(len > (16 * PAGE_SIZE)); >> >> PAGE_SIZE is not necessarily 4KiB. >> >>> + bio = iomap_dio_alloc_bio(iter, dio, BIO_MAX_VECS, >>> + REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); >> >> The point was that we now only need one biovec, not MAX. >> > > Thanks for the comments. I think it all makes sense: > > diff --git a/fs/internal.h b/fs/internal.h > index 7ca738904e34..e152b77a77e4 100644 > --- a/fs/internal.h > +++ b/fs/internal.h > @@ -35,6 +35,14 @@ static inline void bdev_cache_init(void) > int __block_write_begin_int(struct folio *folio, loff_t pos, unsigned len, > get_block_t *get_block, const struct iomap *iomap); > > +/* > + * iomap/buffered-io.c > + */ > + > +#define ZERO_FSB_SIZE (65536) > +#define ZERO_FSB_ORDER (get_order(ZERO_FSB_SIZE)) > +extern struct page *zero_fs_block; > + > /* > * char_dev.c > */ But why? We already have a perfectly fine hugepage zero page in huge_memory.c. Shouldn't we rather export that one and use it? (Actually I have some patches for doing so...) We might allocate folios > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > index 4e8e41c8b3c0..36d2f7edd310 100644 > --- a/fs/iomap/buffered-io.c > +++ b/fs/iomap/buffered-io.c > @@ -42,6 +42,7 @@ struct iomap_folio_state { > }; > > static struct bio_set iomap_ioend_bioset; > +struct page *zero_fs_block; > > static inline bool ifs_is_fully_uptodate(struct folio *folio, > struct iomap_folio_state *ifs) > @@ -1985,8 +1986,13 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc, > } > EXPORT_SYMBOL_GPL(iomap_writepages); > > + > static int __init iomap_init(void) > { > + zero_fs_block = alloc_pages(GFP_KERNEL | __GFP_ZERO, ZERO_FSB_ORDER); > + if (!zero_fs_block) > + return -ENOMEM; > + > return bioset_init(&iomap_ioend_bioset, 4 * (PAGE_SIZE / SECTOR_SIZE), > offsetof(struct iomap_ioend, io_bio), > BIOSET_NEED_BVECS); > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > index f3b43d223a46..50c2bca8a347 100644 > --- a/fs/iomap/direct-io.c > +++ b/fs/iomap/direct-io.c > @@ -236,17 +236,22 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio, > loff_t pos, unsigned len) > { > struct inode *inode = file_inode(dio->iocb->ki_filp); > - struct page *page = ZERO_PAGE(0); > struct bio *bio; > > + /* > + * Max block size supported is 64k > + */ > + WARN_ON_ONCE(len > ZERO_FSB_SIZE); > + > bio = iomap_dio_alloc_bio(iter, dio, 1, REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); > fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits, > GFP_KERNEL); > + > bio->bi_iter.bi_sector = iomap_sector(&iter->iomap, pos); > bio->bi_private = dio; > bio->bi_end_io = iomap_dio_bio_end_io; > > - __bio_add_page(bio, page, len, 0); > + __bio_add_page(bio, zero_fs_block, len, 0); > iomap_dio_submit_bio(iter, dio, bio, pos); > } > -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich