From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89AC9C02198 for ; Wed, 5 Feb 2025 16:21:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D9885280004; Wed, 5 Feb 2025 11:21:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D486D280003; Wed, 5 Feb 2025 11:21:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BE83F280004; Wed, 5 Feb 2025 11:21:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A2662280003 for ; Wed, 5 Feb 2025 11:21:06 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 47B19A039D for ; Wed, 5 Feb 2025 16:21:06 +0000 (UTC) X-FDA: 83086405332.19.B38F2FF Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf07.hostedemail.com (Postfix) with ESMTP id E56624001C for ; Wed, 5 Feb 2025 16:21:03 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=LTpARgVn; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=R7mb6alE; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=LTpARgVn; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=R7mb6alE; spf=pass (imf07.hostedemail.com: domain of hare@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=hare@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738772464; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mErxR2jWIQieUyXE9mZG2Dm/SuvVkUO11gTiB54WlE0=; b=TmPPUQTW99ZNOqZHyYp0tFXupziVNcpojvGUjtndk+ueY/M8Kdz8cgOfeTbp0lt9Sgay0o x1BY8NX8A/u1p+7YIxsPe1DedSj3tW8qeWikdhSXa1yWr1CRDBRlwUiaXHl3Dz+6Hz2fsF 3R8AP82IQlsBWWU0G0De/cGq3OmQDfA= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=LTpARgVn; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=R7mb6alE; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=LTpARgVn; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=R7mb6alE; spf=pass (imf07.hostedemail.com: domain of hare@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=hare@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738772464; a=rsa-sha256; cv=none; b=MT4Y/6oH3DQDys9+rTSJq0nuCWQoWQV3fbbTT/kzDrc9h8rI8nBlXfOIyPOsCqjdOUxHzl QImLx3QZfC7aSbXiNJpfW07bxyxztfXyIRdI1Iz2UTGUceGXvE2miv2n9U2feynyjLqOQY kvt24ig+BVfpYxJFr86WXiwcoZT1Qzo= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 680A921272; Wed, 5 Feb 2025 16:21:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1738772462; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mErxR2jWIQieUyXE9mZG2Dm/SuvVkUO11gTiB54WlE0=; b=LTpARgVnzdxPJCwsSaYoGqKGjVKLX/Zfax+Zz3mBnifyiBmvrewHndRBB/10pRIssn+F/S //BC/sAYDgOJdhx2+f4n9F86tDog9LvcgKUAAkr/NQGTo2L0VRyuk4ORFna+8jsEsF3bcT GXJ9aaqqv5naQdmJ3q1TMmag6GNbTg0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1738772462; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mErxR2jWIQieUyXE9mZG2Dm/SuvVkUO11gTiB54WlE0=; b=R7mb6alEkYKNIjtKyTYI2brfLU6iT0A1v4FJiH/vUf6NULo+lhYxY47FgMM+lyh7RLtgFt 1xBR9T13tXKXelCw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1738772462; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mErxR2jWIQieUyXE9mZG2Dm/SuvVkUO11gTiB54WlE0=; b=LTpARgVnzdxPJCwsSaYoGqKGjVKLX/Zfax+Zz3mBnifyiBmvrewHndRBB/10pRIssn+F/S //BC/sAYDgOJdhx2+f4n9F86tDog9LvcgKUAAkr/NQGTo2L0VRyuk4ORFna+8jsEsF3bcT GXJ9aaqqv5naQdmJ3q1TMmag6GNbTg0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1738772462; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mErxR2jWIQieUyXE9mZG2Dm/SuvVkUO11gTiB54WlE0=; b=R7mb6alEkYKNIjtKyTYI2brfLU6iT0A1v4FJiH/vUf6NULo+lhYxY47FgMM+lyh7RLtgFt 1xBR9T13tXKXelCw== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id D268E13694; Wed, 5 Feb 2025 16:21:01 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id UjrPMe2Po2dnNAAAD6G6ig (envelope-from ); Wed, 05 Feb 2025 16:21:01 +0000 Message-ID: <40f8f338-3b88-497e-b622-49cfa6461d30@suse.de> Date: Wed, 5 Feb 2025 17:21:01 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 2/8] fs/buffer: remove batching from async read To: Luis Chamberlain , willy@infradead.org, dave@stgolabs.net, david@fromorbit.com, djwong@kernel.org, kbusch@kernel.org Cc: john.g.garry@oracle.com, hch@lst.de, ritesh.list@gmail.com, linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org, gost.dev@samsung.com, p.raghav@samsung.com, da.gomez@samsung.com, kernel@pankajraghav.com References: <20250204231209.429356-1-mcgrof@kernel.org> <20250204231209.429356-3-mcgrof@kernel.org> Content-Language: en-US From: Hannes Reinecke In-Reply-To: <20250204231209.429356-3-mcgrof@kernel.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Action: no action X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: E56624001C X-Stat-Signature: ctfxwyduy3ngb89okzbmu47d4zh59hkf X-Rspam-User: X-HE-Tag: 1738772463-943734 X-HE-Meta: U2FsdGVkX1/aD4IqaAlpl4OUXSjfDTM3R9vRTkBk7922++qPtvIOhLkTpoYGME+ZvKaaOjwiI3V5Vlk0QuCO5zYoRmoM9Vrye8/5zvTEGrccCE8BK0wIoleMq7bFXgejypHL5+V0fNkn6/QQTcr00P1xuvTqfF4Foc5B8ulCMPlXm0pm94TUNAel9IX0LbqmyFLp9DL60hlEI2Z8adQlzcaojBYnqnq8VzDp8lTMjCXTNwrHpwYNBgHoYWyB7/LC1x94EvqM4mmao71Ju1b95bIxjj0DBFDnGke1+LlUdk3W+VP/rYtdKI/m181VrULW3Zqp7yuuRWZhdjGjiKSUqMdm2mrrMbcVUGSUdZCug5FnMZqR9g2VqbK9ppZzUTC+ZJv9BiwpBVeR2QrCWJgCFPh2lD7Gry6+EVTnEJ+bXPobvYQgH/nrXSHk2Jbh+culZ+JfFF/Jr7KeHacS/ZKg9L9LifkVD/yET8FGbUK8FPL6fmxLYMWHic4zXNZdmIEHMX3+F7XbAG0X6Am6BSipGqppYCcf9h+rLs5NsH92Val9m34uRnNH3I5x/Z1Tal/vaoJT0NdzdbwvrYZg14UoddYl0jG/Pd/XkZ/Q9RcGJKZmyIqEfhvm/LvXnJshouj94kMPNvCav3vw0gg7ELxV27DIU48l+m+s7ujpLMnLSXZ1rNRytmBWFhYeQqdFMOXKct3v/+7Kf7O+f4dEacLQZaqF1KrrDZeew28b4WwSAu8w7/a7CrzZCDKdgsX0YnFHPPqcLnKn+10frqljXraDHycpQDYjXqjrCa5ePsnGg9E6MBSMkVUhNtnjcvNka9+VXSdiQRkSHr8tMdeIiiQRU0axtr6oYnKkrQQDlo3l3vMSlSHgdcVOO39R0Q/LkV3cedita47L0Fl/MNkHLPbARCOO9d9x3NB65nrRPOWRoELHPxh+QA4G3BzAQcb0iMCbHYBRwe/6ExJ0asISuP5 caPPg41Z BREmRBUN+tKydhF8LB899zn34sben5MxTvTraG6V14l7N0fdA5IyZZzHWoT8mmrT+cJd+NNi+0LMSjR/MugVOtMBlZ8TnguXICdzV59bYHcxDSq/qYyw5/ISU6Uxlyrl769EBILpaSd0mpFYn/w8J3SVL4P9CCCK1kociuBiYQaYSVoKnReCbHnnBHVGo+Wp+MxW4P2hZBFMcP19ctsSRGIi5LO2X5rLgLbGEc4jMRCeLysECZKBP6H5GQf63/ZVPiE7GWSATMPfQW3Ax5CtfFo5/RP4S3RHMtYA+/+QcKMmDIy3dKHmwCTzcAK/6ph4tENkjKin3PHm81lI4Rkx1T7x1R+aGU/bTMS7dROQEzlRwoqG8PAMHS2dbBAY2goUt2/6t9elPIrgjf3MoC0r+9yn7SvcbhSp/zUCcUALylHxqhNjQC1jeIJwJugkCUzPyUPicnWGnR2xfP3eH6SU0VnVPWRnQbsHdybPxNPSeJ1Pd/d0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2/5/25 00:12, Luis Chamberlain wrote: > From: Matthew Wilcox > > The current implementation of a folio async read in block_read_full_folio() > first batches all buffer-heads which need IOs issued for by putting them on an > array of max size MAX_BUF_PER_PAGE. After collection it locks the batched > buffer-heads and finally submits the pending reads. On systems with CPUs > where the system page size is quite larger like Hexagon with 256 KiB this > batching can lead stack growth warnings so we want to avoid that. > > Note the use of folio_end_read() through block_read_full_folio(), its > used either when the folio is determined to be fully uptodate and no > pending read is needed, an IO error happened on get_block(), or an out of > bound read raced against batching collection to make our required reads > uptodate. > > We can simplify this logic considerably and remove the stack growth > issues of MAX_BUF_PER_PAGE by just replacing the batched logic with > one which only issues IO for the previous buffer-head keeping in mind > we'll always have one buffer-head (the current one) on the folio with > an async flag, this will prevent any calls to folio_end_read(). > > So we accomplish two things with this: > > o Avoid large stacks arrays with MAX_BUF_PER_PAGE > o Make the need for folio_end_read() explicit and easier to read > > Suggested-by: Matthew Wilcox > Signed-off-by: Luis Chamberlain > --- > fs/buffer.c | 51 +++++++++++++++++++++------------------------------ > 1 file changed, 21 insertions(+), 30 deletions(-) > > diff --git a/fs/buffer.c b/fs/buffer.c > index b99560e8a142..167fa3e33566 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -2361,9 +2361,8 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block) > { > struct inode *inode = folio->mapping->host; > sector_t iblock, lblock; > - struct buffer_head *bh, *head, *arr[MAX_BUF_PER_PAGE]; > + struct buffer_head *bh, *head, *prev = NULL; > size_t blocksize; > - int nr, i; > int fully_mapped = 1; > bool page_error = false; > loff_t limit = i_size_read(inode); > @@ -2380,7 +2379,6 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block) > iblock = div_u64(folio_pos(folio), blocksize); > lblock = div_u64(limit + blocksize - 1, blocksize); > bh = head; > - nr = 0; > > do { > if (buffer_uptodate(bh)) > @@ -2410,40 +2408,33 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block) > if (buffer_uptodate(bh)) > continue; > } > - arr[nr++] = bh; > + > + lock_buffer(bh); > + if (buffer_uptodate(bh)) { > + unlock_buffer(bh); > + continue; > + } > + > + mark_buffer_async_read(bh); > + if (prev) > + submit_bh(REQ_OP_READ, prev); > + prev = bh; > } while (iblock++, (bh = bh->b_this_page) != head); > > if (fully_mapped) > folio_set_mappedtodisk(folio); > > - if (!nr) { > - /* > - * All buffers are uptodate or get_block() returned an > - * error when trying to map them - we can finish the read. > - */ > - folio_end_read(folio, !page_error); > - return 0; > - } > - > - /* Stage two: lock the buffers */ > - for (i = 0; i < nr; i++) { > - bh = arr[i]; > - lock_buffer(bh); > - mark_buffer_async_read(bh); > - } > - > /* > - * Stage 3: start the IO. Check for uptodateness > - * inside the buffer lock in case another process reading > - * the underlying blockdev brought it uptodate (the sct fix). > + * All buffers are uptodate or get_block() returned an error > + * when trying to map them - we must finish the read because > + * end_buffer_async_read() will never be called on any buffer > + * in this folio. > */ > - for (i = 0; i < nr; i++) { > - bh = arr[i]; > - if (buffer_uptodate(bh)) > - end_buffer_async_read(bh, 1); > - else > - submit_bh(REQ_OP_READ, bh); > - } > + if (prev) > + submit_bh(REQ_OP_READ, prev); > + else > + folio_end_read(folio, !page_error); > + > return 0; > } > EXPORT_SYMBOL(block_read_full_folio); Similar here; as we now removed batching (which technically could result in I/O being completed while executing the various stages) there really is nothing preventing us to use plugging here, no? Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich