From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5ED76E77184 for ; Thu, 19 Dec 2024 03:51:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CEFD06B0085; Wed, 18 Dec 2024 22:51:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C9F316B0088; Wed, 18 Dec 2024 22:51:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B8FB16B0089; Wed, 18 Dec 2024 22:51:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9CCE26B0085 for ; Wed, 18 Dec 2024 22:51:46 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 172371413C9 for ; Thu, 19 Dec 2024 03:51:46 +0000 (UTC) X-FDA: 82910332302.17.21D4620 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf01.hostedemail.com (Postfix) with ESMTP id 331C440011 for ; Thu, 19 Dec 2024 03:51:18 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=LfP6ELbn; spf=none (imf01.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734580279; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=aSFlG7F9RzIzrW9Hgf/vG0f//Q9KVoULX33zB6BeU64=; b=eE29HGKjzeHEtaisoA1nORxeZ1Au68Zaoz5U7xc5bmZfv1Zab4gWrFIWPrTD610uynH8EU Wa5ybaLyuN+jzLehHv20xZrAWKkAVKbuJtFEP/dR8HpGs/QcUwz5YFk4cmtSKEpH7XmDQg 4juOMyRK2BnTRnDHwn9MYv/COIhF+Kk= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=LfP6ELbn; spf=none (imf01.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734580279; a=rsa-sha256; cv=none; b=781E0K4A/tE0rfPQM0KAkLx5R0ZW6YnYLTzKCM59f1XiJ9laklDytxi3Vv+wuRFFYCnFM7 8n0TIWJ52/y5oROosUUVf7c+9urUhVB93WomFep6hv9OivlHZuW0oHh9I1E3AbkB3jt4OW c86qeLfF4tP5NHRPH9yCyyRfHX4rCSI= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=aSFlG7F9RzIzrW9Hgf/vG0f//Q9KVoULX33zB6BeU64=; b=LfP6ELbnHPNdNQWgkQVyW+k95b n+Al8AAcP6xhm5f/VqJq8bnnGo8i+kc+QAXLppn7Q3vLY9QDt1vFe8eFksJyJ4iy/S1SSmhIYhSzJ 6zQElKG2GQ6gkzOKlzY/d+O5h/7Rg55NckHN67uxrWpGg35nJl3Bg3tLmMTVTzFBkITiSnDrIbmYY UDsOp7RB5PvXuA8ir72a5yOOt+kjzYUr/cNpOHx72BneZAcCd9gDIAKb9fMWVr3Tbdn43iURjZZX7 ovE+rajnMBeTuXTBWYLWiKxHW0I0QHJMWbnytmM5J2sfgRlmPYz8tkq4weeMcYgb5S4I8DJAyxxGL PMH/XEMA==; Received: from willy by casper.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1tO7ZW-00000001VPq-3dSV; Thu, 19 Dec 2024 03:51:34 +0000 Date: Thu, 19 Dec 2024 03:51:34 +0000 From: Matthew Wilcox To: Luis Chamberlain Cc: hare@suse.de, dave@stgolabs.net, david@fromorbit.com, djwong@kernel.org, kbusch@kernel.org, john.g.garry@oracle.com, hch@lst.de, ritesh.list@gmail.com, linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org, gost.dev@samsung.com, p.raghav@samsung.com, da.gomez@samsung.com, kernel@pankajraghav.com Subject: Re: [PATCH 0/5] fs/buffer: strack reduction on async read Message-ID: References: <20241218022626.3668119-1-mcgrof@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 331C440011 X-Rspamd-Server: rspam12 X-Stat-Signature: 9p3q4wqa7puxoy1fb5w3xtgp5no19n7w X-Rspam-User: X-HE-Tag: 1734580278-990601 X-HE-Meta: U2FsdGVkX18/1qG7bQ/hZ0gmRnjOv/fyYMo0qRC4melTtFfL3yQsMgNknBwlDxkoJYZ0ffB0LTSZjalVw75BGU9dcUwMsWdrh1N9qWGza4x5Fk5us7KxIv6QLQWllat8dEibcodzkJtl8sGNUwnzxfXovM60eQXG3iPaL8quW/rhZisclVLvIruDu/fmF5n/BhWwZvk52sO0QmJYE5X3u3WBx5v8V4pkSvneT60lOa5oZojUsTExkp/bjJvsYOGggujtWVhkBB2uXpd0vP4AvKX0d2gSPf4PmNdam/AhAy7n0mlx449bbK1lMmJZqsZXeLnNZzx/TAG5jz7qF7IKMx+T6ynvRRFKizgPEuMtfs3RxCHho/09ROV1dV0IM6MkESzobeBedywi48X9zB1abSpo3puJRxSDSA2iH7xQEPLoEV/FeGp+wCZ06aROC1GFnvZkneDic9KkYkZhh5KOBjnw1Fa6G5x9bMxmjrOM/XrXQR2vfrX0MYTI1wbzuejCeYVa9BVkb9xvffOePsI2gyUJgjKr8+4sdOiyhVxvCjqy52jIvOe3/WZxjRQVqDlv6M19baeRciL+04PdvevIIVA2u8nQKp+J/aTNjVDPbNbLnQXdXUDNY+UVCMNZUaZJPOsjic3xpmTR6VvVaF24eMgGgxFoIORF4MOofWJf2+gToLlWB+a6A5LcpYoJQjHFEX4DbUi9+YtH4zXWWC4wyi5BpRRM4pV1HjMj7n+ZTsLpbl6GkoDJ5i4dzRb/wHSsuq6FWCG3dd7jpyRmeXfDo+VnVPzPHL9F2QgyYK3aBhqw93MABF0eps5T6pxZ4CiyvA6ubmWfqD7k0wi0nk3AcK2y3venusQYI1mbiNwff/9Zgq/YoIa2ttAR4cFBtUexU2IF/RXp7ER5U2c4zIdiYBO8jW1l6aN8gbkwUlpJX/RC7q5FGMi9ARYOqDcGE1URmtmZPR2dV3J8B7TSI/O fR4aG4tQ jg9vWL55QyHuSuOsSm3B00/3FZJ7SP653Lefn6jBqONPTacfAPOHJc8gmVtmnmi6xsj13rsvro0AlvQnDgAsSN5shyH+P6MGHpTck20YCQXjoDKOjhulKd2keJAUogq5Rq7kvmIpEcuJA4FTfVQN7i8IEFvp1MZV6ECJxb0/RJ8/MMqk3boQbuR0M+7NlTmPnD4csX2YoxBdKlBpM6Ka9pGucvrFlv7vWkZz0oJn/bXlcH8HS8fW32Q+TxjmUZM8EJxfZC8p54FrrOpoGM3R3qSG0CyUaR3L/Mb4a+oW3SUTFL1Qa8i7PKwGIyA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Dec 18, 2024 at 06:27:36PM -0800, Luis Chamberlain wrote: > On Wed, Dec 18, 2024 at 08:05:29PM +0000, Matthew Wilcox wrote: > > On Tue, Dec 17, 2024 at 06:26:21PM -0800, Luis Chamberlain wrote: > > > This splits up a minor enhancement from the bs > ps device support > > > series into its own series for better review / focus / testing. > > > This series just addresses the reducing the array size used and cleaning > > > up the async read to be easier to read and maintain. > > > > How about this approach instead -- get rid of the batch entirely? > > Less is more! I wish it worked, but we end up with a null pointer on > ext4/032 (and indeed this is the test that helped me find most bugs in > what I was working on): Yeah, I did no testing; just wanted to give people a different approach to consider. > [ 106.034851] BUG: kernel NULL pointer dereference, address: 0000000000000000 > [ 106.046300] RIP: 0010:end_buffer_async_read_io+0x11/0x90 > [ 106.047819] Code: f2 ff 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 53 48 8b 47 10 48 89 fb 48 8b 40 18 <48> 8b 00 f6 40 0d 40 74 0d 0f b7 00 66 25 00 f0 66 3d 00 80 74 09 That decodes as: 5: 53 push %rbx 6: 48 8b 47 10 mov 0x10(%rdi),%rax a: 48 89 fb mov %rdi,%rbx d: 48 8b 40 18 mov 0x18(%rax),%rax 11:* 48 8b 00 mov (%rax),%rax <-- trapping instruction 14: f6 40 0d 40 testb $0x40,0xd(%rax) 6: bh->b_folio d: b_folio->mapping 11: mapping->host So folio->mapping is NULL. Ah, I see the problem. end_buffer_async_read() uses the buffer_async_read test to decide if all buffers on the page are uptodate or not. So both having no batch (ie this patch) and having a batch which is smaller than the number of buffers in the folio can lead to folio_end_read() being called prematurely (ie we'll unlock the folio before finishing reading every buffer in the folio). Once the folio is unlocked, it can be truncated. That's a second-order problem, but it's the one your test happened to hit. This should fix the problem; we always have at least one BH held in the submission path with the async_read flag set, so end_buffer_async_read() will not end it prematurely. By the way, do you have CONFIG_VM_DEBUG enabled in your testing? VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); in folio_end_read() should have tripped before hitting the race with truncate. diff --git a/fs/buffer.c b/fs/buffer.c index cc8452f60251..fd2633e4a5d2 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2361,9 +2361,9 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block) { struct inode *inode = folio->mapping->host; sector_t iblock, lblock; - struct buffer_head *bh, *head, *arr[MAX_BUF_PER_PAGE]; + struct buffer_head *bh, *head, *prev = NULL; size_t blocksize; - int nr, i; + int i; int fully_mapped = 1; bool page_error = false; loff_t limit = i_size_read(inode); @@ -2380,7 +2380,6 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block) iblock = div_u64(folio_pos(folio), blocksize); lblock = div_u64(limit + blocksize - 1, blocksize); bh = head; - nr = 0; i = 0; do { @@ -2411,40 +2410,33 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block) if (buffer_uptodate(bh)) continue; } - arr[nr++] = bh; + + lock_buffer(bh); + if (buffer_uptodate(bh)) { + unlock_buffer(bh); + continue; + } + + mark_buffer_async_read(bh); + if (prev) + submit_bh(REQ_OP_READ, prev); + prev = bh; } while (i++, iblock++, (bh = bh->b_this_page) != head); if (fully_mapped) folio_set_mappedtodisk(folio); - if (!nr) { - /* - * All buffers are uptodate or get_block() returned an - * error when trying to map them - we can finish the read. - */ - folio_end_read(folio, !page_error); - return 0; - } - - /* Stage two: lock the buffers */ - for (i = 0; i < nr; i++) { - bh = arr[i]; - lock_buffer(bh); - mark_buffer_async_read(bh); - } - /* - * Stage 3: start the IO. Check for uptodateness - * inside the buffer lock in case another process reading - * the underlying blockdev brought it uptodate (the sct fix). + * All buffers are uptodate or get_block() returned an error + * when trying to map them - we must finish the read because + * end_buffer_async_read() will never be called on any buffer + * in this folio. */ - for (i = 0; i < nr; i++) { - bh = arr[i]; - if (buffer_uptodate(bh)) - end_buffer_async_read(bh, 1); - else - submit_bh(REQ_OP_READ, bh); - } + if (prev) + submit_bh(REQ_OP_READ, prev); + else + folio_end_read(folio, !page_error); + return 0; } EXPORT_SYMBOL(block_read_full_folio);