From: Matthew Wilcox <willy@infradead.org>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: hare@suse.de, dave@stgolabs.net, david@fromorbit.com,
djwong@kernel.org, kbusch@kernel.org, john.g.garry@oracle.com,
hch@lst.de, ritesh.list@gmail.com, linux-fsdevel@vger.kernel.org,
linux-xfs@vger.kernel.org, linux-mm@kvack.org,
linux-block@vger.kernel.org, gost.dev@samsung.com,
p.raghav@samsung.com, da.gomez@samsung.com,
kernel@pankajraghav.com
Subject: Re: [PATCH 0/5] fs/buffer: strack reduction on async read
Date: Thu, 19 Dec 2024 03:51:34 +0000 [thread overview]
Message-ID: <Z2OYRkpRcUFIOFog@casper.infradead.org> (raw)
In-Reply-To: <Z2OEmALBGB8ARLlc@bombadil.infradead.org>
On Wed, Dec 18, 2024 at 06:27:36PM -0800, Luis Chamberlain wrote:
> On Wed, Dec 18, 2024 at 08:05:29PM +0000, Matthew Wilcox wrote:
> > On Tue, Dec 17, 2024 at 06:26:21PM -0800, Luis Chamberlain wrote:
> > > This splits up a minor enhancement from the bs > ps device support
> > > series into its own series for better review / focus / testing.
> > > This series just addresses the reducing the array size used and cleaning
> > > up the async read to be easier to read and maintain.
> >
> > How about this approach instead -- get rid of the batch entirely?
>
> Less is more! I wish it worked, but we end up with a null pointer on
> ext4/032 (and indeed this is the test that helped me find most bugs in
> what I was working on):
Yeah, I did no testing; just wanted to give people a different approach
to consider.
> [ 106.034851] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [ 106.046300] RIP: 0010:end_buffer_async_read_io+0x11/0x90
> [ 106.047819] Code: f2 ff 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 53 48 8b 47 10 48 89 fb 48 8b 40 18 <48> 8b 00 f6 40 0d 40 74 0d 0f b7 00 66 25 00 f0 66 3d 00 80 74 09
That decodes as:
5: 53 push %rbx
6: 48 8b 47 10 mov 0x10(%rdi),%rax
a: 48 89 fb mov %rdi,%rbx
d: 48 8b 40 18 mov 0x18(%rax),%rax
11:* 48 8b 00 mov (%rax),%rax <-- trapping instruction
14: f6 40 0d 40 testb $0x40,0xd(%rax)
6: bh->b_folio
d: b_folio->mapping
11: mapping->host
So folio->mapping is NULL.
Ah, I see the problem. end_buffer_async_read() uses the buffer_async_read
test to decide if all buffers on the page are uptodate or not. So both
having no batch (ie this patch) and having a batch which is smaller than
the number of buffers in the folio can lead to folio_end_read() being
called prematurely (ie we'll unlock the folio before finishing reading
every buffer in the folio).
Once the folio is unlocked, it can be truncated. That's a second-order
problem, but it's the one your test happened to hit.
This should fix the problem; we always have at least one BH held in
the submission path with the async_read flag set, so
end_buffer_async_read() will not end it prematurely.
By the way, do you have CONFIG_VM_DEBUG enabled in your testing?
VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
in folio_end_read() should have tripped before hitting the race with
truncate.
diff --git a/fs/buffer.c b/fs/buffer.c
index cc8452f60251..fd2633e4a5d2 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2361,9 +2361,9 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block)
{
struct inode *inode = folio->mapping->host;
sector_t iblock, lblock;
- struct buffer_head *bh, *head, *arr[MAX_BUF_PER_PAGE];
+ struct buffer_head *bh, *head, *prev = NULL;
size_t blocksize;
- int nr, i;
+ int i;
int fully_mapped = 1;
bool page_error = false;
loff_t limit = i_size_read(inode);
@@ -2380,7 +2380,6 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block)
iblock = div_u64(folio_pos(folio), blocksize);
lblock = div_u64(limit + blocksize - 1, blocksize);
bh = head;
- nr = 0;
i = 0;
do {
@@ -2411,40 +2410,33 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block)
if (buffer_uptodate(bh))
continue;
}
- arr[nr++] = bh;
+
+ lock_buffer(bh);
+ if (buffer_uptodate(bh)) {
+ unlock_buffer(bh);
+ continue;
+ }
+
+ mark_buffer_async_read(bh);
+ if (prev)
+ submit_bh(REQ_OP_READ, prev);
+ prev = bh;
} while (i++, iblock++, (bh = bh->b_this_page) != head);
if (fully_mapped)
folio_set_mappedtodisk(folio);
- if (!nr) {
- /*
- * All buffers are uptodate or get_block() returned an
- * error when trying to map them - we can finish the read.
- */
- folio_end_read(folio, !page_error);
- return 0;
- }
-
- /* Stage two: lock the buffers */
- for (i = 0; i < nr; i++) {
- bh = arr[i];
- lock_buffer(bh);
- mark_buffer_async_read(bh);
- }
-
/*
- * Stage 3: start the IO. Check for uptodateness
- * inside the buffer lock in case another process reading
- * the underlying blockdev brought it uptodate (the sct fix).
+ * All buffers are uptodate or get_block() returned an error
+ * when trying to map them - we must finish the read because
+ * end_buffer_async_read() will never be called on any buffer
+ * in this folio.
*/
- for (i = 0; i < nr; i++) {
- bh = arr[i];
- if (buffer_uptodate(bh))
- end_buffer_async_read(bh, 1);
- else
- submit_bh(REQ_OP_READ, bh);
- }
+ if (prev)
+ submit_bh(REQ_OP_READ, prev);
+ else
+ folio_end_read(folio, !page_error);
+
return 0;
}
EXPORT_SYMBOL(block_read_full_folio);
next prev parent reply other threads:[~2024-12-19 3:51 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-18 2:26 Luis Chamberlain
2024-12-18 2:26 ` [PATCH 1/5] fs/buffer: move async batch read code into a helper Luis Chamberlain
2024-12-18 2:26 ` [PATCH 2/5] fs/buffer: simplify block_read_full_folio() with bh_offset() Luis Chamberlain
2024-12-18 2:26 ` [PATCH 3/5] fs/buffer: add a for_each_bh() for block_read_full_folio() Luis Chamberlain
2024-12-18 19:20 ` Matthew Wilcox
2024-12-18 2:26 ` [PATCH 4/5] fs/buffer: add iteration support " Luis Chamberlain
2024-12-18 2:26 ` [PATCH 5/5] fs/buffer: reduce stack usage on bh_read_iter() Luis Chamberlain
2024-12-18 2:47 ` Luis Chamberlain
2024-12-18 20:05 ` [PATCH 0/5] fs/buffer: strack reduction on async read Matthew Wilcox
2024-12-19 2:27 ` Luis Chamberlain
2024-12-19 3:51 ` Matthew Wilcox [this message]
2024-12-30 17:30 ` Luis Chamberlain
2025-01-31 16:54 ` Luis Chamberlain
2025-01-31 22:01 ` Matthew Wilcox
2025-02-03 14:00 ` Luis Chamberlain
2024-12-19 6:28 ` Christoph Hellwig
2024-12-19 17:53 ` Luis Chamberlain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z2OYRkpRcUFIOFog@casper.infradead.org \
--to=willy@infradead.org \
--cc=da.gomez@samsung.com \
--cc=dave@stgolabs.net \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=gost.dev@samsung.com \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=john.g.garry@oracle.com \
--cc=kbusch@kernel.org \
--cc=kernel@pankajraghav.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=p.raghav@samsung.com \
--cc=ritesh.list@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox