linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Luis Chamberlain <mcgrof@kernel.org>,
	willy@infradead.org, dave@stgolabs.net, david@fromorbit.com,
	djwong@kernel.org, kbusch@kernel.org
Cc: john.g.garry@oracle.com, hch@lst.de, ritesh.list@gmail.com,
	linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-mm@kvack.org, linux-block@vger.kernel.org,
	gost.dev@samsung.com, p.raghav@samsung.com, da.gomez@samsung.com,
	kernel@pankajraghav.com
Subject: Re: [PATCH v2 2/8] fs/buffer: remove batching from async read
Date: Fri, 7 Feb 2025 08:08:09 +0100	[thread overview]
Message-ID: <a023ce9a-12bf-475b-9c34-3218c80b6ff3@suse.de> (raw)
In-Reply-To: <40f8f338-3b88-497e-b622-49cfa6461d30@suse.de>

On 2/5/25 17:21, Hannes Reinecke wrote:
> On 2/5/25 00:12, Luis Chamberlain wrote:
>> From: Matthew Wilcox <willy@infradead.org>
>>
>> The current implementation of a folio async read in 
>> block_read_full_folio()
>> first batches all buffer-heads which need IOs issued for by putting 
>> them on an
>> array of max size MAX_BUF_PER_PAGE. After collection it locks the batched
>> buffer-heads and finally submits the pending reads. On systems with CPUs
>> where the system page size is quite larger like Hexagon with 256 KiB this
>> batching can lead stack growth warnings so we want to avoid that.
>>
>> Note the use of folio_end_read() through block_read_full_folio(), its
>> used either when the folio is determined to be fully uptodate and no
>> pending read is needed, an IO error happened on get_block(), or an out of
>> bound read raced against batching collection to make our required reads
>> uptodate.
>>
>> We can simplify this logic considerably and remove the stack growth
>> issues of MAX_BUF_PER_PAGE by just replacing the batched logic with
>> one which only issues IO for the previous buffer-head keeping in mind
>> we'll always have one buffer-head (the current one) on the folio with
>> an async flag, this will prevent any calls to folio_end_read().
>>
>> So we accomplish two things with this:
>>
>>   o Avoid large stacks arrays with MAX_BUF_PER_PAGE
>>   o Make the need for folio_end_read() explicit and easier to read
>>
>> Suggested-by: Matthew Wilcox <willy@infradead.org>
>> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
>> ---
>>   fs/buffer.c | 51 +++++++++++++++++++++------------------------------
>>   1 file changed, 21 insertions(+), 30 deletions(-)
>>
>> diff --git a/fs/buffer.c b/fs/buffer.c
>> index b99560e8a142..167fa3e33566 100644
>> --- a/fs/buffer.c
>> +++ b/fs/buffer.c
>> @@ -2361,9 +2361,8 @@ int block_read_full_folio(struct folio *folio, 
>> get_block_t *get_block)
>>   {
>>       struct inode *inode = folio->mapping->host;
>>       sector_t iblock, lblock;
>> -    struct buffer_head *bh, *head, *arr[MAX_BUF_PER_PAGE];
>> +    struct buffer_head *bh, *head, *prev = NULL;
>>       size_t blocksize;
>> -    int nr, i;
>>       int fully_mapped = 1;
>>       bool page_error = false;
>>       loff_t limit = i_size_read(inode);
>> @@ -2380,7 +2379,6 @@ int block_read_full_folio(struct folio *folio, 
>> get_block_t *get_block)
>>       iblock = div_u64(folio_pos(folio), blocksize);
>>       lblock = div_u64(limit + blocksize - 1, blocksize);
>>       bh = head;
>> -    nr = 0;
>>       do {
>>           if (buffer_uptodate(bh))
>> @@ -2410,40 +2408,33 @@ int block_read_full_folio(struct folio *folio, 
>> get_block_t *get_block)
>>               if (buffer_uptodate(bh))
>>                   continue;
>>           }
>> -        arr[nr++] = bh;
>> +
>> +        lock_buffer(bh);
>> +        if (buffer_uptodate(bh)) {
>> +            unlock_buffer(bh);
>> +            continue;
>> +        }
>> +
>> +        mark_buffer_async_read(bh);
>> +        if (prev)
>> +            submit_bh(REQ_OP_READ, prev);
>> +        prev = bh;
>>       } while (iblock++, (bh = bh->b_this_page) != head);
>>       if (fully_mapped)
>>           folio_set_mappedtodisk(folio);
>> -    if (!nr) {
>> -        /*
>> -         * All buffers are uptodate or get_block() returned an
>> -         * error when trying to map them - we can finish the read.
>> -         */
>> -        folio_end_read(folio, !page_error);
>> -        return 0;
>> -    }
>> -
>> -    /* Stage two: lock the buffers */
>> -    for (i = 0; i < nr; i++) {
>> -        bh = arr[i];
>> -        lock_buffer(bh);
>> -        mark_buffer_async_read(bh);
>> -    }
>> -
>>       /*
>> -     * Stage 3: start the IO.  Check for uptodateness
>> -     * inside the buffer lock in case another process reading
>> -     * the underlying blockdev brought it uptodate (the sct fix).
>> +     * All buffers are uptodate or get_block() returned an error
>> +     * when trying to map them - we must finish the read because
>> +     * end_buffer_async_read() will never be called on any buffer
>> +     * in this folio.
>>        */
>> -    for (i = 0; i < nr; i++) {
>> -        bh = arr[i];
>> -        if (buffer_uptodate(bh))
>> -            end_buffer_async_read(bh, 1);
>> -        else
>> -            submit_bh(REQ_OP_READ, bh);
>> -    }
>> +    if (prev)
>> +        submit_bh(REQ_OP_READ, prev);
>> +    else
>> +        folio_end_read(folio, !page_error);
>> +
>>       return 0;
>>   }
>>   EXPORT_SYMBOL(block_read_full_folio);
> 
> Similar here; as we now removed batching (which technically could result
> in I/O being completed while executing the various stages) there really
> is nothing preventing us to use plugging here, no?
> 
In the light of the discussion to the previous patch we should move that
to a later point. So:

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


  reply	other threads:[~2025-02-07  7:08 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-04 23:12 [PATCH v2 0/8] enable bs > ps for block devices Luis Chamberlain
2025-02-04 23:12 ` [PATCH v2 1/8] fs/buffer: simplify block_read_full_folio() with bh_offset() Luis Chamberlain
2025-02-05 16:18   ` Hannes Reinecke
2025-02-05 22:03     ` Matthew Wilcox
2025-02-06  7:17       ` Hannes Reinecke
2025-02-06 17:30         ` Luis Chamberlain
2025-02-07  7:06           ` Hannes Reinecke
2025-02-04 23:12 ` [PATCH v2 2/8] fs/buffer: remove batching from async read Luis Chamberlain
2025-02-05 16:21   ` Hannes Reinecke
2025-02-07  7:08     ` Hannes Reinecke [this message]
2025-02-17 21:40   ` Matthew Wilcox
2025-02-04 23:12 ` [PATCH v2 3/8] fs/mpage: avoid negative shift for large blocksize Luis Chamberlain
2025-02-17 21:48   ` Matthew Wilcox
2025-02-04 23:12 ` [PATCH v2 4/8] fs/mpage: use blocks_per_folio instead of blocks_per_page Luis Chamberlain
2025-02-17 21:58   ` Matthew Wilcox
2025-02-18 15:02     ` Hannes Reinecke
2025-02-21 18:58       ` Luis Chamberlain
2025-02-21 20:25         ` Matthew Wilcox
2025-02-21 20:38           ` Luis Chamberlain
2025-02-21 20:27         ` Matthew Wilcox
2025-02-21 20:39           ` Luis Chamberlain
2025-02-04 23:12 ` [PATCH v2 5/8] fs/buffer fs/mpage: remove large folio restriction Luis Chamberlain
2025-02-05 16:21   ` Hannes Reinecke
2025-02-17 21:59   ` Matthew Wilcox
2025-02-04 23:12 ` [PATCH v2 6/8] block/bdev: enable large folio support for large logical block sizes Luis Chamberlain
2025-02-17 21:59   ` Matthew Wilcox
2025-02-04 23:12 ` [PATCH v2 7/8] block/bdev: lift block size restrictions to 64k Luis Chamberlain
2025-02-17 22:01   ` Matthew Wilcox
2025-02-04 23:12 ` [PATCH v2 8/8] bdev: use bdev_io_min() for statx block size Luis Chamberlain
2025-02-05 16:22   ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a023ce9a-12bf-475b-9c34-3218c80b6ff3@suse.de \
    --to=hare@suse.de \
    --cc=da.gomez@samsung.com \
    --cc=dave@stgolabs.net \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=gost.dev@samsung.com \
    --cc=hch@lst.de \
    --cc=john.g.garry@oracle.com \
    --cc=kbusch@kernel.org \
    --cc=kernel@pankajraghav.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=p.raghav@samsung.com \
    --cc=ritesh.list@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox