From: David Hildenbrand <david@redhat.com>
To: Pavel Begunkov <asml.silence@gmail.com>,
Jens Axboe <axboe@kernel.dk>,
Alexander Potapenko <glider@google.com>
Cc: syzbot <syzbot+1d335893772467199ab6@syzkaller.appspotmail.com>,
akpm@linux-foundation.org, catalin.marinas@arm.com, jgg@ziepe.ca,
jhubbard@nvidia.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, peterx@redhat.com,
syzkaller-bugs@googlegroups.com
Subject: Re: [syzbot] [mm?] kernel BUG in sanity_check_pinned_pages
Date: Mon, 23 Jun 2025 18:59:19 +0200 [thread overview]
Message-ID: <6f92b7d6-7d3c-4830-a591-75dc4d55c46c@redhat.com> (raw)
In-Reply-To: <ea748461-2059-4aca-81a0-4d01c34926da@gmail.com>
On 23.06.25 18:48, Pavel Begunkov wrote:
> On 6/23/25 16:11, David Hildenbrand wrote:
>> On 23.06.25 16:58, Jens Axboe wrote:
>>> On 6/23/25 6:22 AM, David Hildenbrand wrote:
>>>> On 23.06.25 12:10, David Hildenbrand wrote:
>>>>> On 23.06.25 11:53, Alexander Potapenko wrote:
>>>>>> On Mon, Jun 23, 2025 at 11:29?AM 'David Hildenbrand' via
>>>>>> syzkaller-bugs <syzkaller-bugs@googlegroups.com> wrote:
>>>>>>>
> ...>>> When only pinning a single tail page (iovec.iov_len = pagesize), it works as expected.
>>>>
>>>> So, if we pinned two tail pages but end up calling io_release_ubuf()->unpin_user_page()
>>>> on the head page, meaning that "imu->bvec[i].bv_page" points at the wrong folio page
>>>> (IOW, one we never pinned).
>>>>
>>>> So it's related to the io_coalesce_buffer() machinery.
>>>>
>>>> And in fact, in there, we have this weird logic:
>>>>
>>>> /* Store head pages only*/
>>>> new_array = kvmalloc_array(nr_folios, sizeof(struct page *), GFP_KERNEL);
>>>> ...
>>>>
>>>>
>>>> Essentially discarding the subpage information when coalescing tail pages.
>>>>
>>>>
>>>> I am afraid the whole io_check_coalesce_buffer + io_coalesce_buffer() logic might be
>>>> flawed (we can -- in theory -- coalesc different folio page ranges in
>>>> a GUP result?).
>>>>
>>>> @Jens, not sure if this only triggers a warning when unpinning or if we actually mess up
>>>> imu->bvec[i].bv_page, to end up pointing at (reading/writing) pages we didn't even pin in the first
>>>> place.
>>>>
>>>> Can you look into that, as you are more familiar with the logic?
>>>
>>> Leaving this all quoted and adding Pavel, who wrote that code. I'm
>>> currently away, so can't look into this right now.
>
> Chenliang Li did, but not like it matters
>
>> I did some more digging, but ended up being all confused about io_check_coalesce_buffer() and io_imu_folio_data().
>>
>> Assuming we pass a bunch of consecutive tail pages that all belong to the same folio, then the loop in io_check_coalesce_buffer() will always
>> run into the
>>
>> if (page_folio(page_array[i]) == folio &&
>> page_array[i] == page_array[i-1] + 1) {
>> count++;
>> continue;
>> }
>>
>> case, making the function return "true" ... in io_coalesce_buffer(), we then store the head page ... which seems very wrong.
>>
>> In general, storing head pages when they are not the first page to be coalesced seems wrong.
>
> Yes, it stores the head page even if the range passed to
> pin_user_pages() doesn't cover the head page.
> > It should be converted to unpin_user_folio(), which doesn't seem
> to do sanity_check_pinned_pages(). Do you think that'll be enough
> (conceptually)? Nobody is actually touching the head page in those
> cases apart from the final unpin, and storing the head page is
> more convenient than keeping folios. I'll take a look if it can
> be fully converted to folios w/o extra overhead.
Assuming we had from GUP
nr_pages = 2
pages[0] = folio_page(folio, 1)
pages[1] = folio_page(folio, 2)
After io_coalesce_buffer() we have
nr_pages = 1
pages[0] = folio_page(folio, 0)
Using unpin_user_folio() in all places where we could see something like
that would be the right thing to do. The sanity checks are not in
unpin_user_folio() for exactly that reason: we don't know which folio
pages we pinned.
But now I wonder where you make sure that "Nobody is actually touching
the head page"?
How do you get back the "which folio range" information after
io_coalesce_buffer() ?
If you rely on alignment in virtual address space for you, combined with
imu->folio_shift, that might not work reliably ...
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2025-06-23 16:59 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-03 15:31 syzbot
2025-06-03 16:22 ` David Hildenbrand
2025-06-03 17:20 ` Jens Axboe
2025-06-03 17:25 ` David Hildenbrand
2025-06-03 17:36 ` Jens Axboe
2025-06-21 21:52 ` syzbot
2025-06-23 9:29 ` David Hildenbrand
2025-06-23 9:53 ` Alexander Potapenko
2025-06-23 10:10 ` David Hildenbrand
2025-06-23 12:22 ` David Hildenbrand
2025-06-23 12:47 ` David Hildenbrand
2025-06-23 14:58 ` Jens Axboe
2025-06-23 15:11 ` David Hildenbrand
2025-06-23 16:48 ` Pavel Begunkov
2025-06-23 16:59 ` David Hildenbrand [this message]
2025-06-23 17:36 ` David Hildenbrand
2025-06-23 18:02 ` Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6f92b7d6-7d3c-4830-a591-75dc4d55c46c@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=catalin.marinas@arm.com \
--cc=glider@google.com \
--cc=jgg@ziepe.ca \
--cc=jhubbard@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=peterx@redhat.com \
--cc=syzbot+1d335893772467199ab6@syzkaller.appspotmail.com \
--cc=syzkaller-bugs@googlegroups.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox