linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Yisheng Xie <ethan.xys@linux.alibaba.com>,
	akpm@linux-foundation.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] vfio/type1: unpin PageReserved page
Date: Fri, 1 Mar 2024 11:57:16 +0100	[thread overview]
Message-ID: <97d53a8b-1b8c-47a7-977f-4fc4977ef236@redhat.com> (raw)
In-Reply-To: <20240229150406.4d41db01.alex.williamson@redhat.com>

On 29.02.24 23:04, Alex Williamson wrote:
> On Tue, 27 Feb 2024 13:25:56 -0700
> Alex Williamson <alex.williamson@redhat.com> wrote:
> 
>> On Tue, 27 Feb 2024 11:27:08 +0100
>> David Hildenbrand <david@redhat.com> wrote:
>>
>>> On 26.02.24 18:32, Alex Williamson wrote:
>>>> On Tue, 27 Feb 2024 01:14:54 +0800
>>>> Yisheng Xie <ethan.xys@linux.alibaba.com> wrote:
>>>>      
>>>>> 在 2024/2/27 00:14, Alex Williamson 写道:
>>>>>> On Tue, 27 Feb 2024 00:01:06 +0800
>>>>>> Yisheng Xie<ethan.xys@linux.alibaba.com>  wrote:
>>>>>>        
>>>>>>> We meet a warning as following:
>>>>>>>     WARNING: CPU: 99 PID: 1766859 at mm/gup.c:209 try_grab_page.part.0+0xe8/0x1b0
>>>>>>>     CPU: 99 PID: 1766859 Comm: qemu-kvm Kdump: loaded Tainted: GOE  5.10.134-008.2.x86_64 #1
>>>>>>                                                                       ^^^^^^^^
>>>>>>
>>>>>> Does this issue reproduce on mainline?  Thanks,
>>>>>
>>>>> I have check the code of mainline, the logical seems the same as my
>>>>> version.
>>>>>
>>>>> so I think it can reproduce if i understand correctly.
>>>>
>>>> I obviously can't speak to what's in your 5.10.134-008.2 kernel, but I
>>>> do know there's a very similar issue resolved in v6.0 mainline and
>>>> included in v5.10.146 of the stable tree.  Please test.  Thanks,
>>>
>>> This commit, to be precise:
>>>
>>> commit 873aefb376bbc0ed1dd2381ea1d6ec88106fdbd4
>>> Author: Alex Williamson <alex.williamson@redhat.com>
>>> Date:   Mon Aug 29 21:05:40 2022 -0600
>>>
>>>       vfio/type1: Unpin zero pages
>>>       
>>>       There's currently a reference count leak on the zero page.  We increment
>>>       the reference via pin_user_pages_remote(), but the page is later handled
>>>       as an invalid/reserved page, therefore it's not accounted against the
>>>       user and not unpinned by our put_pfn().
>>>       
>>>       Introducing special zero page handling in put_pfn() would resolve the
>>>       leak, but without accounting of the zero page, a single user could
>>>       still create enough mappings to generate a reference count overflow.
>>>       
>>>       The zero page is always resident, so for our purposes there's no reason
>>>       to keep it pinned.  Therefore, add a loop to walk pages returned from
>>>       pin_user_pages_remote() and unpin any zero pages.
>>>
>>>
>>> BUT
>>>
>>> in the meantime, we also have
>>>
>>> commit c8070b78751955e59b42457b974bea4a4fe00187
>>> Author: David Howells <dhowells@redhat.com>
>>> Date:   Fri May 26 22:41:40 2023 +0100
>>>
>>>       mm: Don't pin ZERO_PAGE in pin_user_pages()
>>>       
>>>       Make pin_user_pages*() leave a ZERO_PAGE unpinned if it extracts a pointer
>>>       to it from the page tables and make unpin_user_page*() correspondingly
>>>       ignore a ZERO_PAGE when unpinning.  We don't want to risk overrunning a
>>>       zero page's refcount as we're only allowed ~2 million pins on it -
>>>       something that userspace can conceivably trigger.
>>>       
>>>       Add a pair of functions to test whether a page or a folio is a ZERO_PAGE.
>>>
>>>
>>> So the unpin_user_page_* won't do anything with the shared zeropage.
>>>
>>> (likely, we could revert 873aefb376bbc0ed1dd2381ea1d6ec88106fdbd4)
>>
>>
>> Yes, according to the commit log it seems like the unpin is now just
>> wasted work since v6.5.  Thanks!
> 
> I dusted off an old unit test for mapping the zeropage through vfio and
> started working on posting a revert for 873aefb376bb but I actually
> found that this appears to be resolved even before c8070b787519.  I
> bisected it to:
> 
> commit 84209e87c6963f928194a890399e24e8ad299db1
> Author: David Hildenbrand <david@redhat.com>
> Date:   Wed Nov 16 11:26:48 2022 +0100
> 
>      mm/gup: reliable R/O long-term pinning in COW mappings
>      
>      We already support reliable R/O pinning of anonymous memory.
>      However, assume we end up pinning (R/O long-term) a pagecache page
>      or the shared zeropage inside a writable private ("COW") mapping.
>      The next write access will trigger a write-fault and replace the
>      pinned page by an exclusive anonymous page in the process page
>      tables to break COW: the pinned page no longer corresponds to the
>      page mapped into the process' page table.
>      Now that FAULT_FLAG_UNSHARE can break COW on anything mapped into a
>      COW mapping, let's properly break COW first before R/O long-term
>      pinning something that's not an exclusive anon page inside a COW
>      mapping. FAULT_FLAG_UNSHARE will break COW and map an exclusive
>      anon page instead that can get pinned safely.
>      
>      With this change, we can stop using FOLL_FORCE|FOLL_WRITE for
>      reliable R/O long-term pinning in COW mappings.
> 
> [...]
> 
>      Note 3: For users that use FOLL_LONGTERM right now without
>      FOLL_WRITE, such as VFIO, we'd now no longer pin the shared
>      zeropage. Instead, we'd populate exclusive anon pages that we can
>      pin. There was a concern that this could affect the memlock limit
>      of existing setups.
> 
>      For example, a VM running with VFIO could run into the memlock
>      limit and fail to run. However, we essentially had the same
>      behavior already in commit 17839856fd58 ("gup: document and work
>      around "COW can break either way" issue") which got merged into
>      some enterprise distros, and there were not any such complaints. So
>      most probably, we're fine.
> 

Oh, I almost forgot about that one :)

Indeed, 84209e87c696 was v6.2 and c8070b787519 was v6.5.

... and c8070b787519 was primarily concerned about !FOLL_LONGTERM usage, 
so that makes sense that they would still run into zeropages.

For vfio, 84209e87c696 did the trick.

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2024-03-01 10:57 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-26 16:01 Yisheng Xie
2024-02-26 16:14 ` Alex Williamson
2024-02-26 17:14   ` Yisheng Xie
2024-02-26 17:32     ` Alex Williamson
2024-02-27 10:27       ` David Hildenbrand
2024-02-27 20:25         ` Alex Williamson
2024-02-29 22:04           ` Alex Williamson
2024-03-01 10:57             ` David Hildenbrand [this message]
2024-02-28 11:35         ` Yisheng Xie
2024-02-27 15:29 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=97d53a8b-1b8c-47a7-977f-4fc4977ef236@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=ethan.xys@linux.alibaba.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox