Re: [PATCH 1/4] fs/proc/task_mmu: use folio API in pte_is_pinned()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Kefeng Wang <wangkefeng.wang@huawei.com>
To: David Hildenbrand <david@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Helge Deller <deller@gmx.de>, Daniel Vetter <daniel@ffwll.ch>,
	Matthew Wilcox <willy@infradead.org>, <linux-mm@kvack.org>,
	Jonathan Corbet <corbet@lwn.net>
Subject: Re: [PATCH 1/4] fs/proc/task_mmu: use folio API in pte_is_pinned()
Date: Wed, 5 Jun 2024 09:30:59 +0800	[thread overview]
Message-ID: <f5a3c6d2-6287-452a-b852-2aaddc9ce5e4@huawei.com> (raw)
In-Reply-To: <f466aae2-9a3e-4081-b2d8-fa5930b8bc29@redhat.com>



On 2024/6/4 23:47, David Hildenbrand wrote:
> On 04.06.24 16:26, Kefeng Wang wrote:
>>
>>
>> On 2024/6/4 19:51, David Hildenbrand wrote:
>>> On 04.06.24 13:48, Kefeng Wang wrote:
>>>> Convert to use vm_normal_folio() and folio_maybe_dma_pinned() API,
>>>> which helps to remove page_maybe_dma_pinned() in the subsequent change.
>>>>
>>>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>>> ---
>>>>    fs/proc/task_mmu.c | 8 ++++----
>>>>    1 file changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>>>> index f8d35f993fe5..5aceb3db7565 100644
>>>> --- a/fs/proc/task_mmu.c
>>>> +++ b/fs/proc/task_mmu.c
>>>> @@ -1088,7 +1088,7 @@ struct clear_refs_private {
>>>>    static inline bool pte_is_pinned(struct vm_area_struct *vma,
>>>> unsigned long addr, pte_t pte)
>>>>    {
>>>> -    struct page *page;
>>>> +    struct folio *folio;
>>>>        if (!pte_write(pte))
>>>>            return false;
>>>> @@ -1096,10 +1096,10 @@ static inline bool pte_is_pinned(struct
>>>> vm_area_struct *vma, unsigned long addr,
>>>>            return false;
>>>>        if (likely(!test_bit(MMF_HAS_PINNED, &vma->vm_mm->flags)))
>>>>            return false;
>>>> -    page = vm_normal_page(vma, addr, pte);
>>>> -    if (!page)
>>>> +    folio = vm_normal_folio(vma, addr, pte);
>>>> +    if (!folio)
>>>>            return false;
>>>> -    return page_maybe_dma_pinned(page);
>>>> +    return folio_maybe_dma_pinned(folio);
>>>>    }
>>>>    static inline void clear_soft_dirty(struct vm_area_struct *vma,
>>>
>>> Likely we should just get rid of the pte_is_pinned() check completely
>>> now. We don't perform the same for PMDs, we don't sync against GUP-fast,
>>
>> Yes, no handle for clear PMDs.
>>
>>> and the original COW vs. GUP issue was resolved.
>>
>> Agree, but I'm not sure about removing it, from the commit 9348b73c2e1bf,
>>
>>     "The whole "track page soft dirty" state doesn't work with pinned 
>> pages
>>     anyway, since the page might be dirtied by the pinning entity without
>>     ever being noticed in the page tables."
>>
>> The issue is between the pin mechanism and "track page soft dirty", if
>> the page is pinned, the pining entiry(DMA?) could change the page but
>> the pte dirty won't be set, so maybe we still need it, even add some
>> similar thing for PMD? Correct me if I'm wrong, thanks.
> 
> Yes, but it doesn't work with any mechanism that write-protects PTEs,
> including mprotect() and uffd-wp.
> 
> Then, we never synced agaisnt concurrent GUP-fast, concurrent O_DIRECT
> that might still use !FOLL_PIN, never handled PMD ... so it's all not
> consistent nor really helpful nowdays.
> 
> ... and if you have read-only pinned pages (which we cannot distinguish)
> 

OK, many cases need to be addressed.

> I have a proper patch lying around here for quite a while:
> 
> commit 9ef578b7aba8bba626b904fe90e5be0690842fd3
> Author: David Hildenbrand <david@redhat.com>
> Date:   Wed Feb 16 20:39:43 2022 +0100
> 
>      fs/proc/task_mmu: allow setting pinned pages R/O for softdirty 
> tracking
>      Before we had PG_anon_exclusive, our new COW logic would end up
>      replacing a pinned page in the page tables, detecting it as possibly
>      shared. Consequently, we'd lost synchronicity between the page mapped
>      into the page table and the page pin.
>      We tried preventing mapping pinned pages R/O, however, history told us
>      that that is impossible -- and we added PG_anon_exclusive to have 
> it all
>      working reliably again.
>      Now that we have PG_anon_exclusive in place, let's get rid of the 
> check for
>      pinned pages and revert it to the old state.
>      Yes, we won't be able to detect the following scenario correctly:
>      (1) R/W-pin a page.
>      (2) Clear softdirty.
>      (3) Modify the page via the R/W-pin.
>      However, that isn't working reliably right now either way, because
>      * The current check is racy, because we can race with GUP-fast 
> taking a
>        R/W pin while we're clearing the softdirty marker and mapping the 
> page
>        R/O.
>      * The current check cannot handle FOLL_GET R/W references, as used for
>        O_DIRECT. So if there is concurrent I/O the PTE will get marked as
>        !softdirty, yet, direct I/O will modify page content afterwards.
>      * We check for pins only when handling PTEs, but not for PMDs etc.
>      Also, this handling is in no way different to other mechanisms 
> (mprotect,
>      uffd-wp) that map pages R/O to catch successive write access via the
>      page table -- because acceses via the page pin no longer go logically
>      via the page table, the page table is bypassed.
>      With this change, the interface now works again as expected when we 
> have
>      R/O pins, and behaves just like any other mechanism that uses write
>      protection to catch successive writes (mprotect, uffd-wp) -- and 
> they all
>      face the same issue regarding R/W access via GUP (FOLL_PIN and
>      FOLL_GET).
>      User space better be aware that using read-protection to catch 
> writes to
>      a page can miss writes via GUP. Softdirt tracking cannot reliably 
> catch
>      modifications via GUP after clearing softdirty and returning to user
>      space.
> 
> But I understand if you want to be careful :) So I might send that patch 
> out at
> some point myself ...
> 

Thank for your detail explanation, let's wait it out.

next prev parent reply	other threads:[~2024-06-05  1:31 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-04 11:48 [PATCH 0/4] mm: remove page_maybe_dma_pinned() and page_mkclean() Kefeng Wang
2024-06-04 11:48 ` [PATCH 1/4] fs/proc/task_mmu: use folio API in pte_is_pinned() Kefeng Wang
2024-06-04 11:51   ` David Hildenbrand
2024-06-04 14:26     ` Kefeng Wang
     [not found]       ` <f466aae2-9a3e-4081-b2d8-fa5930b8bc29@redhat.com>
2024-06-05  1:30         ` Kefeng Wang [this message]
2024-06-25 22:38           ` Andrew Morton
2024-06-26  4:57             ` David Hildenbrand
2024-06-04 11:48 ` [PATCH 2/4] mm: remove page_maybe_dma_pinned() Kefeng Wang
2024-06-04 11:48 ` [PATCH 3/4] fb_defio: use a folio in fb_deferred_io_work() Kefeng Wang
2024-06-04 11:53   ` David Hildenbrand
2024-06-04 11:48 ` [PATCH 4/4] mm: remove page_mkclean() Kefeng Wang
2024-06-07  6:46   ` [PATCH] pin_user_pages: fix underline length Kefeng Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f5a3c6d2-6287-452a-b852-2aaddc9ce5e4@huawei.com \
    --to=wangkefeng.wang@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=daniel@ffwll.ch \
    --cc=david@redhat.com \
    --cc=deller@gmx.de \
    --cc=linux-mm@kvack.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox