From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B660BC27C50 for ; Wed, 5 Jun 2024 01:31:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3462D6B0099; Tue, 4 Jun 2024 21:31:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F5B86B009A; Tue, 4 Jun 2024 21:31:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1BDC56B009B; Tue, 4 Jun 2024 21:31:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id F24956B0099 for ; Tue, 4 Jun 2024 21:31:09 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id BBD6F807AF for ; Wed, 5 Jun 2024 01:31:09 +0000 (UTC) X-FDA: 82195106658.17.64DB932 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf14.hostedemail.com (Postfix) with ESMTP id 83EBF100002 for ; Wed, 5 Jun 2024 01:31:06 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717551068; a=rsa-sha256; cv=none; b=KMaWgHanyGsac5NIT6kRwBafr/NH6SD2H7jfy57Zun7kpNz/GrYaRVsUy1yFBcZnMtR7C9 JDVttXq0MGTOwCPzdWFZVVWopqEsKg4sxDiiAwHJNeb3VZBGvvZCJt/RHzQNRcOc5pSUPO NSkrDDIar7rgzsJP/ENU/1m1aGdYl7w= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717551068; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=246f/kGqwdeubCDttiD2677ve5X9GLY5DHwcGMy0CE0=; b=1U35hm+6FB6cSROurkt2XvGeh0sTQCE0tWogZ+UnN7onTIeKShtPCE5GjdiaAs6dTemy0P XHyIAdv/HhHkpXCV8/TBCvdamvqxYqey2JgbT9dBDZLfgxFrrO1NK0m6F+dgs3AyPouE/J JFk+9QcN0kimbf3Z8MjYmLomHaFmFQY= Received: from mail.maildlp.com (unknown [172.19.88.194]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4Vv8tr0GhHzmXr1; Wed, 5 Jun 2024 09:26:28 +0800 (CST) Received: from dggpemm100001.china.huawei.com (unknown [7.185.36.93]) by mail.maildlp.com (Postfix) with ESMTPS id A40981402CB; Wed, 5 Jun 2024 09:31:00 +0800 (CST) Received: from [10.174.177.243] (10.174.177.243) by dggpemm100001.china.huawei.com (7.185.36.93) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Wed, 5 Jun 2024 09:31:00 +0800 Message-ID: Date: Wed, 5 Jun 2024 09:30:59 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 1/4] fs/proc/task_mmu: use folio API in pte_is_pinned() Content-Language: en-US To: David Hildenbrand , Andrew Morton CC: Helge Deller , Daniel Vetter , Matthew Wilcox , , Jonathan Corbet References: <20240604114822.2089819-1-wangkefeng.wang@huawei.com> <20240604114822.2089819-2-wangkefeng.wang@huawei.com> <70ce5e9a-ddd7-4e21-9ca9-cd0e72e1df60@redhat.com> <5ec7f8b0-777c-44d2-874f-9332432129b4@huawei.com> From: Kefeng Wang In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.177.243] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To dggpemm100001.china.huawei.com (7.185.36.93) X-Rspamd-Queue-Id: 83EBF100002 X-Rspam-User: X-Rspamd-Server: rspam12 X-Stat-Signature: ucwmixh6gm95trwnp936cagudty4b8rb X-HE-Tag: 1717551066-610079 X-HE-Meta: U2FsdGVkX19iL+F82Y3td5UwDg4477u5zpbZfoL+iWaRf5Lu2ML7Q4G/8quK8I3HsaFaRd46yQTziBx35zQ4YiyFwrJRz7VuCcuVVIrEQd+2xk8ee8giCDFUilei/DAlRZiesH25ggnGZy2wGCfeQ5BnNdS37ClgLTa/+oAnBERkaGRFyd2F38JT+W+euM6W8aXetmOVWPsPEx01pC7b5ITzyO1dxVcySy443L5/sCB6jXue5HGwyniLwP08yboU//GI+0VBlfxk/XG83u12lT0ki15iTwLFQjTI+HMl2FF5C/wD6eH2AisEnK2ucw/BmQnU03UGvafQ6YTItU2jZsvsVzgLJ33QbZBIDnLRfYHRwWRfmml2SoW2ei7aMtrneidiWvN9JYm5Xtv1nk1cCOv3FrYNJi1Rr/X5NyUzH+S2G682vvc6THziohKW2UqoPQOCizFQ8BDBxc0yQXTtca8t8a0cOSbFf1Vy1geeTeryh0gcgz94uGhFKXX0Tf7Ld50cmtqQl8D2iUH0iMKjhNuCMIM537GTum4pQRT51UDUJxDr4DMDcWjhpWsZM6UUMLSzrEGwQx4hy1+YExAVKfFZFhcnfmzGq7KeWzl2x5VfgmmTd3otqGSugIsnZzLdqacHd8nGLbPZ5wPgMCNc0kQbYIQDwJHGfB9xCxiv++ZLK4m1Wm2Fu5sG/B0wRsIgtDE++6KflLG4Uf6JDNEZNgkJLOEv9bYsy7BtBtLRouMwMKz0YBYvcAAa1CwIsUpYgXRNbLRFsboTNUwbHOVAx6sThANcrcPb/yS13ZBXxzZMKvmp02m1EsHgAdZsQJe+V6WmhHQQla9gdS8PVnP5EHOXBeIurLfv1HT7qfGprIzb1IH87wfs0flo9OWvQzKnUDu1v7Qc6NU9D2hwKDwLdkt6HhlcaKDxkrCB3DoYW/QjeDRkyUp5vCgtySRlR0DF1Zo9ZyCdPCvtOGutMY1 PGe+VRCt 5IdQX+rVPJp9TwLPac+GIYctMixbTyxIRaVP1alecQ8l27rCp+xwJlDMxrtLiJRSKxeOmeXR6SKH4T7L5dOKodWSXqZ3uInO0fFyvRPucaX/TUKWSaxl1fndDDv+WU5Q777Knu5HjvsnbVunXVks4AWUvkDwns6ZqjChZVs9XGAfpLQToDkJ5XHJPiomnCz32zYr/1YLqavqzsTr5LcQzmT95j9bbS6oyaBFxGquq4oCftdxDo/AyYYf0L1OGZU7xqt4bJmHJz60JNaX0UvRtDDswXuxa8kU7hztl X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/6/4 23:47, David Hildenbrand wrote: > On 04.06.24 16:26, Kefeng Wang wrote: >> >> >> On 2024/6/4 19:51, David Hildenbrand wrote: >>> On 04.06.24 13:48, Kefeng Wang wrote: >>>> Convert to use vm_normal_folio() and folio_maybe_dma_pinned() API, >>>> which helps to remove page_maybe_dma_pinned() in the subsequent change. >>>> >>>> Signed-off-by: Kefeng Wang >>>> --- >>>>    fs/proc/task_mmu.c | 8 ++++---- >>>>    1 file changed, 4 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c >>>> index f8d35f993fe5..5aceb3db7565 100644 >>>> --- a/fs/proc/task_mmu.c >>>> +++ b/fs/proc/task_mmu.c >>>> @@ -1088,7 +1088,7 @@ struct clear_refs_private { >>>>    static inline bool pte_is_pinned(struct vm_area_struct *vma, >>>> unsigned long addr, pte_t pte) >>>>    { >>>> -    struct page *page; >>>> +    struct folio *folio; >>>>        if (!pte_write(pte)) >>>>            return false; >>>> @@ -1096,10 +1096,10 @@ static inline bool pte_is_pinned(struct >>>> vm_area_struct *vma, unsigned long addr, >>>>            return false; >>>>        if (likely(!test_bit(MMF_HAS_PINNED, &vma->vm_mm->flags))) >>>>            return false; >>>> -    page = vm_normal_page(vma, addr, pte); >>>> -    if (!page) >>>> +    folio = vm_normal_folio(vma, addr, pte); >>>> +    if (!folio) >>>>            return false; >>>> -    return page_maybe_dma_pinned(page); >>>> +    return folio_maybe_dma_pinned(folio); >>>>    } >>>>    static inline void clear_soft_dirty(struct vm_area_struct *vma, >>> >>> Likely we should just get rid of the pte_is_pinned() check completely >>> now. We don't perform the same for PMDs, we don't sync against GUP-fast, >> >> Yes, no handle for clear PMDs. >> >>> and the original COW vs. GUP issue was resolved. >> >> Agree, but I'm not sure about removing it, from the commit 9348b73c2e1bf, >> >>     "The whole "track page soft dirty" state doesn't work with pinned >> pages >>     anyway, since the page might be dirtied by the pinning entity without >>     ever being noticed in the page tables." >> >> The issue is between the pin mechanism and "track page soft dirty", if >> the page is pinned, the pining entiry(DMA?) could change the page but >> the pte dirty won't be set, so maybe we still need it, even add some >> similar thing for PMD? Correct me if I'm wrong, thanks. > > Yes, but it doesn't work with any mechanism that write-protects PTEs, > including mprotect() and uffd-wp. > > Then, we never synced agaisnt concurrent GUP-fast, concurrent O_DIRECT > that might still use !FOLL_PIN, never handled PMD ... so it's all not > consistent nor really helpful nowdays. > > ... and if you have read-only pinned pages (which we cannot distinguish) > OK, many cases need to be addressed. > I have a proper patch lying around here for quite a while: > > commit 9ef578b7aba8bba626b904fe90e5be0690842fd3 > Author: David Hildenbrand > Date:   Wed Feb 16 20:39:43 2022 +0100 > >     fs/proc/task_mmu: allow setting pinned pages R/O for softdirty > tracking >     Before we had PG_anon_exclusive, our new COW logic would end up >     replacing a pinned page in the page tables, detecting it as possibly >     shared. Consequently, we'd lost synchronicity between the page mapped >     into the page table and the page pin. >     We tried preventing mapping pinned pages R/O, however, history told us >     that that is impossible -- and we added PG_anon_exclusive to have > it all >     working reliably again. >     Now that we have PG_anon_exclusive in place, let's get rid of the > check for >     pinned pages and revert it to the old state. >     Yes, we won't be able to detect the following scenario correctly: >     (1) R/W-pin a page. >     (2) Clear softdirty. >     (3) Modify the page via the R/W-pin. >     However, that isn't working reliably right now either way, because >     * The current check is racy, because we can race with GUP-fast > taking a >       R/W pin while we're clearing the softdirty marker and mapping the > page >       R/O. >     * The current check cannot handle FOLL_GET R/W references, as used for >       O_DIRECT. So if there is concurrent I/O the PTE will get marked as >       !softdirty, yet, direct I/O will modify page content afterwards. >     * We check for pins only when handling PTEs, but not for PMDs etc. >     Also, this handling is in no way different to other mechanisms > (mprotect, >     uffd-wp) that map pages R/O to catch successive write access via the >     page table -- because acceses via the page pin no longer go logically >     via the page table, the page table is bypassed. >     With this change, the interface now works again as expected when we > have >     R/O pins, and behaves just like any other mechanism that uses write >     protection to catch successive writes (mprotect, uffd-wp) -- and > they all >     face the same issue regarding R/W access via GUP (FOLL_PIN and >     FOLL_GET). >     User space better be aware that using read-protection to catch > writes to >     a page can miss writes via GUP. Softdirt tracking cannot reliably > catch >     modifications via GUP after clearing softdirty and returning to user >     space. > > But I understand if you want to be careful :) So I might send that patch > out at > some point myself ... > Thank for your detail explanation, let's wait it out.