From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C08DC433B4 for ; Tue, 27 Apr 2021 16:13:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DB400613E8 for ; Tue, 27 Apr 2021 16:13:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DB400613E8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 332416B007E; Tue, 27 Apr 2021 12:13:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2BCC26B0080; Tue, 27 Apr 2021 12:13:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0E8BF6B0081; Tue, 27 Apr 2021 12:13:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0245.hostedemail.com [216.40.44.245]) by kanga.kvack.org (Postfix) with ESMTP id DA6846B007E for ; Tue, 27 Apr 2021 12:13:43 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 9DC248249980 for ; Tue, 27 Apr 2021 16:13:43 +0000 (UTC) X-FDA: 78078642726.11.05285D1 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf13.hostedemail.com (Postfix) with ESMTP id 6DE6AE000135 for ; Tue, 27 Apr 2021 16:13:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1619540022; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sFcTcWKJ6vbywtGp6iOAQR8zJIhwD0TyppO0sf4kPHY=; b=I4AEvx7eP+xOQx2VXfKGufbVIfGKE2b5bj7yvf1ZetzblU94yVSzIdOvRByLBw4zzAQzJV lm6Ve8aDhHDOCEY/pBFjNNvZAUXcRlXpXi4CnO5Hbzuq2B55u7OnulKJfC5+6rb2SkKQr0 E3yzj55sDAO90x9QE6QnU/ci5b6HVGQ= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-578-SSRYJSmlPJKfmqv7O0YCjw-1; Tue, 27 Apr 2021 12:13:41 -0400 X-MC-Unique: SSRYJSmlPJKfmqv7O0YCjw-1 Received: by mail-qk1-f198.google.com with SMTP id v7-20020a05620a0a87b02902e02f31812fso23309395qkg.6 for ; Tue, 27 Apr 2021 09:13:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=sFcTcWKJ6vbywtGp6iOAQR8zJIhwD0TyppO0sf4kPHY=; b=o1QaiMsKeM3NTt/Eale0kCAmBbjLrabHUiVwVSchMXHs7hoM4LcjZho5yhF3WEY0aE Ao/Rqw9Ix1QGMumOoW6kYXbVi8SkhHaj3HvF0MZepgiJP6vOFV2BE/WSoX2odGV97Fzm ePMXq2qrjfYMNJg60K8sJ/V6lrKHOMK/9V1+edTWqn17jQ05CRaZbdjmVZE0LOSI4ZyJ MuPN5wpFOuG/7A5T/jLT7ZQ+f6MsmnhJNtxCYA34dTVdVNbTm8jygMgF4bVPso12Akz5 3CMymFIXaCUuGEOG3057VtaziH4ql502Aj8QSH49kHasMELOQH05APtgcpwhCYkQ7aWc oMIA== X-Gm-Message-State: AOAM531mwKqST1aJKLrc1AzTivXN9atE2MukXlQy9vp7Um4SR3J1LtXj BkvpWX271dbVycpaPlrtqtJydYla8XKRfuG+J+3uANZnauUKwMUqX0DgbHotd4Gz3GUb2R8JGRE XYjn0qYGSkuEJoslU6m4Oj2Wc5NGfcef3XwTwaeQVsKaLkSLsDghP1Bi8jH+Y X-Received: by 2002:a37:906:: with SMTP id 6mr23571632qkj.234.1619540019870; Tue, 27 Apr 2021 09:13:39 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwizuNUF7IfETxrg3V3tIEOD3b3VLA2dvV5qF9HpChu8JMhRtFPP+WV7Rtoqm2bFWb4I5EFlQ== X-Received: by 2002:a37:906:: with SMTP id 6mr23571576qkj.234.1619540019405; Tue, 27 Apr 2021 09:13:39 -0700 (PDT) Received: from xz-x1.redhat.com (bras-base-toroon474qw-grc-77-184-145-104-227.dsl.bell.ca. [184.145.104.227]) by smtp.gmail.com with ESMTPSA id v66sm3103621qkd.113.2021.04.27.09.13.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Apr 2021 09:13:38 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Nadav Amit , Miaohe Lin , Mike Rapoport , Andrea Arcangeli , Hugh Dickins , peterx@redhat.com, Jerome Glisse , Mike Kravetz , Jason Gunthorpe , Matthew Wilcox , Andrew Morton , Axel Rasmussen , "Kirill A . Shutemov" Subject: [PATCH v2 11/24] shmem/userfaultfd: Allow wr-protect none pte for file-backed mem Date: Tue, 27 Apr 2021 12:13:04 -0400 Message-Id: <20210427161317.50682-12-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210427161317.50682-1-peterx@redhat.com> References: <20210427161317.50682-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="US-ASCII" X-Stat-Signature: itwecr86o5pf6p9pcztw7ipcqtd465n9 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 6DE6AE000135 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf13; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=170.10.133.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1619540015-122090 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: File-backed memory differs from anonymous memory in that even if the pte = is missing, the data could still resides either in the file or in page/swap = cache. So when wr-protect a pte, we need to consider none ptes too. We do that by installing the uffd-wp special swap pte as a marker. So wh= en there's a future write to the pte, the fault handler will go the special = path to first fault-in the page as read-only, then report to userfaultfd serve= r with the wr-protect message. On the other hand, when unprotecting a page, it's also possible that the = pte got unmapped but replaced by the special uffd-wp marker. Then we'll need= to be able to recover from a uffd-wp special swap pte into a none pte, so that = the next access to the page will fault in correctly as usual when trigger the= fault handler next time, rather than sending a uffd-wp message. Special care needs to be taken throughout the change_protection_range() process. Since now we allow user to wr-protect a none pte, we need to be= able to pre-populate the page table entries if we see !anonymous && MM_CP_UFFD= _WP requests, otherwise change_protection_range() will always skip when the p= gtable entry does not exist. Note that this patch only covers the small pages (pte level) but not cove= ring any of the transparent huge pages yet. But this will be a base for thps = too. Signed-off-by: Peter Xu --- mm/mprotect.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 48 insertions(+) diff --git a/mm/mprotect.c b/mm/mprotect.c index b3def0a102bf4..6b63e3544b470 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include #include @@ -176,6 +177,32 @@ static unsigned long change_pte_range(struct vm_area= _struct *vma, pmd_t *pmd, set_pte_at(vma->vm_mm, addr, pte, newpte); pages++; } + } else if (unlikely(is_swap_special_pte(oldpte))) { + if (uffd_wp_resolve && !vma_is_anonymous(vma) && + pte_swp_uffd_wp_special(oldpte)) { + /* + * This is uffd-wp special pte and we'd like to + * unprotect it. What we need to do is simply + * recover the pte into a none pte; the next + * page fault will fault in the page. + */ + pte_clear(vma->vm_mm, addr, pte); + pages++; + } + } else { + /* It must be an none page, or what else?.. */ + WARN_ON_ONCE(!pte_none(oldpte)); + if (unlikely(uffd_wp && !vma_is_anonymous(vma))) { + /* + * For file-backed mem, we need to be able to + * wr-protect even for a none pte! Because + * even if the pte is null, the page/swap cache + * could exist. + */ + set_pte_at(vma->vm_mm, addr, pte, + pte_swp_mkuffd_wp_special(vma)); + pages++; + } } } while (pte++, addr +=3D PAGE_SIZE, addr !=3D end); arch_leave_lazy_mmu_mode(); @@ -209,6 +236,25 @@ static inline int pmd_none_or_clear_bad_unless_trans= _huge(pmd_t *pmd) return 0; } =20 +/* + * File-backed vma allows uffd wr-protect upon none ptes, because even i= f pte + * is missing, page/swap cache could exist. When that happens, the wr-p= rotect + * information will be stored in the page table entries with the marker = (e.g., + * PTE_SWP_UFFD_WP_SPECIAL). Prepare for that by always populating the = page + * tables to pte level, so that we'll install the markers in change_pte_= range() + * where necessary. + * + * Note that we only need to do this in pmd level, because if pmd does n= ot + * exist, it means the whole range covered by the pmd entry (of a pud) d= oes not + * contain any valid data but all zeros. Then nothing to wr-protect. + */ +#define change_protection_prepare(vma, pmd, addr, cp_flags) \ + do { \ + if (unlikely((cp_flags & MM_CP_UFFD_WP) && pmd_none(*pmd) && \ + !vma_is_anonymous(vma))) \ + WARN_ON_ONCE(pte_alloc(vma->vm_mm, pmd)); \ + } while (0) + static inline unsigned long change_pmd_range(struct vm_area_struct *vma, pud_t *pud, unsigned long addr, unsigned long end, pgprot_t newprot, unsigned long cp_flags) @@ -227,6 +273,8 @@ static inline unsigned long change_pmd_range(struct v= m_area_struct *vma, =20 next =3D pmd_addr_end(addr, end); =20 + change_protection_prepare(vma, pmd, addr, cp_flags); + /* * Automatic NUMA balancing walks the tables with mmap_lock * held for read. It's possible a parallel update to occur --=20 2.26.2