From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B1E9C7EE23 for ; Mon, 22 May 2023 07:15:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A1BAB900006; Mon, 22 May 2023 03:15:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9A589900002; Mon, 22 May 2023 03:15:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8466F900006; Mon, 22 May 2023 03:15:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 6F543900002 for ; Mon, 22 May 2023 03:15:44 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 3CDFB120273 for ; Mon, 22 May 2023 07:15:44 +0000 (UTC) X-FDA: 80817031008.10.28B64D4 Received: from out-9.mta1.migadu.com (out-9.mta1.migadu.com [95.215.58.9]) by imf13.hostedemail.com (Postfix) with ESMTP id 3611520008 for ; Mon, 22 May 2023 07:15:41 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=IH3GuWgD; spf=pass (imf13.hostedemail.com: domain of qi.zheng@linux.dev designates 95.215.58.9 as permitted sender) smtp.mailfrom=qi.zheng@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684739742; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=V+i7Y7TvU/1cB99/J0N35qsPA2c9+2WBSKn0CQ3JOro=; b=I63VP1E2eHvZRlorm/KM0LODw4iF8m0d7Jtf/U9KmHhX78g505Hgzvshd337/86vDMiy78 cFdu3Pk+wpTl1co9nwCeTlKlDb6mmUkcd5LmoiQkSaAd/OIDsjYUGzPiXiVe5Y/boN1NJn inxiw4OIELbXnkZ0ul1ArYFR5WxK/mc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684739742; a=rsa-sha256; cv=none; b=uikCbEdYx8kgXLqogw77PfRF/Wae0Y4cOAffUugGoKXu9XFs5nC0Ou42GXtKidqhxhkL3C RlijdhQB0WFo2qQMWesJOr4XT3gkC2lELkz8bSbeXccdGvjyg31re9lYaEKFYDuXrR3VFV 9oNjUUKV9ibyE4YSfowGTZcFIG5RTuM= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=IH3GuWgD; spf=pass (imf13.hostedemail.com: domain of qi.zheng@linux.dev designates 95.215.58.9 as permitted sender) smtp.mailfrom=qi.zheng@linux.dev; dmarc=pass (policy=none) header.from=linux.dev Message-ID: <620eef7f-ee01-673c-b097-243d6fe25b09@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1684739739; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=V+i7Y7TvU/1cB99/J0N35qsPA2c9+2WBSKn0CQ3JOro=; b=IH3GuWgDLXxllEjM0jBC3ZVrRYnz6r/7HNdiMe1tNMtSMaxE8fWmSEq0qoenUYbwHPgjbq v3ZWn+2izo4vWfULDRhj9PAYSOfIpJbAFECXK2U9W0noF9m1Fsb6PyGSWwOsdnlE9m6nun kM2dyfSwSfptKo3WVtSMS0AV9+9S9Hc= Date: Mon, 22 May 2023 15:15:32 +0800 MIME-Version: 1.0 Subject: Re: [PATCH] mmu_notifiers: Notify on pte permission upgrades Content-Language: en-US To: Alistair Popple Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, robin.murphy@arm.com, will@kernel.org, nicolinc@nvidia.com, linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu, jgg@nvidia.com, John Hubbard , Andrew Morton References: <20230522063725.284686-1-apopple@nvidia.com> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Qi Zheng In-Reply-To: <20230522063725.284686-1-apopple@nvidia.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT X-Stat-Signature: hfr1bh4sj6b4wx838pycxw8uma519yo6 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 3611520008 X-Rspam-User: X-HE-Tag: 1684739741-897524 X-HE-Meta: U2FsdGVkX1/KJ7qbNvg60cZI2gg5jV2J91hgSWM6POTXFmjPvjvwnOePIHD6dn87f9Vh6vaOmg+BVRKu7LxDdFvsDU1cfDpRJ0i6zs+vi2gL15c5xv6cRL+jIJKL7xqccBO4MnIX8nVPrZ+RKVigMddahDH7p4DqJaBgUpL8QJNNDP3LCFdQR3G503+Agbhrt1xFtS0ua2dc9wWA7rLq5vpeE20yhzXrxVFRChhnPA99aWrB3rs+S9wRzmEnc7wDus/+QbR+axNOfnYq5c6OmrFm52OKk1s9lBn9jlFOtYwXNvsRLl3TX21i8JteWMSGXIG6397sDsEpcsPVRBu88AcA/qYy+65KG17RC0DhxQzRRndlCT0KFWCaoUAW5mecPgMXlzwj4gOmHAmrIbYbhedzuhIUpit2inRBdf1WBQJqviRSjXqAwoRObO+rG8MyJWsSvE9AsAQyJ2IXZYY0via2wNe9ADR1O7KNHiBGhLsxk4Og2nUiFJC6heQTwzr/8pzWJ0DR3Kmdyb1AidYYwBuhJwcNRsa7v4Wvk8d8u8HTvsItiKjm27jPm0KVllzoS5rCyRT0Z0rLItM/bXHuu29DQD4UOPOsUO125N6aGadeGqrEV9h8wS3u7FJQNiz4HWi8UD0q6WqOKgqi01+xj4irKc7dz/1YPw//UMWBfjkgXl0KuN+gEc1jTTBouKhSTR6NZ5bXGWpqUWXCU7J+Qf7Y9piu077BfYoUHNzb85coDEqSefwp7hc+jG80Kgljta2+iYUhFPmHW2mJHJUobs3qzElkHOYwzV1NNK/nhunD2RJ75ImF17799MbF/tD2zvBtEX8AzkQZ120BN2YcSpUHDj9fVxCfyOs+Y8zXDuLLOF9d5OJeIdJdkSrt2v4v2Ra27VSxDOQKxEA5VLGzgvEbKBAPED7YBtQU9MCkr4EkVV/HPqjUep91/fNOtPGaDVuMXRxwa+3q7Dwy19a ixziFxLy LaJ9vAvB4QapvCrJEJtXhiE0vU0DLdNXOh/1ZZZO2w3Y4KnIwgYQDaFOolkZiXBkkvnYD45VUIThTW/582aQKbB6rdC9O02KPwqzeY/oE96nbEmWJ51A0ne+gbuJubTuWSggzXrFfmHWDfUxmKJ04qVFCSxyk3A/knPFz X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Alistair, On 2023/5/22 14:37, Alistair Popple wrote: > Some architectures, specifically ARM and perhaps Sparc and IA64, > require TLB invalidates when upgrading pte permission from read-only > to read-write. > > The current mmu_notifier implementation assumes that upgrades do not > need notifications. Typically though mmu_notifiers are used to > implement TLB invalidations for secondary MMUs that comply with the > main CPU architecture. > > Therefore if the main CPU architecture requires an invalidation for > permission upgrade the secondary MMU will as well and an mmu_notifier > should be sent for the upgrade. > > Currently CPU invalidations for permission upgrade occur in > ptep_set_access_flags(). Unfortunately MMU notifiers cannot be called > directly from this architecture specific code as the notifier > callbacks can sleep, and ptep_set_access_flags() is usually called > whilst holding the PTL spinlock. Therefore add the notifier calls > after the PTL is dropped and only if the PTE actually changed. This > will allow secondary MMUs to obtain an updated PTE with appropriate > permissions. > > This problem was discovered during testing of an ARM SMMU > implementation that does not support broadcast TLB maintenance > (BTM). In this case the SMMU driver uses notifiers to issue TLB > invalidates. For read-only to read-write pte upgrades the SMMU > continually returned a read-only PTE to the device, even though the > CPU had a read-write PTE installed. > > Sending a mmu notifier event to the SMMU driver fixes the problem by > flushing secondary TLB entries. A new notifier event type is added so > drivers may filter out these invalidations if not required. Note a > driver should never upgrade or install a PTE in response to this mmu > notifier event as it is not synchronised against other PTE operations. > > Signed-off-by: Alistair Popple > --- > include/linux/mmu_notifier.h | 6 +++++ > mm/hugetlb.c | 24 ++++++++++++++++++- > mm/memory.c | 45 ++++++++++++++++++++++++++++++++++-- > 3 files changed, 72 insertions(+), 3 deletions(-) > > diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h > index d6c06e140277..f14d68f119d8 100644 > --- a/include/linux/mmu_notifier.h > +++ b/include/linux/mmu_notifier.h > @@ -31,6 +31,11 @@ struct mmu_interval_notifier; > * pages in the range so to mirror those changes the user must inspect the CPU > * page table (from the end callback). > * > + * @MMU_NOTIFY_PROTECTION_UPGRAGE: update is due to a change from read-only to > + * read-write for pages in the range. This must not be used to upgrade > + * permissions on secondary PTEs, rather it should only be used to invalidate > + * caches such as secondary TLBs that may cache old read-only entries. > + * > * @MMU_NOTIFY_SOFT_DIRTY: soft dirty accounting (still same page and same > * access flags). User should soft dirty the page in the end callback to make > * sure that anyone relying on soft dirtiness catch pages that might be written > @@ -53,6 +58,7 @@ enum mmu_notifier_event { > MMU_NOTIFY_CLEAR, > MMU_NOTIFY_PROTECTION_VMA, > MMU_NOTIFY_PROTECTION_PAGE, > + MMU_NOTIFY_PROTECTION_UPGRADE, > MMU_NOTIFY_SOFT_DIRTY, > MMU_NOTIFY_RELEASE, > MMU_NOTIFY_MIGRATE, > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index bdbfeb6fb393..e5d467c7bff7 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -5987,6 +5987,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > vm_fault_t ret; > u32 hash; > pgoff_t idx; > + bool changed = false; > struct page *page = NULL; > struct page *pagecache_page = NULL; > struct hstate *h = hstate_vma(vma); > @@ -6122,6 +6123,9 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > if (!huge_pte_write(entry)) { > ret = hugetlb_wp(mm, vma, address, ptep, flags, > pagecache_page, ptl); > + if (!ret) > + changed = true; > + > goto out_put_page; > } else if (likely(flags & FAULT_FLAG_WRITE)) { > entry = huge_pte_mkdirty(entry); > @@ -6129,8 +6133,11 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > } > entry = pte_mkyoung(entry); > if (huge_ptep_set_access_flags(vma, haddr, ptep, entry, > - flags & FAULT_FLAG_WRITE)) > + flags & FAULT_FLAG_WRITE)) { > update_mmu_cache(vma, haddr, ptep); > + changed = true; > + } > + > out_put_page: > if (page != pagecache_page) > unlock_page(page); > @@ -6138,6 +6145,21 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > out_ptl: > spin_unlock(ptl); > > + if (changed) { > + struct mmu_notifier_range range; > + unsigned long hpage_mask = huge_page_mask(h); > + unsigned long hpage_size = huge_page_size(h); > + > + update_mmu_cache(vma, haddr, ptep); > + > + mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_UPGRADE, > + 0, vma, mm, haddr & hpage_mask, > + (haddr & hpage_mask) + hpage_size); > + mmu_notifier_invalidate_range_start(&range); > + mmu_notifier_invalidate_range_end(&range); > + } > + > + > if (pagecache_page) { > unlock_page(pagecache_page); > put_page(pagecache_page); > diff --git a/mm/memory.c b/mm/memory.c > index f526b9152bef..0ac78c6a232c 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -2098,6 +2098,7 @@ static vm_fault_t insert_pfn(struct vm_area_struct *vma, unsigned long addr, > struct mm_struct *mm = vma->vm_mm; > pte_t *pte, entry; > spinlock_t *ptl; > + bool changed = false; > > pte = get_locked_pte(mm, addr, &ptl); > if (!pte) > @@ -2120,8 +2121,10 @@ static vm_fault_t insert_pfn(struct vm_area_struct *vma, unsigned long addr, > } > entry = pte_mkyoung(*pte); > entry = maybe_mkwrite(pte_mkdirty(entry), vma); > - if (ptep_set_access_flags(vma, addr, pte, entry, 1)) > + if (ptep_set_access_flags(vma, addr, pte, entry, 1)) { > update_mmu_cache(vma, addr, pte); > + changed = true; > + } > } > goto out_unlock; > } > @@ -2142,6 +2145,17 @@ static vm_fault_t insert_pfn(struct vm_area_struct *vma, unsigned long addr, > > out_unlock: > pte_unmap_unlock(pte, ptl); > + > + if (changed) { > + struct mmu_notifier_range range; > + > + mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_UPGRADE, > + 0, vma, mm, addr & PAGE_MASK, > + (addr & PAGE_MASK) + PAGE_SIZE); > + mmu_notifier_invalidate_range_start(&range); > + mmu_notifier_invalidate_range_end(&range); > + } > + > return VM_FAULT_NOPAGE; > } > > @@ -2820,6 +2834,7 @@ static inline int __wp_page_copy_user(struct page *dst, struct page *src, > struct vm_area_struct *vma = vmf->vma; > struct mm_struct *mm = vma->vm_mm; > unsigned long addr = vmf->address; > + bool changed = false; > > if (likely(src)) { > if (copy_mc_user_highpage(dst, src, addr, vma)) { > @@ -2858,8 +2873,10 @@ static inline int __wp_page_copy_user(struct page *dst, struct page *src, > } > > entry = pte_mkyoung(vmf->orig_pte); > - if (ptep_set_access_flags(vma, addr, vmf->pte, entry, 0)) > + if (ptep_set_access_flags(vma, addr, vmf->pte, entry, 0)) { > update_mmu_cache(vma, addr, vmf->pte); > + changed = true; > + } > } > > /* > @@ -2897,6 +2914,16 @@ static inline int __wp_page_copy_user(struct page *dst, struct page *src, > } > } > > + if (changed) { > + struct mmu_notifier_range range; > + > + mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_UPGRADE, > + 0, vma, vma->vm_mm, addr & PAGE_MASK, > + (addr & PAGE_MASK) + PAGE_SIZE); > + mmu_notifier_invalidate_range_start(&range); > + mmu_notifier_invalidate_range_end(&range); > + } > + > ret = 0; > > pte_unlock: > @@ -4877,6 +4904,7 @@ static vm_fault_t wp_huge_pud(struct vm_fault *vmf, pud_t orig_pud) > static vm_fault_t handle_pte_fault(struct vm_fault *vmf) > { > pte_t entry; > + bool changed = false; > > if (unlikely(pmd_none(*vmf->pmd))) { > /* > @@ -4957,6 +4985,7 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) > if (ptep_set_access_flags(vmf->vma, vmf->address, vmf->pte, entry, > vmf->flags & FAULT_FLAG_WRITE)) { > update_mmu_cache(vmf->vma, vmf->address, vmf->pte); > + changed = true; > } else { > /* Skip spurious TLB flush for retried page fault */ > if (vmf->flags & FAULT_FLAG_TRIED) > @@ -4972,6 +5001,18 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) > } > unlock: > pte_unmap_unlock(vmf->pte, vmf->ptl); > + > + if (changed) { > + struct mmu_notifier_range range; > + > + mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_UPGRADE, > + 0, vmf->vma, vmf->vma->vm_mm, > + vmf->address & PAGE_MASK, > + (vmf->address & PAGE_MASK) + PAGE_SIZE); > + mmu_notifier_invalidate_range_start(&range); > + mmu_notifier_invalidate_range_end(&range); > + } There are four similar patterns, can we introduce a helper function to deduplicate them? > + > return 0; > } > -- Thanks, Qi