From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.0 required=3.0 tests=DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2890FC4320D for ; Tue, 24 Sep 2019 15:29:33 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D335D214DA for ; Tue, 24 Sep 2019 15:29:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="tV4SoPth" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D335D214DA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 62ECE6B0008; Tue, 24 Sep 2019 11:29:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5DFAD6B000A; Tue, 24 Sep 2019 11:29:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4CE156B000C; Tue, 24 Sep 2019 11:29:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0137.hostedemail.com [216.40.44.137]) by kanga.kvack.org (Postfix) with ESMTP id 29B666B0008 for ; Tue, 24 Sep 2019 11:29:32 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id CB033824377D for ; Tue, 24 Sep 2019 15:29:31 +0000 (UTC) X-FDA: 75970198542.24.front45_8d58b3918c157 X-HE-Tag: front45_8d58b3918c157 X-Filterd-Recvd-Size: 11447 Received: from mail-pf1-f193.google.com (mail-pf1-f193.google.com [209.85.210.193]) by imf27.hostedemail.com (Postfix) with ESMTP for ; Tue, 24 Sep 2019 15:29:31 +0000 (UTC) Received: by mail-pf1-f193.google.com with SMTP id v4so1581124pff.6 for ; Tue, 24 Sep 2019 08:29:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=wX1ufemuVcZ1/+F45a1SkQ+EbtNri2rD1aWT8xtEdFA=; b=tV4SoPthoWRTYODIqZrV/aC6Sy0AUAsJmY8+miZnW8eGoJuGgrCmC2Gwxs9g7wu6n4 HZ1nGz2DO267ukmKLDkhgAtHlfP1NoS/zJ8Ad3Ab33inKm+oOlmqbwG/QLyiIyhwEguT Tva3X+3cOLK30dewFhu0Z44DTxNbYT7IKxH/q6l1UEPga27KweT6Y3IP9sDB8qFP6Kdz oGjfVPGuNPC8+0JJp3YnfHfHw/OEWuBrBW7/CEUEkX2lwARfoDqV5bTjehqV8n4KvXnp AQIxSAozmjR/alnw9QBwiHhJL7FKChxKg953mKgDnzqbwQNJljZr48U4PA6ehsAlk3Es k3DQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=wX1ufemuVcZ1/+F45a1SkQ+EbtNri2rD1aWT8xtEdFA=; b=fGl/gynZzhqGM7i2fKN1f4Xi1PbEPAYdge/gTOovhFokB/BhDGpdIBUkpeGFPIxJFk zD1iXuKdsxWsEDSq5wLHEYey1yrOn1Ph72HdCa6/LDt1d4GlgQPZwEPDybpD6rKXLa6R PPoB+wybpso+DxSf70U1O8YFHJ5kWJCOtJzB4VZpVxs5n1Gd/Cwt+l5HaVMWvoBeiQKE GMhuzqrFO5erIScdVOWrnfWBFpZaf+IMsOYS2EvBVjAQvdVI/mpfc59PB1Lj6ZQZUuF2 jV4hx3R6btF+Msr5vBjaoz+FKLQQqGWyaMBxLd3FADoQY0IiFQHeCBn6cqINR1lJ/0dz Y69w== X-Gm-Message-State: APjAAAWVhXphKHjrpF+zsr8g5I1qzieOVHUwraocH+bza+gcRhfGiuaX TMi6TeyKjJKwkye2uvuCKmU= X-Google-Smtp-Source: APXvYqw6B/kU84DrWlicmm5F5ycfnicD7nHzLATMgZQ9XTa2Px4+hI7yUIysB8iSsBop1/FAD9fgHQ== X-Received: by 2002:a62:1658:: with SMTP id 85mr4065421pfw.195.1569338970064; Tue, 24 Sep 2019 08:29:30 -0700 (PDT) Received: from [0.0.0.0] (104.129.187.94.16clouds.com. [104.129.187.94]) by smtp.gmail.com with ESMTPSA id q2sm3656607pfg.144.2019.09.24.08.29.12 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 24 Sep 2019 08:29:29 -0700 (PDT) Subject: Re: [PATCH v8 3/3] mm: fix double page fault on arm64 if PTE_AF is cleared To: Catalin Marinas , "Justin He (Arm Technology China)" Cc: Will Deacon , Mark Rutland , James Morse , Marc Zyngier , Matthew Wilcox , "Kirill A. Shutemov" , "linux-arm-kernel@lists.infradead.org" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , Suzuki Poulose , Punit Agrawal , Anshuman Khandual , Alex Van Brunt , Robin Murphy , Thomas Gleixner , Andrew Morton , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Ralph Campbell , "Kaly Xin (Arm Technology China)" , nd References: <20190921135054.142360-1-justin.he@arm.com> <20190921135054.142360-4-justin.he@arm.com> <20190923170433.GE10192@arrakis.emea.arm.com> <20190924103324.GB41214@arrakis.emea.arm.com> From: Jia He Message-ID: <6267b685-5162-85ac-087f-112303bb7035@gmail.com> Date: Tue, 24 Sep 2019 23:29:07 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <20190924103324.GB41214@arrakis.emea.arm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Catalin On 2019/9/24 18:33, Catalin Marinas wrote: > On Tue, Sep 24, 2019 at 06:43:06AM +0000, Justin He (Arm Technology Chi= na) wrote: >> Catalin Marinas wrote: >>> On Sat, Sep 21, 2019 at 09:50:54PM +0800, Jia He wrote: >>>> @@ -2151,21 +2163,53 @@ static inline void cow_user_page(struct page= *dst, struct page *src, unsigned lo >>>> * fails, we just zero-fill it. Live with it. >>>> */ >>>> if (unlikely(!src)) { >>>> - void *kaddr =3D kmap_atomic(dst); >>>> - void __user *uaddr =3D (void __user *)(va & PAGE_MASK); >>>> + void *kaddr; >>>> + pte_t entry; >>>> + void __user *uaddr =3D (void __user *)(addr & PAGE_MASK); >>>> >>>> + /* On architectures with software "accessed" bits, we would >>>> + * take a double page fault, so mark it accessed here. >>>> + */ > [...] >>>> + if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) { >>>> + vmf->pte =3D pte_offset_map_lock(mm, vmf->pmd, addr, >>>> + &vmf->ptl); >>>> + if (likely(pte_same(*vmf->pte, vmf->orig_pte))) { >>>> + entry =3D pte_mkyoung(vmf->orig_pte); >>>> + if (ptep_set_access_flags(vma, addr, >>>> + vmf->pte, entry, 0)) >>>> + update_mmu_cache(vma, addr, vmf->pte); >>>> + } else { >>>> + /* Other thread has already handled the fault >>>> + * and we don't need to do anything. If it's >>>> + * not the case, the fault will be triggered >>>> + * again on the same address. >>>> + */ >>>> + pte_unmap_unlock(vmf->pte, vmf->ptl); >>>> + return false; >>>> + } >>>> + pte_unmap_unlock(vmf->pte, vmf->ptl); >>>> + } > [...] >>>> + >>>> + kaddr =3D kmap_atomic(dst); >>> Since you moved the kmap_atomic() here, could the above >>> arch_faults_on_old_pte() run in a preemptible context? I suggested to >>> add a WARN_ON in patch 2 to be sure. >> Should I move kmap_atomic back to the original line? Thus, we can make= sure >> that arch_faults_on_old_pte() is in the context of preempt_disabled? >> Otherwise, arch_faults_on_old_pte() may cause plenty of warning if I a= dd >> a WARN_ON in arch_faults_on_old_pte. I tested it when I enable the PR= EEMPT=3Dy >> on a ThunderX2 qemu guest. > So we have two options here: > > 1. Change arch_faults_on_old_pte() scope to the whole system rather tha= n > just the current CPU. You'd have to wire up a new arm64 capability > for the access flag but this way we don't care whether it's > preemptible or not. > > 2. Keep the arch_faults_on_old_pte() per-CPU but make sure we are not > preempted here. The kmap_atomic() move would do but you'd have to > kunmap_atomic() before the return. > > I think the answer to my question below also has some implication on > which option to pick: > >>>> /* >>>> * This really shouldn't fail, because the page is there >>>> * in the page tables. But it might just be unreadable, >>>> * in which case we just give up and fill the result with >>>> * zeroes. >>>> */ >>>> - if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) >>>> + if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) { >>>> + /* Give a warn in case there can be some obscure >>>> + * use-case >>>> + */ >>>> + WARN_ON_ONCE(1); >>> That's more of a question for the mm guys: at this point we do the >>> copying with the ptl released; is there anything else that could have >>> made the pte old in the meantime? I think unuse_pte() is only called = on >>> anonymous vmas, so it shouldn't be the case here. > If we need to hold the ptl here, you could as well have an enclosing > kmap/kunmap_atomic (option 2) with some goto instead of "return false". I am not 100% sure that I understand your suggestion well, so I drafted t= he patch here: Changes: optimize the indentions =C2=A0=C2=A0=C2=A0=C2=A0 hold the ptl longer -static inline void cow_user_page(struct page *dst, struct page *src, uns= igned=20 long va, struct vm_area_struct *vma) +static inline bool cow_user_page(struct page *dst, struct page *src, +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2= =A0 =C2=A0struct vm_fault *vmf) =C2=A0{ +=C2=A0=C2=A0=C2=A0 struct vm_area_struct *vma =3D vmf->vma; +=C2=A0=C2=A0=C2=A0 struct mm_struct *mm =3D vma->vm_mm; +=C2=A0=C2=A0=C2=A0 unsigned long addr =3D vmf->address; +=C2=A0=C2=A0=C2=A0 bool ret; +=C2=A0=C2=A0=C2=A0 pte_t entry; +=C2=A0=C2=A0=C2=A0 void *kaddr; +=C2=A0=C2=A0=C2=A0 void __user *uaddr; + =C2=A0=C2=A0=C2=A0=C2=A0 debug_dma_assert_idle(src); +=C2=A0=C2=A0=C2=A0 if (likely(src)) { +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 copy_user_highpage(dst, src, addr,= vma); +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 return true; +=C2=A0=C2=A0=C2=A0 } + =C2=A0=C2=A0=C2=A0=C2=A0 /* =C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0* If the source page was a PFN mapping, w= e don't have =C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0* a "struct page" for it. We do a best-ef= fort copy by =C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0* just copying from the original user add= ress. If that =C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0* fails, we just zero-fill it. Live with = it. =C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0*/ -=C2=A0=C2=A0=C2=A0 if (unlikely(!src)) { -=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 void *kaddr =3D kmap_atomic(dst); -=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 void __user *uaddr =3D (void __use= r *)(va & PAGE_MASK); +=C2=A0=C2=A0=C2=A0 kaddr =3D kmap_atomic(dst); +=C2=A0=C2=A0=C2=A0 uaddr =3D (void __user *)(addr & PAGE_MASK); + +=C2=A0=C2=A0=C2=A0 /* +=C2=A0=C2=A0=C2=A0 =C2=A0* On architectures with software "accessed" bit= s, we would +=C2=A0=C2=A0=C2=A0 =C2=A0* take a double page fault, so mark it accessed= here. +=C2=A0=C2=A0=C2=A0 =C2=A0*/ +=C2=A0=C2=A0=C2=A0 vmf->pte =3D pte_offset_map_lock(mm, vmf->pmd, addr, = &vmf->ptl); +=C2=A0=C2=A0=C2=A0 if (arch_faults_on_old_pte() && !pte_young(vmf->orig_= pte)) { +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 if (!likely(pte_same(*vmf->pte, vm= f->orig_pte))) { +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 /* +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0* Other t= hread has already handled the fault +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0* and we = don't need to do anything. If it's +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0* not the= case, the fault will be triggered +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0* again o= n the same address. +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0*/ +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 ret =3D false; +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 goto pte_unlock= ; +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 } + +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 entry =3D pte_mkyoung(vmf->orig_pt= e); +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 if (ptep_set_access_flags(vma, add= r, vmf->pte, entry, 0)) +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 update_mmu_cach= e(vma, addr, vmf->pte); +=C2=A0=C2=A0=C2=A0 } +=C2=A0=C2=A0=C2=A0 /* +=C2=A0=C2=A0=C2=A0 =C2=A0* This really shouldn't fail, because the page = is there +=C2=A0=C2=A0=C2=A0 =C2=A0* in the page tables. But it might just be unre= adable, +=C2=A0=C2=A0=C2=A0 =C2=A0* in which case we just give up and fill the re= sult with +=C2=A0=C2=A0=C2=A0 =C2=A0* zeroes. +=C2=A0=C2=A0=C2=A0 =C2=A0*/ +=C2=A0=C2=A0=C2=A0 if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE= )) { =C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 /* -=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0* This really shouldn't fail= , because the page is there -=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0* in the page tables. But it= might just be unreadable, -=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0* in which case we just give= up and fill the result with -=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0* zeroes. +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0* Give a warn in case there = can be some obscure +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0* use-case =C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0*/ -=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 if (__copy_from_user_inatomic(kadd= r, uaddr, PAGE_SIZE)) -=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 clear_page(kadd= r); -=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 kunmap_atomic(kaddr); -=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 flush_dcache_page(dst); -=C2=A0=C2=A0=C2=A0 } else -=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 copy_user_highpage(dst, src, va, v= ma); +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 WARN_ON_ONCE(1); +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 clear_page(kaddr); +=C2=A0=C2=A0=C2=A0 } + +=C2=A0=C2=A0=C2=A0 ret =3D true; + +pte_unlock: +=C2=A0=C2=A0=C2=A0 pte_unmap_unlock(vmf->pte, vmf->ptl); +=C2=A0=C2=A0=C2=A0 kunmap_atomic(kaddr); +=C2=A0=C2=A0=C2=A0 flush_dcache_page(dst); + +=C2=A0=C2=A0=C2=A0 return ret; =C2=A0} --- Cheers, Justin (Jia He)