From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62D93EB64DA for ; Wed, 5 Jul 2023 13:13:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D82488D0002; Wed, 5 Jul 2023 09:13:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D30F38D0001; Wed, 5 Jul 2023 09:13:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C20228D0002; Wed, 5 Jul 2023 09:13:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id AC9A78D0001 for ; Wed, 5 Jul 2023 09:13:40 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 70C5CC0B3A for ; Wed, 5 Jul 2023 13:13:40 +0000 (UTC) X-FDA: 80977600200.08.FBC8D9A Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf16.hostedemail.com (Postfix) with ESMTP id 8798B18000D for ; Wed, 5 Jul 2023 13:13:37 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=none; spf=pass (imf16.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688562817; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7hRPBi8cnNeOQDHz+aVKss0R0bZGyzNTGhjNINEMslk=; b=cZofplVCaHpt6sjLujNMGrONlNkyPkKAELR3i4MyoVMa9rz2inagT8mAS+6Gs9UWkTc/dA Ry7iiVPwctzF3f0UT0Kp3BdZhu+Nd2zpfJ4XSgNF2fcod1SC3Zyt8JeUZ6IVP6yfHDblUz 6sVl9pZSHaiKPni3mp2dCr/528Jrvew= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none; spf=pass (imf16.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688562817; a=rsa-sha256; cv=none; b=rsoZYuBXJuQKCaAHsQ7Wm5hePN+Jh7pzpfiC3xvAygbHRtrvM4KcnYelJcpIzYtFWqXQVO MQnHSzcM6WWnNmgCw2apxILNQDN/RKa3dicH4Ol42ZW6Phzu+NdrPhyJ9oMqLKseT25VmI AIvz0i+whxdPYsVtAMP/Fe/cvX6oUJo= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 95C3F16A3; Wed, 5 Jul 2023 06:14:18 -0700 (PDT) Received: from [10.57.76.116] (unknown [10.57.76.116]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 205843F73F; Wed, 5 Jul 2023 06:13:33 -0700 (PDT) Message-ID: <7aa05d5e-c4ea-cef1-f20e-ead77be1a027@arm.com> Date: Wed, 5 Jul 2023 14:13:30 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [PATCH v1 11/14] arm64/mm: Wire up PTE_CONT for user mappings From: Ryan Roberts To: Catalin Marinas Cc: Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20230622144210.2623299-1-ryan.roberts@arm.com> <20230622144210.2623299-12-ryan.roberts@arm.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 8798B18000D X-Rspam-User: X-Stat-Signature: u3om45xgkiie6ck3opnyzbc8utcg3kdq X-Rspamd-Server: rspam01 X-HE-Tag: 1688562817-492857 X-HE-Meta: U2FsdGVkX1+EcVHQDbyykvwP/ZIIgOxKqQR8RuGDnXoXJluITdlBsOK7t6t3ZNHHn0quS7Z8gY0vC8PLhkBfwSvKb1e4rLjWaPniWl5kLH2a31eCIyNbBzMWGB2TRUzxJY9RLklt5X1XFbjrigxzoGMu1SFU5eZeBnO6pI+TduSL7PbKjThZL91XlWYF/88toAkRD99z1wa3lV8VFBvnsS4+5asXcEZIbwOLL41Co1VfjJ0YOHEC9M6XtiLO0eoloLmXmeOe/tHd2VmwnXCYseNTXgeyBVORTT1SfMcsilkFDoKbQLrntqiwy6plkcOPVxZeQ6aTM2vTGWqCgb1TvX07dJpPNuynyt6/kxpIdZoliOqX7Vo3fprMYO45Zpg7ZTwTG5suwgtG25kRuHyJxeoGtr4b7TgEQMGKuFiXEd+qiWUe58h/kJLlfIQI/yVQJ5hwduT6onTFBHqGjrN+SDJ6s5+9/rVxKefZieQ9dpwkJbG11vUbmv9sUWyl48kGLo9EUrqsmA/A8qDHkLBcOjVO7h+N9rZZActlADzWCYJB1j4ZRe308Xt2FtI9iGxtv3X1tjVzMzZQnQhG7F/cA9UCfaQYWl9HNpSgFAKyvcz5yqax5OVlAR1Z4OLNoq4LBPUw6cX8gbKydXWGfR46+buDOfpwyhYaM6rp0RgJwoUx7FqwsHflaAl/85LC3T9J+EDNm37IyoB511J1UuKxCCCHSKzlLsaN1meQpQVheZDhaod88MMSl3VajLf4bdvGqBgPuDrxrAfNiTexZYlG/CSpnbygWkp2zhudVXbOaMcL4AN+qw17uucTUVoi7H+Lq2VxknMHUX08Jy/bV89Beu4YwweSFloeM9TZuxKVnDNMKYvsIZ/PekF4YCYU+NwODi0yYT4rCXG1xODW9N5/Uhlhw2s4ZZp4wHGGWn6qu7/FhLXBmapr/2Od6kRgU84WszuzFEhW/2H3nu/lTQL bfOqhMY/ 0SY1Nct8CSjBOmxNlmh8+AhvcKyYrBafKjYh+rBjlBPRpEzIudVsKV4nw2kQgobIINPD5Lx1n9eKX8Y3nm8j72xQRdYtMarTnIIfK3jLGMSJaFlLVQ39dQAnK3r8qwncsElz3q2ogTLRIOUjnzhkXyzNxPxJlsmlPi7szS0ms5XnMupK3jnj3qmdrh0IN3RQuvZGpx0I0wsPlOze9sApNLJ2yz+nFUhvpjQRKZBUuFoZ7NfM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 04/07/2023 12:09, Ryan Roberts wrote: > On 03/07/2023 16:17, Catalin Marinas wrote: >> Hi Ryan, >> ... >>> + >>> +int contpte_ptep_set_access_flags(struct vm_area_struct *vma, >>> + unsigned long addr, pte_t *ptep, >>> + pte_t entry, int dirty) >>> +{ >>> + pte_t orig_pte; >>> + int i; >>> + >>> + /* >>> + * Gather the access/dirty bits for the contiguous range. If nothing has >>> + * changed, its a noop. >>> + */ >>> + orig_pte = ptep_get(ptep); >>> + if (pte_val(orig_pte) == pte_val(entry)) >>> + return 0; >>> + >>> + /* >>> + * We can fix up access/dirty bits without having to unfold/fold the >>> + * contig range. But if the write bit is changing, we need to go through >>> + * the full unfold/fold cycle. >>> + */ >>> + if (pte_write(orig_pte) == pte_write(entry)) { >> >> Depending on the architecture version, pte_write() either checks a >> software only bit or it checks the DBM one. >> >>> + /* >>> + * No need to flush here; This is always "more permissive" so we >>> + * can only be _adding_ the access or dirty bit. And since the >>> + * tlb can't cache an entry without the AF set and the dirty bit >>> + * is a SW bit, there can be no confusion. For HW access >>> + * management, we technically only need to update the flag on a >>> + * single pte in the range. But for SW access management, we >>> + * need to update all the ptes to prevent extra faults. >>> + */ >> >> On pre-DBM hardware, a PTE_RDONLY entry (writable from the kernel >> perspective but clean) may be cached in the TLB and we do need flushing. > > I don't follow; The Arm ARM says: > > IPNQBP When an Access flag fault is generated, the translation table entry > causing the fault is not cached in a TLB. > > So the entry can only be in the TLB if AF is already 1. And given the dirty bit > is SW, it shouldn't affect the TLB state. And this function promises to only > change the bits so they are more permissive (so AF=0 -> AF=1, D=0 -> D=1). > > So I'm not sure what case you are describing here? Ahh sorry, I get your point now - on pre-DBM hardware, the HW sees a read-only PTE when the kernel considers it clean and this can be in the TLB. Then when making it dirty (from kernel's perspective), we are removing the read-only protection from the HW perspective, so we need to flush the TLB entry. > >> >>> + ptep = contpte_align_down(ptep); >>> + addr = ALIGN_DOWN(addr, CONT_PTE_SIZE); >>> + >>> + for (i = 0; i < CONT_PTES; i++, ptep++, addr += PAGE_SIZE) >>> + __ptep_set_access_flags(vma, addr, ptep, entry, 0); Fixed by adding this after iterating though the ptes, intent it to avoid the per-page tlb flash and instead flush the whole range at the end: if (dirty) __flush_tlb_range(vma, start_addr, addr, PAGE_SIZE, true, 3); >>> + } else { >>> + /* >>> + * No need to flush in __ptep_set_access_flags() because we just >>> + * flushed the whole range in __contpte_try_unfold(). >>> + */ >>> + __contpte_try_unfold(vma->vm_mm, addr, ptep, orig_pte); >>> + __ptep_set_access_flags(vma, addr, ptep, entry, 0); I also think this is wrong; we must pass `dirty` as the last parameter so that __ptep_set_access_flags will flush if neccessary. My comment about having just done the flush is incorrect - we have just done a flush, but the ptes are still valid with their old value so the HW could pull this into the TLB before we modify the value. >>> + contpte_try_fold(vma->vm_mm, addr, ptep, entry); >>> + } >>> + >>> + return 1; >>> +} >> > > Thanks, > Ryan >