From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44A6BC021A4 for ; Mon, 24 Feb 2025 12:12:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CAEB06B0083; Mon, 24 Feb 2025 07:12:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C5DF46B0085; Mon, 24 Feb 2025 07:12:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B4D426B0088; Mon, 24 Feb 2025 07:12:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 947266B0083 for ; Mon, 24 Feb 2025 07:12:45 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id CC66A141319 for ; Mon, 24 Feb 2025 12:11:31 +0000 (UTC) X-FDA: 83154723582.27.6A11A6E Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf14.hostedemail.com (Postfix) with ESMTP id D025E10000D for ; Mon, 24 Feb 2025 12:11:27 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740399088; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gTeyJbfR/Gq50f5fhY7HAT6qWjO2hwhLr1N+6WGPJkU=; b=ng9/BJY26WRQ+mcjEnYFw0IJAOi3BbEvXPoz0liRqvQtegmUHSkAODJNoqP5ZwYC9TSGSx nUYftugTauQvlllQOCgMsIKZXN+cslY9vtshAxLJ+npu8tOSLNb0R8Pa+aqcgk4EclT/Bk fNTjDhWP28IL93ii2h4W97+bEdhxbA8= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740399088; a=rsa-sha256; cv=none; b=GPPawXyngX3AgvVNfI4/zEw4s7dO/qtaQtsw1waqMS9R64INZ+Mf8IfXV87RgMjOq9AhK8 wdVzgU1KSSYtv6dS0wyEi53pW4ZP78m1UZ/asf2Ave9FZf+gVpvNCgAQja3PTpubQMxnn1 7gpraNhw1GvDH52cNwbTtplJDXkRzH0= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 554841756; Mon, 24 Feb 2025 04:11:43 -0800 (PST) Received: from [10.1.27.150] (XHFQ2J9959.cambridge.arm.com [10.1.27.150]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5D6963F6A8; Mon, 24 Feb 2025 04:11:21 -0800 (PST) Message-ID: <6ebf36f2-2e55-49b2-8764-90fd972d6e66@arm.com> Date: Mon, 24 Feb 2025 12:11:19 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 2/4] arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptes Content-Language: en-GB To: Will Deacon Cc: Catalin Marinas , Huacai Chen , WANG Xuerui , Thomas Bogendoerfer , "James E.J. Bottomley" , Helge Deller , Madhavan Srinivasan , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Naveen N Rao , Paul Walmsley , Palmer Dabbelt , Albert Ou , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Christian Borntraeger , Sven Schnelle , Gerald Schaefer , "David S. Miller" , Andreas Larsson , Arnd Bergmann , Muchun Song , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , David Hildenbrand , "Matthew Wilcox (Oracle)" , Mark Rutland , Anshuman Khandual , Dev Jain , Kevin Brodsky , Alexandre Ghiti , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org References: <20250217140419.1702389-1-ryan.roberts@arm.com> <20250217140419.1702389-3-ryan.roberts@arm.com> <20250221153156.GC20567@willie-the-truck> From: Ryan Roberts In-Reply-To: <20250221153156.GC20567@willie-the-truck> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: D025E10000D X-Stat-Signature: 7y5t96cqjkoezeuxm4y5z9eo7as5f48x X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1740399087-582021 X-HE-Meta: U2FsdGVkX1+8Knc3aLtEQzL9doG69HIzWp6zn/MVfneKf6rCuYOdCr9yxtInrPBWVZBLcHSTYRKoHbOSUtTDoVh7K7RcZk3bY79UXI6sE1psL/H1rl7vmHxpr67r9MxtkSFTckHTD2ivVuVJ2dsWG68Y/4/LV81S6W7SpVd1/WLvp5QCIHFJwcbI15f/r0FCry4FilKe2ls5uAYrDWH9r/TWKWOxRTXsKoAYgVmWGWdOwfm4EcHtleFs+1y10iXNXRYhj9UPF1lC8f5qu+t2gTOPybqiqpxnA4eu4cSSr9nemyRvHy0lYEGK5FJOXe62Kpq0XCEYy24QO/WX4g2WUQWejZboffcSQXssN812wC2z4LNjdsUahwBcoMYCZ5TiSAQEIxPPPQOhBDLdda2FEoz2UyPHD3YjuSY6ULmtzMnIJR2nB+BQKOq6K1ZTj1VAJAPNtpFso3uqwdYkOHCkJSeZoK3hVYGekpUDciWR9J/DOD7r/yNzIsUxoEOwoKfIdKlGAFbQu//5+HoP2ouxplcF3xC3386y2BHt7+pQfBoXLKUwbxhjqjzd2HeBJT0OoCZ9JXyGPp6UDOYpj3XgUptCgoMaiRkQ5YxV5dEvbpFf3115Z5KILh7SYbCBuxk+gxPdf1DBS1UUK6gm6tbutXjk4qEVCdj6FcXl7VJm5uc6b5H5N8SsQmcqhU+pFWuhg7J4H8z3+5G/dOdRu+W6CO02U9a5maaF7jMbOdunQrPKiExa7AO+17UA9++NHRsj5i8pOTXZxTv8jNK8zcU/rD8BUMdGuGrRiKlTJahyyNKGsPMVQj/1VcQiSADhQN7mjiWNKIEyefMyQTlV04tmAZZHQvIGytvbONdwJV2nP6z/gbNNNOTHerz1zP/wIM9PIZYRL6MlhxvdHyLeZ1W/dg2512NCsT0xVyycNLjbluWZqyBO4hNEGspmGoT120ELxQKmYxPjD8TzsuGp9gz UZPL0FTs hpkGxTAFS4bWnHX2MbdGZMLKGLWkc4QKw70Sq6LFe7fzKQ47ZDu2JrGHUF8Cg+B5RP9fe/s4BjGlBsGwm/Qt3RjHlKfLBS2/M4gnnI611JpAbB48tcqv1nv7KD6DQeknEN2k0IadPQHMaKnId8QPfeejjWuFMJOf4ahqrn1YsyMFv/bbTRA5CwMRGS5nW/ts5NNoY99r+IqRPiH0VnUdmgolaWJyQAiQfrNS1aavNT/B/oKvnKcmIehNceyMg8wzoZJiPThZHbXePrXROm5ttbXDNgg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 21/02/2025 15:31, Will Deacon wrote: > On Mon, Feb 17, 2025 at 02:04:15PM +0000, Ryan Roberts wrote: >> arm64 supports multiple huge_pte sizes. Some of the sizes are covered by >> a single pte entry at a particular level (PMD_SIZE, PUD_SIZE), and some >> are covered by multiple ptes at a particular level (CONT_PTE_SIZE, >> CONT_PMD_SIZE). So the function has to figure out the size from the >> huge_pte pointer. This was previously done by walking the pgtable to >> determine the level and by using the PTE_CONT bit to determine the >> number of ptes at the level. >> >> But the PTE_CONT bit is only valid when the pte is present. For >> non-present pte values (e.g. markers, migration entries), the previous >> implementation was therefore erroniously determining the size. There is >> at least one known caller in core-mm, move_huge_pte(), which may call >> huge_ptep_get_and_clear() for a non-present pte. So we must be robust to >> this case. Additionally the "regular" ptep_get_and_clear() is robust to >> being called for non-present ptes so it makes sense to follow the >> behaviour. >> >> Fix this by using the new sz parameter which is now provided to the >> function. Additionally when clearing each pte in a contig range, don't >> gather the access and dirty bits if the pte is not present. >> >> An alternative approach that would not require API changes would be to >> store the PTE_CONT bit in a spare bit in the swap entry pte for the >> non-present case. But it felt cleaner to follow other APIs' lead and >> just pass in the size. >> >> As an aside, PTE_CONT is bit 52, which corresponds to bit 40 in the swap >> entry offset field (layout of non-present pte). Since hugetlb is never >> swapped to disk, this field will only be populated for markers, which >> always set this bit to 0 and hwpoison swap entries, which set the offset >> field to a PFN; So it would only ever be 1 for a 52-bit PVA system where >> memory in that high half was poisoned (I think!). So in practice, this >> bit would almost always be zero for non-present ptes and we would only >> clear the first entry if it was actually a contiguous block. That's >> probably a less severe symptom than if it was always interpretted as 1 >> and cleared out potentially-present neighboring PTEs. >> >> Cc: stable@vger.kernel.org >> Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit") >> Signed-off-by: Ryan Roberts >> --- >> arch/arm64/mm/hugetlbpage.c | 40 ++++++++++++++++--------------------- >> 1 file changed, 17 insertions(+), 23 deletions(-) >> >> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c >> index 06db4649af91..614b2feddba2 100644 >> --- a/arch/arm64/mm/hugetlbpage.c >> +++ b/arch/arm64/mm/hugetlbpage.c >> @@ -163,24 +163,23 @@ static pte_t get_clear_contig(struct mm_struct *mm, >> unsigned long pgsize, >> unsigned long ncontig) >> { >> - pte_t orig_pte = __ptep_get(ptep); >> - unsigned long i; >> - >> - for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) { >> - pte_t pte = __ptep_get_and_clear(mm, addr, ptep); >> - >> - /* >> - * If HW_AFDBM is enabled, then the HW could turn on >> - * the dirty or accessed bit for any page in the set, >> - * so check them all. >> - */ >> - if (pte_dirty(pte)) >> - orig_pte = pte_mkdirty(orig_pte); >> - >> - if (pte_young(pte)) >> - orig_pte = pte_mkyoung(orig_pte); >> + pte_t pte, tmp_pte; >> + bool present; >> + >> + pte = __ptep_get_and_clear(mm, addr, ptep); >> + present = pte_present(pte); >> + while (--ncontig) { >> + ptep++; >> + addr += pgsize; >> + tmp_pte = __ptep_get_and_clear(mm, addr, ptep); >> + if (present) { >> + if (pte_dirty(tmp_pte)) >> + pte = pte_mkdirty(pte); >> + if (pte_young(tmp_pte)) >> + pte = pte_mkyoung(pte); >> + } >> } > > nit: With the loop now structured like this, we really can't handle > num_contig_ptes() returning 0 if it gets an unknown size. Granted, that > really shouldn't happen, but perhaps it would be better to add a 'default' > case with a WARN() to num_contig_ptes() and then add an early return here? Looking at other users of num_contig_ptes() it looks like huge_ptep_get() already assumes at least 1 pte (it calls __ptep_get() before calling num_contig_ptes()) and set_huge_pte_at() assumes 1 pte for the "present and non-contig" case. So num_contig_ptes() returning 0 is already not really consumed consistently. How about we change the default num_contig_ptes() return value to 1 and add a warning if size is invalid: ---8<--- diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 614b2feddba2..b3a7fafe8892 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -100,20 +100,11 @@ static int find_num_contig(struct mm_struct *mm, unsigned long addr, static inline int num_contig_ptes(unsigned long size, size_t *pgsize) { - int contig_ptes = 0; + int contig_ptes = 1; *pgsize = size; switch (size) { -#ifndef __PAGETABLE_PMD_FOLDED - case PUD_SIZE: - if (pud_sect_supported()) - contig_ptes = 1; - break; -#endif - case PMD_SIZE: - contig_ptes = 1; - break; case CONT_PMD_SIZE: *pgsize = PMD_SIZE; contig_ptes = CONT_PMDS; @@ -122,6 +113,8 @@ static inline int num_contig_ptes(unsigned long size, size_t *pgsize) *pgsize = PAGE_SIZE; contig_ptes = CONT_PTES; break; + default: + WARN_ON(!__hugetlb_valid_size(size)); } return contig_ptes; ---8<--- > > Will