From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05C5BC02192 for ; Fri, 7 Feb 2025 09:20:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 91DE86B0089; Fri, 7 Feb 2025 04:20:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8CEB06B008A; Fri, 7 Feb 2025 04:20:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 795CD6B008C; Fri, 7 Feb 2025 04:20:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 5AAC96B0089 for ; Fri, 7 Feb 2025 04:20:02 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 02F6B1217F6 for ; Fri, 7 Feb 2025 09:20:01 +0000 (UTC) X-FDA: 83092601844.26.FB7747B Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf19.hostedemail.com (Postfix) with ESMTP id B0F551A0010 for ; Fri, 7 Feb 2025 09:19:59 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf19.hostedemail.com: domain of anshuman.khandual@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=anshuman.khandual@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738920000; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CZfmOYEjsOOigbYsvY6i0SBzuIazvcfmaA15U8nykOU=; b=Ww8rawXyEmEuryWPHgj5n4krbndctKnV566u1TEsitGPInirEgTwU1qtCYKEJVO0rLfpWC 8koB+9Pb2pxHkoZNYQlajBtdIgb5ULaAkna9BQR2t0RnR7BBXxgHATP3qhRoh59Fjnl67E 5F7G4y9p+0pojMFcYTaMbP+ovnJBNpg= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf19.hostedemail.com: domain of anshuman.khandual@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=anshuman.khandual@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738920000; a=rsa-sha256; cv=none; b=YQU7dQ9eZpbvoaQexDsx0nYtvHb24fcNNG+BoTMyzUsOXudyMIvVqb3LaoMcF/sTJLjyJh 3fs5bRfSMBuZR992DoNnnre9cvxrBjCYF4v7TbDICe3BGloLRs9k3BAAOqJd8i7on2UCup ibPNGlz2KoLgEa4cZ7BKWSh/lk9FpDQ= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id ACBCB106F; Fri, 7 Feb 2025 01:20:21 -0800 (PST) Received: from [10.162.16.89] (a077893.blr.arm.com [10.162.16.89]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5AF5C3F5A1; Fri, 7 Feb 2025 01:19:53 -0800 (PST) Message-ID: Date: Fri, 7 Feb 2025 14:49:50 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v1 11/16] mm/vmalloc: Gracefully unmap huge ptes To: Ryan Roberts , Catalin Marinas , Will Deacon , Muchun Song , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Mark Rutland , Ard Biesheuvel , Dev Jain , Alexandre Ghiti , Steve Capper , Kevin Brodsky Cc: linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20250205151003.88959-1-ryan.roberts@arm.com> <20250205151003.88959-12-ryan.roberts@arm.com> Content-Language: en-US From: Anshuman Khandual In-Reply-To: <20250205151003.88959-12-ryan.roberts@arm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Stat-Signature: jy1hmy3f5mr4xbgi3g56iu76zjn6175b X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: B0F551A0010 X-HE-Tag: 1738919999-685874 X-HE-Meta: U2FsdGVkX1+klp6CGShWn4LdG5SvPfiu+QDhCQKFQ3vRfULOJFpmWTh/LyRJWIkKRxmPd33w9JWRJdfm2PCa1HSgCz05hAHzyuA4hba06EFSM3MdojwV2G4YgiGsm9F7roo3jUW6bg9AQsqJHz77r3DBrhDQcE8Ayp0leUhvYg+nBh9OM3pWmjRK05kGGadDT7v1nlkJ9sn/y0HUSlB9rtddW2AWhaGhicRgA0vaX1YuRt8JDHIho0NMHI51sdMt+EJOKdmSwGO4uwTZpxgndIhtsk23MOWr71g+RG0pkwRXmPtAnASn81NIYL7kf+kbFrRJgzuwB3Kt95whoiTL8eU2xTjnLCCQ42fGBImCb24LTF3e8OFDbXyrQDA8vat1OPqQodtdC5HwQ9oC92aEK2Swtd1Y6cqP59pC6sx4BhNNxfQbkwmnSwfmTKOmKFQ5dpaw/z5vLcxhMiCVDQFPZfOGY7OWW6aygw5/NKYwbCQcP6x+ca8C3I/Tc/7w50yJwELnbc7F6sISvo18zE/wU4XX0ia4FNzMIbfpNSi2TWqdAc0thTi8NrM8x6/8rtPIDMdiBCCYN1xTFwx/8+kPezPrLX8WV4pMPttLY2LH7qLHlmy15H4eqcuqAKJqyobcEVIHS0KL8ZOm7kiDmUb8cyAhjJy9xhZYOY6ioFWVvTqT/C+/OU2hClolgZyET6Pds63e/USKXwWJ2sJBnMlvmAbV7VR+4KrOCQUuckhLqoYHl4FEX3QJoZI9mamakz15QpVP11YOVbEdKTXTogbzHBMdCid0yIvj45VVOWp9aF4MEvBeZ0TDVpZrug8lXC7knqhkvVWkv9qdRZcLBo6Wza6qrPWVyGQN38R3SHxqa9CGwSK26OnRTsFakxPYKL9CveBepzowUvTugpk3KTb25bZlX0V2BnymX5qcODARHatBsxXN2MoKO91q/KyWh4AZgO8QnBDMzmKtVdrSj9e I5jXrG9k 2gDT554O8Woocr5lh6+Xh4HwBQbgD9/jngpdhGy7krj6E7JslclGd/lxP/EjvFT2YXi+wbHRc9UmYwaErFMihUUQsWki1rJS4By3dd2SSRoHqwN+KM3W+FZz4MTZQL0QC3xi5cpCqViEocbvhNHeGLtvhnmBTOVyXHtTEFzyhXctP8g4EI6C6i420yiqWB4dWAZCaJbTL12LCJBI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2/5/25 20:39, Ryan Roberts wrote: > Commit f7ee1f13d606 ("mm/vmalloc: enable mapping of huge pages at pte > level in vmap") added its support by reusing the set_huge_pte_at() API, > which is otherwise only used for user mappings. But when unmapping those > huge ptes, it continued to call ptep_get_and_clear(), which is a > layering violation. To date, the only arch to implement this support is > powerpc and it all happens to work ok for it. > > But arm64's implementation of ptep_get_and_clear() can not be safely > used to clear a previous set_huge_pte_at(). So let's introduce a new > arch opt-in function, arch_vmap_pte_range_unmap_size(), which can > provide the size of a (present) pte. Then we can call > huge_ptep_get_and_clear() to tear it down properly. > > Note that if vunmap_range() is called with a range that starts in the > middle of a huge pte-mapped page, we must unmap the entire huge page so > the behaviour is consistent with pmd and pud block mappings. In this > case emit a warning just like we do for pmd/pud mappings. > > Signed-off-by: Ryan Roberts > --- > include/linux/vmalloc.h | 8 ++++++++ > mm/vmalloc.c | 18 ++++++++++++++++-- > 2 files changed, 24 insertions(+), 2 deletions(-) > > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h > index 31e9ffd936e3..16dd4cba64f2 100644 > --- a/include/linux/vmalloc.h > +++ b/include/linux/vmalloc.h > @@ -113,6 +113,14 @@ static inline unsigned long arch_vmap_pte_range_map_size(unsigned long addr, uns > } > #endif > > +#ifndef arch_vmap_pte_range_unmap_size > +static inline unsigned long arch_vmap_pte_range_unmap_size(unsigned long addr, > + pte_t *ptep) > +{ > + return PAGE_SIZE; > +} > +#endif > + > #ifndef arch_vmap_pte_supported_shift > static inline int arch_vmap_pte_supported_shift(unsigned long size) > { > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index fcdf67d5177a..6111ce900ec4 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -350,12 +350,26 @@ static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, > pgtbl_mod_mask *mask) > { > pte_t *pte; > + pte_t ptent; > + unsigned long size = PAGE_SIZE; Default fallback size being PAGE_SIZE like before. > > pte = pte_offset_kernel(pmd, addr); > do { > - pte_t ptent = ptep_get_and_clear(&init_mm, addr, pte); > +#ifdef CONFIG_HUGETLB_PAGE > + size = arch_vmap_pte_range_unmap_size(addr, pte); > + if (size != PAGE_SIZE) { > + if (WARN_ON(!IS_ALIGNED(addr, size))) { > + addr = ALIGN_DOWN(addr, size); > + pte = PTR_ALIGN_DOWN(pte, sizeof(*pte) * (size >> PAGE_SHIFT)); > + } > + ptent = huge_ptep_get_and_clear(&init_mm, addr, pte, size); > + if (WARN_ON(end - addr < size)) > + size = end - addr; > + } else > +#endif > + ptent = ptep_get_and_clear(&init_mm, addr, pte); ptep_get_and_clear() gets used both, when !HUGETLB_PAGE or HUGETLB_PAGE with arch_vmap_pte_range_unmap_size() returned value being PAGE_SIZE, which makes sense. > WARN_ON(!pte_none(ptent) && !pte_present(ptent)); > - } while (pte++, addr += PAGE_SIZE, addr != end); > + } while (pte += (size >> PAGE_SHIFT), addr += size, addr != end); > *mask |= PGTBL_PTE_MODIFIED; > } > LGTM Reviewed-by: Anshuman Khandual