From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4F1BCD343F for ; Tue, 19 Sep 2023 06:27:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1A5766B04AB; Tue, 19 Sep 2023 02:27:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 12EC36B04AC; Tue, 19 Sep 2023 02:27:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F11256B04AD; Tue, 19 Sep 2023 02:27:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id DE9BB6B04AB for ; Tue, 19 Sep 2023 02:27:57 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A554B80A4D for ; Tue, 19 Sep 2023 06:27:57 +0000 (UTC) X-FDA: 81252366594.09.F4C140D Received: from out-223.mta0.migadu.com (out-223.mta0.migadu.com [91.218.175.223]) by imf22.hostedemail.com (Postfix) with ESMTP id CE763C0009 for ; Tue, 19 Sep 2023 06:27:55 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=MGuoQEHq; spf=pass (imf22.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.223 as permitted sender) smtp.mailfrom=muchun.song@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695104876; a=rsa-sha256; cv=none; b=ZBLcfC0BOSGjSNk9Ep5jQNwK5pz8ODz/2kXHf0ZW9EcW6dReG6lajRiF1LI8fl81kahVNY 3ZmcY72xhy1VAd3SoOXqeaAvljwqw6Q0Njc/UWyJC6YrOhw55j/e3/y+59KeLWNN8WNcDH 1H4F3SmwmLPuDyGrNPyk2tm/+hfE6tw= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=MGuoQEHq; spf=pass (imf22.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.223 as permitted sender) smtp.mailfrom=muchun.song@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695104876; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0d4ZWxB++cxVEPOirapSlnN0ARPZONL/CkMHGfjd1HA=; b=ua64Rm4vWZDBd8Vfabmuk8o4sTW/zjvdklYvO3kEe70w1RIIvQpWELtdAN6VgfzLR9NlL5 ZJWyy3a+POHeqmj3UPgEtm7gvZTQv6xDx8OVWnLaVB0f0K0NrB1bhB8RQ/1fEqjNOvGsd8 6Pi7xne33HV8aO45ugt0UuOQdXdqBwE= Message-ID: <7d0129fb-551f-e37a-f6cd-8fd96c896851@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1695104874; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0d4ZWxB++cxVEPOirapSlnN0ARPZONL/CkMHGfjd1HA=; b=MGuoQEHqUWtdcyIc/+HcLqg5W4rqYSBlPdvexrn8edEN7QsVnz9+kmqb4+UFr1SY6pTXbY ZX3UAh+6jQficEZ5XquyLamkmHwsWqOQ9lFToLpxGGbdjvrmLiFuki2QQZwtmQOjbteTHO AcJXF4+pDq8AbmSTVw17QMcVnCUdk6M= Date: Tue, 19 Sep 2023 14:27:44 +0800 MIME-Version: 1.0 Subject: Re: [PATCH v4 6/8] hugetlb: batch PMD split for bulk vmemmap dedup To: Mike Kravetz , linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , Joao Martins , Oscar Salvador , David Hildenbrand , Miaohe Lin , David Rientjes , Anshuman Khandual , Naoya Horiguchi , Barry Song <21cnbao@gmail.com>, Michal Hocko , Matthew Wilcox , Xiongchun Duan , Andrew Morton References: <20230918230202.254631-1-mike.kravetz@oracle.com> <20230918230202.254631-7-mike.kravetz@oracle.com> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Muchun Song In-Reply-To: <20230918230202.254631-7-mike.kravetz@oracle.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: CE763C0009 X-Stat-Signature: i1jami5mjmsdimzejawabr376q8jpod9 X-Rspam-User: X-HE-Tag: 1695104875-20960 X-HE-Meta: U2FsdGVkX1/cR4ABQt+wxbEVQlhwGuem8Z7NmnBqFfwCq2r/3iiAYmvgsvD5g91E+4NhrmXC5KgXlE9cewqkz6CcjKgrUkLVsBCXhQrso45DtuPjqe9u9fy2MDIu1llTfilHCDvufKkm4uEbq6eqOkjnmGyMltNAg8DGrHAuLQbiLFwc3jR4DP+vBjpp1b5Hih7ub9RA5Av1e8K8QIny62WSahNAm1OkJbG/Iw5n0macI8fXPUL/QMZVQ49RPsAVbNn172JZ/rmciR/ayGl4/+CEzYVi64W0wnHi0F1OxKStTG6heK576uBs1dewNTLOeSwLVmFsnYPJwVSBfVKWrBMDVW4LU7t0NAdj6f3RoMiO4cukveRx3UzFZabGwtlMZmpOGBxddBz709OC4ZtvhCXCd4nzb/7soReNl6mV8VaCiKcb76rxJyUrFwCmp8u+bya2uG6oPDLPf5+VvGiOLezrJ8AdyFtDJ/ieHo8pwZj/w/SGrNXZtbgXbrMMwj8XV9Ss9ACOXMDDToaYz01e7RlLTcoRntR6dwpwmA6iN/4hD5xfOU4G3x2zs9TDdMWUVGO05W4RFIOdc6ajUnGUKw7mV6L+szmZCWq6OEhv19ttqyjYPa3iv00UDcJsYYJZ+WZlgZzNls/oFNpv1qr/Atvh9EkhEYCbuHaj1YMNepNrzOyS58XtMu6LwvKeZZMEYkLs5y7jVIeCPb+4R5rrab360mT/YsIA9ZOPZoU04Jf4MwfLsraeCbbFl1K1iuoyKved7wE+rYbUt6ifnME0aC/G6JBwVdYxrZsjD7iE/56ml64CXKHC30rDHOK4bP2YyC9EBS6Dw2bz9EeERDhta0nrmRgJqN1fGB6twaa2z8fywLu2CY1+oKlDKEd2NQVvhFAWQnAYomMUQezUzZNNodYZNbiKuR9tThIF0AmpALvuVDePPCSdruhP0spVzaFP24+srAFOsqW7Xxz83JP Nx78RN8A gXLrI4xcKpt3TZ7G82B8rAunKBDUyZcHpS9BMB5h+0gwzeeBAYL/pHnso7Nz5O+jkYY+A1ZGAvfWtzumNoFTn5hLRssUIJ2DJttRCM671iOyH3ZLkWdyvL6vHng== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2023/9/19 07:01, Mike Kravetz wrote: > From: Joao Martins > > In an effort to minimize amount of TLB flushes, batch all PMD splits > belonging to a range of pages in order to perform only 1 (global) TLB > flush. > > Add a flags field to the walker and pass whether it's a bulk allocation > or just a single page to decide to remap. First value > (VMEMMAP_SPLIT_NO_TLB_FLUSH) designates the request to not do the TLB > flush when we split the PMD. > > Rebased and updated by Mike Kravetz > > Signed-off-by: Joao Martins > Signed-off-by: Mike Kravetz > --- > mm/hugetlb_vmemmap.c | 79 +++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 75 insertions(+), 4 deletions(-) > > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c > index 147ed15bcae4..e8bc2f7567db 100644 > --- a/mm/hugetlb_vmemmap.c > +++ b/mm/hugetlb_vmemmap.c > @@ -27,6 +27,7 @@ > * @reuse_addr: the virtual address of the @reuse_page page. > * @vmemmap_pages: the list head of the vmemmap pages that can be freed > * or is mapped from. > + * @flags: used to modify behavior in bulk operations > */ > struct vmemmap_remap_walk { > void (*remap_pte)(pte_t *pte, unsigned long addr, > @@ -35,9 +36,11 @@ struct vmemmap_remap_walk { > struct page *reuse_page; > unsigned long reuse_addr; > struct list_head *vmemmap_pages; > +#define VMEMMAP_SPLIT_NO_TLB_FLUSH BIT(0) Please add a brief comment following this macro to explain what's the behavior. > + unsigned long flags; > }; > > -static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start) > +static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start, bool flush) > { > pmd_t __pmd; > int i; > @@ -80,7 +83,8 @@ static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start) > /* Make pte visible before pmd. See comment in pmd_install(). */ > smp_wmb(); > pmd_populate_kernel(&init_mm, pmd, pgtable); > - flush_tlb_kernel_range(start, start + PMD_SIZE); > + if (flush) > + flush_tlb_kernel_range(start, start + PMD_SIZE); > } else { > pte_free_kernel(&init_mm, pgtable); > } > @@ -127,11 +131,20 @@ static int vmemmap_pmd_range(pud_t *pud, unsigned long addr, > do { > int ret; > > - ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK); > + ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK, > + walk->flags & VMEMMAP_SPLIT_NO_TLB_FLUSH); !(walk->flags & VMEMMAP_SPLIT_NO_TLB_FLUSH)? Thanks. > if (ret) > return ret; > > next = pmd_addr_end(addr, end); > + > + /* > + * We are only splitting, not remapping the hugetlb vmemmap > + * pages. > + */ > + if (!walk->remap_pte) > + continue; > + > vmemmap_pte_range(pmd, addr, next, walk); > } while (pmd++, addr = next, addr != end); > > @@ -198,7 +211,8 @@ static int vmemmap_remap_range(unsigned long start, unsigned long end, > return ret; > } while (pgd++, addr = next, addr != end); > > - flush_tlb_kernel_range(start, end); > + if (walk->remap_pte) > + flush_tlb_kernel_range(start, end); > > return 0; > } > @@ -300,6 +314,36 @@ static void vmemmap_restore_pte(pte_t *pte, unsigned long addr, > set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot)); > } > > +/** > + * vmemmap_remap_split - split the vmemmap virtual address range [@start, @end) > + * backing PMDs of the directmap into PTEs > + * @start: start address of the vmemmap virtual address range that we want > + * to remap. > + * @end: end address of the vmemmap virtual address range that we want to > + * remap. > + * @reuse: reuse address. > + * > + * Return: %0 on success, negative error code otherwise. > + */ > +static int vmemmap_remap_split(unsigned long start, unsigned long end, > + unsigned long reuse) > +{ > + int ret; > + struct vmemmap_remap_walk walk = { > + .remap_pte = NULL, > + .flags = VMEMMAP_SPLIT_NO_TLB_FLUSH, > + }; > + > + /* See the comment in the vmemmap_remap_free(). */ > + BUG_ON(start - reuse != PAGE_SIZE); > + > + mmap_read_lock(&init_mm); > + ret = vmemmap_remap_range(reuse, end, &walk); > + mmap_read_unlock(&init_mm); > + > + return ret; > +} > + > /** > * vmemmap_remap_free - remap the vmemmap virtual address range [@start, @end) > * to the page which @reuse is mapped to, then free vmemmap > @@ -323,6 +367,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end, > .remap_pte = vmemmap_remap_pte, > .reuse_addr = reuse, > .vmemmap_pages = vmemmap_pages, > + .flags = 0, > }; > int nid = page_to_nid((struct page *)reuse); > gfp_t gfp_mask = GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN; > @@ -371,6 +416,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end, > .remap_pte = vmemmap_restore_pte, > .reuse_addr = reuse, > .vmemmap_pages = vmemmap_pages, > + .flags = 0, > }; > > vmemmap_remap_range(reuse, end, &walk); > @@ -422,6 +468,7 @@ static int vmemmap_remap_alloc(unsigned long start, unsigned long end, > .remap_pte = vmemmap_restore_pte, > .reuse_addr = reuse, > .vmemmap_pages = &vmemmap_pages, > + .flags = 0, > }; > > /* See the comment in the vmemmap_remap_free(). */ > @@ -630,11 +677,35 @@ void hugetlb_vmemmap_optimize(const struct hstate *h, struct page *head) > free_vmemmap_page_list(&vmemmap_pages); > } > > +static void hugetlb_vmemmap_split(const struct hstate *h, struct page *head) > +{ > + unsigned long vmemmap_start = (unsigned long)head, vmemmap_end; > + unsigned long vmemmap_reuse; > + > + if (!vmemmap_should_optimize(h, head)) > + return; > + > + vmemmap_end = vmemmap_start + hugetlb_vmemmap_size(h); > + vmemmap_reuse = vmemmap_start; > + vmemmap_start += HUGETLB_VMEMMAP_RESERVE_SIZE; > + > + /* > + * Split PMDs on the vmemmap virtual address range [@vmemmap_start, > + * @vmemmap_end] > + */ > + vmemmap_remap_split(vmemmap_start, vmemmap_end, vmemmap_reuse); > +} > + > void hugetlb_vmemmap_optimize_folios(struct hstate *h, struct list_head *folio_list) > { > struct folio *folio; > LIST_HEAD(vmemmap_pages); > > + list_for_each_entry(folio, folio_list, lru) > + hugetlb_vmemmap_split(h, &folio->page); > + > + flush_tlb_all(); > + > list_for_each_entry(folio, folio_list, lru) { > int ret = __hugetlb_vmemmap_optimize(h, &folio->page, > &vmemmap_pages);