From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 817FBEB8FA5 for ; Wed, 6 Sep 2023 08:25:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E7114280007; Wed, 6 Sep 2023 04:24:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E22A98E0014; Wed, 6 Sep 2023 04:24:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CC0A0280007; Wed, 6 Sep 2023 04:24:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id BDBB78E0014 for ; Wed, 6 Sep 2023 04:24:59 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 7FFD1140CC0 for ; Wed, 6 Sep 2023 08:24:59 +0000 (UTC) X-FDA: 81205487118.27.5EC0D38 Received: from out-211.mta0.migadu.com (out-211.mta0.migadu.com [91.218.175.211]) by imf27.hostedemail.com (Postfix) with ESMTP id 7BE8C4001B for ; Wed, 6 Sep 2023 08:24:57 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=lSqApaS9; spf=pass (imf27.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.211 as permitted sender) smtp.mailfrom=muchun.song@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1693988697; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CaSml86dokuyvs5L6O2IJuNI2GacngyA8BKI6TpMhWg=; b=wSgCF6+qWH99704gH7OuvbKgT6LMUvClCOmqXfDVP2OZIF4Q+ywH16axjTft53UvaRVwH6 2/UgGszbqIH4h409ujMG/d0UqdqkS/8mWhIBBRGnuGcpIpJcGz0FhlsNMfFjqK/qtMRCNQ OWlTRW8Wj8kwbrihHQKLl7DJ52Vpmwc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1693988697; a=rsa-sha256; cv=none; b=HOHInBgaYm5oJIIxfRWMx/CHpTelQgIFnaVgDPJHPkrLqc8AfCsljIoB+2ef14xqlWg4TV 0URpqcs37Cessht2Q0OXpcy9wZ6stC0aO+Rp5sqBGgHuXbi5GEG9YAIZ9/L5kK+nLJLelw X8YSZ5/utjPSGyvXwsihPa2d83vVeCg= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=lSqApaS9; spf=pass (imf27.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.211 as permitted sender) smtp.mailfrom=muchun.song@linux.dev; dmarc=pass (policy=none) header.from=linux.dev Message-ID: <0b0609d8-bc87-0463-bafd-9613f0053039@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1693988695; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CaSml86dokuyvs5L6O2IJuNI2GacngyA8BKI6TpMhWg=; b=lSqApaS93XEC+O+nZvUhxZLur3BekeFV1iD/WTdWFe3dkcj2QDsUsyRepQjpK3w25KGGF1 tl3G7ctcmVwH48x/at+u5T17pAT40uXrNGvPyIDiamo1vpsDZgW+23iur0Xhg5afTT0ZvL IV5F97Xfny/nFnLyzAGx4s3E4lpGEAY= Date: Wed, 6 Sep 2023 16:24:49 +0800 MIME-Version: 1.0 Subject: Re: [PATCH v2 09/11] hugetlb: batch PMD split for bulk vmemmap dedup To: Mike Kravetz , linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , Joao Martins , Oscar Salvador , David Hildenbrand , Miaohe Lin , David Rientjes , Anshuman Khandual , Naoya Horiguchi , Barry Song , Michal Hocko , Matthew Wilcox , Xiongchun Duan , Andrew Morton References: <20230905214412.89152-1-mike.kravetz@oracle.com> <20230905214412.89152-10-mike.kravetz@oracle.com> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Muchun Song In-Reply-To: <20230905214412.89152-10-mike.kravetz@oracle.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT X-Stat-Signature: qn7qkiewoquxgkcgiiea9tr4zcp7waw9 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 7BE8C4001B X-Rspam-User: X-HE-Tag: 1693988697-327745 X-HE-Meta: U2FsdGVkX1+05e3SVllvAeRpc+pmwYzLiVvyxmWZx/fkliK94jPUz67szgFOyMwwzX0KbTKFglmBp4K4IIhdbGGuMuio2CdbF0yIUA1alDypl9/KSd1Nk1vZopc756kgfGnRwbJl26NFa3SVtkXSzL+Q94fxnhPkDtoPQAdQeGSk0oG2d7SfZVPtfQUqzls00q71Zc5xR5UQ9hpJg4g2xsZXjsUJ3nCMes1f50ka9TVI845wgWFJakE+p4pWMxfp9wI+/sAP/L9O0F6uoHQtiTmJhwaemztozv+U1vTxReTHNBOZ6iHWF2Cod+4uvnd8GWgYD9/4SggqOx6/+WTaCZQwxrq+aVLr4Xn0O3W+GJ+9XNnD+liC6cX7ZbgAiE6DGfajWFpLiCne2SCiWZGgSpXw0RC86CoZWv0FW8xnEuc3OYNYlnrExMoFv/EahllhtIk2JxMt9PEDd992q+XcvLUoNLs1NSAXK5QaTgTyIOqwQehbD/LM4ss3Piis/+xHfysSrZxm9G1nalOk6ajHPGj0uau7745dEfXB4CCuTKL69Pv4cg1lH6WIic9G6N6X2yJ2x0woSvYtAePiAxk7vgWpIN+Czg69n30m1z58s+1zDOCGbkJ/AF7AptIcEAxOWnD4E+cvUliwetDmsy+IQ8VMQYR0il38NMaN+LyUPfhqxiLTSDPFs8Qm3i3cpqOtLNfYQ6xYx8QDDvI8MOgiEff8/kTCP4+C3Tf0rSjlrpuPQb2Ii9OX5hPPvSSOoB1R82eJto0n1jNw103x/iZJ4ilNufAXXeL3btcgZyEq1ZXlMhk2av0Bb3vyMrKOQIfZtLaeXlwb3rcfATYmqPTUuC2xrg0OYqxJBqH6hbu31i0AS/ieJCDADB/uZVi3/mCpz4gsYj3Q7lvwKiMY046fcHvRqfJ1AIVsvUEui/CnMw/tn3jHI0gj4X/6jpXFA6PKt3u6PeoVl2M+MfBRDxm Hg5OLT1k YbHn5WIGz6X6Cs9IknGl7qd4JeTItBF7QuoY6gkq9vLBVCTyOcge4SnrhLZSmFuvRB1SgPNuWtRwRECou85+0p7rMBYjONoSJygFGz7dXgaEoLgjaQQpoFj0Eag== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2023/9/6 05:44, Mike Kravetz wrote: > From: Joao Martins > > In an effort to minimize amount of TLB flushes, batch all PMD splits > belonging to a range of pages in order to perform only 1 (global) TLB > flush. > > Rebased and updated by Mike Kravetz > > Signed-off-by: Joao Martins > Signed-off-by: Mike Kravetz > --- > mm/hugetlb_vmemmap.c | 72 +++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 68 insertions(+), 4 deletions(-) > > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c > index a715712df831..d956551699bc 100644 > --- a/mm/hugetlb_vmemmap.c > +++ b/mm/hugetlb_vmemmap.c > @@ -37,7 +37,7 @@ struct vmemmap_remap_walk { > struct list_head *vmemmap_pages; > }; > > -static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start) > +static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start, bool flush) > { > pmd_t __pmd; > int i; > @@ -80,7 +80,8 @@ static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start) > /* Make pte visible before pmd. See comment in pmd_install(). */ > smp_wmb(); > pmd_populate_kernel(&init_mm, pmd, pgtable); > - flush_tlb_kernel_range(start, start + PMD_SIZE); > + if (flush) > + flush_tlb_kernel_range(start, start + PMD_SIZE); > } else { > pte_free_kernel(&init_mm, pgtable); > } > @@ -127,11 +128,20 @@ static int vmemmap_pmd_range(pud_t *pud, unsigned long addr, > do { > int ret; > > - ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK); > + ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK, > + walk->remap_pte != NULL); It is bettter to only make @walk->remap_pte indicate whether we should go to the last page table level. I suggest reusing VMEMMAP_NO_TLB_FLUSH to indicate whether we should flush the TLB at pmd level. It'll be more clear. > if (ret) > return ret; > > next = pmd_addr_end(addr, end); > + > + /* > + * We are only splitting, not remapping the hugetlb vmemmap > + * pages. > + */ > + if (!walk->remap_pte) > + continue; > + > vmemmap_pte_range(pmd, addr, next, walk); > } while (pmd++, addr = next, addr != end); > > @@ -198,7 +208,8 @@ static int vmemmap_remap_range(unsigned long start, unsigned long end, > return ret; > } while (pgd++, addr = next, addr != end); > > - flush_tlb_kernel_range(start, end); > + if (walk->remap_pte) > + flush_tlb_kernel_range(start, end); > > return 0; > } > @@ -297,6 +308,35 @@ static void vmemmap_restore_pte(pte_t *pte, unsigned long addr, > set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot)); > } > > +/** > + * vmemmap_remap_split - split the vmemmap virtual address range [@start, @end) > + * backing PMDs of the directmap into PTEs > + * @start: start address of the vmemmap virtual address range that we want > + * to remap. > + * @end: end address of the vmemmap virtual address range that we want to > + * remap. > + * @reuse: reuse address. > + * > + * Return: %0 on success, negative error code otherwise. > + */ > +static int vmemmap_remap_split(unsigned long start, unsigned long end, > + unsigned long reuse) > +{ > + int ret; > + struct vmemmap_remap_walk walk = { > + .remap_pte = NULL, > + }; > + > + /* See the comment in the vmemmap_remap_free(). */ > + BUG_ON(start - reuse != PAGE_SIZE); > + > + mmap_read_lock(&init_mm); > + ret = vmemmap_remap_range(reuse, end, &walk); > + mmap_read_unlock(&init_mm); > + > + return ret; > +} > + > /** > * vmemmap_remap_free - remap the vmemmap virtual address range [@start, @end) > * to the page which @reuse is mapped to, then free vmemmap > @@ -602,11 +642,35 @@ void hugetlb_vmemmap_optimize(const struct hstate *h, struct page *head) > free_vmemmap_page_list(&vmemmap_pages); > } > > +static void hugetlb_vmemmap_split(const struct hstate *h, struct page *head) > +{ > + unsigned long vmemmap_start = (unsigned long)head, vmemmap_end; > + unsigned long vmemmap_reuse; > + > + if (!vmemmap_should_optimize(h, head)) > + return; > + > + vmemmap_end = vmemmap_start + hugetlb_vmemmap_size(h); > + vmemmap_reuse = vmemmap_start; > + vmemmap_start += HUGETLB_VMEMMAP_RESERVE_SIZE; > + > + /* > + * Split PMDs on the vmemmap virtual address range [@vmemmap_start, > + * @vmemmap_end] > + */ > + vmemmap_remap_split(vmemmap_start, vmemmap_end, vmemmap_reuse); > +} > + > void hugetlb_vmemmap_optimize_folios(struct hstate *h, struct list_head *folio_list) > { > struct folio *folio; > LIST_HEAD(vmemmap_pages); > > + list_for_each_entry(folio, folio_list, lru) > + hugetlb_vmemmap_split(h, &folio->page); Maybe it is reasonable to add a return value to hugetlb_vmemmap_split() to indicate whether it has done successfully, if it fails, it must be OOM, in which case, there is no sense to continue to split the page talbe and optimize the vmemmap pages subsequently, right? Thanks. > + > + flush_tlb_all(); > + > list_for_each_entry(folio, folio_list, lru) > __hugetlb_vmemmap_optimize(h, &folio->page, &vmemmap_pages); >