From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B441EE14D3 for ; Thu, 7 Sep 2023 06:55:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 25D4B90001F; Thu, 7 Sep 2023 02:55:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 20D9E8E000F; Thu, 7 Sep 2023 02:55:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0D55A90001F; Thu, 7 Sep 2023 02:55:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id EF8D78E000F for ; Thu, 7 Sep 2023 02:55:15 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id BA4101CAE17 for ; Thu, 7 Sep 2023 06:55:15 +0000 (UTC) X-FDA: 81208889790.25.6A2910E Received: from out-230.mta0.migadu.com (out-230.mta0.migadu.com [91.218.175.230]) by imf26.hostedemail.com (Postfix) with ESMTP id D0C70140002 for ; Thu, 7 Sep 2023 06:55:13 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=D234PbBC; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf26.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.230 as permitted sender) smtp.mailfrom=muchun.song@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694069714; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=X7o+HyEyr853QgXsvzJozqL/nR8tYVSVMdFZFLksYBI=; b=wZ4FGXyzHH2oW5X6r9TCOncZtABp0+s4Q38/qLx5hvVrHz8TVBn7ohwiZscZKuz0MoE3uF EKmWU1sGrhcXLu/ioQ4B34RYdFx+8BP4vFLRk50GR2mSJz4Q9qC386cxYdMNwJuhxKoIk+ blCzLP11dijwdX2Ce0PU/1uesCdpeEQ= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=D234PbBC; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf26.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.230 as permitted sender) smtp.mailfrom=muchun.song@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694069714; a=rsa-sha256; cv=none; b=4aN6dm9OW3UDvQ00oQPXlOi3N6INHFY14nst1z8fEgqO7JxexZp4nkp2ZKMZW5HgmxASo5 75Tfc2rMXKZtIAq8FntVTvXN8Lyi18uiTv7p80054a0OFVWVm6rcjoGkKw19G0Kmphbasq 219kcf2p54cYUfM+Lgp6m6j1YNxqdg0= Message-ID: <77060d35-a6d9-73e2-28a2-e736df00709a@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1694069711; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=X7o+HyEyr853QgXsvzJozqL/nR8tYVSVMdFZFLksYBI=; b=D234PbBC23JVcuZchVdysz1HX/2OKRH2gvzqEVNGO9th7MM5bWtDMMlWS35wml+0KyOeJz yeH0q11wDS2XBukFrA6Jf/c0yPnzdgyFlelsGFS73iwOzw/RtEFl9fv557Bu3KS0tmmLB6 qsK3aRY1YQVt0UNtO8H7G4HnBRP+x7o= Date: Thu, 7 Sep 2023 14:55:01 +0800 MIME-Version: 1.0 Subject: Re: [PATCH v2 10/11] hugetlb: batch TLB flushes when freeing vmemmap To: Mike Kravetz , linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , Joao Martins , Oscar Salvador , David Hildenbrand , Miaohe Lin , David Rientjes , Anshuman Khandual , Naoya Horiguchi , Barry Song , Michal Hocko , Matthew Wilcox , Xiongchun Duan , Andrew Morton References: <20230905214412.89152-1-mike.kravetz@oracle.com> <20230905214412.89152-11-mike.kravetz@oracle.com> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Muchun Song In-Reply-To: <20230905214412.89152-11-mike.kravetz@oracle.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: D0C70140002 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: tibtx5m1ngmyxrjxbmsfoachwegx7swk X-HE-Tag: 1694069713-429863 X-HE-Meta: U2FsdGVkX18LjDCtqCrUNJZqRl1EfHnK2s0NdvO8uHbynBIKBAE6IUtQ0pBU+6q+5HocOaZxKhdxw+olPQoY2I9Wa8tbXISLvIsscBynp4PezrWt3kcbRswTjJVEdJEyr2/2hepVhvGfHU71zyeHK2KyP+n6ovNs5QmLW0VjvVP7heV0upXQVYAVXh5Sa0LHi4HHkWT11P/z5zcrMs5NNyeeo80z+o4IxgheMVmB1AOq8Dvx9I8ViRV61/LNOtamhMgdm0eZIyttRuXQ0NJg4Ss1H8dJGABH8Ah8JTwGJ6/JPLBo0tn+o3a47Wek9qukCCXXP0vMdtatvc5hNJdBXidTFByU1d3vfuQxE+mW+45iXvXILVHp92oNTzTZCQxrc59ZkfNI78Q5NDhmFGLgyaGYmuy0YuHJ/gzC+5qcvnGjZsNz/Kqu4mzLf4P2CrWxuR/rFq1oMVAOID74YbmngyZmHk9R5QwbNB35PLE1d+K3gAxpGU+zkb5ZptsDonfqcVAgOMiWgL4E/gJMK7HywVunyirD6/M9W2KjM0MxWeHnSNV1AoWevUKm3h00Ih42Ud2DLYlTpt03ueOkvFBmFcLfwv/M1/bKTnsvjYv4pTkNrOIoaES8Iz++k1p4rdqnTQsXsLd2cznyk3RIpCSDvy2iqrOSugde41OUMHiSe6Fbc6xYavo+gPXx7B7a4LBBZjNnEO/N1W4QlatwPVtJsJY9qJ+LiSvQuhSdY5X/pKuOi6k9ApAZ82gv3TKynUWqQlQ4gI11Fa+iG2VdoQDNRwEydBtsaQCkeWj+dfMsL6uDqWGqGmaJMSV5YOV2tHnI0MygJkwVmm/m+IB765Mk3E1VA4YnmGohWwh2XrKrKgWtvynzG4fkZETprj0AfIIX1Za//RFz9ZxFAdEep5tC+3SsYioX+P4JhEkKuSnuZGm6kucDyZXsFZ3bxf2bOsIHCxyAsu5yfT0jQcz6CjS 5EcKZWbv 4jwtNrXUt47HOoaIFNZ0BsJcVMdGmu4GtIxfFtZJp2Af7WtXGc7wwL2KLvJd7FJY+9yHOnlg+Ljlh4e4iRZrnQ3vmHucfWq5C/sKAXrZjrp+bX1bVDx5Ic8RRFA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2023/9/6 05:44, Mike Kravetz wrote: > From: Joao Martins > > Now that a list of pages is deduplicated at once, the TLB > flush can be batched for all vmemmap pages that got remapped. > > Add a flags field and pass whether it's a bulk allocation or > just a single page to decide to remap. > > The TLB flush is global as we don't have guarantees from caller > that the set of folios is contiguous, or to add complexity in > composing a list of kVAs to flush. > > Modified by Mike Kravetz to perform TLB flush on single folio if an > error is encountered. > > Signed-off-by: Joao Martins > Signed-off-by: Mike Kravetz > --- > mm/hugetlb_vmemmap.c | 38 ++++++++++++++++++++++++++++++-------- > 1 file changed, 30 insertions(+), 8 deletions(-) > > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c > index d956551699bc..8c85e2c38538 100644 > --- a/mm/hugetlb_vmemmap.c > +++ b/mm/hugetlb_vmemmap.c > @@ -27,6 +27,7 @@ > * @reuse_addr: the virtual address of the @reuse_page page. > * @vmemmap_pages: the list head of the vmemmap pages that can be freed > * or is mapped from. > + * @flags: used to modify behavior in bulk operations > */ > struct vmemmap_remap_walk { > void (*remap_pte)(pte_t *pte, unsigned long addr, > @@ -35,6 +36,8 @@ struct vmemmap_remap_walk { > struct page *reuse_page; > unsigned long reuse_addr; > struct list_head *vmemmap_pages; > +#define VMEMMAP_NO_TLB_FLUSH BIT(0) > + unsigned long flags; > }; > > static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start, bool flush) > @@ -208,7 +211,7 @@ static int vmemmap_remap_range(unsigned long start, unsigned long end, > return ret; > } while (pgd++, addr = next, addr != end); > > - if (walk->remap_pte) > + if (walk->remap_pte && !(walk->flags & VMEMMAP_NO_TLB_FLUSH)) > flush_tlb_kernel_range(start, end); > > return 0; > @@ -348,12 +351,14 @@ static int vmemmap_remap_split(unsigned long start, unsigned long end, > * @reuse: reuse address. > * @vmemmap_pages: list to deposit vmemmap pages to be freed. It is callers > * responsibility to free pages. > + * @flags: modifications to vmemmap_remap_walk flags > * > * Return: %0 on success, negative error code otherwise. > */ > static int vmemmap_remap_free(unsigned long start, unsigned long end, > unsigned long reuse, > - struct list_head *vmemmap_pages) > + struct list_head *vmemmap_pages, > + unsigned long flags) > { > int ret; > LIST_HEAD(freed_pages); > @@ -361,6 +366,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end, > .remap_pte = vmemmap_remap_pte, > .reuse_addr = reuse, > .vmemmap_pages = &freed_pages, > + .flags = flags, > }; > int nid = page_to_nid((struct page *)start); > gfp_t gfp_mask = GFP_KERNEL | __GFP_THISNODE | __GFP_NORETRY | > @@ -410,6 +416,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end, > .remap_pte = vmemmap_restore_pte, > .reuse_addr = reuse, > .vmemmap_pages = &freed_pages, > + .flags = 0, > }; > > vmemmap_remap_range(reuse, end, &walk); > @@ -597,7 +604,8 @@ static bool vmemmap_should_optimize(const struct hstate *h, const struct page *h > > static void __hugetlb_vmemmap_optimize(const struct hstate *h, > struct page *head, > - struct list_head *vmemmap_pages) > + struct list_head *vmemmap_pages, > + unsigned long flags) > { > unsigned long vmemmap_start = (unsigned long)head, vmemmap_end; > unsigned long vmemmap_reuse; > @@ -607,6 +615,18 @@ static void __hugetlb_vmemmap_optimize(const struct hstate *h, > return; > > static_branch_inc(&hugetlb_optimize_vmemmap_key); > + /* > + * Very Subtle > + * If VMEMMAP_NO_TLB_FLUSH is set, TLB flushing is not performed > + * immediately after remapping. As a result, subsequent accesses > + * and modifications to struct pages associated with the hugetlb > + * page could bet to the OLD struct pages. Set the vmemmap optimized > + * flag here so that it is copied to the new head page. This keeps > + * the old and new struct pages in sync. > + * If there is an error during optimization, we will immediately FLUSH > + * the TLB and clear the flag below. > + */ > + SetHPageVmemmapOptimized(head); > > vmemmap_end = vmemmap_start + hugetlb_vmemmap_size(h); > vmemmap_reuse = vmemmap_start; > @@ -618,10 +638,10 @@ static void __hugetlb_vmemmap_optimize(const struct hstate *h, > * mapping the range to vmemmap_pages list so that they can be freed by > * the caller. > */ > - if (vmemmap_remap_free(vmemmap_start, vmemmap_end, vmemmap_reuse, vmemmap_pages)) > + if (vmemmap_remap_free(vmemmap_start, vmemmap_end, vmemmap_reuse, vmemmap_pages, flags)) { > static_branch_dec(&hugetlb_optimize_vmemmap_key); > - else > - SetHPageVmemmapOptimized(head); > + ClearHPageVmemmapOptimized(head); > + } > } > > /** > @@ -638,7 +658,7 @@ void hugetlb_vmemmap_optimize(const struct hstate *h, struct page *head) > { > LIST_HEAD(vmemmap_pages); > > - __hugetlb_vmemmap_optimize(h, head, &vmemmap_pages); > + __hugetlb_vmemmap_optimize(h, head, &vmemmap_pages, 0UL); UL suffix could be dropped. Right? > free_vmemmap_page_list(&vmemmap_pages); > } > > @@ -672,7 +692,9 @@ void hugetlb_vmemmap_optimize_folios(struct hstate *h, struct list_head *folio_l > flush_tlb_all(); > > list_for_each_entry(folio, folio_list, lru) > - __hugetlb_vmemmap_optimize(h, &folio->page, &vmemmap_pages); > + __hugetlb_vmemmap_optimize(h, &folio->page, &vmemmap_pages, VMEMMAP_NO_TLB_FLUSH); > + > + flush_tlb_all(); > > free_vmemmap_page_list(&vmemmap_pages); > }