From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3333CCE79A8 for ; Wed, 20 Sep 2023 02:48:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 878356B00CE; Tue, 19 Sep 2023 22:48:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 801AE6B00EA; Tue, 19 Sep 2023 22:48:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6A19B6B00F2; Tue, 19 Sep 2023 22:48:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 57A946B00CE for ; Tue, 19 Sep 2023 22:48:18 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 1E786C133B for ; Wed, 20 Sep 2023 02:48:18 +0000 (UTC) X-FDA: 81255441876.02.0033A05 Received: from out-211.mta0.migadu.com (out-211.mta0.migadu.com [91.218.175.211]) by imf18.hostedemail.com (Postfix) with ESMTP id 047891C001B for ; Wed, 20 Sep 2023 02:48:15 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=MypVaEqK; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf18.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.211 as permitted sender) smtp.mailfrom=muchun.song@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695178096; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=90dpuWkSNpYOn5Bwrw/HrT1nHeiQih2ItWu9YBquWTY=; b=pmtENRxDPtGM4PPnGpffO+JNFC2nahwvpYu59JJeoG4eZRaFF6onXMrnW/9qvXGP0fMYn0 0xPMjbaIVwanN4KnJFglmibIdQxtz+YVsjzGzp9uisnlqctxTFu/TSehg/r8+uLGn8ztA0 BPkgFlvgoTYGZ6qhYTU5qeNxcYgQRFQ= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=MypVaEqK; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf18.hostedemail.com: domain of muchun.song@linux.dev designates 91.218.175.211 as permitted sender) smtp.mailfrom=muchun.song@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695178096; a=rsa-sha256; cv=none; b=ZL/VghggmTEuzTpgd+nqz3Zc3KfC5gv6QRZ8Rg/Huy3EuEMI5NU3nOn84nYRyy4v4dbDm/ Wgzo5Y7lZmmH/hTAfNaRcazjaFLD8vpAKNsXuWZ0PTifurQE4gIe2jFT9mYtM/Qr6f5kcB EuTktJgeUP2o8/vE97pj7oZvTPegTmw= Content-Type: text/plain; charset=us-ascii DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1695178093; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=90dpuWkSNpYOn5Bwrw/HrT1nHeiQih2ItWu9YBquWTY=; b=MypVaEqK2j/LUSt8onkeWROq1bmjMcVubeR7TXa2+ke1wnX27/NIpRdRWqqBVgjgEnop/2 3ENu0RNvMx0qf+fy3PZRyShTkBm3hfKlIUpkBFdra/JoRG3+5/gkRbHsSs4hsWmG4orGe9 gGd0pxJzayBfK1vii2wnVKqkyUu0Yis= Mime-Version: 1.0 Subject: Re: [PATCH v4 6/8] hugetlb: batch PMD split for bulk vmemmap dedup X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Muchun Song In-Reply-To: <5bd9c4f5-3411-4327-a495-ce6672150977@oracle.com> Date: Wed, 20 Sep 2023 10:47:33 +0800 Cc: Mike Kravetz , Muchun Song , Oscar Salvador , David Hildenbrand , Miaohe Lin , David Rientjes , Anshuman Khandual , Naoya Horiguchi , Barry Song <21cnbao@gmail.com>, Michal Hocko , Matthew Wilcox , Xiongchun Duan , Linux-MM , Andrew Morton , LKML Content-Transfer-Encoding: quoted-printable Message-Id: References: <20230918230202.254631-1-mike.kravetz@oracle.com> <20230918230202.254631-7-mike.kravetz@oracle.com> <9c627733-e6a2-833b-b0f9-d59552f6ab0d@linux.dev> <07192BE2-C66E-4F74-8F76-05F57777C6B7@linux.dev> <83B874B6-FF22-4588-90A9-31644D598032@linux.dev> <5bd9c4f5-3411-4327-a495-ce6672150977@oracle.com> To: Joao Martins X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 047891C001B X-Stat-Signature: ahggp9sdq6eoeh1zqk8mpxe44k5cthjg X-HE-Tag: 1695178095-654212 X-HE-Meta: U2FsdGVkX1/KPWdLUis8/XvzPNUG7OVVxX3XcDT0s7Fv7YVNplImqzC6gknLowIH4UuTEg5ea8URT2WO6dZgHsZqsrEzasvPyHvLAoXlQgbGqYBm7su1JSXYnilOHdecypGnq8ZWCcGXzOltntrqmFS2lf1S4HmHcuFNv+FNHScLKznd9RgV3JjOHcYbhq+81mixWCLcZJNnVj1db1laqREkiZeeTZ+KvsLfBz4UbME+hZJjHJ7CmsU+CSdDUV+lJjq4Q9j+A2N2zinOyPGe5chzeKNqhF7STxkmNA8ujxCisTHAj20Oh+C//XX7FuEVvG5Hx4NydzAW7GSFhjaEILFB4GIBtfnaunaExoKm3uPT6HvOyZLApd3rBoBhatCzlOfcsLVlZTR4OBfR/DOLFWtIAYfDZ85x7yIAVtUkUrO3x72pBzbVnOP3BuZ+0R2tA2oUMPTnXGvOepS0TzzO6K/qzLjQe5ai9WSqSsRPi6lea9bUyD9Xan/cf53EGo71QwpHKC+q/x58NPVE/qdprSFvIKFdRFdO5dheWaIL5e7Ocb2ZqGa5SVeLU46MG00xFpK0hCtdZlaG5acp2yR/dEGA4kZBa7GAeF6Kk2mwwYwVeYCS5+4/zdcOREpFX0i657Ib91st8n+vKDN3dy5uu60M4xywkwO/5c5aduJrNAVINXrK16LLBIF8fN1n5+jXFuPXWycJhDwxntmuHB0XLcF9ZB0Jk265p4E+Kmmldd0JmJdeIHpTjrhMwGV9O5K9cQM1dhGqfYz878kr0AFGFUFWyHOaCoV5e4XNxVOw+TB/dID5QtoVUNSR78GiEkjMbTjZvi5xC4j6owsguQppF1aqyfZCpr3HbT3T8IlzSpv9YZRBp61UrP6Pq358MS2bolUbTarD5zWYx2zKRiNUDZrge58qCQ/rlDVVsGHfoNd27F9uAw155vfpVFM8c+9Yy2fmT8wM/CUX9Fwpeoh q0ikO0dJ utoLy+RRV7GIygdUsGUkuLn2vNLYe1bMTsGuunQHAEUwtmeEHxv9NyHZOIQH7ID6Ix8JWMHMjRQqpUjgnXWWwPWgZhxu9PS28YQ1jbUKDbYstaE9Q6/hqyEcVs4b3mdkrwMqqKtCfRALlVSoYyHeJOMAMSP6Mus6PQoUWHMzQ1mhtFu8mjeVGxB9kJajQvFmDIfrxzau6NssSUYg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Sep 19, 2023, at 23:09, Joao Martins = wrote: >=20 > On 19/09/2023 09:57, Muchun Song wrote: >>> On Sep 19, 2023, at 16:55, Joao Martins = wrote: >>> On 19/09/2023 09:41, Muchun Song wrote: >>>>> On Sep 19, 2023, at 16:26, Joao Martins = wrote: >>>>> On 19/09/2023 07:42, Muchun Song wrote: >>>>>> On 2023/9/19 07:01, Mike Kravetz wrote: >>>>>>> list_for_each_entry(folio, folio_list, lru) { >>>>>>> int ret =3D __hugetlb_vmemmap_optimize(h, &folio->page, >>>>>>> &vmemmap_pages); >>>>>>=20 >>>>>> This is unlikely to be failed since the page table allocation >>>>>> is moved to the above=20 >>>>>=20 >>>>>> (Note that the head vmemmap page allocation >>>>>> is not mandatory).=20 >>>>>=20 >>>>> Good point that I almost forgot >>>>>=20 >>>>>> So we should handle the error case in the above >>>>>> splitting operation. >>>>>=20 >>>>> But back to the previous discussion in v2... the thinking was that = /some/ PMDs >>>>> got split, and say could allow some PTE remapping to occur and = free some pages >>>>> back (each page allows 6 more splits worst case). Then the next >>>>> __hugetlb_vmemmap_optimize() will have to split PMD pages again = for those >>>>> hugepages that failed the batch PMD split (as we only defer the = PTE remap tlb >>>>> flush in this stage). >>>>=20 >>>> Oh, yes. Maybe we could break the above traversal as early as = possible >>>> once we enter an ENOMEM? >>>>=20 >>>=20 >>> Sounds good -- no point in keep trying to split if we are failing = with OOM. >>>=20 >>> Perhaps a comment in both of these clauses (the early break on split = and the OOM >>> handling in batch optimize) could help make this clear. >>=20 >> Make sense. >=20 > These are the changes I have so far for this patch based on the = discussion so > far. For next one it's at the end: Code looks good to me. One nit below. >=20 > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c > index e8bc2f7567db..d9c6f2cf698c 100644 > --- a/mm/hugetlb_vmemmap.c > +++ b/mm/hugetlb_vmemmap.c > @@ -27,7 +27,8 @@ > * @reuse_addr: the virtual address of the @reuse_page = page. > * @vmemmap_pages: the list head of the vmemmap pages that can be = freed > * or is mapped from. > - * @flags: used to modify behavior in bulk operations > + * @flags: used to modify behavior in vmemmap page table = walking > + * operations. > */ > struct vmemmap_remap_walk { > void (*remap_pte)(pte_t *pte, unsigned long = addr, > @@ -36,6 +37,8 @@ struct vmemmap_remap_walk { > struct page *reuse_page; > unsigned long reuse_addr; > struct list_head *vmemmap_pages; > + > +/* Skip the TLB flush when we split the PMD */ > #define VMEMMAP_SPLIT_NO_TLB_FLUSH BIT(0) > unsigned long flags; > }; > @@ -132,7 +135,7 @@ static int vmemmap_pmd_range(pud_t *pud, unsigned = long addr, > int ret; >=20 > ret =3D split_vmemmap_huge_pmd(pmd, addr & PMD_MASK, > - walk->flags & = VMEMMAP_SPLIT_NO_TLB_FLUSH); > + !(walk->flags & = VMEMMAP_SPLIT_NO_TLB_FLUSH)); > if (ret) > return ret; >=20 > @@ -677,13 +680,13 @@ void hugetlb_vmemmap_optimize(const struct = hstate *h, > struct page *head) > free_vmemmap_page_list(&vmemmap_pages); > } >=20 > -static void hugetlb_vmemmap_split(const struct hstate *h, struct page = *head) > +static int hugetlb_vmemmap_split(const struct hstate *h, struct page = *head) > { > unsigned long vmemmap_start =3D (unsigned long)head, = vmemmap_end; > unsigned long vmemmap_reuse; >=20 > if (!vmemmap_should_optimize(h, head)) > - return; > + return 0; >=20 > vmemmap_end =3D vmemmap_start + hugetlb_vmemmap_size(h); > vmemmap_reuse =3D vmemmap_start; > @@ -693,7 +696,7 @@ static void hugetlb_vmemmap_split(const struct = hstate *h, > struct page *head) > * Split PMDs on the vmemmap virtual address range = [@vmemmap_start, > * @vmemmap_end] > */ > - vmemmap_remap_split(vmemmap_start, vmemmap_end, = vmemmap_reuse); > + return vmemmap_remap_split(vmemmap_start, vmemmap_end, = vmemmap_reuse); > } >=20 > void hugetlb_vmemmap_optimize_folios(struct hstate *h, struct = list_head > *folio_list) > @@ -701,8 +704,18 @@ void hugetlb_vmemmap_optimize_folios(struct = hstate *h, > struct list_head *folio_l > struct folio *folio; > LIST_HEAD(vmemmap_pages); >=20 > - list_for_each_entry(folio, folio_list, lru) > - hugetlb_vmemmap_split(h, &folio->page); > + list_for_each_entry(folio, folio_list, lru) { > + int ret =3D hugetlb_vmemmap_split(h, &folio->page); > + > + /* > + * Spliting the PMD requires allocating a page, thus = lets fail ^^^^ ^^^ Splitting page table page I'd like to specify the functionality of the allocated page. > + * early once we encounter the first OOM. No point in = retrying > + * as it can be dynamically done on remap with the = memory > + * we get back from the vmemmap deduplication. > + */ > + if (ret =3D=3D -ENOMEM) > + break; > + } >=20 > flush_tlb_all(); >=20 > For patch 7, I only have commentary added derived from this earlier = discussion > above: >=20 > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c > index d9c6f2cf698c..f6a1020a4b6a 100644 > --- a/mm/hugetlb_vmemmap.c > +++ b/mm/hugetlb_vmemmap.c > @@ -40,6 +40,8 @@ struct vmemmap_remap_walk { >=20 > /* Skip the TLB flush when we split the PMD */ > #define VMEMMAP_SPLIT_NO_TLB_FLUSH BIT(0) > +/* Skip the TLB flush when we remap the PTE */ > #define VMEMMAP_REMAP_NO_TLB_FLUSH BIT(1) > unsigned long flags; > }; >=20 > @@ -721,19 +739,28 @@ void hugetlb_vmemmap_optimize_folios(struct = hstate *h, > struct list_head *folio_l >=20 > list_for_each_entry(folio, folio_list, lru) { > int ret =3D __hugetlb_vmemmap_optimize(h, &folio->page, > &vmemmap_pages, > = VMEMMAP_REMAP_NO_TLB_FLUSH); >=20 > /* > * Pages to be freed may have been accumulated. If we > * encounter an ENOMEM, free what we have and try = again. > + * This can occur in the case that both spliting fails ^^^ splitting > + * halfway and head page allocation also failed. In = this ^^^^^^^ head vmemmap page Otherwise: Reviewed-by: Muchun Song Thanks. > + * case __hugetlb_vmemmap_optimize() would free memory > + * allowing more vmemmap remaps to occur. > */ > if (ret =3D=3D -ENOMEM && !list_empty(&vmemmap_pages)) = { >=20