From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4FE1CC433EF for ; Tue, 28 Jun 2022 15:40:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A4A1D6B0071; Tue, 28 Jun 2022 11:40:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9FA518E0002; Tue, 28 Jun 2022 11:40:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 89A5D8E0001; Tue, 28 Jun 2022 11:40:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 7800B6B0071 for ; Tue, 28 Jun 2022 11:40:40 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 4B108352EE for ; Tue, 28 Jun 2022 15:40:40 +0000 (UTC) X-FDA: 79628057040.18.9E25980 Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) by imf12.hostedemail.com (Postfix) with ESMTP id AD72240025 for ; Tue, 28 Jun 2022 15:40:39 +0000 (UTC) Received: by mail-pj1-f42.google.com with SMTP id b12-20020a17090a6acc00b001ec2b181c98so16318720pjm.4 for ; Tue, 28 Jun 2022 08:40:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=/O/fAu/z5wIR2Tt1wRNpx7oM/8IQ7c97KuaGl+RDxRY=; b=Ndrf/nwkzP0xlXdJtQEGsYom6o6YovaBRoZs6vYCFyq3Ac+qNpargywQRgKpKFGxFu fIBmevJ8iBzkjswx3ZxH34Thn8KeQ8JFz1xiXFDw3a0XKjiSuJ+RX5ihlKiR1QWFXlBP yQIUUA+W/ITcCAwPLE6G9sO00m4z/AZh+IPTEW9JRPHmHjofioriP8BSGBVwfZKtf+e+ LJYVNzzlEm9/wqstxgDVKZDJGWGHFVX1tNJOOgrcla5vCOoA8gOAqr7S1RfRyBa1eUZ1 D7kXIRU0+FSxhpVjDqmrf92neQt6NJ8CXjMBHWBdrai7RC7fQgGK5sN8V3QUFyrqJBt7 2bAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=/O/fAu/z5wIR2Tt1wRNpx7oM/8IQ7c97KuaGl+RDxRY=; b=EZ/wl+BJjOVPqvrJj389iOkMCnnRRUgtWoOuaM64depRXM1EvseMcsbdanB6HqBotZ 8BnYkoulHJKAPcHUul8oIJM6BkU+veQnumyzk36/DESa1+weziQ6EI1jDSIMx5UJtWqj EXghiQd8auhzgeAMcVlBRUfIrBAF7kG8VMPU28T3g7wH8Vus9upMHuxPARSWEFRkswoP ZbODJUsvhQQTdAQrm+uynXkgQ1+NGhN5PVMtRWYQI2W1X3DeANl66Lceb60WDghYEB78 ZxSfE0r1AlbgFX9BHPn8Gds8MTyYI9h76Ak0Zurzt11wFfUysEUUG3EfASQblD9jpkPd qz6Q== X-Gm-Message-State: AJIora9yE703lOBX5CpideJnhAQ26TpnbD4zxT2NMCTOmirZbjY29aiF 5PgUHBvQnJhwgvJKdeQR4BSuGuUhPaMhdyEnT1PFSg== X-Google-Smtp-Source: AGRyM1semGJYll4lUVae+1xmL2gICM7y3fVFTLJibtYnVeZ/R+FRel/MPkikBBdLknAzVCIGhZeDMT7n134aR5jp+24= X-Received: by 2002:a17:902:e94f:b0:16a:214e:46c1 with SMTP id b15-20020a170902e94f00b0016a214e46c1mr4278522pll.89.1656430838615; Tue, 28 Jun 2022 08:40:38 -0700 (PDT) MIME-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> <20220624173656.2033256-3-jthoughton@google.com> In-Reply-To: From: James Houghton Date: Tue, 28 Jun 2022 08:40:27 -0700 Message-ID: Subject: Re: [RFC PATCH 02/26] hugetlb: sort hstates in hugetlb_init_hstates To: Mike Kravetz Cc: Muchun Song , Peter Xu , David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="Ndrf/nwk"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of jthoughton@google.com designates 209.85.216.42 as permitted sender) smtp.mailfrom=jthoughton@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1656430839; a=rsa-sha256; cv=none; b=Tvae4Ard4f+os9Qjoh/g53LIufAWmYMWg76/EC4VQwg5D+r76F/2dIDUMx9sUcAY6GOhbl smoJKceijvx8q//7Z+ShFIneZNaf0RCY4E7k4ZsYyTioLIk3yYAzkBhpvGg54fVMnsExgl Xsny3yxUdHz+Ms2VtjJqlN0nBSYjCvw= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1656430839; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/O/fAu/z5wIR2Tt1wRNpx7oM/8IQ7c97KuaGl+RDxRY=; b=SpU/RjyrxNYQe/oCXfuzXLr124ze9yS+5yAu43MjMBJah6G4VTtrdKilewGAYlVFskXyKk dd/mY82xTHr3LlyiXTS3q8QQcBp8l161tLP4SYKZ1OQSM4q8HLC6jn/vgacadctn9+uyz2 gRJBh+SAHJpt2B3IFvWhmqOfHF0xw4s= X-Stat-Signature: pnsei8d5aoieukenqff8mzfms5a4qgsr X-Rspamd-Server: rspam08 X-Rspam-User: X-Rspamd-Queue-Id: AD72240025 Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="Ndrf/nwk"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of jthoughton@google.com designates 209.85.216.42 as permitted sender) smtp.mailfrom=jthoughton@google.com X-HE-Tag: 1656430839-337285 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jun 27, 2022 at 11:42 AM Mike Kravetz wrote: > > On 06/24/22 17:36, James Houghton wrote: > > When using HugeTLB high-granularity mapping, we need to go through the > > supported hugepage sizes in decreasing order so that we pick the largest > > size that works. Consider the case where we're faulting in a 1G hugepage > > for the first time: we want hugetlb_fault/hugetlb_no_page to map it with > > a PUD. By going through the sizes in decreasing order, we will find that > > PUD_SIZE works before finding out that PMD_SIZE or PAGE_SIZE work too. > > > > Signed-off-by: James Houghton > > --- > > mm/hugetlb.c | 40 +++++++++++++++++++++++++++++++++++++--- > > 1 file changed, 37 insertions(+), 3 deletions(-) > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index a57e1be41401..5df838d86f32 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -33,6 +33,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > #include > > @@ -48,6 +49,10 @@ > > > > int hugetlb_max_hstate __read_mostly; > > unsigned int default_hstate_idx; > > +/* > > + * After hugetlb_init_hstates is called, hstates will be sorted from largest > > + * to smallest. > > + */ > > struct hstate hstates[HUGE_MAX_HSTATE]; > > > > #ifdef CONFIG_CMA > > @@ -3144,14 +3149,43 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h) > > kfree(node_alloc_noretry); > > } > > > > +static int compare_hstates_decreasing(const void *a, const void *b) > > +{ > > + const int shift_a = huge_page_shift((const struct hstate *)a); > > + const int shift_b = huge_page_shift((const struct hstate *)b); > > + > > + if (shift_a < shift_b) > > + return 1; > > + if (shift_a > shift_b) > > + return -1; > > + return 0; > > +} > > + > > +static void sort_hstates(void) > > +{ > > + unsigned long default_hstate_sz = huge_page_size(&default_hstate); > > + > > + /* Sort from largest to smallest. */ > > + sort(hstates, hugetlb_max_hstate, sizeof(*hstates), > > + compare_hstates_decreasing, NULL); > > + > > + /* > > + * We may have changed the location of the default hstate, so we need to > > + * update it. > > + */ > > + default_hstate_idx = hstate_index(size_to_hstate(default_hstate_sz)); > > +} > > + > > static void __init hugetlb_init_hstates(void) > > { > > struct hstate *h, *h2; > > > > - for_each_hstate(h) { > > - if (minimum_order > huge_page_order(h)) > > - minimum_order = huge_page_order(h); > > + sort_hstates(); > > > > + /* The last hstate is now the smallest. */ > > + minimum_order = huge_page_order(&hstates[hugetlb_max_hstate - 1]); > > + > > + for_each_hstate(h) { > > /* oversize hugepages were init'ed in early boot */ > > if (!hstate_is_gigantic(h)) > > hugetlb_hstate_alloc_pages(h); > > This may/will cause problems for gigantic hugetlb pages allocated at boot > time. See alloc_bootmem_huge_page() where a pointer to the associated hstate > is encoded within the allocated hugetlb page. These pages are added to > hugetlb pools by the routine gather_bootmem_prealloc() which uses the saved > hstate to add prep the gigantic page and add to the correct pool. Currently, > gather_bootmem_prealloc is called after hugetlb_init_hstates. So, changing > hstate order will cause errors. > > I do not see any reason why we could not call gather_bootmem_prealloc before > hugetlb_init_hstates to avoid this issue. Thanks for catching this, Mike. Your suggestion certainly seems to work, but it also seems kind of error prone. I'll have to look at the code more closely, but maybe it would be better if I just maintained a separate `struct hstate *sorted_hstate_ptrs[]`, where the original locations of the hstates remain unchanged, as to not break gather_bootmem_prealloc/other things. > -- > Mike Kravetz