From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35CB6C5517A for ; Thu, 5 Nov 2020 16:09:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 628572151B for ; Thu, 5 Nov 2020 16:09:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=bytedance-com.20150623.gappssmtp.com header.i=@bytedance-com.20150623.gappssmtp.com header.b="pThxNC2i" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 628572151B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D035D6B0137; Thu, 5 Nov 2020 11:09:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CB37F6B013E; Thu, 5 Nov 2020 11:09:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BCA636B013F; Thu, 5 Nov 2020 11:09:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0166.hostedemail.com [216.40.44.166]) by kanga.kvack.org (Postfix) with ESMTP id 9084C6B0137 for ; Thu, 5 Nov 2020 11:09:04 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 2C61E8249980 for ; Thu, 5 Nov 2020 16:09:04 +0000 (UTC) X-FDA: 77450848608.13.skin58_5417192272cb Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id 03B5C18140B67 for ; Thu, 5 Nov 2020 16:09:03 +0000 (UTC) X-HE-Tag: skin58_5417192272cb X-Filterd-Recvd-Size: 9015 Received: from mail-pl1-f196.google.com (mail-pl1-f196.google.com [209.85.214.196]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Thu, 5 Nov 2020 16:09:03 +0000 (UTC) Received: by mail-pl1-f196.google.com with SMTP id w11so992094pll.8 for ; Thu, 05 Nov 2020 08:09:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=DNv8EosKffE1C3jvJ0rJJftgCHCwNxl7hTu0xBgm/Zs=; b=pThxNC2iJ8WFGX/CGhbMOz/uQOkRR402aJuKaha93Pq/6th/4+9b3EQxZYmYMja8bt JGNVYLyczLTg2hqy5qZBWT3Qlql1DyFTk1DUA3WPaJxLhLn+HdXlsk1pmfYlKC32a0Ki zHxNATnnT6UY5TqJcf94WHtNHakHxKdieBbEnx3OasXX7Q21307PjKgIZSG5WzPjd5Ha nMA1b85jwYf6oFl3eUW1PUHpKY341qDdil2h3EQX+oGvUp1GvL6FUtzf5k9K2pB3+nAc RahaqWpJdCTC5R7p6o0Pm5Sl5Wsj+YZmm+yJOrvelDAt2iswhlHggy22P1C3OpaM7QAB friA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=DNv8EosKffE1C3jvJ0rJJftgCHCwNxl7hTu0xBgm/Zs=; b=td/G4KZASlFkbkqHj/OzUQyokL5dioKvd5Fn+ngqFGZZyjZVYK02N5MB/2jhcR4Gqx MR8efzAKHZMAP77oMJZa2F/9+/S2Utb1y/0HOu+BDoO5kvr0C89vBOk7Oq7tsgN0BrHY OsMWYkDOZMKk3HLel1iifQWzHWWyclu+KiIEO9DYh36bg/a1pBBPQHdOyI6eSCCedUci bbFIOqa6913T80CjA0oM+P3LMExGI/JIdOwREHfqxQwanqiGKYRIIexlMlWNAhlkgM5O H7qHQ6gXfccnOSylR1nOFHAsA3WPU0rE53Frd4VSr8+EMVot91MXwGq4zGRm8syLqOWl Ki8g== X-Gm-Message-State: AOAM531UstcJ8kYRiH7g1RXAhrg+qdb3PG3Bfuedf1Idy1e/o+NJmxoO kCWy5sal7hjD8tEIAet9ozSLBY3pz1ZGCt7SXkC+Ig== X-Google-Smtp-Source: ABdhPJyfkwi9gxSQB/Njp7wQRtuErG+nc9B9XUg+PiBnL0eCmyKFJMhKuJYI9/DMyjHpnFYTEqRY9c+gEThQzv17YF8= X-Received: by 2002:a17:902:aa86:b029:d6:99d8:3fff with SMTP id d6-20020a170902aa86b02900d699d83fffmr2954337plr.34.1604592541785; Thu, 05 Nov 2020 08:09:01 -0800 (PST) MIME-Version: 1.0 References: <20201026145114.59424-1-songmuchun@bytedance.com> <20201026145114.59424-6-songmuchun@bytedance.com> <20201105132337.GA7552@linux> In-Reply-To: <20201105132337.GA7552@linux> From: Muchun Song Date: Fri, 6 Nov 2020 00:08:22 +0800 Message-ID: Subject: Re: [External] Re: [PATCH v2 05/19] mm/hugetlb: Introduce pgtable allocation/freeing helpers To: Oscar Salvador Cc: Jonathan Corbet , Mike Kravetz , Thomas Gleixner , mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, Peter Zijlstra , viro@zeniv.linux.org.uk, Andrew Morton , paulmck@kernel.org, mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com, Randy Dunlap , oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, Mina Almasry , David Rientjes , Matthew Wilcox , Xiongchun duan , linux-doc@vger.kernel.org, LKML , Linux Memory Management List , linux-fsdevel Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Nov 5, 2020 at 9:23 PM Oscar Salvador wrote: > > On Mon, Oct 26, 2020 at 10:51:00PM +0800, Muchun Song wrote: > > +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP > > +#define VMEMMAP_HPAGE_SHIFT PMD_SHIFT > > +#define arch_vmemmap_support_huge_mapping() boot_cpu_has(X86_FEATURE_PSE) > > I do not think you need this. > We already have hugepages_supported(). Maybe some architectures support hugepage, but the vmemmap do not use the hugepage map. In this case, we need it. But I am not sure if it exists in the real world. At least, x86 can reuse hugepages_supported. > > > +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP > > +#ifndef arch_vmemmap_support_huge_mapping > > +static inline bool arch_vmemmap_support_huge_mapping(void) > > +{ > > + return false; > > +} > > Same as above > > > static inline unsigned int nr_free_vmemmap(struct hstate *h) > > { > > return h->nr_free_vmemmap_pages; > > } > > > > +static inline unsigned int nr_vmemmap(struct hstate *h) > > +{ > > + return nr_free_vmemmap(h) + RESERVE_VMEMMAP_NR; > > +} > > + > > +static inline unsigned long nr_vmemmap_size(struct hstate *h) > > +{ > > + return (unsigned long)nr_vmemmap(h) << PAGE_SHIFT; > > +} > > + > > +static inline unsigned int nr_pgtable(struct hstate *h) > > +{ > > + unsigned long vmemmap_size = nr_vmemmap_size(h); > > + > > + if (!arch_vmemmap_support_huge_mapping()) > > + return 0; > > + > > + /* > > + * No need pre-allocate page tabels when there is no vmemmap pages > > + * to free. > > + */ > > + if (!nr_free_vmemmap(h)) > > + return 0; > > + > > + return ALIGN(vmemmap_size, VMEMMAP_HPAGE_SIZE) >> VMEMMAP_HPAGE_SHIFT; > > +} > > IMHO, Mike's naming suggestion fit much better. I will do that. > > > +static void vmemmap_pgtable_deposit(struct page *page, pte_t *pte_p) > > +{ > > + pgtable_t pgtable = virt_to_page(pte_p); > > + > > + /* FIFO */ > > + if (!page_huge_pte(page)) > > + INIT_LIST_HEAD(&pgtable->lru); > > + else > > + list_add(&pgtable->lru, &page_huge_pte(page)->lru); > > + page_huge_pte(page) = pgtable; > > +} > > I think it would make more sense if this took a pgtable argument > instead of a pte_t *. Will do. Thanks for your suggestions. > > > +static pte_t *vmemmap_pgtable_withdraw(struct page *page) > > +{ > > + pgtable_t pgtable; > > + > > + /* FIFO */ > > + pgtable = page_huge_pte(page); > > + if (unlikely(!pgtable)) > > + return NULL; > > AFAICS, above check only needs to be run once. > It think we can move it to vmemmap_pgtable_free, can't we? Yeah, we can. Thanks. > > > + page_huge_pte(page) = list_first_entry_or_null(&pgtable->lru, > > + struct page, lru); > > + if (page_huge_pte(page)) > > + list_del(&pgtable->lru); > > + return page_to_virt(pgtable); > > +} > > At the risk of adding more code, I think it would be nice to return a > pagetable_t? > So it is more coherent with the name of the function and with what > we are doing. Yeah. > > It is a pity we cannot converge these and pgtable_trans_huge_*. > They share some code but it is different enough. > > > +static int vmemmap_pgtable_prealloc(struct hstate *h, struct page *page) > > +{ > > + int i; > > + pte_t *pte_p; > > + unsigned int nr = nr_pgtable(h); > > + > > + if (!nr) > > + return 0; > > + > > + vmemmap_pgtable_init(page); > > Maybe just open code this one? Sorry. I don't quite understand what it means. Could you explain? > > > +static inline void vmemmap_pgtable_free(struct hstate *h, struct page *page) > > +{ > > + pte_t *pte_p; > > + > > + if (!nr_pgtable(h)) > > + return; > > + > > + while ((pte_p = vmemmap_pgtable_withdraw(page))) > > + pte_free_kernel(&init_mm, pte_p); > > As mentioned above, move the pgtable_t check from vmemmap_pgtable_withdraw > in here. OK. > > > > static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) > > { > > + /* Must be called before the initialization of @page->lru */ > > + vmemmap_pgtable_free(h, page); > > + > > INIT_LIST_HEAD(&page->lru); > > set_compound_page_dtor(page, HUGETLB_PAGE_DTOR); > > set_hugetlb_cgroup(page, NULL); > > @@ -1783,6 +1892,14 @@ static struct page *alloc_fresh_huge_page(struct hstate *h, > > if (!page) > > return NULL; > > > > + if (vmemmap_pgtable_prealloc(h, page)) { > > + if (hstate_is_gigantic(h)) > > + free_gigantic_page(page, huge_page_order(h)); > > + else > > + put_page(page); > > + return NULL; > > + } > > + > > I must confess I am bit puzzled. > > IIUC, in this patch we are just adding the helpers to create/tear the page > tables. > I do not think we actually need to call vmemmap_pgtable_prealloc/vmemmap_pgtable_free, do we? > In the end, we are just allocating pages for pagetables and then free them shortly. > > I think it would make more sense to add the calls when they need to be? OK, will do. Thanks very much. > > > -- > Oscar Salvador > SUSE L3 -- Yours, Muchun