From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D91CC433DB for ; Thu, 4 Feb 2021 03:54:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DF02E64F5E for ; Thu, 4 Feb 2021 03:54:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DF02E64F5E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6F4026B0072; Wed, 3 Feb 2021 22:54:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6886B6B0073; Wed, 3 Feb 2021 22:54:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 549FF6B0074; Wed, 3 Feb 2021 22:54:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0249.hostedemail.com [216.40.44.249]) by kanga.kvack.org (Postfix) with ESMTP id 3A3606B0072 for ; Wed, 3 Feb 2021 22:54:43 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 04260180AD815 for ; Thu, 4 Feb 2021 03:54:43 +0000 (UTC) X-FDA: 77779218846.15.spark37_4303bfc275d9 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin15.hostedemail.com (Postfix) with ESMTP id D4CE91814B0C1 for ; Thu, 4 Feb 2021 03:54:42 +0000 (UTC) X-HE-Tag: spark37_4303bfc275d9 X-Filterd-Recvd-Size: 11836 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by imf45.hostedemail.com (Postfix) with ESMTP for ; Thu, 4 Feb 2021 03:54:42 +0000 (UTC) Received: by mail-pl1-f177.google.com with SMTP id x9so1039007plb.5 for ; Wed, 03 Feb 2021 19:54:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=mzEE5dMYm3uaH3H3dlBrJOpsyDd+Maq9i6h9c/DgK5c=; b=ANvsIfyFgYUgwvAmNnBti4IVtWO0awEgD/8GT4aqnmwDMFnkWF3xL2n6SCFsfr5Mz8 EvjtopyRs5DyXMk2j6n/WG8xQYbRBhX/AbrbVTCC+LMFt7MEnPfxl8pAk+FyL0E05Ffr kw1tdzQ0wPHhskCAPSFGFHl2dhY/2O1GMrAdVV1wY+DlicDQhPuc2WpRRTjNNIWF8qq2 VMLOpD7KP/pQJAq7XYUE3RNlq4E4VckA/tc7l2AVrBqnlFYtzm8J8VTWSarCCmW0sNX6 JgCl2EbZvPA6ulpvq8pHigXLynbaitBhZb/NI7lpnKFXNyDsITgI4+baMx+qvJvB62tI zQaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=mzEE5dMYm3uaH3H3dlBrJOpsyDd+Maq9i6h9c/DgK5c=; b=CrKT9oeQi21jH08gQZfvQYVm2336baC63ACI9cARzZzVKPYixfwNRqxsj8ME/FIrrF /uHJJpAnvAd7FvZmugeo8gM7QxSl0S5MByHnk3jAhxZ5ba2TU97VuVej58kywIrsKzwJ DIbHtkRqbfGHvhn8Q5qF6EaaR/0Nh2B+jdW7Vt5h7PwulIcswbbkk6UdiuO0Qo/IXOo5 und1WZKYopaN5qEg6dIWLlq+e0F6riGeqhd6hxE1DQHNIEXTa9qmBc2kSZTO+f+pJdtv TLhsvKFnc0+jWm4aZtndiXmv8K7JclBlrlJPudEDu4OE6+j/83qcXN+Bq+zF+NsKjYhq sbFw== X-Gm-Message-State: AOAM532vKBLCTF1ovrFeSIDK9mudr7CfqjVuOCWbMdewVDfAzxvDWyJT U9bKOI4tO/GJ4XKu/wvpgbz7Cw== X-Google-Smtp-Source: ABdhPJznX6V2uFsLCVlQPVHV1yCg/5uHfNmlj6D5ICIjl4lRF9gHv3KZY4Dt3XrXZMr8e2TfNavlxQ== X-Received: by 2002:a17:90a:ca8d:: with SMTP id y13mr6301179pjt.76.1612410881505; Wed, 03 Feb 2021 19:54:41 -0800 (PST) Received: from localhost.localdomain ([139.177.225.239]) by smtp.gmail.com with ESMTPSA id 9sm3747466pfy.110.2021.02.03.19.54.30 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 03 Feb 2021 19:54:40 -0800 (PST) From: Muchun Song To: corbet@lwn.net, mike.kravetz@oracle.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, paulmck@kernel.org, mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com, rdunlap@infradead.org, oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, almasrymina@google.com, rientjes@google.com, willy@infradead.org, osalvador@suse.de, mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com, naoya.horiguchi@nec.com Cc: duanxiongchun@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Muchun Song Subject: [PATCH v14 4/8] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page Date: Thu, 4 Feb 2021 11:50:39 +0800 Message-Id: <20210204035043.36609-5-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210204035043.36609-1-songmuchun@bytedance.com> References: <20210204035043.36609-1-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When we free a HugeTLB page to the buddy allocator, we should allocate th= e vmemmap pages associated with it. But we may cannot allocate vmemmap page= s when the system is under memory pressure, in this case, we just refuse to free the HugeTLB page instead of looping forever trying to allocate the pages. Signed-off-by: Muchun Song --- include/linux/mm.h | 2 ++ mm/hugetlb.c | 19 ++++++++++++- mm/hugetlb_vmemmap.c | 30 +++++++++++++++++++++ mm/hugetlb_vmemmap.h | 8 ++++++ mm/sparse-vmemmap.c | 75 ++++++++++++++++++++++++++++++++++++++++++++++= +++++- 5 files changed, 132 insertions(+), 2 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index d7dddf334779..33c5911afe18 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2981,6 +2981,8 @@ static inline void print_vma_addr(char *prefix, uns= igned long rip) =20 void vmemmap_remap_free(unsigned long start, unsigned long end, unsigned long reuse); +int vmemmap_remap_alloc(unsigned long start, unsigned long end, + unsigned long reuse, gfp_t gfp_mask); =20 void *sparse_buffer_alloc(unsigned long size); struct page * __populate_section_memmap(unsigned long pfn, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4cfca27c6d32..5518283aa667 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1397,16 +1397,26 @@ static void __free_huge_page(struct page *page) h->resv_huge_pages++; =20 if (HPageTemporary(page)) { - list_del(&page->lru); ClearHPageTemporary(page); + + if (alloc_huge_page_vmemmap(h, page, GFP_ATOMIC)) { + h->surplus_huge_pages++; + h->surplus_huge_pages_node[nid]++; + goto enqueue; + } + list_del(&page->lru); update_and_free_page(h, page); } else if (h->surplus_huge_pages_node[nid]) { + if (alloc_huge_page_vmemmap(h, page, GFP_ATOMIC)) + goto enqueue; + /* remove the page from active list */ list_del(&page->lru); update_and_free_page(h, page); h->surplus_huge_pages--; h->surplus_huge_pages_node[nid]--; } else { +enqueue: arch_clear_hugepage_flags(page); enqueue_huge_page(h, page); } @@ -1693,6 +1703,10 @@ static int free_pool_huge_page(struct hstate *h, n= odemask_t *nodes_allowed, struct page *page =3D list_entry(h->hugepage_freelists[node].next, struct page, lru); + + if (alloc_huge_page_vmemmap(h, page, GFP_ATOMIC)) + break; + list_del(&page->lru); h->free_huge_pages--; h->free_huge_pages_node[node]--; @@ -1760,6 +1774,9 @@ int dissolve_free_huge_page(struct page *page) goto retry; } =20 + if (alloc_huge_page_vmemmap(h, head, GFP_ATOMIC)) + goto out; + /* * Move PageHWPoison flag from head page to the raw error page, * which makes any subpages rather than the error page reusable. diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index ddd872ab6180..0bd6b8d7282d 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -169,6 +169,8 @@ * (last) level. So this type of HugeTLB page can be optimized only when= its * size of the struct page structs is greater than 2 pages. */ +#define pr_fmt(fmt) "HugeTLB: " fmt + #include "hugetlb_vmemmap.h" =20 /* @@ -198,6 +200,34 @@ static inline unsigned long free_vmemmap_pages_size_= per_hpage(struct hstate *h) return (unsigned long)free_vmemmap_pages_per_hpage(h) << PAGE_SHIFT; } =20 +int alloc_huge_page_vmemmap(struct hstate *h, struct page *head, gfp_t g= fp_mask) +{ + int ret; + unsigned long vmemmap_addr =3D (unsigned long)head; + unsigned long vmemmap_end, vmemmap_reuse; + + if (!free_vmemmap_pages_per_hpage(h)) + return 0; + + vmemmap_addr +=3D RESERVE_VMEMMAP_SIZE; + vmemmap_end =3D vmemmap_addr + free_vmemmap_pages_size_per_hpage(h); + vmemmap_reuse =3D vmemmap_addr - PAGE_SIZE; + + /* + * The pages which the vmemmap virtual address range [@vmemmap_addr, + * @vmemmap_end) are mapped to are freed to the buddy allocator, and + * the range is mapped to the page which @vmemmap_reuse is mapped to. + * When a HugeTLB page is freed to the buddy allocator, previously + * discarded vmemmap pages must be allocated and remapping. + */ + ret =3D vmemmap_remap_alloc(vmemmap_addr, vmemmap_end, vmemmap_reuse, + gfp_mask | __GFP_NOWARN | __GFP_THISNODE); + if (ret =3D=3D -ENOMEM) + pr_info("cannot alloc vmemmap pages\n"); + + return ret; +} + void free_huge_page_vmemmap(struct hstate *h, struct page *head) { unsigned long vmemmap_addr =3D (unsigned long)head; diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h index 6923f03534d5..6f89a9eed02c 100644 --- a/mm/hugetlb_vmemmap.h +++ b/mm/hugetlb_vmemmap.h @@ -11,8 +11,16 @@ #include =20 #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP +int alloc_huge_page_vmemmap(struct hstate *h, struct page *head, + gfp_t gfp_mask); void free_huge_page_vmemmap(struct hstate *h, struct page *head); #else +static inline int alloc_huge_page_vmemmap(struct hstate *h, struct page = *head, + gfp_t gfp_mask) +{ + return 0; +} + static inline void free_huge_page_vmemmap(struct hstate *h, struct page = *head) { } diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index 50c1dc00b686..277eb43aebd5 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -40,7 +40,8 @@ * @remap_pte: called for each non-empty PTE (lowest-level) entry. * @reuse_page: the page which is reused for the tail vmemmap pages. * @reuse_addr: the virtual address of the @reuse_page page. - * @vmemmap_pages: the list head of the vmemmap pages that can be freed. + * @vmemmap_pages: the list head of the vmemmap pages that can be freed + * or is mapped from. */ struct vmemmap_remap_walk { void (*remap_pte)(pte_t *pte, unsigned long addr, @@ -237,6 +238,78 @@ void vmemmap_remap_free(unsigned long start, unsigne= d long end, free_vmemmap_page_list(&vmemmap_pages); } =20 +static void vmemmap_restore_pte(pte_t *pte, unsigned long addr, + struct vmemmap_remap_walk *walk) +{ + pgprot_t pgprot =3D PAGE_KERNEL; + struct page *page; + void *to; + + BUG_ON(pte_page(*pte) !=3D walk->reuse_page); + + page =3D list_first_entry(walk->vmemmap_pages, struct page, lru); + list_del(&page->lru); + to =3D page_to_virt(page); + copy_page(to, (void *)walk->reuse_addr); + + set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot)); +} + +static int alloc_vmemmap_page_list(unsigned long start, unsigned long en= d, + gfp_t gfp_mask, struct list_head *list) +{ + unsigned long addr; + int nid =3D page_to_nid((const void *)start); + struct page *page, *next; + + for (addr =3D start; addr < end; addr +=3D PAGE_SIZE) { + page =3D alloc_pages_node(nid, gfp_mask, 0); + if (!page) + goto out; + list_add_tail(&page->lru, list); + } + + return 0; +out: + list_for_each_entry_safe(page, next, list, lru) + __free_pages(page, 0); + return -ENOMEM; +} + +/** + * vmemmap_remap_alloc - remap the vmemmap virtual address range [@start= , end) + * to the page which is from the @vmemmap_pages + * respectively. + * @start: start address of the vmemmap virtual address range that we wa= nt + * to remap. + * @end: end address of the vmemmap virtual address range that we want t= o + * remap. + * @reuse: reuse address. + * @gpf_mask: GFP flag for allocating vmemmap pages. + */ +int vmemmap_remap_alloc(unsigned long start, unsigned long end, + unsigned long reuse, gfp_t gfp_mask) +{ + LIST_HEAD(vmemmap_pages); + struct vmemmap_remap_walk walk =3D { + .remap_pte =3D vmemmap_restore_pte, + .reuse_addr =3D reuse, + .vmemmap_pages =3D &vmemmap_pages, + }; + + /* See the comment in the vmemmap_remap_free(). */ + BUG_ON(start - reuse !=3D PAGE_SIZE); + + might_sleep_if(gfpflags_allow_blocking(gfp_mask)); + + if (alloc_vmemmap_page_list(start, end, gfp_mask, &vmemmap_pages)) + return -ENOMEM; + + vmemmap_remap_range(reuse, end, &walk); + + return 0; +} + /* * Allocate a block of memory to be used to back the virtual memory map * or to back the page tables that are used to create the mapping. --=20 2.11.0