From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 686E5C433DB for ; Tue, 12 Jan 2021 11:34:20 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A47CF23102 for ; Tue, 12 Jan 2021 11:34:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A47CF23102 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2D78D6B00C8; Tue, 12 Jan 2021 06:34:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 25FE38D0090; Tue, 12 Jan 2021 06:34:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 150886B02A2; Tue, 12 Jan 2021 06:34:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0028.hostedemail.com [216.40.44.28]) by kanga.kvack.org (Postfix) with ESMTP id F07FD6B00C8 for ; Tue, 12 Jan 2021 06:34:18 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id BA92F180AD80F for ; Tue, 12 Jan 2021 11:34:18 +0000 (UTC) X-FDA: 77696914596.16.smoke17_2b0967a27515 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id 97277100E690B for ; Tue, 12 Jan 2021 11:34:18 +0000 (UTC) X-HE-Tag: smoke17_2b0967a27515 X-Filterd-Recvd-Size: 8080 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) by imf09.hostedemail.com (Postfix) with ESMTP for ; Tue, 12 Jan 2021 11:34:17 +0000 (UTC) Received: by mail-pl1-f174.google.com with SMTP id x18so1254255pln.6 for ; Tue, 12 Jan 2021 03:34:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=x4fbLWMy7tyZTn+4QhCx/eBNGb6COT/VlIQAs5b2FxE=; b=Q1oD52jI+aIUjL3KnRIVupMshuaeY6O8X/28pjs+B+/kBY7wiMWUtc3S9jlQLJk1f+ sGCb/NU0LXotWlRqajiD9ffQbrmsyRBE4iHewv0m/snjcHhFUdgpVT+vHqfBCt19vIYx DmvqVTwcDuXSSI3xpV1LDw4kvZJBFjaR3nFJfKBFSNKnhk3IpaO8psrhfQik56VPPFlS loxHeAK59IhbNGLuEUYl4MIuJVZuVqaQNp06xI32ZHaTmsXdBe34YVqw9ZDVw6zLiztG iUrSn8I7mOuLNx5d2ZL4r1zlkHK0t+rxCvwkagsFhRGbIGl1c7zRA9nB0Ez5/4yBnvpx y57A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=x4fbLWMy7tyZTn+4QhCx/eBNGb6COT/VlIQAs5b2FxE=; b=jv0vOTFyfnAQRvSczSgn7MbUq42xdXgcT5bbx+BicmwmGbzcm1mpWBhQL62zvNwcPL DsUx+KdfCP+H3g9fMhCaiZvdgndoSKAbkjxEBK0SZH9LSga97592KT7O+nCx8QKeopAJ wfLSy8GIFX2RdmvwqDXwCMTdF7IQCLSwXTPdZk8b5Bewqtg/HsF5Ych9NYc5ivySljUK 9SI0P/VAwQMFILJm8WeBKgLpJqRCnusBizXDpjgkV7KdC7TXYMHpymBurfMtwiqqWfQV gJxLopqarNNvGA1hgCjN7EiMynqWQWcYE6WB1VHhTZ1I9lqurnqzBjb7ZdTxb7hpr4Lb 9u4g== X-Gm-Message-State: AOAM531AVoxiAOwWNuEdoYkrBL1qFsi1zpFb/+rcneb0Nd2GyV56UYbY dRq1RtGeAYxj4Us7U6HlLR5qG3H6eXFu6ySWc5MEcw== X-Google-Smtp-Source: ABdhPJyVdXNDm11fdbJsCJ5DrnV2EarI5jkwSk+Egq8s0ShDsjrDdPaqcCK37QtLNK4gxDR3vY8z6XMXc+kvJVUHyzE= X-Received: by 2002:a17:90a:5405:: with SMTP id z5mr4284976pjh.13.1610451256706; Tue, 12 Jan 2021 03:34:16 -0800 (PST) MIME-Version: 1.0 References: <20210106141931.73931-1-songmuchun@bytedance.com> <20210106141931.73931-5-songmuchun@bytedance.com> <20210112080453.GA10895@linux> In-Reply-To: <20210112080453.GA10895@linux> From: Muchun Song Date: Tue, 12 Jan 2021 19:33:33 +0800 Message-ID: Subject: Re: [External] Re: [PATCH v12 04/13] mm/hugetlb: Free the vmemmap pages associated with each HugeTLB page To: Oscar Salvador Cc: Jonathan Corbet , Mike Kravetz , Thomas Gleixner , mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, Peter Zijlstra , viro@zeniv.linux.org.uk, Andrew Morton , paulmck@kernel.org, mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com, Randy Dunlap , oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, Mina Almasry , David Rientjes , Matthew Wilcox , Michal Hocko , "Song Bao Hua (Barry Song)" , David Hildenbrand , =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= , Xiongchun duan , linux-doc@vger.kernel.org, LKML , Linux Memory Management List , linux-fsdevel Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 12, 2021 at 4:05 PM Oscar Salvador wrote: > > On Wed, Jan 06, 2021 at 10:19:22PM +0800, Muchun Song wrote: > > Every HugeTLB has more than one struct page structure. We __know__ that > > we only use the first 4(HUGETLB_CGROUP_MIN_ORDER) struct page structures > > to store metadata associated with each HugeTLB. > > > > There are a lot of struct page structures associated with each HugeTLB > > page. For tail pages, the value of compound_head is the same. So we can > > reuse first page of tail page structures. We map the virtual addresses > > of the remaining pages of tail page structures to the first tail page > > struct, and then free these page frames. Therefore, we need to reserve > > two pages as vmemmap areas. > > > > When we allocate a HugeTLB page from the buddy, we can free some vmemmap > > pages associated with each HugeTLB page. It is more appropriate to do it > > in the prep_new_huge_page(). > > > > The free_vmemmap_pages_per_hpage(), which indicates how many vmemmap > > pages associated with a HugeTLB page can be freed, returns zero for > > now, which means the feature is disabled. We will enable it once all > > the infrastructure is there. > > > > Signed-off-by: Muchun Song > > My memory may betray me after vacation, so bear with me. Welcome back. :) > > > +/* > > + * Any memory allocated via the memblock allocator and not via the > > + * buddy will be marked reserved already in the memmap. For those > > + * pages, we can call this function to free it to buddy allocator. > > + */ > > +static inline void free_bootmem_page(struct page *page) > > +{ > > + unsigned long magic = (unsigned long)page->freelist; > > + > > + /* > > + * The reserve_bootmem_region sets the reserved flag on bootmem > > + * pages. > > + */ > > + VM_WARN_ON_PAGE(page_ref_count(page) != 2, page); > > I have been thinking about this some more. > And while I think that this macro might have its room somewhere, I do not > think this is the case. > > Here, if we see that page's refcount differs from 2 it means that we had an > earlier corruption. > Now, as a person that has dealt with debugging memory corruptions, I think it > is of no use to proceed further if such corruption happened, as this can lead > to problems somewhere else that can manifest in funny ways, and you will find > yourself scratching your head and trying to work out what happened. > > I am aware that this is not the root of the problem here, as someone might have > had to decrease the refcount, but I would definitely change this to its > VM_BUG_ON_* variant. OK. I will change this to VM_BUG_ON_PAGE. > > > --- /dev/null > > +++ b/mm/hugetlb_vmemmap.c > > [...] > > > diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h > > new file mode 100644 > > index 000000000000..6923f03534d5 > > --- /dev/null > > +++ b/mm/hugetlb_vmemmap.h > > [...] > > > +/** > > + * vmemmap_remap_free - remap the vmemmap virtual address range [@start, @end) > > + * to the page which @reuse is mapped, then free vmemmap > > + * pages. > > + * @start: start address of the vmemmap virtual address range. > > + * @end: end address of the vmemmap virtual address range. > > + * @reuse: reuse address. > > + */ > > +void vmemmap_remap_free(unsigned long start, unsigned long end, > > + unsigned long reuse) > > +{ > > + LIST_HEAD(vmemmap_pages); > > + struct vmemmap_remap_walk walk = { > > + .remap_pte = vmemmap_remap_pte, > > + .reuse_addr = reuse, > > + .vmemmap_pages = &vmemmap_pages, > > + }; > > + > > + BUG_ON(start != reuse + PAGE_SIZE); > > It seems a bit odd to only pass "start" for the BUG_ON. > Also, I kind of dislike the "addr += PAGE_SIZE" in vmemmap_pte_range. > > I wonder if adding a ".remap_start_addr" would make more sense. > And adding it here with the vmemmap_remap_walk init. How about introducing a new function which aims to get the reuse page? In this case, we can drop the BUG_ON() and "addr += PAGE_SIZE" which is in vmemmap_pte_range. The vmemmap_remap_range only does the remapping. > > > -- > Oscar Salvador > SUSE L3