From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 34398C433FE
	for <linux-mm@archiver.kernel.org>; Thu, 10 Feb 2022 07:45:47 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id B08F96B0075; Thu, 10 Feb 2022 02:45:46 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id AB9A56B007B; Thu, 10 Feb 2022 02:45:46 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 981726B007D; Thu, 10 Feb 2022 02:45:46 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0157.hostedemail.com [216.40.44.157])
	by kanga.kvack.org (Postfix) with ESMTP id 86AC66B0075
	for <linux-mm@kvack.org>; Thu, 10 Feb 2022 02:45:46 -0500 (EST)
Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay04.hostedemail.com (Postfix) with ESMTP id 4343298C2B
	for <linux-mm@kvack.org>; Thu, 10 Feb 2022 07:45:46 +0000 (UTC)
X-FDA: 79126085892.12.50CF393
Received: from mail-yb1-f173.google.com (mail-yb1-f173.google.com [209.85.219.173])
	by imf16.hostedemail.com (Postfix) with ESMTP id 30370180002
	for <linux-mm@kvack.org>; Thu, 10 Feb 2022 07:45:44 +0000 (UTC)
Received: by mail-yb1-f173.google.com with SMTP id c6so13004353ybk.3
        for <linux-mm@kvack.org>; Wed, 09 Feb 2022 23:45:43 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=bytedance-com.20210112.gappssmtp.com; s=20210112;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=H0bpw8ZiHtzokDVaZikQejorexIqOBZeSJyqUU8svCs=;
        b=0cKvSKcc1KKaEgZvwZfBB/sT+3/J1bnyZKqgjpIZH3rOhPAVTwoVlFnAwsIH1eY6+w
         xG5HvcMtYGBEMZD95shkO5qKpKZQqrDhZBPWJUNSKw8AEoiJQlXPfzDggY1b02YLIVYz
         INdKXKl5tkMhiaCPWdSeatI0sUHNHWRnSzSN4bC74ET3lwztwbIa3iQlEurjNt9gYOfG
         9vxZ8pfHutYzYB+KdqxiEDx7G0XFvBEbc8IXucA0do8ZjID/HpgHx8FpfUxtWOjH11DW
         1lJ7dT2Nuhmr/xXRAnR1K+TaX8CFHjHf+2L5A0sDifPNWsD3JWVNe0TV/JJPl+7m/uQv
         AWEg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=H0bpw8ZiHtzokDVaZikQejorexIqOBZeSJyqUU8svCs=;
        b=NN6uvDGuiLS82uiCR7Pf6mIdXFG0eD3ryAyw43DYaCgm/vODOe6GYoEFbv4BY53p8B
         7DuWKQqNKFl58xY+tXBZovitCo1ZGDWVs8hDzKczKHsuPEiTNlRCqnfQeeBvhuCCP/5U
         KjdkOtTPWBA1I7/gZIPxbFlBMJoNI25gYT1z2m5qYua4xvLA8yCLmvhuYaNelmfTReXx
         LGfDNYMcN9C3uBwtRHrqaNTPb63tZXtkkJo6hZJ4Ypd/Iaq1GLIFn4ZU/9sVa0p+d2UH
         kCpQ0ap7TRTTJCsqVtmf3XGdSi/jbFor5oqlERZuTBIMPSlblOSDtsLhHuvA6oXhrWph
         6dqw==
X-Gm-Message-State: AOAM5301K78tZ8GlDTNviW1HtH7MncYEGODhq4odGIXWHF4PenISdFPg
	4j/0E757+NE4DFoGmrDPpyGcohGXDwATw/EuKXDIBw==
X-Google-Smtp-Source: ABdhPJy551ndV8++kKVWq1nu0+LX/n1eH0hGXi+5byk51qcTH4o+UpUO/ZlPArXPB3Hi3VB75BpAsJShzG4LcxV+yJ4=
X-Received: by 2002:a81:4051:: with SMTP id m17mr5819673ywn.319.1644479143117;
 Wed, 09 Feb 2022 23:45:43 -0800 (PST)
MIME-Version: 1.0
References: <20211101031651.75851-1-songmuchun@bytedance.com>
 <CAMZfGtUeL45=WG3ceaZ_tALMGZTLtuD9jbfKEzeQv270OnaLYQ@mail.gmail.com>
 <35c5217d-eb8f-6f70-544a-a3e8bd009a46@oracle.com> <CAMZfGtW=e___8kpe1B5a1rK+SV63bP_Nwucj89QKaX4ZOMUpaw@mail.gmail.com>
 <CAMZfGtV0JKcjVL0qGoYCQJ-LsXdng7Z2UjBym5hf_WM0LcYi=Q@mail.gmail.com>
 <20211123190952.7d1e0cac2d72acacd2df016c@linux-foundation.org>
 <CAMZfGtW+DoKp_gCcPSy33Urc86A58rRp8HJ+-GOuW9vFP-BwxQ@mail.gmail.com>
 <CAMZfGtV-ODDGQ6dUuo_eSEDn2kDSiUjNDc=nyFwoRRSxNLsTeg@mail.gmail.com> <fd4a237e-7494-9bd5-1952-164c22896c43@oracle.com>
In-Reply-To: <fd4a237e-7494-9bd5-1952-164c22896c43@oracle.com>
From: Muchun Song <songmuchun@bytedance.com>
Date: Thu, 10 Feb 2022 15:45:06 +0800
Message-ID: <CAMZfGtXPUNWheky0PLCOryDBMxGA7vghkzVFC8xahX7+_sUQ8w@mail.gmail.com>
Subject: Re: [PATCH v7 0/5] Free the 2nd vmemmap page associated with each
 HugeTLB page
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, Oscar Salvador <osalvador@suse.de>, 
	David Hildenbrand <david@redhat.com>, Michal Hocko <mhocko@suse.com>, Matthew Wilcox <willy@infradead.org>, 
	Jonathan Corbet <corbet@lwn.net>, Xiongchun duan <duanxiongchun@bytedance.com>, 
	Fam Zheng <fam.zheng@bytedance.com>, Muchun Song <smuchun@gmail.com>, 
	Qi Zheng <zhengqi.arch@bytedance.com>, 
	Linux Doc Mailing List <linux-doc@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>, 
	Linux Memory Management List <linux-mm@kvack.org>, 
	"Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>, Barry Song <21cnbao@gmail.com>, 
	"Bodeddula, Balasubramaniam" <bodeddub@amazon.com>, Jue Wang <juew@google.com>
Content-Type: text/plain; charset="UTF-8"
Authentication-Results: imf16.hostedemail.com;
	dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=0cKvSKcc;
	spf=pass (imf16.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.219.173 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com;
	dmarc=pass (policy=none) header.from=bytedance.com
X-Stat-Signature: fnhfmjf194poojq83hzn995s4dtiudgp
X-Rspamd-Server: rspam12
X-Rspamd-Queue-Id: 30370180002
X-Rspam-User: 
X-HE-Tag: 1644479144-907752
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Thu, Feb 10, 2022 at 6:49 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 2/8/22 23:44, Muchun Song wrote:
> > On Wed, Jan 26, 2022 at 4:04 PM Muchun Song <songmuchun@bytedance.com> wrote:
> >>
> >> On Wed, Nov 24, 2021 at 11:09 AM Andrew Morton
> >> <akpm@linux-foundation.org> wrote:
> >>>
> >>> On Mon, 22 Nov 2021 12:21:32 +0800 Muchun Song <songmuchun@bytedance.com> wrote:
> >>>
> >>>> On Wed, Nov 10, 2021 at 2:18 PM Muchun Song <songmuchun@bytedance.com> wrote:
> >>>>>
> >>>>> On Tue, Nov 9, 2021 at 3:33 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> >>>>>>
> >>>>>> On 11/8/21 12:16 AM, Muchun Song wrote:
> >>>>>>> On Mon, Nov 1, 2021 at 11:22 AM Muchun Song <songmuchun@bytedance.com> wrote:
> >>>>>>>>
> >>>>>>>> This series can minimize the overhead of struct page for 2MB HugeTLB pages
> >>>>>>>> significantly. It further reduces the overhead of struct page by 12.5% for
> >>>>>>>> a 2MB HugeTLB compared to the previous approach, which means 2GB per 1TB
> >>>>>>>> HugeTLB. It is a nice gain. Comments and reviews are welcome. Thanks.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> Ping guys. Does anyone have any comments or suggestions
> >>>>>>> on this series?
> >>>>>>>
> >>>>>>> Thanks.
> >>>>>>>
> >>>>>>
> >>>>>> I did look over the series earlier.  I have no issue with the hugetlb and
> >>>>>> vmemmap modifications as they are enhancements to the existing
> >>>>>> optimizations.  My primary concern is the (small) increased overhead
> >>>>>> for the helpers as outlined in your cover letter.  Since these helpers
> >>>>>> are not limited to hugetlb and used throughout the kernel, I would
> >>>>>> really like to get comments from others with a better understanding of
> >>>>>> the potential impact.
> >>>>>
> >>>>> Thanks Mike. I'd like to hear others' comments about this as well.
> >>>>> From my point of view, maybe the (small) overhead is acceptable
> >>>>> since it only affects the head page, however Matthew Wilcox's folio
> >>>>> series could reduce this situation as well.
> >>>
> >>> I think Mike was inviting you to run some tests to quantify the
> >>> overhead ;)
> >>
> >> Hi Andrew,
> >>
> >> Sorry for the late reply.
> >>
> >> Specific overhead figures are already in the cover letter. Also,
> >> I did some other tests, e.g. kernel compilation, sysbench. I didn't
> >> see any regressions.
> >
> > The overhead is introduced by page_fixed_fake_head() which
> > has an "if" statement and an access to a possible cold cache line.
> > I think the main overhead is from the latter. However, probabilistically,
> > only 1/64 of the pages need to do the latter.  And
> > page_fixed_fake_head() is already simple (I mean the overhead
> > is small enough) and many performance bottlenecks in mm are
> > not in compound_head().  This also matches the tests I did.
> > I didn't see any regressions after enabling this feature.
> >
> > I knew Mike's concern is the increased overhead to use cases
> > beyond HugeTLB. If we really want to avoid the access to
> > a possible cold cache line, we can introduce a new page
> > flag like PG_hugetlb and test if it is set in the page->flags,
> > if so, then return the read head page struct. Then
> > page_fixed_fake_head() looks like below.
> >
> > static __always_inline const struct page *page_fixed_fake_head(const
> > struct page *page)
> > {
> >         if (!hugetlb_free_vmemmap_enabled())
> >                 return page;
> >
> >         if (test_bit(PG_hugetlb, &page->flags)) {
> >                 unsigned long head = READ_ONCE(page[1].compound_head);
> >
> >                 if (likely(head & 1))
> >                         return (const struct page *)(head - 1);
> >         }
> >         return page;
> > }
> >
> > But I don't think it's worth doing this.
> >
> > Hi Mike and Andrew,
> >
> > Since these helpers are not limited to hugetlb and used throughout the
> > kernel, I would really like to get comments from others with a better
> > understanding of the potential impact. Do you have any appropriate
> > reviewers to invite?
> >
>
> I think the appropriate people are already on Cc as they provided input on
> the original vmemmap optimization series.
>
> The question that needs to be answered is simple enough:  Is the savings of
> one vmemmap page per hugetlb page worth the extra minimal overhead in
> compound_head()?  Like most things, this depends on workload.
>
> One thing to note is that compound_page() overhead is only introduced if
> hugetlb vmemmap freeing is enabled.  Correct?

Definitely correct.

> During the original vmemmap
> optimization discussions, people thought it important that this be 'opt in'.  I do not know if distos will enable this by default.  But, perhaps the
> potential overhead can be thought of as just part of 'opting in' for
> vmemmap optimizations.

I agree. Does anyone else have a different opinion?

Thanks.