From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=6rh8=FZ=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-8.7 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id AF2D0C433DB
	for <linux-mm@archiver.kernel.org>; Mon, 21 Dec 2020 11:08:00 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 2092522BEF
	for <linux-mm@archiver.kernel.org>; Mon, 21 Dec 2020 11:07:59 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2092522BEF
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 525276B0068; Mon, 21 Dec 2020 06:07:59 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 4D2FB6B006C; Mon, 21 Dec 2020 06:07:59 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 374DA6B006E; Mon, 21 Dec 2020 06:07:59 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0070.hostedemail.com [216.40.44.70])
	by kanga.kvack.org (Postfix) with ESMTP id 1F6696B0068
	for <linux-mm@kvack.org>; Mon, 21 Dec 2020 06:07:59 -0500 (EST)
Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay01.hostedemail.com (Postfix) with ESMTP id CEF66180AD815
	for <linux-mm@kvack.org>; Mon, 21 Dec 2020 11:07:58 +0000 (UTC)
X-FDA: 77617014636.14.pin35_480288727456
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin14.hostedemail.com (Postfix) with ESMTP id AF61318020B28
	for <linux-mm@kvack.org>; Mon, 21 Dec 2020 11:07:58 +0000 (UTC)
X-HE-Tag: pin35_480288727456
X-Filterd-Recvd-Size: 7004
Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179])
	by imf46.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Mon, 21 Dec 2020 11:07:57 +0000 (UTC)
Received: by mail-pl1-f179.google.com with SMTP id b8so5447829plx.0
        for <linux-mm@kvack.org>; Mon, 21 Dec 2020 03:07:57 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=bytedance-com.20150623.gappssmtp.com; s=20150623;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=BOlZCOLM2fiX9IZHU6rHTKDZB2KqCTIIheGSQkZLkzk=;
        b=keW0J8Q4lXZv+SIUf1De3T7sb3p2+MR0Lv0chpM09bx0Nf5BD8D5xSd/IAc7+zxacC
         TWPoC21wd4gx+W52iqF52xIef/afr+jhVsi7ICGWneMOi4bju/knXoiROvRkFBMcggys
         gkA4Ik1i+mImdXy06ZzmVZofmSfej86CBElBp5x9DGRYfeQd4HWCLFRlbaKjaowPGAsJ
         IVkAjGPr4+BIpMFGmzi2+qnWEg3k/aB5NkTtneAyxaWGJsaxzm+lfSEVxypucCTxG0Lf
         p3NLvydg4KDMuXdIT3HZUgFFg5XlzjgZy5lcvJZS07IWPc3vUWStqiexCXRwm73UpQh6
         Onmw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=BOlZCOLM2fiX9IZHU6rHTKDZB2KqCTIIheGSQkZLkzk=;
        b=BQT/DjkfSKSH3vMFBRCJWVGMnNuFiRk908fk/QDjmT4CiOZpKpWEpZ9AC+UTwHZ1zb
         Vvufmbx2cB97zw48LTVAOmd89t+CuL4OV+j21kpuzTXyozEkhCFDHJYIs+yD+z8wzKFh
         l6d0UHL1J8jBjfOVDKEGm1Z3ngvaMfBJH+ZzD7ITHyZW0C6U1ddw3GtE4uvKRHnhG11C
         n8vdLlmNRuAwBd1LzKJPcFZDpv6KvDPAK8hVoFNzuehuH2GR4U9DjpIZzO8+T4zlusq5
         aWzgYpnHKDp+cZP/BaYD0MaohrqT3FOf0shCUW96UfTboZorOrjjopf7UWlMuQfZU2gx
         mjUQ==
X-Gm-Message-State: AOAM5311bHyYR7xtSkiFLV0YsK5sko8pD1Fe8Vs1V0fM65UjIAsBJ6cq
	+YzC6mom3odO4VteTeO/MBJn/ip5n/5Fw7ZeS7kVVQ==
X-Google-Smtp-Source: ABdhPJwoJStF1A669nR7sEzwHJAQg6sembIt+S/xmXLOZRFLjO1my0AFoe54teplQtqVoxYW3ICQo1YTApqa6dAcBYs=
X-Received: by 2002:a17:902:8503:b029:dc:44f:62d8 with SMTP id
 bj3-20020a1709028503b02900dc044f62d8mr15600526plb.34.1608548876589; Mon, 21
 Dec 2020 03:07:56 -0800 (PST)
MIME-Version: 1.0
References: <20201217121303.13386-1-songmuchun@bytedance.com>
 <20201217121303.13386-5-songmuchun@bytedance.com> <20201221102703.GA15804@linux>
In-Reply-To: <20201221102703.GA15804@linux>
From: Muchun Song <songmuchun@bytedance.com>
Date: Mon, 21 Dec 2020 19:07:18 +0800
Message-ID: <CAMZfGtW0jzNchLqieAudyk4TsaAUtYEdoC=j+gkkVLJBaKg3pA@mail.gmail.com>
Subject: Re: [External] Re: [PATCH v10 04/11] mm/hugetlb: Defer freeing of
 HugeTLB pages
To: Oscar Salvador <osalvador@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>, Mike Kravetz <mike.kravetz@oracle.com>, 
	Thomas Gleixner <tglx@linutronix.de>, mingo@redhat.com, bp@alien8.de, x86@kernel.org, 
	hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, 
	Peter Zijlstra <peterz@infradead.org>, viro@zeniv.linux.org.uk, 
	Andrew Morton <akpm@linux-foundation.org>, paulmck@kernel.org, mchehab+huawei@kernel.org, 
	pawan.kumar.gupta@linux.intel.com, Randy Dunlap <rdunlap@infradead.org>, oneukum@suse.com, 
	anshuman.khandual@arm.com, jroedel@suse.de, 
	Mina Almasry <almasrymina@google.com>, David Rientjes <rientjes@google.com>, 
	Matthew Wilcox <willy@infradead.org>, Michal Hocko <mhocko@suse.com>, 
	"Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>, David Hildenbrand <david@redhat.com>, naoya.horiguchi@nec.com, 
	Xiongchun duan <duanxiongchun@bytedance.com>, linux-doc@vger.kernel.org, 
	LKML <linux-kernel@vger.kernel.org>, 
	Linux Memory Management List <linux-mm@kvack.org>, linux-fsdevel <linux-fsdevel@vger.kernel.org>
Content-Type: text/plain; charset="UTF-8"
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Mon, Dec 21, 2020 at 6:27 PM Oscar Salvador <osalvador@suse.de> wrote:
>
> On Thu, Dec 17, 2020 at 08:12:56PM +0800, Muchun Song wrote:
> > In the subsequent patch, we will allocate the vmemmap pages when free
> > HugeTLB pages. But update_and_free_page() is called from a non-task
> > context(and hold hugetlb_lock), so we can defer the actual freeing in
> > a workqueue to prevent from using GFP_ATOMIC to allocate the vmemmap
> > pages.
>
> I think we would benefit from a more complete changelog, at least I had
> to stare at the code for a while in order to grasp what are we trying
> to do and the reasons behind.

OK. Will do.

>
> > +static void __free_hugepage(struct hstate *h, struct page *page);
> > +
> > +/*
> > + * As update_and_free_page() is be called from a non-task context(and hold
> > + * hugetlb_lock), we can defer the actual freeing in a workqueue to prevent
> > + * use GFP_ATOMIC to allocate a lot of vmemmap pages.
>
> The above implies that update_and_free_page() is __always__ called from a
> non-task context, but that is not always the case?

IIUC, here is always the case.

>
> > +static void update_hpage_vmemmap_workfn(struct work_struct *work)
> >  {
> > -     int i;
> > +     struct llist_node *node;
> > +     struct page *page;
> >
> > +     node = llist_del_all(&hpage_update_freelist);
> > +
> > +     while (node) {
> > +             page = container_of((struct address_space **)node,
> > +                                  struct page, mapping);
> > +             node = node->next;
> > +             page->mapping = NULL;
> > +             __free_hugepage(page_hstate(page), page);
> > +
> > +             cond_resched();
> > +     }
> > +}
> > +static DECLARE_WORK(hpage_update_work, update_hpage_vmemmap_workfn);
>
> I wonder if this should be moved to hugetlb_vmemmap.c

Maybe I can do a try.

>
> > +/*
> > + * This is where the call to allocate vmemmmap pages will be inserted.
> > + */
>
> I think this should go in the changelog.

OK. Will do.

>
> > +static void __free_hugepage(struct hstate *h, struct page *page)
> > +{
> > +     int i;
> > +
> >       for (i = 0; i < pages_per_huge_page(h); i++) {
> >               page[i].flags &= ~(1 << PG_locked | 1 << PG_error |
> >                               1 << PG_referenced | 1 << PG_dirty |
> > @@ -1313,13 +1377,17 @@ static void update_and_free_page(struct hstate *h, struct page *page)
> >       set_page_refcounted(page);
> >       if (hstate_is_gigantic(h)) {
> >               /*
> > -              * Temporarily drop the hugetlb_lock, because
> > -              * we might block in free_gigantic_page().
> > +              * Temporarily drop the hugetlb_lock only when this type of
> > +              * HugeTLB page does not support vmemmap optimization (which
> > +              * context do not hold the hugetlb_lock), because we might
> > +              * block in free_gigantic_page().
>
> "
>  /*
>   * Temporarily drop the hugetlb_lock, because we might block
>   * in free_gigantic_page(). Only drop it in case the vmemmap
>   * optimization is disabled, since that context does not hold
>   * the lock.
>   */
> " ?

Thanks a lot.

>
>
> Oscar Salvador
> SUSE L3


-- 
Yours,
Muchun