From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3A492C433F5
	for <linux-mm@archiver.kernel.org>; Wed, 16 Mar 2022 00:44:08 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 92C348D0003; Tue, 15 Mar 2022 20:44:07 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 8DB9B8D0001; Tue, 15 Mar 2022 20:44:07 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 77B478D0003; Tue, 15 Mar 2022 20:44:07 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26])
	by kanga.kvack.org (Postfix) with ESMTP id 696CF8D0001
	for <linux-mm@kvack.org>; Tue, 15 Mar 2022 20:44:07 -0400 (EDT)
Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay06.hostedemail.com (Postfix) with ESMTP id 2C30023667
	for <linux-mm@kvack.org>; Wed, 16 Mar 2022 00:44:07 +0000 (UTC)
X-FDA: 79248402534.04.35D5F2A
Received: from mail-yb1-f180.google.com (mail-yb1-f180.google.com [209.85.219.180])
	by imf02.hostedemail.com (Postfix) with ESMTP id 36C5480012
	for <linux-mm@kvack.org>; Wed, 16 Mar 2022 00:44:06 +0000 (UTC)
Received: by mail-yb1-f180.google.com with SMTP id j2so1802981ybu.0
        for <linux-mm@kvack.org>; Tue, 15 Mar 2022 17:44:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=bytedance-com.20210112.gappssmtp.com; s=20210112;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=82AfxM9iWnmVbmHsCRPs5CDEdlL0ZgkL1CBgzMeozWA=;
        b=HjI4FtQJjPwIQM6KCfd/DEBcIiXtO+1ZVNN+7Q1olj8MlOm0y9YfuZNo+uF7xqnLSg
         eAG1N47VmWf3OsLK4bNAgem70pW6Sokraw+2xztuQMv0As9tkyKxHOwuuURFz160Yxxv
         P/OmNm2/o5y/Sdly2KmigBaKeA5hs2w4WCzTgNATocIHFiVYi2+ENnnyNAE3QGrRt4mo
         0thJPKAkX+dZxzXDKQT7Alr5nTzt1w+rBx6l1MH3V5wTVURtKmWfMyiJoh0Nl2jpIGoP
         j9EXT2uSd/aFSIsx10vtpdXtOgPKVaDwgynhpMwv0vi9mU0h7ivfTJAe0x2mgAhbm9pt
         WuaQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=82AfxM9iWnmVbmHsCRPs5CDEdlL0ZgkL1CBgzMeozWA=;
        b=dT11UknS4v2y77Ubw5lkWGXPDYIRjMEMSbtSlRRmILUhDysFGCpJs2G7dO0wRd9ehZ
         GUqynLBMF6njOByym0A5PZUQPHuRE7+tidYm8JAwS3noQKTrf5+ao/GQYgbtJLhJVp4t
         12rKQkTN9pu/VlVTBMddv0y/nV5drjQofsxtCasyh2P5QkR9x+IGk4AM4hG3EYw9P/sf
         MUJvRWsEtZsimQzI2hvapbBL3cBYymExTr7HxdmbCfrGJB5TwvW+0C3tKynqho0fST32
         elQDpGYP3Gfn8C7SebrONylbqSjoLqfn0vbwv4PWPM3d+LCxvK6IUmTurk529DZRQOUy
         FU2g==
X-Gm-Message-State: AOAM5319FZoQVOMjfRBjjM9+SThvDYqIGS141l+RnXAcG1SXfRcvtF1O
	UKrhdVHFxELvDeA0cMylNYp8PFTGdM/8eHQOHOUOlQ==
X-Google-Smtp-Source: ABdhPJwAQvb3ghxoP3pI5m1Z3v3jw3I83xYnaaMv5u9LKgogJDBAjewC7noL1WOiyI1NsyN7W0n+XT8qU92gaoiinQU=
X-Received: by 2002:a25:943:0:b0:633:883b:3e21 with SMTP id
 u3-20020a250943000000b00633883b3e21mr2284628ybm.132.1647391445249; Tue, 15
 Mar 2022 17:44:05 -0700 (PDT)
MIME-Version: 1.0
References: <20220315042355.362810-1-luofei@unicloud.com> <CAMZfGtWjnhZLVmRD0BSpMbAWr_vD5BCj5s0ARfNHpHeAAGWYjA@mail.gmail.com>
 <a56e0ea8-3b11-8239-d39c-ed33e479427e@oracle.com>
In-Reply-To: <a56e0ea8-3b11-8239-d39c-ed33e479427e@oracle.com>
From: Muchun Song <songmuchun@bytedance.com>
Date: Wed, 16 Mar 2022 08:42:14 +0800
Message-ID: <CAMZfGtUP-Uczx+fiP+28HnPjsGNSvCF5SR87war=MGzbmzBn4Q@mail.gmail.com>
Subject: Re: [PATCH] hugetlbfs: fix description about atomic allocation of
 vmemmap pages when free huge page
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: luofei <luofei@unicloud.com>, Andrew Morton <akpm@linux-foundation.org>, 
	Linux Memory Management List <linux-mm@kvack.org>, LKML <linux-kernel@vger.kernel.org>
Content-Type: text/plain; charset="UTF-8"
X-Rspamd-Queue-Id: 36C5480012
X-Rspam-User: 
Authentication-Results: imf02.hostedemail.com;
	dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=HjI4FtQJ;
	dmarc=pass (policy=none) header.from=bytedance.com;
	spf=pass (imf02.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.219.180 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com
X-Stat-Signature: xz9gc1wqs1jko9p1m8ko6gmmknj4dqbi
X-Rspamd-Server: rspam04
X-HE-Tag: 1647391446-650262
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Wed, Mar 16, 2022 at 5:16 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 3/15/22 06:29, Muchun Song wrote:
> > On Tue, Mar 15, 2022 at 12:24 PM luofei <luofei@unicloud.com> wrote:
> >>
> >> No matter what context update_and_free_page() is called in,
> >> the flag for allocating the vmemmap page is fixed
> >> (GFP_KERNEL | __GFP_NORETRY | __GFP_THISNODE), and no atomic
> >> allocation is involved, so the description of atomicity here
> >> is somewhat inappropriate.
> >>
> >> and the atomic parameter naming of update_and_free_page() is
> >> somewhat misleading.
> >>
> >> Signed-off-by: luofei <luofei@unicloud.com>
> >> ---
> >>  mm/hugetlb.c | 10 ++++------
> >>  1 file changed, 4 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >> index f8ca7cca3c1a..239ef82b7897 100644
> >> --- a/mm/hugetlb.c
> >> +++ b/mm/hugetlb.c
> >> @@ -1570,8 +1570,8 @@ static void __update_and_free_page(struct hstate *h, struct page *page)
> >>
> >>  /*
> >>   * As update_and_free_page() can be called under any context, so we cannot
> >> - * use GFP_KERNEL to allocate vmemmap pages. However, we can defer the
> >> - * actual freeing in a workqueue to prevent from using GFP_ATOMIC to allocate
> >> + * use GFP_ATOMIC to allocate vmemmap pages. However, we can defer the
> >> + * actual freeing in a workqueue to prevent waits caused by allocating
> >>   * the vmemmap pages.
> >>   *
> >>   * free_hpage_workfn() locklessly retrieves the linked list of pages to be
> >> @@ -1617,16 +1617,14 @@ static inline void flush_free_hpage_work(struct hstate *h)
> >>  }
> >>
> >>  static void update_and_free_page(struct hstate *h, struct page *page,
> >> -                                bool atomic)
> >> +                                bool delay)
> >
> > Hi luofei,
> >
> > At least, I don't agree with this change.  The "atomic" means if the
> > caller is under atomic context instead of whether using atomic
> > GFP_MASK.  The "delay" seems to tell the caller that it can undelay
> > the allocation even if it is under atomic context (actually, it has no
> > choice).  But "atomic" can indicate the user is being asked to tell us
> > if it is under atomic context.
>
> There may be some confusion since GFP_ATOMIC is mentioned in the comments
> and GFP_ATOMIC is not used in the allocation of vmemmap pages.  IIRC,
> the use of GFP_ATOMIC was discussed at one time but dismissed because of
> undesired side effects such as dipping into "atomic reserves".
>
> How about an update to the comments as follows (sorry mailer may mess up
> formatting)?
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index f8ca7cca3c1a..6a4d27e24b21 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1569,10 +1569,12 @@ static void __update_and_free_page(struct hstate *h, struct page *page)
>  }
>
>  /*
> - * As update_and_free_page() can be called under any context, so we cannot
> - * use GFP_KERNEL to allocate vmemmap pages. However, we can defer the
> - * actual freeing in a workqueue to prevent from using GFP_ATOMIC to allocate
> - * the vmemmap pages.
> + * Freeing hugetlb pages in done in update_and_free_page().  When freeing a
> + * hugetlb page, vmemmap pages may need to be allocated.  The routine
> + * alloc_huge_page_vmemmap() can possibly sleep as it uses GFP_KERNEL.
> + * However, update_and_free_page() can be called under any context.  To
> + * avoid the possibility of sleeping in a context where sleeping is not
> + * allowed, defer the actual freeing in a workqueue where sleeping is allowed.
>   *
>   * free_hpage_workfn() locklessly retrieves the linked list of pages to be
>   * freed and frees them one-by-one. As the page->mapping pointer is going
> @@ -1616,6 +1618,10 @@ static inline void flush_free_hpage_work(struct hstate *h)
>                 flush_work(&free_hpage_work);
>  }
>
> +/*
> + * atomic == true indicates called from a context where sleeping is
> + * not allowed.
> + */
>  static void update_and_free_page(struct hstate *h, struct page *page,
>                                  bool atomic)
>  {
> @@ -1625,7 +1631,8 @@ static void update_and_free_page(struct hstate *h, struct page *page,
>         }
>
>         /*
> -        * Defer freeing to avoid using GFP_ATOMIC to allocate vmemmap pages.
> +        * Defer freeing to avoid possible sleeping when allocating
> +        * vmemmap pages.
>          *
>          * Only call schedule_work() if hpage_freelist is previously
>          * empty. Otherwise, schedule_work() had been called but the workfn
>

LGTM. Thanks Mike.