From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=wefk=GP=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 29ECBC433DB
	for <linux-mm@archiver.kernel.org>; Tue, 12 Jan 2021 03:24:49 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 7D6E6223C8
	for <linux-mm@archiver.kernel.org>; Tue, 12 Jan 2021 03:24:47 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7D6E6223C8
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 9580E6B00BA; Mon, 11 Jan 2021 22:24:46 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 906CE8D0051; Mon, 11 Jan 2021 22:24:46 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 7A6786B00E9; Mon, 11 Jan 2021 22:24:46 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0160.hostedemail.com [216.40.44.160])
	by kanga.kvack.org (Postfix) with ESMTP id 6570B6B00BA
	for <linux-mm@kvack.org>; Mon, 11 Jan 2021 22:24:46 -0500 (EST)
Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay01.hostedemail.com (Postfix) with ESMTP id 2B06E180AD815
	for <linux-mm@kvack.org>; Tue, 12 Jan 2021 03:24:46 +0000 (UTC)
X-FDA: 77695680972.01.pull10_590a6f527512
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin01.hostedemail.com (Postfix) with ESMTP id 0E774100473C2
	for <linux-mm@kvack.org>; Tue, 12 Jan 2021 03:24:46 +0000 (UTC)
X-HE-Tag: pull10_590a6f527512
X-Filterd-Recvd-Size: 15451
Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176])
	by imf39.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Tue, 12 Jan 2021 03:24:45 +0000 (UTC)
Received: by mail-pl1-f176.google.com with SMTP id v3so620031plz.13
        for <linux-mm@kvack.org>; Mon, 11 Jan 2021 19:24:44 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=bytedance-com.20150623.gappssmtp.com; s=20150623;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=88MhZZkiTYL095y5cHODK9nBa0PLN1eNKk834Xx6AoM=;
        b=Eyr6gw/Epnsk9y/QALUwVMBevbVdAn5qUe+kORbqBChkh1+fGQFCnSg23wmKH9KsZa
         9CLY+umsk+8PXjdkeOZzOf5936NRqKSpINERFlzigO+Nk327B0HU4xI323iQkZuj7r5s
         KjpCoqd62SqX+5ksBnAUOE3E+ziJFKA9kDYjVpS2P9+Zno/9HeFGlwfclozCcsuDnqmM
         +Fa4Ar27rbNK8pMQchfs607b3ZeS/72x92XIMhSSyutsjvkp26zS16nkxep8372zZRX0
         Y8VWxVdt0UILIM/0q+w2Zb4rdPhud+U2KYFmsdG08RdgsJX7n0pgV2cs0CQkkh6jJN0W
         aFmw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=88MhZZkiTYL095y5cHODK9nBa0PLN1eNKk834Xx6AoM=;
        b=G4B1jEXUkoptQwzoPLjUxEkRhCGF4NiSDgootXhK5Q39X9cR05FAh0GFaAYSzPRPc3
         nDaVDhetAMKacm8iUhOpEghYJ2OP5WrODcLbW0enPsM43KXlz/tUm69mSQAWWXzFhlWh
         L34UD6J7Ldyv52+6x6XL4ZQ20KqUvNYx9E2wxOSvF8awtVuSJE2KquLTIGKZl6Zz54Zt
         lNzytvH+TD1rFxt26T2Wgp4N3tHYwuoa860SE0y/hUjHMx7B5k8tx814qRYutlP+KR91
         XVgWxFHySXbShzdQqh5bcHtS4k8tY0ttJ212PBW5bt5vsjvTEtaYcByR0q0xeHj6t9us
         +lyg==
X-Gm-Message-State: AOAM531W/N2ZhbHsjfzpmU+Ly0BDLqvg8N09OyUgsxnmxmTOjq14rLeY
	pY99t4kSz6vExEp7d2JxquPja3OzF0lP4a6LA0T2Tw==
X-Google-Smtp-Source: ABdhPJz1Vbh21hprw7xZT0rhB3RJxFLThTGKPEcsKVs3HixxOQYFziJO9zs8iO3CyuJb5AsnFXWl+AD7Uh+OgNBIGCU=
X-Received: by 2002:a17:90a:c588:: with SMTP id l8mr2143656pjt.147.1610421883700;
 Mon, 11 Jan 2021 19:24:43 -0800 (PST)
MIME-Version: 1.0
References: <20210111210152.118394-1-mike.kravetz@oracle.com> <20210111210152.118394-2-mike.kravetz@oracle.com>
In-Reply-To: <20210111210152.118394-2-mike.kravetz@oracle.com>
From: Muchun Song <songmuchun@bytedance.com>
Date: Tue, 12 Jan 2021 11:24:05 +0800
Message-ID: <CAMZfGtW=y=SESn_M7TC_N=hAVDQJXJkSSswmcy68XKw_KqDD2w@mail.gmail.com>
Subject: Re: [External] [RFC PATCH 1/3] hugetlb: use page.private for hugetlb
 specific page flags
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: LKML <linux-kernel@vger.kernel.org>, 
	Linux Memory Management List <linux-mm@kvack.org>, Michal Hocko <mhocko@kernel.org>, 
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>, David Hildenbrand <david@redhat.com>, 
	Andrew Morton <akpm@linux-foundation.org>
Content-Type: text/plain; charset="UTF-8"
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Tue, Jan 12, 2021 at 5:04 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> As hugetlbfs evolved, state information about hugetlb pages was added.
> One 'convenient' way of doing this was to use available fields in tail
> pages.  Over time, it has become difficult to know the meaning or contents
> of fields simply be looking at a small bit of code.  Sometimes, the
> naming is just confusing.  For example: The PagePrivate flag indicates
> a huge page reservation was consumed and needs to be restored if an error
> is encountered and the page is freed before it is instantiated.  The
> page.private field contains the pointer to a subpool if the page is
> associated with one.
>
> In an effort to make the code more readable, use page.private to contain
> hugetlb specific flags.  These flags will have test, set and clear functions
> similar to those used for 'normal' page flags.  More importantly, the
> flags will have names which actually reflect their purpose.
>
> In this patch,
> - Create infrastructure for huge page flag functions
> - Move subpool pointer to page[1].private to make way for flags
>   Create routines with meaningful names to modify subpool field
> - Use new HPageRestoreReserve reserve flag instead of PagePrivate
>
> Conversion of other state information will happen in subsequent patches.

Great. We can add more meta information easily in the feature.
And it also makes the code more readable. Thanks.

>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  fs/hugetlbfs/inode.c    | 11 +++---
>  include/linux/hugetlb.h |  2 ++
>  mm/hugetlb.c            | 80 +++++++++++++++++++++++++++++++----------
>  3 files changed, 67 insertions(+), 26 deletions(-)
>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index b5c109703daa..8bfb80bc6e96 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -966,14 +966,11 @@ static int hugetlbfs_migrate_page(struct address_space *mapping,
>                 return rc;
>
>         /*
> -        * page_private is subpool pointer in hugetlb pages.  Transfer to
> -        * new page.  PagePrivate is not associated with page_private for
> -        * hugetlb pages and can not be set here as only page_huge_active
> -        * pages can be migrated.
> +        * Transfer any subpool pointer to the new page.
>          */
> -       if (page_private(page)) {
> -               set_page_private(newpage, page_private(page));
> -               set_page_private(page, 0);
> +       if (hpage_spool(page)) {
> +               set_hpage_spool(newpage, hpage_spool(page));
> +               set_hpage_spool(page, 0);
>         }
>
>         if (mode != MIGRATE_SYNC_NO_COPY)
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index ebca2ef02212..4f0159f1b9cc 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -104,6 +104,8 @@ extern int hugetlb_max_hstate __read_mostly;
>  struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long max_hpages,
>                                                 long min_hpages);
>  void hugepage_put_subpool(struct hugepage_subpool *spool);
> +struct hugepage_subpool *hpage_spool(struct page *hpage);
> +void set_hpage_spool(struct page *hpage, struct hugepage_subpool *spool);
>
>  void reset_vma_resv_huge_pages(struct vm_area_struct *vma);
>  int hugetlb_sysctl_handler(struct ctl_table *, int, void *, size_t *, loff_t *);
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 3b38ea958e95..3eb3b102c589 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -52,6 +52,49 @@ static struct cma *hugetlb_cma[MAX_NUMNODES];
>  #endif
>  static unsigned long hugetlb_cma_size __initdata;
>
> +/*
> + * hugetlb specific state flags located in hpage.private
> + */
> +enum htlb_page_flags {
> +       HPAGE_RestoreReserve = 0,

IMHO, can we rename it to HPG_restore_reserve? I just want to
make the name consistent with the page flags (see enum pageflags
in include/linux/page-flags.h).

May we also should add a __NR_HPAGEFLAGS to indicate
how many bits that we already used. And add a BUILD_BUG_ON
to check whether the used bits exceed sizeof(unsigned long).
Although it is not necessary now. If we can check it, I think it
would be better.

> +};
> +
> +/*
> + * Macros to create function definitions for hpage flags
> + */
> +#define TESTHPAGEFLAG(flname)                                  \
> +static inline int HPage##flname(struct page *page)             \
> +       { return test_bit(HPAGE_##flname, &(page->private)); }
> +
> +#define SETHPAGEFLAG(flname)                                   \
> +static inline void SetHPage##flname(struct page *page)         \
> +       { set_bit(HPAGE_##flname, &(page->private)); }
> +
> +#define CLEARHPAGEFLAG(flname)                                 \
> +static inline void ClearHPage##flname(struct page *page)       \
> +       { clear_bit(HPAGE_##flname, &(page->private)); }
> +
> +#define HPAGEFLAG(flname)                                      \
> +       TESTHPAGEFLAG(flname)                                   \
> +       SETHPAGEFLAG(flname)                                    \
> +       CLEARHPAGEFLAG(flname)
> +
> +HPAGEFLAG(RestoreReserve)
> +
> +/*
> + * hugetlb page subpool pointer located in hpage[1].private
> + */
> +struct hugepage_subpool *hpage_spool(struct page *hpage)
> +{
> +       return (struct hugepage_subpool *)(hpage+1)->private;
> +}
> +
> +void set_hpage_spool(struct page *hpage,
> +               struct hugepage_subpool *spool)
> +{
> +       set_page_private(hpage+1, (unsigned long)spool);
> +}
> +
>  /*
>   * Minimum page order among possible hugepage sizes, set to a proper value
>   * at boot time.
> @@ -1116,7 +1159,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
>         nid = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
>         page = dequeue_huge_page_nodemask(h, gfp_mask, nid, nodemask);
>         if (page && !avoid_reserve && vma_has_reserves(vma, chg)) {
> -               SetPagePrivate(page);
> +               SetHPageRestoreReserve(page);
>                 h->resv_huge_pages--;
>         }
>
> @@ -1391,20 +1434,19 @@ static void __free_huge_page(struct page *page)
>          */
>         struct hstate *h = page_hstate(page);
>         int nid = page_to_nid(page);
> -       struct hugepage_subpool *spool =
> -               (struct hugepage_subpool *)page_private(page);
> +       struct hugepage_subpool *spool = hpage_spool(page);
>         bool restore_reserve;
>
>         VM_BUG_ON_PAGE(page_count(page), page);
>         VM_BUG_ON_PAGE(page_mapcount(page), page);
>
> -       set_page_private(page, 0);
> +       set_hpage_spool(page, 0);
>         page->mapping = NULL;
> -       restore_reserve = PagePrivate(page);
> -       ClearPagePrivate(page);
> +       restore_reserve = HPageRestoreReserve(page);
> +       ClearHPageRestoreReserve(page);
>
>         /*
> -        * If PagePrivate() was set on page, page allocation consumed a
> +        * If RestoreReserve was set on page, page allocation consumed a
>          * reservation.  If the page was associated with a subpool, there
>          * would have been a page reserved in the subpool before allocation
>          * via hugepage_subpool_get_pages().  Since we are 'restoring' the
> @@ -2213,9 +2255,9 @@ static long vma_add_reservation(struct hstate *h,
>   * This routine is called to restore a reservation on error paths.  In the
>   * specific error paths, a huge page was allocated (via alloc_huge_page)
>   * and is about to be freed.  If a reservation for the page existed,
> - * alloc_huge_page would have consumed the reservation and set PagePrivate
> + * alloc_huge_page would have consumed the reservation and set RestoreReserve
>   * in the newly allocated page.  When the page is freed via free_huge_page,
> - * the global reservation count will be incremented if PagePrivate is set.
> + * the global reservation count will be incremented if RestoreReserve is set.
>   * However, free_huge_page can not adjust the reserve map.  Adjust the
>   * reserve map here to be consistent with global reserve count adjustments
>   * to be made by free_huge_page.
> @@ -2224,13 +2266,13 @@ static void restore_reserve_on_error(struct hstate *h,
>                         struct vm_area_struct *vma, unsigned long address,
>                         struct page *page)
>  {
> -       if (unlikely(PagePrivate(page))) {
> +       if (unlikely(HPageRestoreReserve(page))) {
>                 long rc = vma_needs_reservation(h, vma, address);
>
>                 if (unlikely(rc < 0)) {
>                         /*
>                          * Rare out of memory condition in reserve map
> -                        * manipulation.  Clear PagePrivate so that
> +                        * manipulation.  Clear RestoreReserve so that
>                          * global reserve count will not be incremented
>                          * by free_huge_page.  This will make it appear
>                          * as though the reservation for this page was
> @@ -2239,7 +2281,7 @@ static void restore_reserve_on_error(struct hstate *h,
>                          * is better than inconsistent global huge page
>                          * accounting of reserve counts.
>                          */
> -                       ClearPagePrivate(page);
> +                       ClearHPageRestoreReserve(page);
>                 } else if (rc) {
>                         rc = vma_add_reservation(h, vma, address);
>                         if (unlikely(rc < 0))
> @@ -2247,7 +2289,7 @@ static void restore_reserve_on_error(struct hstate *h,
>                                  * See above comment about rare out of
>                                  * memory condition.
>                                  */
> -                               ClearPagePrivate(page);
> +                               ClearHPageRestoreReserve(page);
>                 } else
>                         vma_end_reservation(h, vma, address);
>         }
> @@ -2328,7 +2370,7 @@ struct page *alloc_huge_page(struct vm_area_struct *vma,
>                 if (!page)
>                         goto out_uncharge_cgroup;
>                 if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) {
> -                       SetPagePrivate(page);
> +                       SetHPageRestoreReserve(page);
>                         h->resv_huge_pages--;
>                 }
>                 spin_lock(&hugetlb_lock);
> @@ -2346,7 +2388,7 @@ struct page *alloc_huge_page(struct vm_area_struct *vma,
>
>         spin_unlock(&hugetlb_lock);
>
> -       set_page_private(page, (unsigned long)spool);
> +       set_hpage_spool(page, spool);
>
>         map_commit = vma_commit_reservation(h, vma, addr);
>         if (unlikely(map_chg > map_commit)) {
> @@ -4150,7 +4192,7 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
>         spin_lock(ptl);
>         ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
>         if (likely(ptep && pte_same(huge_ptep_get(ptep), pte))) {
> -               ClearPagePrivate(new_page);
> +               ClearHPageRestoreReserve(new_page);
>
>                 /* Break COW */
>                 huge_ptep_clear_flush(vma, haddr, ptep);
> @@ -4217,7 +4259,7 @@ int huge_add_to_page_cache(struct page *page, struct address_space *mapping,
>
>         if (err)
>                 return err;
> -       ClearPagePrivate(page);
> +       ClearHPageRestoreReserve(page);
>
>         /*
>          * set page dirty so that it will not be removed from cache/file
> @@ -4379,7 +4421,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
>                 goto backout;
>
>         if (anon_rmap) {
> -               ClearPagePrivate(page);
> +               ClearHPageRestoreReserve(page);
>                 hugepage_add_new_anon_rmap(page, vma, haddr);
>         } else
>                 page_dup_rmap(page, true);
> @@ -4693,7 +4735,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
>         if (vm_shared) {
>                 page_dup_rmap(page, true);
>         } else {
> -               ClearPagePrivate(page);
> +               ClearHPageRestoreReserve(page);
>                 hugepage_add_new_anon_rmap(page, dst_vma, dst_addr);
>         }
>
> --
> 2.29.2
>