From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29ECBC433DB for ; Tue, 12 Jan 2021 03:24:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7D6E6223C8 for ; Tue, 12 Jan 2021 03:24:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7D6E6223C8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9580E6B00BA; Mon, 11 Jan 2021 22:24:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 906CE8D0051; Mon, 11 Jan 2021 22:24:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7A6786B00E9; Mon, 11 Jan 2021 22:24:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0160.hostedemail.com [216.40.44.160]) by kanga.kvack.org (Postfix) with ESMTP id 6570B6B00BA for ; Mon, 11 Jan 2021 22:24:46 -0500 (EST) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 2B06E180AD815 for ; Tue, 12 Jan 2021 03:24:46 +0000 (UTC) X-FDA: 77695680972.01.pull10_590a6f527512 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin01.hostedemail.com (Postfix) with ESMTP id 0E774100473C2 for ; Tue, 12 Jan 2021 03:24:46 +0000 (UTC) X-HE-Tag: pull10_590a6f527512 X-Filterd-Recvd-Size: 15451 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf39.hostedemail.com (Postfix) with ESMTP for ; Tue, 12 Jan 2021 03:24:45 +0000 (UTC) Received: by mail-pl1-f176.google.com with SMTP id v3so620031plz.13 for ; Mon, 11 Jan 2021 19:24:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=88MhZZkiTYL095y5cHODK9nBa0PLN1eNKk834Xx6AoM=; b=Eyr6gw/Epnsk9y/QALUwVMBevbVdAn5qUe+kORbqBChkh1+fGQFCnSg23wmKH9KsZa 9CLY+umsk+8PXjdkeOZzOf5936NRqKSpINERFlzigO+Nk327B0HU4xI323iQkZuj7r5s KjpCoqd62SqX+5ksBnAUOE3E+ziJFKA9kDYjVpS2P9+Zno/9HeFGlwfclozCcsuDnqmM +Fa4Ar27rbNK8pMQchfs607b3ZeS/72x92XIMhSSyutsjvkp26zS16nkxep8372zZRX0 Y8VWxVdt0UILIM/0q+w2Zb4rdPhud+U2KYFmsdG08RdgsJX7n0pgV2cs0CQkkh6jJN0W aFmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=88MhZZkiTYL095y5cHODK9nBa0PLN1eNKk834Xx6AoM=; b=G4B1jEXUkoptQwzoPLjUxEkRhCGF4NiSDgootXhK5Q39X9cR05FAh0GFaAYSzPRPc3 nDaVDhetAMKacm8iUhOpEghYJ2OP5WrODcLbW0enPsM43KXlz/tUm69mSQAWWXzFhlWh L34UD6J7Ldyv52+6x6XL4ZQ20KqUvNYx9E2wxOSvF8awtVuSJE2KquLTIGKZl6Zz54Zt lNzytvH+TD1rFxt26T2Wgp4N3tHYwuoa860SE0y/hUjHMx7B5k8tx814qRYutlP+KR91 XVgWxFHySXbShzdQqh5bcHtS4k8tY0ttJ212PBW5bt5vsjvTEtaYcByR0q0xeHj6t9us +lyg== X-Gm-Message-State: AOAM531W/N2ZhbHsjfzpmU+Ly0BDLqvg8N09OyUgsxnmxmTOjq14rLeY pY99t4kSz6vExEp7d2JxquPja3OzF0lP4a6LA0T2Tw== X-Google-Smtp-Source: ABdhPJz1Vbh21hprw7xZT0rhB3RJxFLThTGKPEcsKVs3HixxOQYFziJO9zs8iO3CyuJb5AsnFXWl+AD7Uh+OgNBIGCU= X-Received: by 2002:a17:90a:c588:: with SMTP id l8mr2143656pjt.147.1610421883700; Mon, 11 Jan 2021 19:24:43 -0800 (PST) MIME-Version: 1.0 References: <20210111210152.118394-1-mike.kravetz@oracle.com> <20210111210152.118394-2-mike.kravetz@oracle.com> In-Reply-To: <20210111210152.118394-2-mike.kravetz@oracle.com> From: Muchun Song Date: Tue, 12 Jan 2021 11:24:05 +0800 Message-ID: Subject: Re: [External] [RFC PATCH 1/3] hugetlb: use page.private for hugetlb specific page flags To: Mike Kravetz Cc: LKML , Linux Memory Management List , Michal Hocko , Naoya Horiguchi , David Hildenbrand , Andrew Morton Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 12, 2021 at 5:04 AM Mike Kravetz wrote: > > As hugetlbfs evolved, state information about hugetlb pages was added. > One 'convenient' way of doing this was to use available fields in tail > pages. Over time, it has become difficult to know the meaning or contents > of fields simply be looking at a small bit of code. Sometimes, the > naming is just confusing. For example: The PagePrivate flag indicates > a huge page reservation was consumed and needs to be restored if an error > is encountered and the page is freed before it is instantiated. The > page.private field contains the pointer to a subpool if the page is > associated with one. > > In an effort to make the code more readable, use page.private to contain > hugetlb specific flags. These flags will have test, set and clear functions > similar to those used for 'normal' page flags. More importantly, the > flags will have names which actually reflect their purpose. > > In this patch, > - Create infrastructure for huge page flag functions > - Move subpool pointer to page[1].private to make way for flags > Create routines with meaningful names to modify subpool field > - Use new HPageRestoreReserve reserve flag instead of PagePrivate > > Conversion of other state information will happen in subsequent patches. Great. We can add more meta information easily in the feature. And it also makes the code more readable. Thanks. > > Signed-off-by: Mike Kravetz > --- > fs/hugetlbfs/inode.c | 11 +++--- > include/linux/hugetlb.h | 2 ++ > mm/hugetlb.c | 80 +++++++++++++++++++++++++++++++---------- > 3 files changed, 67 insertions(+), 26 deletions(-) > > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c > index b5c109703daa..8bfb80bc6e96 100644 > --- a/fs/hugetlbfs/inode.c > +++ b/fs/hugetlbfs/inode.c > @@ -966,14 +966,11 @@ static int hugetlbfs_migrate_page(struct address_space *mapping, > return rc; > > /* > - * page_private is subpool pointer in hugetlb pages. Transfer to > - * new page. PagePrivate is not associated with page_private for > - * hugetlb pages and can not be set here as only page_huge_active > - * pages can be migrated. > + * Transfer any subpool pointer to the new page. > */ > - if (page_private(page)) { > - set_page_private(newpage, page_private(page)); > - set_page_private(page, 0); > + if (hpage_spool(page)) { > + set_hpage_spool(newpage, hpage_spool(page)); > + set_hpage_spool(page, 0); > } > > if (mode != MIGRATE_SYNC_NO_COPY) > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index ebca2ef02212..4f0159f1b9cc 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -104,6 +104,8 @@ extern int hugetlb_max_hstate __read_mostly; > struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long max_hpages, > long min_hpages); > void hugepage_put_subpool(struct hugepage_subpool *spool); > +struct hugepage_subpool *hpage_spool(struct page *hpage); > +void set_hpage_spool(struct page *hpage, struct hugepage_subpool *spool); > > void reset_vma_resv_huge_pages(struct vm_area_struct *vma); > int hugetlb_sysctl_handler(struct ctl_table *, int, void *, size_t *, loff_t *); > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 3b38ea958e95..3eb3b102c589 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -52,6 +52,49 @@ static struct cma *hugetlb_cma[MAX_NUMNODES]; > #endif > static unsigned long hugetlb_cma_size __initdata; > > +/* > + * hugetlb specific state flags located in hpage.private > + */ > +enum htlb_page_flags { > + HPAGE_RestoreReserve = 0, IMHO, can we rename it to HPG_restore_reserve? I just want to make the name consistent with the page flags (see enum pageflags in include/linux/page-flags.h). May we also should add a __NR_HPAGEFLAGS to indicate how many bits that we already used. And add a BUILD_BUG_ON to check whether the used bits exceed sizeof(unsigned long). Although it is not necessary now. If we can check it, I think it would be better. > +}; > + > +/* > + * Macros to create function definitions for hpage flags > + */ > +#define TESTHPAGEFLAG(flname) \ > +static inline int HPage##flname(struct page *page) \ > + { return test_bit(HPAGE_##flname, &(page->private)); } > + > +#define SETHPAGEFLAG(flname) \ > +static inline void SetHPage##flname(struct page *page) \ > + { set_bit(HPAGE_##flname, &(page->private)); } > + > +#define CLEARHPAGEFLAG(flname) \ > +static inline void ClearHPage##flname(struct page *page) \ > + { clear_bit(HPAGE_##flname, &(page->private)); } > + > +#define HPAGEFLAG(flname) \ > + TESTHPAGEFLAG(flname) \ > + SETHPAGEFLAG(flname) \ > + CLEARHPAGEFLAG(flname) > + > +HPAGEFLAG(RestoreReserve) > + > +/* > + * hugetlb page subpool pointer located in hpage[1].private > + */ > +struct hugepage_subpool *hpage_spool(struct page *hpage) > +{ > + return (struct hugepage_subpool *)(hpage+1)->private; > +} > + > +void set_hpage_spool(struct page *hpage, > + struct hugepage_subpool *spool) > +{ > + set_page_private(hpage+1, (unsigned long)spool); > +} > + > /* > * Minimum page order among possible hugepage sizes, set to a proper value > * at boot time. > @@ -1116,7 +1159,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h, > nid = huge_node(vma, address, gfp_mask, &mpol, &nodemask); > page = dequeue_huge_page_nodemask(h, gfp_mask, nid, nodemask); > if (page && !avoid_reserve && vma_has_reserves(vma, chg)) { > - SetPagePrivate(page); > + SetHPageRestoreReserve(page); > h->resv_huge_pages--; > } > > @@ -1391,20 +1434,19 @@ static void __free_huge_page(struct page *page) > */ > struct hstate *h = page_hstate(page); > int nid = page_to_nid(page); > - struct hugepage_subpool *spool = > - (struct hugepage_subpool *)page_private(page); > + struct hugepage_subpool *spool = hpage_spool(page); > bool restore_reserve; > > VM_BUG_ON_PAGE(page_count(page), page); > VM_BUG_ON_PAGE(page_mapcount(page), page); > > - set_page_private(page, 0); > + set_hpage_spool(page, 0); > page->mapping = NULL; > - restore_reserve = PagePrivate(page); > - ClearPagePrivate(page); > + restore_reserve = HPageRestoreReserve(page); > + ClearHPageRestoreReserve(page); > > /* > - * If PagePrivate() was set on page, page allocation consumed a > + * If RestoreReserve was set on page, page allocation consumed a > * reservation. If the page was associated with a subpool, there > * would have been a page reserved in the subpool before allocation > * via hugepage_subpool_get_pages(). Since we are 'restoring' the > @@ -2213,9 +2255,9 @@ static long vma_add_reservation(struct hstate *h, > * This routine is called to restore a reservation on error paths. In the > * specific error paths, a huge page was allocated (via alloc_huge_page) > * and is about to be freed. If a reservation for the page existed, > - * alloc_huge_page would have consumed the reservation and set PagePrivate > + * alloc_huge_page would have consumed the reservation and set RestoreReserve > * in the newly allocated page. When the page is freed via free_huge_page, > - * the global reservation count will be incremented if PagePrivate is set. > + * the global reservation count will be incremented if RestoreReserve is set. > * However, free_huge_page can not adjust the reserve map. Adjust the > * reserve map here to be consistent with global reserve count adjustments > * to be made by free_huge_page. > @@ -2224,13 +2266,13 @@ static void restore_reserve_on_error(struct hstate *h, > struct vm_area_struct *vma, unsigned long address, > struct page *page) > { > - if (unlikely(PagePrivate(page))) { > + if (unlikely(HPageRestoreReserve(page))) { > long rc = vma_needs_reservation(h, vma, address); > > if (unlikely(rc < 0)) { > /* > * Rare out of memory condition in reserve map > - * manipulation. Clear PagePrivate so that > + * manipulation. Clear RestoreReserve so that > * global reserve count will not be incremented > * by free_huge_page. This will make it appear > * as though the reservation for this page was > @@ -2239,7 +2281,7 @@ static void restore_reserve_on_error(struct hstate *h, > * is better than inconsistent global huge page > * accounting of reserve counts. > */ > - ClearPagePrivate(page); > + ClearHPageRestoreReserve(page); > } else if (rc) { > rc = vma_add_reservation(h, vma, address); > if (unlikely(rc < 0)) > @@ -2247,7 +2289,7 @@ static void restore_reserve_on_error(struct hstate *h, > * See above comment about rare out of > * memory condition. > */ > - ClearPagePrivate(page); > + ClearHPageRestoreReserve(page); > } else > vma_end_reservation(h, vma, address); > } > @@ -2328,7 +2370,7 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, > if (!page) > goto out_uncharge_cgroup; > if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) { > - SetPagePrivate(page); > + SetHPageRestoreReserve(page); > h->resv_huge_pages--; > } > spin_lock(&hugetlb_lock); > @@ -2346,7 +2388,7 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, > > spin_unlock(&hugetlb_lock); > > - set_page_private(page, (unsigned long)spool); > + set_hpage_spool(page, spool); > > map_commit = vma_commit_reservation(h, vma, addr); > if (unlikely(map_chg > map_commit)) { > @@ -4150,7 +4192,7 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma, > spin_lock(ptl); > ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); > if (likely(ptep && pte_same(huge_ptep_get(ptep), pte))) { > - ClearPagePrivate(new_page); > + ClearHPageRestoreReserve(new_page); > > /* Break COW */ > huge_ptep_clear_flush(vma, haddr, ptep); > @@ -4217,7 +4259,7 @@ int huge_add_to_page_cache(struct page *page, struct address_space *mapping, > > if (err) > return err; > - ClearPagePrivate(page); > + ClearHPageRestoreReserve(page); > > /* > * set page dirty so that it will not be removed from cache/file > @@ -4379,7 +4421,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, > goto backout; > > if (anon_rmap) { > - ClearPagePrivate(page); > + ClearHPageRestoreReserve(page); > hugepage_add_new_anon_rmap(page, vma, haddr); > } else > page_dup_rmap(page, true); > @@ -4693,7 +4735,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, > if (vm_shared) { > page_dup_rmap(page, true); > } else { > - ClearPagePrivate(page); > + ClearHPageRestoreReserve(page); > hugepage_add_new_anon_rmap(page, dst_vma, dst_addr); > } > > -- > 2.29.2 >