Re: [External] Re: [PATCH v22 6/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Muchun Song <songmuchun@bytedance.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "Jonathan Corbet" <corbet@lwn.net>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@redhat.com>,
	bp@alien8.de, "X86 ML" <x86@kernel.org>,
	hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Alexander Viro" <viro@zeniv.linux.org.uk>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	paulmck@kernel.org, pawan.kumar.gupta@linux.intel.com,
	"Randy Dunlap" <rdunlap@infradead.org>,
	oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de,
	"Mina Almasry" <almasrymina@google.com>,
	"David Rientjes" <rientjes@google.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Oscar Salvador" <osalvador@suse.de>,
	"Michal Hocko" <mhocko@suse.com>,
	"Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>,
	"David Hildenbrand" <david@redhat.com>,
	"HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>,
	"Joao Martins" <joao.m.martins@oracle.com>,
	"Xiongchun duan" <duanxiongchun@bytedance.com>,
	fam.zheng@bytedance.com, zhengqi.arch@bytedance.com,
	linux-doc@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
	"Linux Memory Management List" <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [External] Re: [PATCH v22 6/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page
Date: Thu, 6 May 2021 10:52:58 +0800	[thread overview]
Message-ID: <CAMZfGtWaSGCUaubv6kwc1hzRoc9=O2eXJBcU9t8bX3XeQtP9Yw@mail.gmail.com> (raw)
In-Reply-To: <c2e8bc43-44dc-825d-9f59-0de300815fa4@oracle.com>

On Thu, May 6, 2021 at 6:21 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 4/29/21 8:13 PM, Muchun Song wrote:
> > When we free a HugeTLB page to the buddy allocator, we need to allocate
> > the vmemmap pages associated with it. However, we may not be able to
> > allocate the vmemmap pages when the system is under memory pressure. In
> > this case, we just refuse to free the HugeTLB page. This changes behavior
> > in some corner cases as listed below:
> >
> >  1) Failing to free a huge page triggered by the user (decrease nr_pages).
> >
> >     User needs to try again later.
> >
> >  2) Failing to free a surplus huge page when freed by the application.
> >
> >     Try again later when freeing a huge page next time.
> >
> >  3) Failing to dissolve a free huge page on ZONE_MOVABLE via
> >     offline_pages().
> >
> >     This can happen when we have plenty of ZONE_MOVABLE memory, but
> >     not enough kernel memory to allocate vmemmmap pages.  We may even
> >     be able to migrate huge page contents, but will not be able to
> >     dissolve the source huge page.  This will prevent an offline
> >     operation and is unfortunate as memory offlining is expected to
> >     succeed on movable zones.  Users that depend on memory hotplug
> >     to succeed for movable zones should carefully consider whether the
> >     memory savings gained from this feature are worth the risk of
> >     possibly not being able to offline memory in certain situations.
> >
> >  4) Failing to dissolve a huge page on CMA/ZONE_MOVABLE via
> >     alloc_contig_range() - once we have that handling in place. Mainly
> >     affects CMA and virtio-mem.
> >
> >     Similar to 3). virito-mem will handle migration errors gracefully.
> >     CMA might be able to fallback on other free areas within the CMA
> >     region.
> >
> > Vmemmap pages are allocated from the page freeing context. In order for
> > those allocations to be not disruptive (e.g. trigger oom killer)
> > __GFP_NORETRY is used. hugetlb_lock is dropped for the allocation
> > because a non sleeping allocation would be too fragile and it could fail
> > too easily under memory pressure. GFP_ATOMIC or other modes to access
> > memory reserves is not used because we want to prevent consuming
> > reserves under heavy hugetlb freeing.
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  Documentation/admin-guide/mm/hugetlbpage.rst    |  8 ++
> >  Documentation/admin-guide/mm/memory-hotplug.rst | 13 ++++
> >  include/linux/hugetlb.h                         |  3 +
> >  include/linux/mm.h                              |  2 +
> >  mm/hugetlb.c                                    | 98 +++++++++++++++++++++----
> >  mm/hugetlb_vmemmap.c                            | 34 +++++++++
> >  mm/hugetlb_vmemmap.h                            |  6 ++
> >  mm/migrate.c                                    |  5 +-
> >  mm/sparse-vmemmap.c                             | 75 ++++++++++++++++++-
> >  9 files changed, 227 insertions(+), 17 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst
> > index f7b1c7462991..6988895d09a8 100644
> > --- a/Documentation/admin-guide/mm/hugetlbpage.rst
> > +++ b/Documentation/admin-guide/mm/hugetlbpage.rst
> > @@ -60,6 +60,10 @@ HugePages_Surp
> >          the pool above the value in ``/proc/sys/vm/nr_hugepages``. The
> >          maximum number of surplus huge pages is controlled by
> >          ``/proc/sys/vm/nr_overcommit_hugepages``.
> > +     Note: When the feature of freeing unused vmemmap pages associated
> > +     with each hugetlb page is enabled, the number of surplus huge pages
> > +     may be temporarily larger than the maximum number of surplus huge
> > +     pages when the system is under memory pressure.
> >  Hugepagesize
> >       is the default hugepage size (in Kb).
> >  Hugetlb
> > @@ -80,6 +84,10 @@ returned to the huge page pool when freed by a task.  A user with root
> >  privileges can dynamically allocate more or free some persistent huge pages
> >  by increasing or decreasing the value of ``nr_hugepages``.
> >
> > +Note: When the feature of freeing unused vmemmap pages associated with each
> > +hugetlb page is enabled, we can fail to free the huge pages triggered by
> > +the user when ths system is under memory pressure.  Please try again later.
> > +
> >  Pages that are used as huge pages are reserved inside the kernel and cannot
> >  be used for other purposes.  Huge pages cannot be swapped out under
> >  memory pressure.
> > diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst
> > index 05d51d2d8beb..c6bae2d77160 100644
> > --- a/Documentation/admin-guide/mm/memory-hotplug.rst
> > +++ b/Documentation/admin-guide/mm/memory-hotplug.rst
> > @@ -357,6 +357,19 @@ creates ZONE_MOVABLE as following.
> >     Unfortunately, there is no information to show which memory block belongs
> >     to ZONE_MOVABLE. This is TBD.
> >
> > +   Memory offlining can fail when dissolving a free huge page on ZONE_MOVABLE
> > +   and the feature of freeing unused vmemmap pages associated with each hugetlb
> > +   page is enabled.
> > +
> > +   This can happen when we have plenty of ZONE_MOVABLE memory, but not enough
> > +   kernel memory to allocate vmemmmap pages.  We may even be able to migrate
> > +   huge page contents, but will not be able to dissolve the source huge page.
> > +   This will prevent an offline operation and is unfortunate as memory offlining
> > +   is expected to succeed on movable zones.  Users that depend on memory hotplug
> > +   to succeed for movable zones should carefully consider whether the memory
> > +   savings gained from this feature are worth the risk of possibly not being
> > +   able to offline memory in certain situations.
> > +
> >  .. note::
> >     Techniques that rely on long-term pinnings of memory (especially, RDMA and
> >     vfio) are fundamentally problematic with ZONE_MOVABLE and, therefore, memory
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index d523a345dc86..d3abaaec2a22 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -525,6 +525,7 @@ unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
> >   *   code knows it has only reference.  All other examinations and
> >   *   modifications require hugetlb_lock.
> >   * HPG_freed - Set when page is on the free lists.
> > + * HPG_vmemmap_optimized - Set when the vmemmap pages of the page are freed.
> >   *   Synchronization: hugetlb_lock held for examination and modification.
>
> You just moved the Synchronization comment so that it applies to both
> HPG_freed and HPG_vmemmap_optimized.  However, HPG_vmemmap_optimized is
> checked/modified both with and without hugetlb_lock.  Nothing wrong with
> that, just need to update/fix the comment.
>

Thanks, Mike. I will update the comment.

> Everything else looks good to me,
>
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
>
> --
> Mike Kravetz

     prev parent reply	other threads:[~2021-05-06  2:53 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-30  3:13 [PATCH v22 0/9] Free some vmemmap pages of " Muchun Song
2021-04-30  3:13 ` [PATCH v22 1/9] mm: memory_hotplug: factor out bootmem core functions to bootmem_info.c Muchun Song
2021-04-30  3:13 ` [PATCH v22 2/9] mm: hugetlb: introduce a new config HUGETLB_PAGE_FREE_VMEMMAP Muchun Song
2021-04-30  3:13 ` [PATCH v22 3/9] mm: hugetlb: gather discrete indexes of tail page Muchun Song
2021-04-30  3:13 ` [PATCH v22 4/9] mm: hugetlb: free the vmemmap pages associated with each HugeTLB page Muchun Song
2021-04-30  3:13 ` [PATCH v22 5/9] mm: hugetlb: defer freeing of HugeTLB pages Muchun Song
2021-05-05 21:29   ` Mike Kravetz
2021-04-30  3:13 ` [PATCH v22 7/9] mm: hugetlb: add a kernel parameter hugetlb_free_vmemmap Muchun Song
2021-04-30  3:13 ` [PATCH v22 8/9] mm: memory_hotplug: disable memmap_on_memory when hugetlb_free_vmemmap enabled Muchun Song
2021-05-05 23:06   ` Mike Kravetz
2021-04-30  3:13 ` [PATCH v22 9/9] mm: hugetlb: introduce nr_free_vmemmap_pages in the struct hstate Muchun Song
     [not found] ` <20210430031352.45379-7-songmuchun@bytedance.com>
2021-05-05 22:21   ` [PATCH v22 6/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page Mike Kravetz
2021-05-06  2:52     ` Muchun Song [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMZfGtWaSGCUaubv6kwc1hzRoc9=O2eXJBcU9t8bX3XeQtP9Yw@mail.gmail.com' \
    --to=songmuchun@bytedance.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=anshuman.khandual@arm.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=fam.zheng@bytedance.com \
    --cc=hpa@zytor.com \
    --cc=joao.m.martins@oracle.com \
    --cc=jroedel@suse.de \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=mingo@redhat.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=oneukum@suse.com \
    --cc=osalvador@suse.de \
    --cc=paulmck@kernel.org \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=song.bao.hua@hisilicon.com \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox