Re: [PATCH 4/6] hugetlb: avoid allocation failed when page reporting is on going

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Alexander Duyck <alexander.duyck@gmail.com>
To: Liang Li <liliang324@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Andrea Arcangeli <aarcange@redhat.com>,
	Dan Williams <dan.j.williams@intel.com>,
	 "Michael S. Tsirkin" <mst@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	 Dave Hansen <dave.hansen@intel.com>,
	Michal Hocko <mhocko@suse.com>,
	 Liang Li <liliangleo@didiglobal.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	 linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	 virtualization@lists.linux-foundation.org
Subject: Re: [PATCH 4/6] hugetlb: avoid allocation failed when page reporting is on going
Date: Thu, 7 Jan 2021 09:56:18 -0800	[thread overview]
Message-ID: <CAKgT0UfQUgZvsw6iQOFuFCGSt1SoU5ij4nC7tsUwbvf4C_0fnA@mail.gmail.com> (raw)
In-Reply-To: <CA+2MQi9MxE_DWW3BHLJbvYDsOppCmfL6AHkdRwtHX0gBDpDebA@mail.gmail.com>

On Wed, Jan 6, 2021 at 7:57 PM Liang Li <liliang324@gmail.com> wrote:
>
> > > Page reporting isolates free pages temporarily when reporting
> > > free pages information. It will reduce the actual free pages
> > > and may cause application failed for no enough available memory.
> > > This patch try to solve this issue, when there is no free page
> > > and page repoting is on going, wait until it is done.
> > >
> > > Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> >
> > Please don't use this email address for me anymore. Either use
> > alexander.duyck@gmail.com or alexanderduyck@fb.com. I am getting
> > bounces when I reply to this thread because of the old address.
>
> No problem.
>
> > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > index eb533995cb49..0fccd5f96954 100644
> > > --- a/mm/hugetlb.c
> > > +++ b/mm/hugetlb.c
> > > @@ -2320,6 +2320,12 @@ struct page *alloc_huge_page(struct vm_area_struct *vma,
> > >                 goto out_uncharge_cgroup_reservation;
> > >
> > >         spin_lock(&hugetlb_lock);
> > > +       while (h->free_huge_pages <= 1 && h->isolated_huge_pages) {
> > > +               spin_unlock(&hugetlb_lock);
> > > +               mutex_lock(&h->mtx_prezero);
> > > +               mutex_unlock(&h->mtx_prezero);
> > > +               spin_lock(&hugetlb_lock);
> > > +       }
> >
> > This seems like a bad idea. It kind of defeats the whole point of
> > doing the page zeroing outside of the hugetlb_lock. Also it is
> > operating on the assumption that the only way you might get a page is
> > from the page zeroing logic.
> >
> > With the page reporting code we wouldn't drop the count to zero. We
> > had checks that were going through and monitoring the watermarks and
> > if we started to hit the low watermark we would stop page reporting
> > and just assume there aren't enough pages to report. You might need to
> > look at doing something similar here so that you can avoid colliding
> > with the allocator.
>
> For hugetlb, things are a little different, Just like Mike points out:
>      "On some systems, hugetlb pages are a precious resource and
>       the sysadmin carefully configures the number needed by
>       applications.  Removing a hugetlb page (even for a very short
>       period of time) could cause serious application failure."
>
> Just keeping some pages in the freelist is not enough to prevent that from
> happening, because these pages may be allocated while zero out is on
> going, and application may still run into a situation for not available free
> pages.

I get what you are saying. However I don't know if it is acceptable
for the allocating thread to be put to sleep in this situation. There
are two scenarios where I can see this being problematic.

One is a setup where you put the page allocator to sleep and while it
is sleeping another thread is then freeing a page and your thread
cannot respond to that newly freed page and is stuck waiting on the
zeroed page.

The second issue is that users may want a different option of just
breaking up the request into smaller pages rather than waiting on the
page zeroing, or to do something else while waiting on the page. So
instead of sitting on the request and waiting it might make more sense
to return an error pointer like EAGAIN or EBUSY to indicate that there
is a page there, but it is momentarily tied up.

next prev parent reply	other threads:[~2021-01-07 17:56 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-06  3:50 Liang Li
2021-01-06 19:02 ` Alexander Duyck
2021-01-07  3:57   ` Liang Li
2021-01-07 17:56     ` Alexander Duyck [this message]
2021-01-11  4:41       ` Liang Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKgT0UfQUgZvsw6iQOFuFCGSt1SoU5ij4nC7tsUwbvf4C_0fnA@mail.gmail.com \
    --to=alexander.duyck@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=liliang324@gmail.com \
    --cc=liliangleo@didiglobal.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=mst@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox