Re: [RFC PATCH 1/3] mm: support hugetlb free page reporting

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Liang Li <liliang324@gmail.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>,
	 Mel Gorman <mgorman@techsingularity.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Andrea Arcangeli <aarcange@redhat.com>,
	Dan Williams <dan.j.williams@intel.com>,
	 "Michael S. Tsirkin" <mst@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	 Dave Hansen <dave.hansen@intel.com>,
	Michal Hocko <mhocko@suse.com>,
	 Liang Li <liliangleo@didiglobal.com>,
	linux-mm <linux-mm@kvack.org>,
	 LKML <linux-kernel@vger.kernel.org>,
	virtualization@lists.linux-foundation.org,
	 qemu-devel@nongnu.org
Subject: Re: [RFC PATCH 1/3] mm: support hugetlb free page reporting
Date: Wed, 23 Dec 2020 11:57:31 +0800	[thread overview]
Message-ID: <CA+2MQi_QDnnsbMdOH5B4Hhak-CWA-Xs6PLhxoGq2f+Vv13sgyg@mail.gmail.com> (raw)
In-Reply-To: <63318bf1-21ea-7202-e060-b4b2517c684e@oracle.com>

> On 12/21/20 11:46 PM, Liang Li wrote:
> > Free page reporting only supports buddy pages, it can't report the
> > free pages reserved for hugetlbfs case. On the other hand, hugetlbfs
> > is a good choice for a system with a huge amount of RAM, because it
> > can help to reduce the memory management overhead and improve system
> > performance.
> > This patch add the support for reporting hugepages in the free list
> > of hugetlb, it canbe used by virtio_balloon driver for memory
> > overcommit and pre zero out free pages for speeding up memory population.
>
> My apologies as I do not follow virtio_balloon driver.  Comments from
> the hugetlb perspective.

Any comments are welcome.


> >  static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
> > @@ -5531,6 +5537,29 @@ follow_huge_pgd(struct mm_struct *mm, unsigned long address, pgd_t *pgd, int fla
> >       return pte_page(*(pte_t *)pgd) + ((address & ~PGDIR_MASK) >> PAGE_SHIFT);
> >  }
> >
> > +bool isolate_free_huge_page(struct page *page, struct hstate *h, int nid)
>
> Looks like this always returns true.  Should it be type void?

will change in the next revision.

> > +{
> > +     bool ret = true;
> > +
> > +     VM_BUG_ON_PAGE(!PageHead(page), page);
> > +
> > +     list_move(&page->lru, &h->hugepage_activelist);
> > +     set_page_refcounted(page);
> > +     h->free_huge_pages--;
> > +     h->free_huge_pages_node[nid]--;
> > +
> > +     return ret;
> > +}
> > +
>
> ...

> > +static void
> > +hugepage_reporting_drain(struct page_reporting_dev_info *prdev,
> > +                      struct hstate *h, struct scatterlist *sgl,
> > +                      unsigned int nents, bool reported)
> > +{
> > +     struct scatterlist *sg = sgl;
> > +
> > +     /*
> > +      * Drain the now reported pages back into their respective
> > +      * free lists/areas. We assume at least one page is populated.
> > +      */
> > +     do {
> > +             struct page *page = sg_page(sg);
> > +
> > +             putback_isolate_huge_page(h, page);
> > +
> > +             /* If the pages were not reported due to error skip flagging */
> > +             if (!reported)
> > +                     continue;
> > +
> > +             __SetPageReported(page);
> > +     } while ((sg = sg_next(sg)));
> > +
> > +     /* reinitialize scatterlist now that it is empty */
> > +     sg_init_table(sgl, nents);
> > +}
> > +
> > +/*
> > + * The page reporting cycle consists of 4 stages, fill, report, drain, and
> > + * idle. We will cycle through the first 3 stages until we cannot obtain a
> > + * full scatterlist of pages, in that case we will switch to idle.
> > + */
>
> As mentioned, I am not familiar with virtio_balloon and the overall design.
> So, some of this does not make sense to me.
>
> > +static int
> > +hugepage_reporting_cycle(struct page_reporting_dev_info *prdev,
> > +                      struct hstate *h, unsigned int nid,
> > +                      struct scatterlist *sgl, unsigned int *offset)
> > +{
> > +     struct list_head *list = &h->hugepage_freelists[nid];
> > +     unsigned int page_len = PAGE_SIZE << h->order;
> > +     struct page *page, *next;
> > +     long budget;
> > +     int ret = 0, scan_cnt = 0;
> > +
> > +     /*
> > +      * Perform early check, if free area is empty there is
> > +      * nothing to process so we can skip this free_list.
> > +      */
> > +     if (list_empty(list))
> > +             return ret;
>
> Do note that not all entries on the hugetlb free lists are free.  Reserved
> entries are also on the free list.  The actual number of free entries is
> 'h->free_huge_pages - h->resv_huge_pages'.
> Is the intention to process reserved pages as well as free pages?

Yes, Reserved pages was treated as 'free pages'

> > +
> > +     spin_lock_irq(&hugetlb_lock);
> > +
> > +     if (huge_page_order(h) > MAX_ORDER)
> > +             budget = HUGEPAGE_REPORTING_CAPACITY;
> > +     else
> > +             budget = HUGEPAGE_REPORTING_CAPACITY * 32;
> > +
> > +     /* loop through free list adding unreported pages to sg list */
> > +     list_for_each_entry_safe(page, next, list, lru) {
> > +             /* We are going to skip over the reported pages. */
> > +             if (PageReported(page)) {
> > +                     if (++scan_cnt >= MAX_SCAN_NUM) {
> > +                             ret = scan_cnt;
> > +                             break;
> > +                     }
> > +                     continue;
> > +             }
> > +
> > +             /*
> > +              * If we fully consumed our budget then update our
> > +              * state to indicate that we are requesting additional
> > +              * processing and exit this list.
> > +              */
> > +             if (budget < 0) {
> > +                     atomic_set(&prdev->state, PAGE_REPORTING_REQUESTED);
> > +                     next = page;
> > +                     break;
> > +             }
> > +
> > +             /* Attempt to pull page from list and place in scatterlist */
> > +             if (*offset) {
> > +                     isolate_free_huge_page(page, h, nid);
>
> Once a hugetlb page is isolated, it can not be used and applications that
> depend on hugetlb pages can start to fail.
> I assume that is acceptable/expected behavior.  Correct?
> On some systems, hugetlb pages are a precious resource and the sysadmin
> carefully configures the number needed by applications.  Removing a hugetlb
> page (even for a very short period of time) could cause serious application
> failure.

That' true, especially for 1G pages. Any suggestions?
Let the hugepage allocator be aware of this situation and retry ?

> My apologies if that is a stupid question.  I really have no knowledge of
> this area.
>
> Mike Kravetz

Thanks for your comments, Mike

Liang

-- 
>

next prev parent reply	other threads:[~2020-12-23  3:57 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-22  7:46 Liang Li
2020-12-22 19:59 ` Alexander Duyck
2020-12-22 22:55   ` Mike Kravetz
2020-12-23  3:42     ` Liang Li
2020-12-23  3:38   ` Liang Li
2020-12-23 16:39     ` Alexander Duyck
2020-12-24  0:45       ` Liang Li
2020-12-22 22:30 ` Mike Kravetz
2020-12-23  3:57   ` Liang Li [this message]
2020-12-23 18:47     ` Mike Kravetz
2020-12-24  1:31       ` Liang Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+2MQi_QDnnsbMdOH5B4Hhak-CWA-Xs6PLhxoGq2f+Vv13sgyg@mail.gmail.com \
    --to=liliang324@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=liliangleo@didiglobal.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox