linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	Alexander Duyck <alexander.duyck@gmail.com>
Cc: kvm@vger.kernel.org, mst@redhat.com,
	linux-kernel@vger.kernel.org,  willy@infradead.org,
	mhocko@kernel.org, linux-mm@kvack.org,
	 mgorman@techsingularity.net, vbabka@suse.cz,
	yang.zhang.wz@gmail.com,  nitesh@redhat.com,
	konrad.wilk@oracle.com, david@redhat.com, pagupta@redhat.com,
	 riel@surriel.com, lcapitulino@redhat.com, dave.hansen@intel.com,
	 wei.w.wang@intel.com, aarcange@redhat.com, pbonzini@redhat.com,
	 dan.j.williams@intel.com, osalvador@suse.de
Subject: Re: [PATCH v12 3/6] mm: Introduce Reported pages
Date: Tue, 22 Oct 2019 16:25:33 -0700	[thread overview]
Message-ID: <2ee2a9fc42f5d0644ae8fbad3bb57fd84bd60583.camel@linux.intel.com> (raw)
In-Reply-To: <20191022160347.3559936a0a0a4389cfec455e@linux-foundation.org>

On Tue, 2019-10-22 at 16:03 -0700, Andrew Morton wrote:
> On Tue, 22 Oct 2019 15:28:12 -0700 Alexander Duyck <alexander.duyck@gmail.com> wrote:
> 
> > From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> > 
> > In order to pave the way for free page reporting in virtualized
> > environments we will need a way to get pages out of the free lists and
> > identify those pages after they have been returned. To accomplish this,
> > this patch adds the concept of a Reported Buddy, which is essentially
> > meant to just be the Uptodate flag used in conjunction with the Buddy
> > page type.
> > 
> > It adds a set of pointers we shall call "reported_boundary" which
> > represent the upper boundary between the unreported and reported pages.
> > The general idea is that in order for a page to cross from one side of the
> > boundary to the other it will need to verify that it went through the
> > reporting process. Ultimately a free list has been fully processed when
> > the boundary has been moved from the tail all they way up to occupying the
> > first entry in the list. Without this we would have to manually walk the
> > entire page list until we have find a page that hasn't been reported. In my
> > testing this adds as much as 18% additional overhead which would make this
> > unattractive as a solution.
> > 
> > One limitation to this approach is that it is essentially a linear search
> > and in the case of the free lists we can have pages added to either the
> > head or the tail of the list. In order to place limits on this we only
> > allow pages to be added before the reported_boundary instead of adding
> > to the tail itself. An added advantage to this approach is that we should
> > be reducing the overall memory footprint of the guest as it will be more
> > likely to recycle warm pages versus trying to allocate the reported pages
> > that were likely evicted from the guest memory.
> > 
> > Since we will only be reporting one zone at a time we keep the boundary
> > limited to being defined for just the zone we are currently reporting pages
> > from. Doing this we can keep the number of additional pointers needed quite
> > small. To flag that the boundaries are in place we use a single bit
> > in the zone to indicate that reporting and the boundaries are active.
> > 
> > We store the index of the boundary pointer used to track the reported page
> > in the page->index value. Doing this we can avoid unnecessary computation
> > to determine the index value again. There should be no issues with this as
> > the value is unused when the page is in the buddy allocator, and is reset
> > as soon as the page is removed from the free list.
> 
> This looks like quite a lot of new code in code MM.  Hence previous
> "how valuable is this patchset" question!
> 
> Some silly trivia which I noticed while perusing:

I'll try to answer it here.

My understanding is that this can be very valuable in the case where a
host is oversubscribing guest memory. What I have seen is that memory
overcommit can quickly cause certain workloads to take minutes versus just
seconds depending on the speed at which memory is swapped out and in.

What this patch set is providing is a form of auto-ballooning that allows
the guest to shink its memory footprint so that it can be packed more
tightly with other guests, especially in the case where guests are often
inactive.

> > ...
> > 
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -470,6 +470,14 @@ struct zone {
> >  	seqlock_t		span_seqlock;
> >  #endif
> >  
> > +#ifdef CONFIG_PAGE_REPORTING
> > +	/*
> > +	 * Pointer to reported page tracking statistics array. The size of
> > +	 * the array is MAX_ORDER - PAGE_REPORTING_MIN_ORDER. NULL when
> > +	 * unused page reporting is not present.
> > +	 */
> > +	unsigned long		*reported_pages;
> 
> Dumb question.  Why not
> 
> 	unsigned long reported_pages[MAX_ORDER - PAGE_REPORTING_MIN_ORDER];

It was mostly to avoid causing too much change to the zone structure. By
placing it where I did I was essentially just making use of unused space
that would have otherwise been padding. In addition, since this is only
going to be used when in a virtualized environment we keep the size of the
zone smaller on systems that won't be making use of page reporting.

> > +#endif
> >  	int initialized;
> >  
> >  	/* Write-intensive fields used from the page allocator */
> > 
> > ...
> > 
> > +#define page_is_reported(_page)	unlikely(PageReported(_page))
> 
> page_reported() would be more consistent.

Okay, I can do that.

> > ...
> > 
> > +static inline void
> > +add_page_to_reported_list(struct page *page, struct zone *zone,
> > +			  unsigned int order, unsigned int mt)
> > +{
> > +	/*
> > +	 * Default to using index 0, this will be updated later if the zone
> > +	 * is still being processed.
> > +	 */
> > +	page->index = 0;
> > +
> > +	/* flag page as reported */
> > +	__SetPageReported(page);
> > +
> > +	/* update areated page accounting */
> > +	zone->reported_pages[order - PAGE_REPORTING_MIN_ORDER]++;
> 
> nit.  This is an array, not a list.  The function name is a bit screwy.

Yeah. Maybe I should rename this to mark_page_reported(). I think at some
point it was updating the reported_boundary so that the page was pulled
into the list. I gave up on that when we had to start supporting the
boundary being pulled out from under us. The array is just for tracking
the statistics and wasn't a consideration in the naming.



  reply	other threads:[~2019-10-22 23:25 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-22 22:27 [PATCH v12 0/6] mm / virtio: Provide support for unused page reporting Alexander Duyck
2019-10-22 22:27 ` [PATCH v12 1/6] mm: Adjust shuffle code to allow for future coalescing Alexander Duyck
2019-10-22 22:28 ` [PATCH v12 2/6] mm: Use zone and order instead of free area in free_list manipulators Alexander Duyck
2019-10-23  8:26   ` David Hildenbrand
2019-10-23 15:16     ` Alexander Duyck
2019-10-24  9:32       ` David Hildenbrand
2019-10-24 15:19         ` Alexander Duyck
2019-10-22 22:28 ` [PATCH v12 3/6] mm: Introduce Reported pages Alexander Duyck
2019-10-22 23:03   ` Andrew Morton
2019-10-22 23:25     ` Alexander Duyck [this message]
2019-10-22 22:28 ` [PATCH v12 4/6] mm: Add device side and notifier for unused page reporting Alexander Duyck
2019-10-22 22:28 ` [PATCH v12 5/6] virtio-balloon: Pull page poisoning config out of free page hinting Alexander Duyck
2019-10-22 22:28 ` [PATCH v12 6/6] virtio-balloon: Add support for providing unused page reports to host Alexander Duyck
2019-10-22 22:29 ` [PATCH v12 QEMU 1/3] virtio-ballon: Implement support for page poison tracking feature Alexander Duyck
2019-10-22 22:29 ` [PATCH v12 QEMU 2/3] virtio-balloon: Add bit to notify guest of unused page reporting Alexander Duyck
2019-10-22 22:29 ` [PATCH v12 QEMU 3/3] virtio-balloon: Provide a interface for " Alexander Duyck
2019-10-22 23:01 ` [PATCH v12 0/6] mm / virtio: Provide support " Andrew Morton
2019-10-22 23:43   ` Alexander Duyck
2019-10-23 11:19     ` Nitesh Narayan Lal
2019-10-23 11:35 ` Nitesh Narayan Lal
2019-10-23 22:24   ` Alexander Duyck
2019-10-28 14:34 ` Nitesh Narayan Lal
2019-10-28 15:24   ` Alexander Duyck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2ee2a9fc42f5d0644ae8fbad3bb57fd84bd60583.camel@linux.intel.com \
    --to=alexander.h.duyck@linux.intel.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.duyck@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=lcapitulino@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=mst@redhat.com \
    --cc=nitesh@redhat.com \
    --cc=osalvador@suse.de \
    --cc=pagupta@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=riel@surriel.com \
    --cc=vbabka@suse.cz \
    --cc=wei.w.wang@intel.com \
    --cc=willy@infradead.org \
    --cc=yang.zhang.wz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox