Re: [PATCH] 2/2 Prezeroing large blocks of pages during allocation

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mel Gorman <mel@csn.ul.ie>
To: Christoph Lameter <christoph@lameter.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] 2/2 Prezeroing large blocks of pages during allocation
Date: Mon, 7 Mar 2005 00:35:56 +0000 (GMT)	[thread overview]
Message-ID: <Pine.LNX.4.58.0503070014300.21439@skynet> (raw)
In-Reply-To: <Pine.LNX.4.58.0502281859400.8523@server.graphe.net>

On Mon, 28 Feb 2005, Christoph Lameter wrote:

> On Sun, 27 Feb 2005, Mel Gorman wrote:
>
> > The patch also counts how many blocks of each order were zeroed. This gives
> > a rough indicator if large blocks are frequently zeroed or not.  I found
> > that order-0 are the most frequent zeroed block because of the per-cpu
> > caches. This means we rarely win with zeroing in the allocator but the
> > accounting mechanisms are still handy for the scrubber daemon.
>
> Thanks for your efforts in integrating zeroing into your patches to reduce
> fragmentation.

No problem.

> It is true that you do not win with zeroing pages in the
> allocator. However, you may avoid additional zeroing by zeroing higher
> order pages and then breaking them into lower order pages (but this will
> then lead to additional fragmentation).
>

I got around the fragmentation problem by having a userzero and kernzero
pool. I also taught rmqueue_bulk() to allocate memory in as large as
chunks as possible and break it up into appropriate sizes. This means that
when the per-cpu caches are allocating 16 pages, we can now allocate this
as one 2**4 allocation rather than 16 2**0 allocations (which is possibly
a win in general, not just the prezeroing case, but I have not measured
it).  The zeroblock counts after a stress test now look something like
this;

Zeroblock count   96968  145994  125787   75329   32553     110      11       73      26       5     175

That is a big improvement as we are not zeroing order-0 pages nearly as
often as we were. It is no longer regressing in terms of fragmentation
either which is important. I need to test the patch more but hope to post
a new version by tomorrow evening. It will also need careful reviewing to
make sure I did not miss something with the per-cpu caches.

> > This patch seriously regresses how well fragmentation is handled making it
> > perform almost as badly as the standard allocator. It is because the fallback
> > ordering for USERZERO has a tendency to clobber the reserved lists because
> > of the mix of allocation types that need to be zeroed.
>
> Having pages of multiple orders in zeroed and not zeroed state invariably
> leads to more fragmentation. I have also observed that with my patches
> under many configurations. Seems that the only solution is to
> intentionally either zero all free pages (which means you can coalesce
> them all but you are zeroing lots of pages that did not need zeroing
> after all) or you disregard the zeroed state during coalescing, either
> insure that  both are zeroed or mark the results as unzeroed... both
> solutions introduce additional overhead.
>

My approach is to ignore zero pages during free/coalescing and to treat
kernel allocations for zero pages differently to userspace allocations for
zero pages.

> My favorite solution has been so far to try to zero all
> pages from the  highest order downward but only when the system is idle
> (or there is some hardware that does zeroing for us). And maybe we better
> drop the zeroed status if a zeroed and an unzeroed page can be coalesced
> to a higher order page? However, there will still be lots of unnecessary
> zeroing.
>

When splitting, I zero the largest possible block on the assumption it
makes sense to zero larger blocks. During coalesing, I ignore the zero
state altogether as I could not think of a fast way of determining if a
page was zeroed or not.

> Since most of the request for zeroed pages are order-0 requests, we could
> do a similar thing to that M$ Windows does
> (http://www.windowsitpro.com/Articles/Index.cfm?ArticleID=3774&pg=2): Keep
> a list of zeroed order 0 pages around, only put things on that list if
> the system is truly idle and pick pages up for order 0 zeroed accesses.
>
> These zero lists would needed to be managed more like cpu hotlists and
> not like we do currently as buddy allocator freelists.
>

I went with a variation of this approach. In my latest tree, I have
pageset and pageset_zero to represent a per-CPU cache of normal pages and
of zeroed pages. I have the buddy free lists to zero pages that are only
filled when splitting up a large block of pages for a zero page
allocation.

-- 
Mel Gorman
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

     prev parent reply	other threads:[~2005-03-07  0:35 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-02-27 13:43 Mel Gorman
2005-02-28 16:46 ` Dave Hansen
2005-02-28 19:01   ` Mel Gorman
2005-03-01  3:51 ` Christoph Lameter
2005-03-07  0:35   ` Mel Gorman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.58.0503070014300.21439@skynet \
    --to=mel@csn.ul.ie \
    --cc=christoph@lameter.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox