From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Mel Gorman <mel@csn.ul.ie>
Cc: Linux Memory Management List <linux-mm@kvack.org>,
Pekka Enberg <penberg@cs.helsinki.fi>,
Rik van Riel <riel@redhat.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Christoph Lameter <cl@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Nick Piggin <npiggin@suse.de>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Lin Ming <ming.m.lin@intel.com>,
Zhang Yanmin <yanmin_zhang@linux.intel.com>
Subject: Re: [RFC PATCH 00/20] Cleanup and optimise the page allocator
Date: Tue, 24 Feb 2009 01:46:01 +1100 [thread overview]
Message-ID: <200902240146.03456.nickpiggin@yahoo.com.au> (raw)
In-Reply-To: <1235344649-18265-1-git-send-email-mel@csn.ul.ie>
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 5493 bytes --]
Hi Mel,
Seems like a nice patchset.
On Monday 23 February 2009 10:17:09 Mel Gorman wrote:
> The complexity of the page allocator has been increasing for some time
> and it has now reached the point where the SLUB allocator is doing strange
> tricks to avoid the page allocator. This is obviously bad as it may
> encourage other subsystems to try avoiding the page allocator as well.
>
> This series of patches is intended to reduce the cost of the page
> allocator by doing the following.
>
> Patches 1-3 iron out the entry paths slightly and remove stupid sanity
> checks from the fast path.
>
> Patch 4 uses a lookup table instead of a number of branches to decide what
> zones are usable given the GFP flags.
>
> Patch 5 avoids repeated checks of the zonelist
>
> Patch 6 breaks the allocator up into a fast and slow path where the fast
> path later becomes one long inlined function.
>
> Patches 7-10 avoids calculating the same things repeatedly and instead
> calculates them once.
>
> Patches 11-13 inline the whole allocator fast path
>
> Patch 14 avoids calling get_pageblock_migratetype() potentially twice on
> every page free
>
> Patch 15 reduces the number of times interrupts are disabled by reworking
> what free_page_mlock() does. However, I notice that the cost of calling
> TestClearPageMlocked() is still quite high and I'm guessing it's because
> it's a locked bit operation. It's be nice if it could be established if
> it's safe to use an unlocked version here. Rik, can you comment?
Yes, it can. page flags are owned entirely by the owner of the page.
free_page_mlock shouldn't really be in free_pages_check, but oh well.
> Patch 16 avoids using the zonelist cache on non-NUMA machines
>
> Patch 17 removes an expensive and excessively paranoid check in the
> allocator fast path
I would be careful of removing useful debug checks completely like
this. What is the cost? Obviously non-zero, but it is also a check
I have seen trigger on quite a lot of occasions (due to kernel bugs
and hardware bugs, and in each case it is better to warn than not,
even if many other situations can go undetected).
One problem is that some of the calls we're making in page_alloc.c
do the compound_head() thing, wheras we know that we only want to
look at this page. I've attached a patch which cuts out about 150
bytes of text and several branches from these paths.
> Patch 18 avoids a list search in the allocator fast path.
Ah, this was badly needed :)
> o On many machines, I'm seeing a 0-2% improvement on kernbench. The
> dominant cost in kernbench is the compiler and zeroing allocated pages for
> pagetables.
zeroing is a factor, but IIRC page faults and page allocator are among
the top of the profiles.
> o For tbench, I have seen an 8-12% improvement on two x86-64 machines
> (elm3b6 on test.kernel.org gained 8%) but generally it was less dramatic on
> x86-64 in the range of 0-4%. On one PPC64, the different was also in the
> range of 0-4%. Generally there were gains, but one specific ppc64 showed a
> regression of 7% for one client but a negligible difference for 8 clients.
> It's not clear why this machine regressed and others didn't.
Did you bisect your patchset? It could have been random or pointed to
eg the hot/cold removal?
> o hackbench is harder to conclude anything from. Most machines showed
> performance gains in the 5-11% range but one machine in particular showed
> a mix of gains and losses depending on the number of clients. Might be
> a caching thing.
>
> o One machine in particular was a major surprise for sysbench with gains
> of 4-8% there which was drastically higher than I was expecting. However,
> on other machines, it was in the more reasonable 0-4% range, still pretty
> respectable. It's not guaranteed though. While most machines showed some
> sort of gain, one ppc64 showed no difference at all.
>
> So, by and large it's an improvement of some sort.
Most of these benchmarks *really* need to be run quite a few times to get
a reasonable confidence.
But it sounds pretty positive.
---
mm/page_alloc.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -420,7 +420,7 @@ static inline int page_is_buddy(struct p
return 0;
if (PageBuddy(buddy) && page_order(buddy) == order) {
- BUG_ON(page_count(buddy) != 0);
+ VM_BUG_ON(page_count(buddy) != 0);
return 1;
}
return 0;
@@ -493,9 +493,9 @@ static inline void __free_one_page(struc
static inline int free_pages_check(struct page *page)
{
free_page_mlock(page);
- if (unlikely(page_mapcount(page) |
+ if (unlikely((atomic_read(&page->_mapcount) != -1) |
(page->mapping != NULL) |
- (page_count(page) != 0) |
+ (atomic_read(&page->_count) != 0) |
(page->flags & PAGE_FLAGS_CHECK_AT_FREE))) {
bad_page(page);
return 1;
@@ -633,9 +633,9 @@ static inline void expand(struct zone *z
*/
static int prep_new_page(struct page *page, int order, gfp_t gfp_flags)
{
- if (unlikely(page_mapcount(page) |
+ if (unlikely((atomic_read(&page->_mapcount) != -1) |
(page->mapping != NULL) |
- (page_count(page) != 0) |
+ (atomic_read(&page->_count) != 0) |
(page->flags & PAGE_FLAGS_CHECK_AT_PREP))) {
bad_page(page);
return 1;
\0
N§²æìr¸zǧu©²Æ {\béì¹»\x1c®&Þ)îÆi¢Ø^nr¶Ý¢j$½§$¢¸\x05¢¹¨è§~'.)îÄÃ,yèm¶ÿÃ\f%{±j+ðèצj)Z·
next prev parent reply other threads:[~2009-02-23 14:46 UTC|newest]
Thread overview: 95+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-22 23:17 Mel Gorman
2009-02-22 23:17 ` [PATCH 01/20] Replace __alloc_pages_internal() with __alloc_pages_nodemask() Mel Gorman
2009-02-22 23:17 ` [PATCH 02/20] Do not sanity check order in the fast path Mel Gorman
2009-02-22 23:17 ` [PATCH 03/20] Do not check NUMA node ID when the caller knows the node is valid Mel Gorman
2009-02-23 15:01 ` Christoph Lameter
2009-02-23 16:24 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated values Mel Gorman
2009-02-23 11:55 ` [PATCH] mm: clean up __GFP_* flags a bit Peter Zijlstra
2009-02-23 18:01 ` Mel Gorman
2009-02-23 20:27 ` Vegard Nossum
2009-02-23 15:23 ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated values Christoph Lameter
2009-02-23 15:41 ` Nick Piggin
2009-02-23 15:43 ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated value Christoph Lameter
2009-02-23 16:40 ` Mel Gorman
2009-02-23 17:03 ` Christoph Lameter
2009-02-24 1:32 ` KAMEZAWA Hiroyuki
2009-02-24 3:59 ` Nick Piggin
2009-02-24 5:20 ` KAMEZAWA Hiroyuki
2009-02-24 11:36 ` Mel Gorman
2009-02-23 16:33 ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated values Mel Gorman
2009-02-23 16:33 ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated value Christoph Lameter
2009-02-23 17:41 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 05/20] Check only once if the zonelist is suitable for the allocation Mel Gorman
2009-02-22 23:17 ` [PATCH 06/20] Break up the allocator entry point into fast and slow paths Mel Gorman
2009-02-22 23:17 ` [PATCH 07/20] Simplify the check on whether cpusets are a factor or not Mel Gorman
2009-02-23 7:14 ` Pekka J Enberg
2009-02-23 9:07 ` Peter Zijlstra
2009-02-23 9:13 ` Pekka Enberg
2009-02-23 11:39 ` Mel Gorman
2009-02-23 13:19 ` Pekka Enberg
2009-02-23 9:14 ` Li Zefan
2009-02-22 23:17 ` [PATCH 08/20] Move check for disabled anti-fragmentation out of fastpath Mel Gorman
2009-02-22 23:17 ` [PATCH 09/20] Calculate the preferred zone for allocation only once Mel Gorman
2009-02-22 23:17 ` [PATCH 10/20] Calculate the migratetype " Mel Gorman
2009-02-22 23:17 ` [PATCH 11/20] Inline get_page_from_freelist() in the fast-path Mel Gorman
2009-02-23 7:21 ` Pekka Enberg
2009-02-23 11:42 ` Mel Gorman
2009-02-23 15:32 ` Nick Piggin
2009-02-24 13:32 ` Mel Gorman
2009-02-24 14:08 ` Nick Piggin
2009-02-24 15:03 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 12/20] Inline __rmqueue_smallest() Mel Gorman
2009-02-22 23:17 ` [PATCH 13/20] Inline buffered_rmqueue() Mel Gorman
2009-02-23 7:24 ` Pekka Enberg
2009-02-23 11:44 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 14/20] Do not call get_pageblock_migratetype() more than necessary Mel Gorman
2009-02-22 23:17 ` [PATCH 15/20] Do not disable interrupts in free_page_mlock() Mel Gorman
2009-02-23 9:19 ` Peter Zijlstra
2009-02-23 12:23 ` Mel Gorman
2009-02-23 12:44 ` Peter Zijlstra
2009-02-23 14:25 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 16/20] Do not setup zonelist cache when there is only one node Mel Gorman
2009-02-22 23:17 ` [PATCH 17/20] Do not double sanity check page attributes during allocation Mel Gorman
2009-02-22 23:17 ` [PATCH 18/20] Split per-cpu list into one-list-per-migrate-type Mel Gorman
2009-02-22 23:17 ` [PATCH 19/20] Batch free pages from migratetype per-cpu lists Mel Gorman
2009-02-22 23:17 ` [PATCH 20/20] Get rid of the concept of hot/cold page freeing Mel Gorman
2009-02-23 9:37 ` Andrew Morton
2009-02-23 23:30 ` Mel Gorman
2009-02-23 23:53 ` Andrew Morton
2009-02-24 11:51 ` Mel Gorman
2009-02-25 0:01 ` Andrew Morton
2009-02-25 16:01 ` Mel Gorman
2009-02-25 16:19 ` Andrew Morton
2009-02-26 16:37 ` Mel Gorman
2009-02-26 17:00 ` Christoph Lameter
2009-02-26 17:15 ` Mel Gorman
2009-02-26 17:30 ` Christoph Lameter
2009-02-27 11:33 ` Nick Piggin
2009-02-27 15:40 ` Christoph Lameter
2009-03-03 13:52 ` Mel Gorman
2009-03-03 18:53 ` Christoph Lameter
2009-02-27 11:38 ` Nick Piggin
2009-03-01 10:37 ` KOSAKI Motohiro
2009-02-25 18:33 ` Christoph Lameter
2009-02-22 23:57 ` [RFC PATCH 00/20] Cleanup and optimise the page allocator Andi Kleen
2009-02-23 12:34 ` Mel Gorman
2009-02-23 15:34 ` [RFC PATCH 00/20] Cleanup and optimise the page allocato Christoph Lameter
2009-02-23 0:02 ` [RFC PATCH 00/20] Cleanup and optimise the page allocator Andi Kleen
2009-02-23 14:32 ` Mel Gorman
2009-02-23 17:49 ` Andi Kleen
2009-02-24 14:32 ` Mel Gorman
2009-02-23 7:29 ` Pekka Enberg
2009-02-23 8:34 ` Zhang, Yanmin
2009-02-23 9:10 ` KOSAKI Motohiro
2009-02-23 11:55 ` [PATCH] mm: gfp_to_alloc_flags() Peter Zijlstra
2009-02-23 14:00 ` Pekka Enberg
2009-02-23 18:17 ` Mel Gorman
2009-02-23 20:09 ` Peter Zijlstra
2009-02-23 22:59 ` Andrew Morton
2009-02-24 8:59 ` Peter Zijlstra
2009-02-23 14:38 ` [RFC PATCH 00/20] Cleanup and optimise the page allocator Christoph Lameter
2009-02-23 14:46 ` Nick Piggin [this message]
2009-02-23 15:00 ` Mel Gorman
2009-02-23 15:22 ` Nick Piggin
2009-02-23 20:26 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200902240146.03456.nickpiggin@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=cl@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=ming.m.lin@intel.com \
--cc=npiggin@suse.de \
--cc=penberg@cs.helsinki.fi \
--cc=riel@redhat.com \
--cc=yanmin_zhang@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox