Re: [RFC PATCH 00/20] Cleanup and optimise the page allocator

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mel Gorman <mel@csn.ul.ie>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Linux Memory Management List <linux-mm@kvack.org>,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	Rik van Riel <riel@redhat.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Nick Piggin <npiggin@suse.de>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Lin Ming <ming.m.lin@intel.com>,
	Zhang Yanmin <yanmin_zhang@linux.intel.com>
Subject: Re: [RFC PATCH 00/20] Cleanup and optimise the page allocator
Date: Mon, 23 Feb 2009 20:26:59 +0000	[thread overview]
Message-ID: <20090223202659.GT6740@csn.ul.ie> (raw)
In-Reply-To: <200902240222.04645.nickpiggin@yahoo.com.au>

On Tue, Feb 24, 2009 at 02:22:03AM +1100, Nick Piggin wrote:
> On Tuesday 24 February 2009 02:00:56 Mel Gorman wrote:
> > On Tue, Feb 24, 2009 at 01:46:01AM +1100, Nick Piggin wrote:
> 
> > > free_page_mlock shouldn't really be in free_pages_check, but oh well.
> >
> > Agreed, I took it out of there.
> 
> Oh good. I didn't notice that.
> 
> > > > Patch 16 avoids using the zonelist cache on non-NUMA machines
> > > >
> > > > Patch 17 removes an expensive and excessively paranoid check in the
> > > > allocator fast path
> > >
> > > I would be careful of removing useful debug checks completely like
> > > this. What is the cost? Obviously non-zero, but it is also a check
> >
> > The cost was something like 1/10th the cost of the path. There are atomic
> > operations in there that are causing the problems.
> 
> The only atomic memory operations in there should be atomic loads of
> word or atomic_t sized and aligned locations, which should just be
> normal loads on any architecture?
> 
> The only atomic RMW you might see in that function would come from
> free_page_mlock (which you moved out of there, and anyway can be
> made non-atomic).
> 

You're right, they're normal loads. I wasn't looking at the resulting
assembly closely enough.  I saw a lock and branch in that general area of
free_pages_check(), remembered that it was an atomic read, conflated the
lock-bit-clear with the atomic read and went astray from there. Silly.

> I'd like you to just reevaluate it after your patchset, after the
> patch to make mlock non-atomic, and my patch I just sent.
> 

I re-evaluated with your patch in place of the check being dropped. With
the mlock bit clear moved out of the way, the assembly looks grand and the
amount of time being spent in that check is ok according to profiles

  o roughly 70 samples out of 398 in __free_pages_ok()
  o 2354 samples out of 31295 in free_pcp_pages()
  o 859 samples out of 35362 get_page_from_freelist 

I guess it's 7.5% of the free_pcp_pages() path but it would probably cause
more hassle with hard-to-debug problems the check was removed.

I was momentarily concerned about the compound aspect of page_count. We can
have compound pages in the __free_pages_ok() path and we'll end up checking
the count for each of the sub-pages instead of the head page. It shouldn't be
a problem as the count should be zero for each of the tail pages. A positive
count is a bug and will now trigger where in fact we would have missed it
before. I convinced myself that this change is ok but if anyone can spot a
problem with this reasoning, please shout now.

Is the page_mapcount() change in your patch really necessary? Unlikely
page_count(), it does not check for a compound page so it's not branching
like page_count() is. I don't think it is so I dropped that part of the
patch for the moment.

> 
> > > I have seen trigger on quite a lot of occasions (due to kernel bugs
> > > and hardware bugs, and in each case it is better to warn than not,
> > > even if many other situations can go undetected).
> >
> > Have you really seen it trigger for the allocation path or did it
> > trigger in teh free path? Essentially we are making the same check on
> > every allocation and free which is why I considered it excessivly
> > paranoid.
> 
> Yes I've seen it trigger in the allocation path. Kernel memory scribbles
> or RAM errors.
> 

That's the type of situation I expected it to occur but felt that the free
path would be sufficient. However, I'm convinced now to leave it in place,
particularly as its cost is not as excessive as I initially believed.

> 
> > > One problem is that some of the calls we're making in page_alloc.c
> > > do the compound_head() thing, wheras we know that we only want to
> > > look at this page. I've attached a patch which cuts out about 150
> > > bytes of text and several branches from these paths.
> >
> > Nice, I should have spotted that. I'm going to fold this into the series
> > if that is ok with you? I'll replace patch 17 with it and see does it
> > still show up on profiles.
> 
> Great! Sure fold it in (and put SOB: me on there if you like).
> 

Done, thanks. The version I'm currently using is below.

> 
> > > > So, by and large it's an improvement of some sort.
> > >
> > > Most of these benchmarks *really* need to be run quite a few times to get
> > > a reasonable confidence.
> >
> > Most are run repeatedly and an average taken but I should double check
> > what is going on. It's irritating that gains/regressions are
> > inconsistent between different machine types but that is nothing new.
> 
> Yeah. Cache behaviour maybe. One thing you might try is to size the struct
> page out to 64 bytes if it isn't already. This could bring down any skews
> if one kernel is lucky to get a nice packing of pages, or another is unlucky
> to get lots of struct pages spread over 2 cachelines. Maybe I'm just
> thinking wishfully :)
> 

It's worth an investigate :)

> I think with many of your changes, common sense will tell us that it is a
> better code sequence. Sometimes it's just impossible to really get
> "scientific proof" :)
> 

Sounds good to me but I'm hoping that it'll be possible to show a gains in
a few benchmarks on a few machines without large regressions showing up.

The replacement patch now looks like

=====
    Do not check for compound pages during the page allocator sanity checks
    
    A number of sanity checks are made on each page allocation and free
    including that the page count is zero. page_count() checks for
    compound pages and checks the count of the head page if true. However,
    in these paths, we do not care if the page is compound or not as the
    count of each tail page should also be zero.
    
    This patch makes two changes to the use of page_count() in the free path. It
    converts one check of page_count() to a VM_BUG_ON() as the count should
    have been unconditionally checked earlier in the free path. It also avoids
    checking for compound pages.
    
    [mel@csn.ul.ie: Wrote changelog]
    Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e598da8..8a8db71 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -426,7 +426,7 @@ static inline int page_is_buddy(struct page *page, struct page *buddy,
 		return 0;
 
 	if (PageBuddy(buddy) && page_order(buddy) == order) {
-		BUG_ON(page_count(buddy) != 0);
+		VM_BUG_ON(page_count(buddy) != 0);
 		return 1;
 	}
 	return 0;
@@ -503,7 +503,7 @@ static inline int free_pages_check(struct page *page)
 {
 	if (unlikely(page_mapcount(page) |
 		(page->mapping != NULL)  |
-		(page_count(page) != 0)  |
+		(atomic_read(&page->_count) != 0) |
 		(page->flags & PAGE_FLAGS_CHECK_AT_FREE))) {
 		bad_page(page);
 		return 1;
@@ -648,7 +648,7 @@ static int prep_new_page(struct page *page, int order, gfp_t gfp_flags)
 {
 	if (unlikely(page_mapcount(page) |
 		(page->mapping != NULL)  |
-		(page_count(page) != 0)  |
+		(atomic_read(&page->_count) != 0)  |
 		(page->flags & PAGE_FLAGS_CHECK_AT_PREP))) {
 		bad_page(page);
 		return 1;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

     prev parent reply	other threads:[~2009-02-23 20:27 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-22 23:17 Mel Gorman
2009-02-22 23:17 ` [PATCH 01/20] Replace __alloc_pages_internal() with __alloc_pages_nodemask() Mel Gorman
2009-02-22 23:17 ` [PATCH 02/20] Do not sanity check order in the fast path Mel Gorman
2009-02-22 23:17 ` [PATCH 03/20] Do not check NUMA node ID when the caller knows the node is valid Mel Gorman
2009-02-23 15:01   ` Christoph Lameter
2009-02-23 16:24     ` Mel Gorman
2009-02-22 23:17 ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated values Mel Gorman
2009-02-23 11:55   ` [PATCH] mm: clean up __GFP_* flags a bit Peter Zijlstra
2009-02-23 18:01     ` Mel Gorman
2009-02-23 20:27       ` Vegard Nossum
2009-02-23 15:23   ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated values Christoph Lameter
2009-02-23 15:41     ` Nick Piggin
2009-02-23 15:43       ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated value Christoph Lameter
2009-02-23 16:40         ` Mel Gorman
2009-02-23 17:03           ` Christoph Lameter
2009-02-24  1:32           ` KAMEZAWA Hiroyuki
2009-02-24  3:59             ` Nick Piggin
2009-02-24  5:20               ` KAMEZAWA Hiroyuki
2009-02-24 11:36             ` Mel Gorman
2009-02-23 16:33     ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated values Mel Gorman
2009-02-23 16:33       ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated value Christoph Lameter
2009-02-23 17:41         ` Mel Gorman
2009-02-22 23:17 ` [PATCH 05/20] Check only once if the zonelist is suitable for the allocation Mel Gorman
2009-02-22 23:17 ` [PATCH 06/20] Break up the allocator entry point into fast and slow paths Mel Gorman
2009-02-22 23:17 ` [PATCH 07/20] Simplify the check on whether cpusets are a factor or not Mel Gorman
2009-02-23  7:14   ` Pekka J Enberg
2009-02-23  9:07     ` Peter Zijlstra
2009-02-23  9:13       ` Pekka Enberg
2009-02-23 11:39         ` Mel Gorman
2009-02-23 13:19           ` Pekka Enberg
2009-02-23  9:14   ` Li Zefan
2009-02-22 23:17 ` [PATCH 08/20] Move check for disabled anti-fragmentation out of fastpath Mel Gorman
2009-02-22 23:17 ` [PATCH 09/20] Calculate the preferred zone for allocation only once Mel Gorman
2009-02-22 23:17 ` [PATCH 10/20] Calculate the migratetype " Mel Gorman
2009-02-22 23:17 ` [PATCH 11/20] Inline get_page_from_freelist() in the fast-path Mel Gorman
2009-02-23  7:21   ` Pekka Enberg
2009-02-23 11:42     ` Mel Gorman
2009-02-23 15:32   ` Nick Piggin
2009-02-24 13:32     ` Mel Gorman
2009-02-24 14:08       ` Nick Piggin
2009-02-24 15:03         ` Mel Gorman
2009-02-22 23:17 ` [PATCH 12/20] Inline __rmqueue_smallest() Mel Gorman
2009-02-22 23:17 ` [PATCH 13/20] Inline buffered_rmqueue() Mel Gorman
2009-02-23  7:24   ` Pekka Enberg
2009-02-23 11:44     ` Mel Gorman
2009-02-22 23:17 ` [PATCH 14/20] Do not call get_pageblock_migratetype() more than necessary Mel Gorman
2009-02-22 23:17 ` [PATCH 15/20] Do not disable interrupts in free_page_mlock() Mel Gorman
2009-02-23  9:19   ` Peter Zijlstra
2009-02-23 12:23     ` Mel Gorman
2009-02-23 12:44       ` Peter Zijlstra
2009-02-23 14:25         ` Mel Gorman
2009-02-22 23:17 ` [PATCH 16/20] Do not setup zonelist cache when there is only one node Mel Gorman
2009-02-22 23:17 ` [PATCH 17/20] Do not double sanity check page attributes during allocation Mel Gorman
2009-02-22 23:17 ` [PATCH 18/20] Split per-cpu list into one-list-per-migrate-type Mel Gorman
2009-02-22 23:17 ` [PATCH 19/20] Batch free pages from migratetype per-cpu lists Mel Gorman
2009-02-22 23:17 ` [PATCH 20/20] Get rid of the concept of hot/cold page freeing Mel Gorman
2009-02-23  9:37   ` Andrew Morton
2009-02-23 23:30     ` Mel Gorman
2009-02-23 23:53       ` Andrew Morton
2009-02-24 11:51         ` Mel Gorman
2009-02-25  0:01           ` Andrew Morton
2009-02-25 16:01             ` Mel Gorman
2009-02-25 16:19               ` Andrew Morton
2009-02-26 16:37                 ` Mel Gorman
2009-02-26 17:00                   ` Christoph Lameter
2009-02-26 17:15                     ` Mel Gorman
2009-02-26 17:30                       ` Christoph Lameter
2009-02-27 11:33                         ` Nick Piggin
2009-02-27 15:40                           ` Christoph Lameter
2009-03-03 13:52                             ` Mel Gorman
2009-03-03 18:53                               ` Christoph Lameter
2009-02-27 11:38                       ` Nick Piggin
2009-03-01 10:37                         ` KOSAKI Motohiro
2009-02-25 18:33               ` Christoph Lameter
2009-02-22 23:57 ` [RFC PATCH 00/20] Cleanup and optimise the page allocator Andi Kleen
2009-02-23 12:34   ` Mel Gorman
2009-02-23 15:34   ` [RFC PATCH 00/20] Cleanup and optimise the page allocato Christoph Lameter
2009-02-23  0:02 ` [RFC PATCH 00/20] Cleanup and optimise the page allocator Andi Kleen
2009-02-23 14:32   ` Mel Gorman
2009-02-23 17:49     ` Andi Kleen
2009-02-24 14:32       ` Mel Gorman
2009-02-23  7:29 ` Pekka Enberg
2009-02-23  8:34   ` Zhang, Yanmin
2009-02-23  9:10   ` KOSAKI Motohiro
2009-02-23 11:55 ` [PATCH] mm: gfp_to_alloc_flags() Peter Zijlstra
2009-02-23 14:00   ` Pekka Enberg
2009-02-23 18:17   ` Mel Gorman
2009-02-23 20:09     ` Peter Zijlstra
2009-02-23 22:59   ` Andrew Morton
2009-02-24  8:59     ` Peter Zijlstra
2009-02-23 14:38 ` [RFC PATCH 00/20] Cleanup and optimise the page allocator Christoph Lameter
2009-02-23 14:46 ` Nick Piggin
2009-02-23 15:00   ` Mel Gorman
2009-02-23 15:22     ` Nick Piggin
2009-02-23 20:26       ` Mel Gorman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090223202659.GT6740@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=cl@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ming.m.lin@intel.com \
    --cc=nickpiggin@yahoo.com.au \
    --cc=npiggin@suse.de \
    --cc=penberg@cs.helsinki.fi \
    --cc=riel@redhat.com \
    --cc=yanmin_zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox