From: Mel Gorman <mel@csn.ul.ie>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, penberg@cs.helsinki.fi, riel@redhat.com,
kosaki.motohiro@jp.fujitsu.com, cl@linux-foundation.org,
hannes@cmpxchg.org, npiggin@suse.de,
linux-kernel@vger.kernel.org, ming.m.lin@intel.com,
yanmin_zhang@linux.intel.com
Subject: Re: [PATCH 20/20] Get rid of the concept of hot/cold page freeing
Date: Thu, 26 Feb 2009 16:37:51 +0000 [thread overview]
Message-ID: <20090226163751.GG32756@csn.ul.ie> (raw)
In-Reply-To: <20090225081954.8776ba9b.akpm@linux-foundation.org>
On Wed, Feb 25, 2009 at 08:19:54AM -0800, Andrew Morton wrote:
> On Wed, 25 Feb 2009 16:01:25 +0000 Mel Gorman <mel@csn.ul.ie> wrote:
>
> > ...
> >
> > > That would rub out the benefit which that microbenchmark
> > > demonstrated?
> > >
> >
> > It'd impact it for sure. Due to the non-temporal stores, I'm surprised
> > there is any measurable impact from the patch. This has likely been the
> > case since commit 0812a579c92fefa57506821fa08e90f47cb6dbdd. My reading of
> > this (someone correct/enlighten) is that even if the data was cache hot,
> > it is pushed out as a result of the non-temporal access.
>
> yup, that's my understanding.
>
> > The changelog doesn't give the reasoning for using uncached accesses but maybe
> > it's because for filesystem writes, it is not expected that the data will be
> > accessed by the CPU any more and the storage device driver has less work to
> > do to ensure the data in memory is not dirty in cache (this is speculation,
> > I don't know for sure what the expected benefit is meant to be but it might
> > be in the manual, I'll check later).
> >
> > Thinking of alternative microbenchmarks that might show this up....
>
> Well, 0812a579c92fefa57506821fa08e90f47cb6dbdd is beeing actively
> discussed over in the "Performance regression in write() syscall"
> thread. There are patches there which disable the movnt for
> less-than-PAGE_SIZE copies. Perhaps adapt those to disable movnt
> altogether to then see whether the use of movnt broke the advantages
> which hot-cold-pages gave us?
>
I checked just what that patch was doing with write-truncate and the results
show that using temporal access for small files appeared to have a huge
positive difference for the microbenchmark. It also showed that hot/cold
freeing (i.e. the current code) was a gain when temporal accesses were used
but then I saw a big problem with the benchmark.
The deviations between runs are huge - really huge and I had missed that
before. I redid the test to run a larger number of iterations and then 20
times in a row on a kernel with hot/cold freeing and I got;
size avg stddev
64 3.337564 0.619085
128 2.753963 0.461398
256 2.556934 0.461848
512 2.736831 0.475484
1024 2.561668 0.470887
2048 2.719766 0.478039
4096 2.963039 0.407311
8192 4.043475 0.236713
16384 6.098094 0.249132
32768 9.439190 0.143978
where size is the size of the write/truncate, avg is the average time and the
stddev is the standard deviation. For small sizes, it's too massive to draw
any reasonable conclusion from the microbenchmark. Factors like scheduling,
whether sync happened and a host of other issues muck up the results.
More importantly, I then checked how many times we freed cold pages during
the test and the answer is ..... *never*. They were all hot page releases
which is what my patch originally forced and the profiles agreed because they
showed no samples in the "if (cold)" branch. Cold pages were only freed if I
made kswapd kick off which was my original expectation as a system reclaiming
is currently polluting cache with scanning so it's not important.
Based on that nugget, the patch makes common sense because we never take the
cold branch at a time we care. Common sense also tells me the patch should
be an improvement because pagevec is smaller. Proving it's a good change is
not working out very well at all.
> argh.
>
> Sorry to be pushing all this kernel archeology at you.
Don't be. This was time well spent in my opinion.
> Sometimes I
> think we're insufficiently careful about removing old stuff - it can
> turn out that it wasn't that bad after all! (cf slab.c...)
>
Agreed. Better safe than sorry.
> > Repeated setup, populate and teardown of pagetables might show up something
> > as it should benefit if the pages were cache hot but the cost of faulting
> > and zeroing might hide it.
>
> Well, pagetables have been churning for years, with special-cased
> magazining, quicklists, magazining of known-to-be-zeroed pages, etc.
>
The known-to-be-zeroed pages is interesting and something I tried but didn't
get far enough with. One patch I did but didn't release would zero pages on
the free path if the was process exiting or if it was kswapd. It tracked if
the page was zero using page->index to record the order of the zerod page. On
allocation, it would check index and if a matching order, would not zero a
second time. I got this working for order-0 pages reliably but it didn't gain
anything because we were zeroing even more than we had to in the free path.
I should have gone at the pagetable pages as a source of zerod pages that
required no additional work and said "screw it, I'll release what I have
and see what happens".
> I've always felt that we're doing that wrong, or at least awkwardly.
> If the magazining which the core page allocator does is up-to-snuff
> then it _should_ be usable for pagetables.
>
If pagetable pages were known to be zero and handed back to the allocator
that remember zerod pages, I bet we'd get a win.
> The magazining of known-to-be-zeroed pages is a new requirement.
They don't even need a separate magazine. Put them back on the lists and
record if they are zero with page->index. Granted, it means a caller will
sometimes get pages that are zerod when they don't need to be but I think
it'd be better than larger structures or searches.
> But
> it absolutely should not be done private to the pagetable page
> allocator (if it still is - I forget), because there are other
> callsites in the kernel which want cache-hot zeroed pages, and there
> are probably other places which free up known-to-be-zeroed pages.
>
Agreed. I believe we can do it in the allocator too using page->index if I
am understanding you properly.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-02-26 16:37 UTC|newest]
Thread overview: 95+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-22 23:17 [RFC PATCH 00/20] Cleanup and optimise the page allocator Mel Gorman
2009-02-22 23:17 ` [PATCH 01/20] Replace __alloc_pages_internal() with __alloc_pages_nodemask() Mel Gorman
2009-02-22 23:17 ` [PATCH 02/20] Do not sanity check order in the fast path Mel Gorman
2009-02-22 23:17 ` [PATCH 03/20] Do not check NUMA node ID when the caller knows the node is valid Mel Gorman
2009-02-23 15:01 ` Christoph Lameter
2009-02-23 16:24 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated values Mel Gorman
2009-02-23 11:55 ` [PATCH] mm: clean up __GFP_* flags a bit Peter Zijlstra
2009-02-23 18:01 ` Mel Gorman
2009-02-23 20:27 ` Vegard Nossum
2009-02-23 15:23 ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated values Christoph Lameter
2009-02-23 15:41 ` Nick Piggin
2009-02-23 15:43 ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated value Christoph Lameter
2009-02-23 16:40 ` Mel Gorman
2009-02-23 17:03 ` Christoph Lameter
2009-02-24 1:32 ` KAMEZAWA Hiroyuki
2009-02-24 3:59 ` Nick Piggin
2009-02-24 5:20 ` KAMEZAWA Hiroyuki
2009-02-24 11:36 ` Mel Gorman
2009-02-23 16:33 ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated values Mel Gorman
2009-02-23 16:33 ` [PATCH 04/20] Convert gfp_zone() to use a table of precalculated value Christoph Lameter
2009-02-23 17:41 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 05/20] Check only once if the zonelist is suitable for the allocation Mel Gorman
2009-02-22 23:17 ` [PATCH 06/20] Break up the allocator entry point into fast and slow paths Mel Gorman
2009-02-22 23:17 ` [PATCH 07/20] Simplify the check on whether cpusets are a factor or not Mel Gorman
2009-02-23 7:14 ` Pekka J Enberg
2009-02-23 9:07 ` Peter Zijlstra
2009-02-23 9:13 ` Pekka Enberg
2009-02-23 11:39 ` Mel Gorman
2009-02-23 13:19 ` Pekka Enberg
2009-02-23 9:14 ` Li Zefan
2009-02-22 23:17 ` [PATCH 08/20] Move check for disabled anti-fragmentation out of fastpath Mel Gorman
2009-02-22 23:17 ` [PATCH 09/20] Calculate the preferred zone for allocation only once Mel Gorman
2009-02-22 23:17 ` [PATCH 10/20] Calculate the migratetype " Mel Gorman
2009-02-22 23:17 ` [PATCH 11/20] Inline get_page_from_freelist() in the fast-path Mel Gorman
2009-02-23 7:21 ` Pekka Enberg
2009-02-23 11:42 ` Mel Gorman
2009-02-23 15:32 ` Nick Piggin
2009-02-24 13:32 ` Mel Gorman
2009-02-24 14:08 ` Nick Piggin
2009-02-24 15:03 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 12/20] Inline __rmqueue_smallest() Mel Gorman
2009-02-22 23:17 ` [PATCH 13/20] Inline buffered_rmqueue() Mel Gorman
2009-02-23 7:24 ` Pekka Enberg
2009-02-23 11:44 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 14/20] Do not call get_pageblock_migratetype() more than necessary Mel Gorman
2009-02-22 23:17 ` [PATCH 15/20] Do not disable interrupts in free_page_mlock() Mel Gorman
2009-02-23 9:19 ` Peter Zijlstra
2009-02-23 12:23 ` Mel Gorman
2009-02-23 12:44 ` Peter Zijlstra
2009-02-23 14:25 ` Mel Gorman
2009-02-22 23:17 ` [PATCH 16/20] Do not setup zonelist cache when there is only one node Mel Gorman
2009-02-22 23:17 ` [PATCH 17/20] Do not double sanity check page attributes during allocation Mel Gorman
2009-02-22 23:17 ` [PATCH 18/20] Split per-cpu list into one-list-per-migrate-type Mel Gorman
2009-02-22 23:17 ` [PATCH 19/20] Batch free pages from migratetype per-cpu lists Mel Gorman
2009-02-22 23:17 ` [PATCH 20/20] Get rid of the concept of hot/cold page freeing Mel Gorman
2009-02-23 9:37 ` Andrew Morton
2009-02-23 23:30 ` Mel Gorman
2009-02-23 23:53 ` Andrew Morton
2009-02-24 11:51 ` Mel Gorman
2009-02-25 0:01 ` Andrew Morton
2009-02-25 16:01 ` Mel Gorman
2009-02-25 16:19 ` Andrew Morton
2009-02-26 16:37 ` Mel Gorman [this message]
2009-02-26 17:00 ` Christoph Lameter
2009-02-26 17:15 ` Mel Gorman
2009-02-26 17:30 ` Christoph Lameter
2009-02-27 11:33 ` Nick Piggin
2009-02-27 15:40 ` Christoph Lameter
2009-03-03 13:52 ` Mel Gorman
2009-03-03 18:53 ` Christoph Lameter
2009-02-27 11:38 ` Nick Piggin
2009-03-01 10:37 ` KOSAKI Motohiro
2009-02-25 18:33 ` Christoph Lameter
2009-02-22 23:57 ` [RFC PATCH 00/20] Cleanup and optimise the page allocator Andi Kleen
2009-02-23 12:34 ` Mel Gorman
2009-02-23 15:34 ` [RFC PATCH 00/20] Cleanup and optimise the page allocato Christoph Lameter
2009-02-23 0:02 ` [RFC PATCH 00/20] Cleanup and optimise the page allocator Andi Kleen
2009-02-23 14:32 ` Mel Gorman
2009-02-23 17:49 ` Andi Kleen
2009-02-24 14:32 ` Mel Gorman
2009-02-23 7:29 ` Pekka Enberg
2009-02-23 8:34 ` Zhang, Yanmin
2009-02-23 9:10 ` KOSAKI Motohiro
2009-02-23 11:55 ` [PATCH] mm: gfp_to_alloc_flags() Peter Zijlstra
2009-02-23 14:00 ` Pekka Enberg
2009-02-23 18:17 ` Mel Gorman
2009-02-23 20:09 ` Peter Zijlstra
2009-02-23 22:59 ` Andrew Morton
2009-02-24 8:59 ` Peter Zijlstra
2009-02-23 14:38 ` [RFC PATCH 00/20] Cleanup and optimise the page allocator Christoph Lameter
2009-02-23 14:46 ` Nick Piggin
2009-02-23 15:00 ` Mel Gorman
2009-02-23 15:22 ` Nick Piggin
2009-02-23 20:26 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090226163751.GG32756@csn.ul.ie \
--to=mel@csn.ul.ie \
--cc=akpm@linux-foundation.org \
--cc=cl@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ming.m.lin@intel.com \
--cc=npiggin@suse.de \
--cc=penberg@cs.helsinki.fi \
--cc=riel@redhat.com \
--cc=yanmin_zhang@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox