linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Dave Hansen <dave@sr71.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	penberg@kernel.org, cl@linux-foundation.org
Subject: Re: [PATCH 0/9] re-shrink 'struct page' when SLUB is on.
Date: Fri, 10 Jan 2014 15:39:13 -0800	[thread overview]
Message-ID: <20140110153913.844e84755256afd271371493@linux-foundation.org> (raw)
In-Reply-To: <52D05D90.3060809@sr71.net>

On Fri, 10 Jan 2014 12:52:32 -0800 Dave Hansen <dave@sr71.net> wrote:

> On 01/05/2014 08:32 PM, Joonsoo Kim wrote:
> > On Fri, Jan 03, 2014 at 02:18:16PM -0800, Andrew Morton wrote:
> >> On Fri, 03 Jan 2014 10:01:47 -0800 Dave Hansen <dave@sr71.net> wrote:
> >>> SLUB depends on a 16-byte cmpxchg for an optimization which
> >>> allows it to not disable interrupts in its fast path.  This
> >>> optimization has some small but measurable benefits:
> >>>
> >>> 	http://lkml.kernel.org/r/52B345A3.6090700@sr71.net
> >>
> >> So really the only significant benefit from the cmpxchg16 is with
> >> cache-cold eight-byte kmalloc/kfree?  8% faster in this case?  But with
> >> cache-hot kmalloc/kfree the benefit of cmpxchg16 is precisely zero.
> > 
> > I guess that cmpxchg16 is not used in this cache-hot kmalloc/kfree test,
> > because kfree would be done in free fast-path. In this case,
> > this_cpu_cmpxchg_double() would be called, so you cannot find any effect
> > of cmpxchg16.
> 
> That's a good point.  I also confirmed this theory with the
> free_{fast,slow}path slub counters.  So, I ran another round of tests.
> 
> One important difference from the last round: I'm now writing to each
> allocation.  I originally did this so that I could store the allocations
> in a linked-list, but I also realized that it's important.  It's rare in
> practice to do an allocation and not write _something_ to it.  This
> change adds a bit of cache pressure which changed the results pretty
> substantially.
> 
> I tested 4 cases, all of these on the "cache-cold kfree()" case.  The
> first 3 are with vanilla upstream kernel source.  The 4th is patched
> with my new slub code (all single-threaded):
> 
> 	http://www.sr71.net/~dave/intel/slub/slub-perf-20140109.png

So we're converging on the most complex option.  argh.

> There are a few important takeaways here:
> 1. The double-cmpxchg optimization has a measurable benefit
> 2. 64-byte 'struct page' is faster than the 56-byte one independent of
>    the cmpxchg optimization.  Maybe because foo/sizeof(struct page) is
>    then a simple shift.
> 3. My new code is probably _slightly_ slower than the existing code,
>    but still has the huge space savings
> 4. All of these deltas are greatly magnified here and are hard or
>    impossible to demonstrate in practice.
> 
> Why does the double-cmpxchg help?  The extra cache references that it
> takes to go out and touch the paravirt structures and task struct to
> disable interrupts in the spinlock cases start to show up and hurt our
> allocation rates by about 30%.

So all this testing was performed in a VM?  If so, how much is that
likely to have impacted the results?

>  This advantage starts to evaporate when
> there is more noise in the caches, or when we start to run the tests
> across more cores.
> 
> But the real question here is whether we can shrink 'struct page'.  The
> existing (64-byte struct page) slub code wins on allocations under 256b
> by as much as 5% (the 32-byte kmalloc()), but my new code wins on
> allocations over 1k.  4k allocations just happen to be the most common
> on my systems, and they're also very near the "sweet spot" for the new
> code.  But, the delta here is _much_ smaller that it was in the spinlock
> vs. cmpxchg cases.  This advantage also evaporates when we run things
> across more cores or in less synthetic benchmarks.
> 
> I also explored that 5% hit that my code caused in the 32-byte
> allocation case.  It looked to me to be mostly explained by the code
> that I added.  There were more instructions executed and the
> cycles-per-instruction went down.  This looks to be mostly due to a ~15%
> increase in branch misses, probably from the increased code size and
> complexity.
> 
> This is the perf stat output for a run doing 16.8M kmalloc(32)/kfree()'s:
> vanilla:
> >            883,412 LLC-loads                 #    0.296 M/sec                   [39.76%]
> >            566,546 LLC-load-misses           #   64.13% of all LL-cache hits    [39.98%]
> patched:
> >            556,751 LLC-loads                 #    0.186 M/sec                   [39.86%]
> >            339,106 LLC-load-misses           #   60.91% of all LL-cache hits    [39.72%]
> 
> My best guess is that most of the LLC references are going out and
> touching the struct pages for slab management.  It's why we see such a
> big change.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-01-10 23:39 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-03 18:01 Dave Hansen
2014-01-03 18:01 ` [PATCH 1/9] mm: slab/slub: use page->list consistently instead of page->lru Dave Hansen
2014-01-03 18:01 ` [PATCH 2/9] mm: blk-mq: uses page->list incorrectly Dave Hansen
2014-01-03 18:01 ` [PATCH 3/9] mm: page->pfmemalloc only used by slab/skb Dave Hansen
2014-01-03 18:01 ` [PATCH 4/9] mm: slabs: reset page at free Dave Hansen
2014-01-03 18:01 ` [PATCH 5/9] mm: rearrange struct page Dave Hansen
2014-01-03 18:01 ` [PATCH 6/9] mm: slub: rearrange 'struct page' fields Dave Hansen
2014-01-03 18:02 ` [PATCH 7/9] mm: slub: abstract out double cmpxchg option Dave Hansen
2014-01-03 18:02 ` [PATCH 8/9] mm: slub: remove 'struct page' alignment restrictions Dave Hansen
2014-01-03 18:02 ` [PATCH 9/9] mm: slub: cleanups after code churn Dave Hansen
2014-01-03 22:18 ` [PATCH 0/9] re-shrink 'struct page' when SLUB is on Andrew Morton
2014-01-06  4:32   ` Joonsoo Kim
2014-01-10 20:52     ` Dave Hansen
2014-01-10 23:39       ` Andrew Morton [this message]
2014-01-10 23:42         ` Dave Hansen
2014-01-11  9:26           ` Pekka Enberg
2014-01-12  0:55             ` Christoph Lameter
2014-01-13  1:44               ` Joonsoo Kim
2014-01-13  3:36                 ` Davidlohr Bueso
2014-01-13 13:46                   ` Fengguang Wu
2014-01-13 15:42                     ` Dave Hansen
2014-01-13 17:16                 ` Dave Hansen
2014-01-14 20:07                   ` Christoph Lameter
2014-01-14 22:05                     ` Dave Hansen
2014-01-16 16:44                       ` Christoph Lameter
2014-01-16 17:08                         ` Dave Hansen
2014-01-16 18:26                           ` Christoph Lameter
2014-01-14 17:40               ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140110153913.844e84755256afd271371493@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=cl@linux-foundation.org \
    --cc=dave@sr71.net \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penberg@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox