Re: [RFC]numa: improve I/O performance by optimizing numa interleave allocation

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Andi Kleen <ak@linux.intel.com>
To: Shaohua Li <shaohua.li@intel.com>
Cc: lkml <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jens Axboe <axboe@kernel.dk>, Christoph Lameter <cl@linux.com>,
	lee.schermerhorn@hp.com
Subject: Re: [RFC]numa: improve I/O performance by optimizing numa interleave allocation
Date: Fri, 18 Nov 2011 09:30:14 -0800	[thread overview]
Message-ID: <20111118173013.GB25022@alboin.amr.corp.intel.com> (raw)
In-Reply-To: <1321600332.22361.309.camel@sli10-conroe>

On Fri, Nov 18, 2011 at 03:12:12PM +0800, Shaohua Li wrote:
> If mem plicy is interleaves, we will allocated pages from nodes in a round
> robin way. This surely can do interleave fairly, but not optimal.
> 
> Say the pages will be used for I/O later. Interleave allocation for two pages
> are allocated from two nodes, so the pages are not physically continuous. Later
> each page needs one segment for DMA scatter-gathering. But maxium hardware
> segment number is limited. The non-continuous pages will use up maxium
> hardware segment number soon and we can't merge I/O to bigger DMA. Allocating
> pages from one node hasn't such issue. The memory allocator pcp list makes
> we can get physically continuous pages in several alloc quite likely.

FWIW it depends a lot on the IO hardware if the SG limitation
really makes a measurable difference for IO performance. I saw some wins from 
clustering using the IOMMU before, but that was a long time ago. I wouldn't 
consider it a truth without strong numbers, and then also only
for that particular device measured.

My understanding is that modern IO devices like NHM Express will
be faster at large SG lists.

> So can we make both interleave fairness and continuous allocation happy?
> Simplily we can adjust the round robin algorithm. We switch to another node
> after several (N) allocation happens. If N isn't too big, we can still get
> fair allocation. And we get N continuous pages. I use N=8 in below patch.
> I thought 8 isn't too big for modern NUMA machine. Applications which use
> interleave are unlikely run short time, so I thought fairness still works.

It depends a lot on the CPU access pattern.

Some workloads seem to do reasonable well with 2MB huge page interleaving.
But others actually prefer the cache line interleaving supplied by 
the BIOS.

So you can have a trade off between IO and CPU performance.
When in doubt I usually opt for CPU performance by default.

I definitely wouldn't make it default, but if there are workloads
that benefits a lot it could be an additional parameter to the
interleave policy.

> Run a sequential read workload which accesses disk sdc - sdf,

What IO device is that?

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2011-11-18 17:30 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-18  7:12 Shaohua Li
2011-11-18 15:56 ` Christoph Lameter
2011-11-18 17:30 ` Andi Kleen [this message]
2011-11-21  1:39   ` Shaohua Li
2011-11-23  3:36     ` Shaohua Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111118173013.GB25022@alboin.amr.corp.intel.com \
    --to=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=cl@linux.com \
    --cc=lee.schermerhorn@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shaohua.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox