Re: [PATCH] Avoiding fragmentation through different allocator

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mel Gorman <mel@csn.ul.ie>
To: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Cc: William Lee Irwin III <wli@holomorphy.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] Avoiding fragmentation through different allocator
Date: Sat, 22 Jan 2005 21:48:20 +0000 (GMT)	[thread overview]
Message-ID: <Pine.LNX.4.58.0501222128380.18282@skynet> (raw)
In-Reply-To: <20050121142854.GH19973@logos.cnet>

On Fri, 21 Jan 2005, Marcelo Tosatti wrote:

> On Thu, Jan 20, 2005 at 10:13:00AM +0000, Mel Gorman wrote:
> > <Changelog snipped>
>
> Hi Mel,
>
> I was thinking that it would be nice to have a set of high-order
> intensive workloads, and I wonder what are the most common high-order
> allocation paths which fail.
>

Agreed. As I am not fully sure what workloads require high-order
allocations, I updated VMRegress to keep track of the count of
allocations and released 0.11
(http://www.csn.ul.ie/~mel/projects/vmregress/vmregress-0.11.tar.gz). To
use it to track allocations, do the following

1. Download and unpack vmregress
2. Patch a kernel with kernel_patches/v2.6/trace_pagealloc-count.diff .
The patch currently requires the modified allocator but I can fix that up
if people want it. Build and deploy the kernel
3. Build vmregress by
  ./configure --with-linux=/usr/src/linux-2.6.11-rc1-mbuddy
  (or whatever path is appropriate)
  make
4. Load the modules with;
  insmod src/code/vmregress_core.ko
  insmod src/sense/trace_alloccount.ko

This will create a proc entry /proc/vmregress/trace_alloccount that looks
something like;

Allocations (V1)
-----------
KernNoRclm   997453      370       50        0        0        0        0        0        0        0        0
KernRclm      35279        0        0        0        0        0        0        0        0        0        0
UserRclm    9870808        0        0        0        0        0        0        0        0        0        0
Total      10903540      370       50        0        0        0        0        0        0        0        0

Frees
-----
KernNoRclm   590965      244       28        0        0        0        0        0        0        0        0
KernRclm     227100       60        5        0        0        0        0        0        0        0        0
UserRclm    7974200       73       17        0        0        0        0        0        0        0        0
Total      19695805      747      100        0        0        0        0        0        0        0        0

To blank the counters, use

echo 0 > /proc/vmregress/trace_alloccount

Whatever workload we come up with, this proc entry will tell us if it is
exercising high-order allocations right now.

> It mostly depends on hardware because most high-order allocations happen
> inside device drivers? What are the kernel codepaths which try to do
> high-order allocations and fallback if failed?
>

I'm not sure. I think that the paths we exercise right now will be largely
artifical. For example, you can force order-2 allocations by scping a
large file through localhost (because of the large MTU in that interface).
I have not come up with another meaningful workload that guarentees
high-order allocations yet.

> To measure whether the cost of page migration offsets the ability to be
> able to deliver high-order allocations we want a set of meaningful
> performance tests?
>

Bear in mind, there are more considerations. The allocator potentially
makes hotplug problems easier and could be easily tied into any
page-zeroing system. Some of your own benchmarks also implied that the
modified allocator helped some types of workloads which is beneficial in
itself.The last consideration is HugeTLB pages, which I am hoping William
will weigh in.

Right now, I believe that the pool of huge pages is of a fixed size
because of fragmentation difficulties. If we knew we could allocate huge
pages, this pool would not have to be fixed. Some applications will
heavily benefit from this. While databases are the obvious one,
applications with large heaps will also benefit like Java Virtual
Machines. I can dig up papers that measured this on Solaris although I
don't have them at hand right now.

We know right now that the overhead of this allocator is fairly low
(anyone got benchmarks to disagree) but I understand that page migration
is relatively expensive. The allocator also does not have adverse
CPU+cache affects like migration and the concept is fairly simple.

> Its quite possible that not all unsatisfiable high-order allocations
> want to force page migration (which is quite expensive in terms of
> CPU/cache). Only migrate on __GFP_NOFAIL ?
>

I still believe with the allocator, we will only have to migrate in
exceptional circumstances.

> William, that same tradeoff exists for the zone balancing through
> migration idea you propose...
>

-- 
Mel Gorman
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

next prev parent reply	other threads:[~2005-01-22 21:48 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-01-20 10:13 Mel Gorman
2005-01-21 14:28 ` Marcelo Tosatti
2005-01-22 21:48   ` Mel Gorman [this message]
2005-01-22 21:59     ` Marcelo Tosatti
2005-01-23 13:28       ` Marcelo Tosatti
2005-01-24 13:28       ` Mel Gorman
2005-01-24 12:29         ` Marcelo Tosatti
2005-01-24 16:44           ` James Bottomley
2005-01-24 15:49             ` Marcelo Tosatti
2005-01-24 20:36               ` James Bottomley
2005-01-24 20:47             ` Steve Lord
2005-01-25  7:39               ` Andi Kleen
2005-01-24 19:55           ` Grant Grundler
2005-01-25 14:02 Mukker, Atul
2005-01-25 14:17 ` Steve Lord
2005-01-25 14:27   ` Christoph Hellwig
2005-01-25 14:49     ` Andi Kleen
2005-01-25 14:56 ` Andi Kleen
2005-01-25 16:12   ` Mel Gorman
2005-01-25 18:50 ` Grant Grundler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.58.0501222128380.18282@skynet \
    --to=mel@csn.ul.ie \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=marcelo.tosatti@cyclades.com \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox