From mboxrd@z Thu Jan  1 00:00:00 1970
Date: Wed, 27 Jun 2007 00:21:48 +0200
From: Andrea Arcangeli <andrea@suse.de>
Subject: Re: [PATCH 01 of 16] remove nr_scan_inactive/active
Message-ID: <20070626222148.GB22366@v2.random>
References: <8e38f7656968417dfee0.1181332979@v2.random> <466C36AE.3000101@redhat.com> <20070610181700.GC7443@v2.random> <46814829.8090808@redhat.com> <20070626203743.GG7059@v2.random> <46817DB0.80105@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <46817DB0.80105@redhat.com>
Sender: owner-linux-mm@kvack.org
Return-Path: <owner-linux-mm@kvack.org>
To: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>
List-ID: <linux-mm.kvack.org>

On Tue, Jun 26, 2007 at 04:57:20PM -0400, Rik van Riel wrote:
> Yes, but I would hope that the system would be disk bound
> at that time instead of CPU bound.
> 
> There was no swap IO going on yet, the system was just
> wasting CPU time in the VM.

That seems a separate problem, 01 starts wasting cpu sooner and that's
the regression you discovered, but mainline wastes cpu the same way
too later on. We should do some profiling like Andrew suggested to see
what's going on when it starts trashing cpu (perhaps it's some smp
lock? you said you've only 4 cores so it must be some highly contended
one if it's really a lock).
 
> Oh, I like your simplification of the code, too.
> 
> I was running the test to see if that patch could be
> merged without any negative side effects, because I
> would have liked to see it.

I see. Good that you tested this with this workload so we noticed this
regression. At the moment I hope it's only a tuning knob in the
DEF_PRIORITY (or similar), it'd be really sad if this had a magic racy
behavior that wouldn't be reproducible with a static non-racy
algorithm.

If nothing else, if we want to stick with this explicit smp race in
the vm core, somebody should at least attempt to document how they can
predict what the race will do at runtime, because to me it seems quite
an unpredictable beast. On average it will probably reach a stable
state, but this stable state will depend on the speed of the cpu
caches and on the number of cpus, on the architecture and on the
assembly generated by gcc, and then the race will trigger more or less
or in a different way...

> However, neither of the two seems to be IO bound
> at that point...

Yes. For now I'd be happy to see the same results for both to
eliminate the regression.

> Not only is the AIM7 test perfectly repeatable, it also
> causes the VM to show some of the same behaviour that
> customers are seeing in the field with large JVM workloads.

Sounds good, thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>