From: Rik van Riel <riel@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <andrea@suse.de>, linux-mm@kvack.org
Subject: Re: [PATCH 01 of 16] remove nr_scan_inactive/active
Date: Thu, 28 Jun 2007 19:04:05 -0400 [thread overview]
Message-ID: <46843E65.3020008@redhat.com> (raw)
In-Reply-To: <20070628155715.49d051c9.akpm@linux-foundation.org>
Andrew Morton wrote:
> On Thu, 28 Jun 2007 18:44:56 -0400
> Rik van Riel <riel@redhat.com> wrote:
>
>> Andrew Morton wrote:
>>
>>> Where's the system time being spent?
>> OK, it turns out that there is quite a bit of variability
>> in where the system spends its time. I did a number of
>> reaim runs and averaged the time the system spent in the
>> top functions.
>>
>> This is with the Fedora rawhide kernel config, which has
>> quite a few debugging options enabled.
>>
>> _raw_spin_lock 32.0%
>> page_check_address 12.7%
>> __delay 10.8%
>> mwait_idle 10.4%
>> anon_vma_unlink 5.7%
>> __anon_vma_link 5.3%
>> lockdep_reset_lock 3.5%
>> __kmalloc_node_track_caller 2.8%
>> security_port_sid 1.8%
>> kfree 1.6%
>> anon_vma_link 1.2%
>> page_referenced_one 1.1%
>>
>> In short, the system is waiting on the anon_vma lock.
>
> Sigh. We had a workload (forget which, still unfixed) in which things
> would basically melt down in that linear anon_vma walk, walking 10,000 or
> more vma's. I wonder if that's what's happening here?
That would be a large multi-threaded application that fills up
memory. Customers are reproducing this with JVMs on some very
large systems.
> Also, one thing to watch out for here is a problem with the spinlocks
> themselves: the problem wherein the cores in one package keep rattling the
> lock around between them and never let it out for the cores in another
> package to grab.
This is a single package quad core system, though.
>> I wonder if Lee Schemmerhorn's patch to turn that
>> spinlock into an rwlock would help this workload,
>> or if we simply should scan fewer pages in the
>> pageout code.
>
> Maybe. I'm thinking that the problem here is really due to the huge amount
> of processing which needs to occur when we are in the "all pages active,
> referenced" state and then we hit pages_low. Panic time, we need to scan
> and deactivate a huge amount of stuff.
>
> Would it not be better to prevent that situation from occurring by doing a
> bit of scanning and balancing when adding pages to the LRU? Make sure that
> the lists will be in reasonable shape for when reclaim starts?
Agreed, we need to simply scan fewer pages.
Doing something like SEQ replacement on the anonymous (and other
swap backed) pages might just do the trick here. Page cache, of
course, should continue using a used-once scheme.
I suspect we want to split out the lists for many other reasons
anyway, as detailed on http://linux-mm.org/PageoutFailureModes
I'll whip up a patch that does this...
> That'd deoptimise those workloads which allocate and free pages but never
> enter reclaim. Probably liveable with.
If we do true SEQ replacement for anonymous pages (deactivating
active pages without regard to the referenced bit) and keep the
inactive list reasonably small that penalty should be negligable.
> We would want to avoid needlessly unmapping pages and causing more minor
> faults.
That's a minor issue, the page fault path is pretty cheap and
very scalable.
--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-06-28 23:04 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-08 20:02 [PATCH 00 of 16] OOM related fixes Andrea Arcangeli
2007-06-08 20:02 ` [PATCH 01 of 16] remove nr_scan_inactive/active Andrea Arcangeli
2007-06-10 17:36 ` Rik van Riel
2007-06-10 18:17 ` Andrea Arcangeli
2007-06-11 14:58 ` Rik van Riel
2007-06-26 17:08 ` Rik van Riel
2007-06-26 17:55 ` Andrew Morton
2007-06-26 19:02 ` Rik van Riel
2007-06-28 22:44 ` Rik van Riel
2007-06-28 22:57 ` Andrew Morton
2007-06-28 23:04 ` Rik van Riel [this message]
2007-06-28 23:13 ` Andrew Morton
2007-06-28 23:16 ` Rik van Riel
2007-06-28 23:29 ` Andrew Morton
2007-06-29 0:00 ` Rik van Riel
2007-06-29 0:19 ` Andrew Morton
2007-06-29 0:45 ` Rik van Riel
2007-06-29 1:12 ` Andrew Morton
2007-06-29 1:20 ` Rik van Riel
2007-06-29 1:29 ` Andrew Morton
2007-06-28 23:25 ` Andrea Arcangeli
2007-06-29 0:12 ` Andrew Morton
2007-06-29 13:38 ` Lee Schermerhorn
2007-06-29 14:12 ` Andrea Arcangeli
2007-06-29 14:59 ` Rik van Riel
2007-06-29 22:39 ` "Noreclaim Infrastructure" [was Re: [PATCH 01 of 16] remove nr_scan_inactive/active] Lee Schermerhorn
2007-06-29 22:42 ` RFC "Noreclaim Infrastructure - patch 1/3 basic infrastructure" Lee Schermerhorn
2007-06-29 22:44 ` RFC "Noreclaim Infrastructure patch 2/3 - noreclaim statistics..." Lee Schermerhorn
2007-06-29 22:49 ` "Noreclaim - client patch 3/3 - treat pages w/ excessively references anon_vma as nonreclaimable" Lee Schermerhorn
2007-06-26 20:37 ` [PATCH 01 of 16] remove nr_scan_inactive/active Andrea Arcangeli
2007-06-26 20:57 ` Rik van Riel
2007-06-26 22:21 ` Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 02 of 16] avoid oom deadlock in nfs_create_request Andrea Arcangeli
2007-06-10 17:38 ` Rik van Riel
2007-06-10 18:27 ` Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 03 of 16] prevent oom deadlocks during read/write operations Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 04 of 16] serialize oom killer Andrea Arcangeli
2007-06-09 6:43 ` Peter Zijlstra
2007-06-09 15:27 ` Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 05 of 16] avoid selecting already killed tasks Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 06 of 16] reduce the probability of an OOM livelock Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 07 of 16] balance_pgdat doesn't return the number of pages freed Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 08 of 16] don't depend on PF_EXITING tasks to go away Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 09 of 16] fallback killing more tasks if tif-memdie doesn't " Andrea Arcangeli
2007-06-08 21:57 ` Christoph Lameter
2007-06-08 20:03 ` [PATCH 10 of 16] stop useless vm trashing while we wait the TIF_MEMDIE task to exit Andrea Arcangeli
2007-06-08 21:48 ` Christoph Lameter
2007-06-09 1:59 ` Andrea Arcangeli
2007-06-09 3:01 ` Christoph Lameter
2007-06-09 14:05 ` Andrea Arcangeli
2007-06-09 14:38 ` Andrea Arcangeli
2007-06-11 16:07 ` Christoph Lameter
2007-06-11 16:50 ` Andrea Arcangeli
2007-06-11 16:57 ` Christoph Lameter
2007-06-11 17:51 ` Andrea Arcangeli
2007-06-11 17:56 ` Christoph Lameter
2007-06-11 18:22 ` Andrea Arcangeli
2007-06-11 18:39 ` Christoph Lameter
2007-06-11 18:58 ` Andrea Arcangeli
2007-06-11 19:25 ` Christoph Lameter
2007-06-11 16:04 ` Christoph Lameter
2007-06-08 20:03 ` [PATCH 11 of 16] the oom schedule timeout isn't needed with the VM_is_OOM logic Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 12 of 16] show mem information only when a task is actually being killed Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 13 of 16] simplify oom heuristics Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 14 of 16] oom select should only take rss into account Andrea Arcangeli
2007-06-10 17:17 ` Rik van Riel
2007-06-10 17:30 ` Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 15 of 16] limit reclaim if enough pages have been freed Andrea Arcangeli
2007-06-10 17:20 ` Rik van Riel
2007-06-10 17:32 ` Andrea Arcangeli
2007-06-10 17:52 ` Rik van Riel
2007-06-11 16:23 ` Christoph Lameter
2007-06-11 16:57 ` Rik van Riel
2007-06-08 20:03 ` [PATCH 16 of 16] avoid some lock operation in vm fast path Andrea Arcangeli
2007-06-08 21:26 ` [PATCH 00 of 16] OOM related fixes William Lee Irwin III
2007-06-09 14:55 ` Andrea Arcangeli
2007-06-12 8:58 ` Petr Tesarik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46843E65.3020008@redhat.com \
--to=riel@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=andrea@suse.de \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox