linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Rik van Riel <riel@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <andrea@suse.de>, linux-mm@kvack.org
Subject: Re: [PATCH 01 of 16] remove nr_scan_inactive/active
Date: Thu, 28 Jun 2007 20:45:20 -0400	[thread overview]
Message-ID: <46845620.6020906@redhat.com> (raw)
In-Reply-To: <20070628171922.2c1bd91f.akpm@linux-foundation.org>

Andrew Morton wrote:
> On Thu, 28 Jun 2007 20:00:03 -0400
> Rik van Riel <riel@redhat.com> wrote:
> 
>> Andrew Morton wrote:
>>
>>>> Scanning fewer pages in the pageout path is probably
>>>> the way to go.
>>> I don't see why that would help.  The bottom-line steady-state case is that
>>> we need to reclaim N pages per second, and we need to scan N*M vmas per
>>> second to do so.  How we chunk that up won't affect the aggregate amount of
>>> work which needs to be done.
>>>
>>> Or maybe you're referring to the ongoing LRU balancing thing.  Or to something
>>> else.
>> Yes, I am indeed talking about LRU balancing.
>>
>> We pretty much *know* that an anonymous page on the
>> active list is accessed, so why bother scanning them
>> all?
> 
> Because there might well be pages in there which haven't been accessed in
> days.  Confused.

We won't know that unless we actually did some background
scanning.  Currently hours old (or days old) referenced
bits are not cleared from anonymous pages.

>> We could just deactivate the oldest ones and clear
>> their referenced bits.
>>
>> Once they reach the end of the inactive list, we
>> check for the referenced bit again.  If the page
>> was accessed, we move it back to the active list.
> 
> ok.
> 
>> The only problem with this is that anonymous
>> pages could be easily pushed out of memory by
>> the page cache, because the page cache has
>> totally different locality of reference.
> 
> I don't immediately see why we need to change the fundamental aging design
> at all.   The problems afacit are
> 
> a) that huge burst of activity when we hit pages_high and
> 
> b) the fact that this huge burst happens on lots of CPUs at the same time.
> 
> And balancing the LRUs _prior_ to hitting pages_high can address both
> problems?

That may work on systems with up to a few GB of memory,
but customers are already rolling out systems with 256GB
of RAM for general purpose use, that's 64 million pages!

Even doing a background scan on that many pages will take
insane amounts of CPU time.

In a few years, they will be deploying systems with 1TB
of memory and throwing random workloads at them.

> It will I guess impact the page aging a bit though.

Yes, it will.  However, I believe that the current system
of page aging is simply not sustainable when memory size
gets insanely large.

>> The page cache also benefits from the use-once
>> scheme we have in place today.
>>
>> Because of these three reasons, I want to split
>> the page cache LRU lists from the anonymous
>> memory LRU lists.
>>
>> Does this make sense to you?
> 
> Could do, don't know.    What new problems will it introduce? :(

The obvious problem is how to balance the eviction of
page cache backed pages versus the eviction of swap
backed pages.

The "good news" here is that the current VM does not
really balance this either, but relies on system
administrators to tweak /proc/sys/vm/swappiness on
systems that run a "corner case" workload.

>>>> No matter how efficient we make the scanning of one
>>>> individual page, we simply cannot scan through 1TB
>>>> worth of anonymous pages (which are all referenced
>>>> because they've been there for a week) in order to
>>>> deactivate something.
>>> Sure.  And we could avoid that sudden transition by balancing the LRU prior
>>> to hitting the great pages_high wall.
>> Yes, we will need to do some preactive balancing.
> 
> OK..
> 
> And that huge anon-vma walk might need attention.  At the least we could do
> something to prevent lots of CPUs from piling up in there.

Speaking of which, I have also seen a thousand processes waiting
to grab the iprune_mutex in prune_icache.

Maybe direct reclaim processes should not dive into this cache
at all, but simply increase some variable indicating that kswapd
might want to prune some extra pages from this cache on its next
run?

-- 
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2007-06-29  0:45 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-08 20:02 [PATCH 00 of 16] OOM related fixes Andrea Arcangeli
2007-06-08 20:02 ` [PATCH 01 of 16] remove nr_scan_inactive/active Andrea Arcangeli
2007-06-10 17:36   ` Rik van Riel
2007-06-10 18:17     ` Andrea Arcangeli
2007-06-11 14:58       ` Rik van Riel
2007-06-26 17:08       ` Rik van Riel
2007-06-26 17:55         ` Andrew Morton
2007-06-26 19:02           ` Rik van Riel
2007-06-28 22:44           ` Rik van Riel
2007-06-28 22:57             ` Andrew Morton
2007-06-28 23:04               ` Rik van Riel
2007-06-28 23:13                 ` Andrew Morton
2007-06-28 23:16                   ` Rik van Riel
2007-06-28 23:29                     ` Andrew Morton
2007-06-29  0:00                       ` Rik van Riel
2007-06-29  0:19                         ` Andrew Morton
2007-06-29  0:45                           ` Rik van Riel [this message]
2007-06-29  1:12                             ` Andrew Morton
2007-06-29  1:20                               ` Rik van Riel
2007-06-29  1:29                                 ` Andrew Morton
2007-06-28 23:25                   ` Andrea Arcangeli
2007-06-29  0:12                     ` Andrew Morton
2007-06-29 13:38             ` Lee Schermerhorn
2007-06-29 14:12               ` Andrea Arcangeli
2007-06-29 14:59                 ` Rik van Riel
2007-06-29 22:39                 ` "Noreclaim Infrastructure" [was Re: [PATCH 01 of 16] remove nr_scan_inactive/active] Lee Schermerhorn
2007-06-29 22:42                 ` RFC "Noreclaim Infrastructure - patch 1/3 basic infrastructure" Lee Schermerhorn
2007-06-29 22:44                 ` RFC "Noreclaim Infrastructure patch 2/3 - noreclaim statistics..." Lee Schermerhorn
2007-06-29 22:49                 ` "Noreclaim - client patch 3/3 - treat pages w/ excessively references anon_vma as nonreclaimable" Lee Schermerhorn
2007-06-26 20:37         ` [PATCH 01 of 16] remove nr_scan_inactive/active Andrea Arcangeli
2007-06-26 20:57           ` Rik van Riel
2007-06-26 22:21             ` Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 02 of 16] avoid oom deadlock in nfs_create_request Andrea Arcangeli
2007-06-10 17:38   ` Rik van Riel
2007-06-10 18:27     ` Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 03 of 16] prevent oom deadlocks during read/write operations Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 04 of 16] serialize oom killer Andrea Arcangeli
2007-06-09  6:43   ` Peter Zijlstra
2007-06-09 15:27     ` Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 05 of 16] avoid selecting already killed tasks Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 06 of 16] reduce the probability of an OOM livelock Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 07 of 16] balance_pgdat doesn't return the number of pages freed Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 08 of 16] don't depend on PF_EXITING tasks to go away Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 09 of 16] fallback killing more tasks if tif-memdie doesn't " Andrea Arcangeli
2007-06-08 21:57   ` Christoph Lameter
2007-06-08 20:03 ` [PATCH 10 of 16] stop useless vm trashing while we wait the TIF_MEMDIE task to exit Andrea Arcangeli
2007-06-08 21:48   ` Christoph Lameter
2007-06-09  1:59     ` Andrea Arcangeli
2007-06-09  3:01       ` Christoph Lameter
2007-06-09 14:05         ` Andrea Arcangeli
2007-06-09 14:38           ` Andrea Arcangeli
2007-06-11 16:07             ` Christoph Lameter
2007-06-11 16:50               ` Andrea Arcangeli
2007-06-11 16:57                 ` Christoph Lameter
2007-06-11 17:51                   ` Andrea Arcangeli
2007-06-11 17:56                     ` Christoph Lameter
2007-06-11 18:22                       ` Andrea Arcangeli
2007-06-11 18:39                         ` Christoph Lameter
2007-06-11 18:58                           ` Andrea Arcangeli
2007-06-11 19:25                             ` Christoph Lameter
2007-06-11 16:04           ` Christoph Lameter
2007-06-08 20:03 ` [PATCH 11 of 16] the oom schedule timeout isn't needed with the VM_is_OOM logic Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 12 of 16] show mem information only when a task is actually being killed Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 13 of 16] simplify oom heuristics Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 14 of 16] oom select should only take rss into account Andrea Arcangeli
2007-06-10 17:17   ` Rik van Riel
2007-06-10 17:30     ` Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 15 of 16] limit reclaim if enough pages have been freed Andrea Arcangeli
2007-06-10 17:20   ` Rik van Riel
2007-06-10 17:32     ` Andrea Arcangeli
2007-06-10 17:52       ` Rik van Riel
2007-06-11 16:23         ` Christoph Lameter
2007-06-11 16:57           ` Rik van Riel
2007-06-08 20:03 ` [PATCH 16 of 16] avoid some lock operation in vm fast path Andrea Arcangeli
2007-06-08 21:26 ` [PATCH 00 of 16] OOM related fixes William Lee Irwin III
2007-06-09 14:55   ` Andrea Arcangeli
2007-06-12  8:58     ` Petr Tesarik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46845620.6020906@redhat.com \
    --to=riel@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrea@suse.de \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox