linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Rik van Riel <riel@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <andrea@suse.de>, linux-mm@kvack.org
Subject: Re: [PATCH 01 of 16] remove nr_scan_inactive/active
Date: Thu, 28 Jun 2007 21:20:40 -0400	[thread overview]
Message-ID: <46845E68.9070508@redhat.com> (raw)
In-Reply-To: <20070628181238.372828fa.akpm@linux-foundation.org>

Andrew Morton wrote:
> On Thu, 28 Jun 2007 20:45:20 -0400
> Rik van Riel <riel@redhat.com> wrote:
> 
>>>> The only problem with this is that anonymous
>>>> pages could be easily pushed out of memory by
>>>> the page cache, because the page cache has
>>>> totally different locality of reference.
>>> I don't immediately see why we need to change the fundamental aging design
>>> at all.   The problems afacit are
>>>
>>> a) that huge burst of activity when we hit pages_high and
>>>
>>> b) the fact that this huge burst happens on lots of CPUs at the same time.
>>>
>>> And balancing the LRUs _prior_ to hitting pages_high can address both
>>> problems?
>> That may work on systems with up to a few GB of memory,
>> but customers are already rolling out systems with 256GB
>> of RAM for general purpose use, that's 64 million pages!
>>
>> Even doing a background scan on that many pages will take
>> insane amounts of CPU time.
>>
>> In a few years, they will be deploying systems with 1TB
>> of memory and throwing random workloads at them.
> 
> I don't see how the amount of memory changes anything here: if there are
> more pages, more work needs to be done regardless of when we do it.
> 
> Still confused.

If we deactivate some of the active pages regardless of
whether or not they were recently referenced, you end
up with "hey, I need to deactivate 1GB worth of pages",
instead of with "I need to scan through 1TB worth of
pages to find 1GB of not recently accessed ones".

Note that is the exact same argument used against the
used-once cleanups that have been proposed in the past:
it is more work to scan through the whole list than to
have pages end up in a "reclaimable" state by default.

> But the problem with the vfs caches is that they aren't node/zone-specific.
> We wouldn't want to get into the situation where 1023 CPUs are twiddling
> thumbs waiting for one CPU to free stuff up (or less extreme variants of
> this).

The direct reclaimers can free something else.  Chances are they
don't care about the little bit of memory coming out of these
caches.

We just need to make sure the pressure gets evened out later.

>> Maybe direct reclaim processes should not dive into this cache
>> at all, but simply increase some variable indicating that kswapd
>> might want to prune some extra pages from this cache on its next
>> run?
> 
> Tell the node's kswapd to go off and do VFS reclaim while the CPUs on that
> node wait for it?  That would help I guess, but those thousand processes
> would still need to block _somewhere_ waiting for the memory to come back.

Not for the VFS memory.  They can just recycle some page cache
memory or start IO on anonymous memory going into swap.

> So what we could do here is to back off when iprune_mutex is busy and, if
> nothing else works out, block in congestion_wait() (which is becoming
> increasingly misnamed).  Then, add some more smarts to congestion_wait():
> deliver a wakeup when "enough" memory got freed from the VFS caches.

Yeah, that sounds doable.  Not sure if they should wait in
congestion_wait() though, or if they should just return
to __alloc_pages() since they may already have reclaimed
enough pages from the anonymous list.

> But for now, the question is: is this a reasonable overall design?  Back
> off from contention points, block at the top-level, polling for allocatable
> memory to turn up?

I'm not convinced.  If we have already reclaimed some
pages from the inactive list, why wait in congestion_wait()
AT ALL?

-- 
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2007-06-29  1:20 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-08 20:02 [PATCH 00 of 16] OOM related fixes Andrea Arcangeli
2007-06-08 20:02 ` [PATCH 01 of 16] remove nr_scan_inactive/active Andrea Arcangeli
2007-06-10 17:36   ` Rik van Riel
2007-06-10 18:17     ` Andrea Arcangeli
2007-06-11 14:58       ` Rik van Riel
2007-06-26 17:08       ` Rik van Riel
2007-06-26 17:55         ` Andrew Morton
2007-06-26 19:02           ` Rik van Riel
2007-06-28 22:44           ` Rik van Riel
2007-06-28 22:57             ` Andrew Morton
2007-06-28 23:04               ` Rik van Riel
2007-06-28 23:13                 ` Andrew Morton
2007-06-28 23:16                   ` Rik van Riel
2007-06-28 23:29                     ` Andrew Morton
2007-06-29  0:00                       ` Rik van Riel
2007-06-29  0:19                         ` Andrew Morton
2007-06-29  0:45                           ` Rik van Riel
2007-06-29  1:12                             ` Andrew Morton
2007-06-29  1:20                               ` Rik van Riel [this message]
2007-06-29  1:29                                 ` Andrew Morton
2007-06-28 23:25                   ` Andrea Arcangeli
2007-06-29  0:12                     ` Andrew Morton
2007-06-29 13:38             ` Lee Schermerhorn
2007-06-29 14:12               ` Andrea Arcangeli
2007-06-29 14:59                 ` Rik van Riel
2007-06-29 22:39                 ` "Noreclaim Infrastructure" [was Re: [PATCH 01 of 16] remove nr_scan_inactive/active] Lee Schermerhorn
2007-06-29 22:42                 ` RFC "Noreclaim Infrastructure - patch 1/3 basic infrastructure" Lee Schermerhorn
2007-06-29 22:44                 ` RFC "Noreclaim Infrastructure patch 2/3 - noreclaim statistics..." Lee Schermerhorn
2007-06-29 22:49                 ` "Noreclaim - client patch 3/3 - treat pages w/ excessively references anon_vma as nonreclaimable" Lee Schermerhorn
2007-06-26 20:37         ` [PATCH 01 of 16] remove nr_scan_inactive/active Andrea Arcangeli
2007-06-26 20:57           ` Rik van Riel
2007-06-26 22:21             ` Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 02 of 16] avoid oom deadlock in nfs_create_request Andrea Arcangeli
2007-06-10 17:38   ` Rik van Riel
2007-06-10 18:27     ` Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 03 of 16] prevent oom deadlocks during read/write operations Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 04 of 16] serialize oom killer Andrea Arcangeli
2007-06-09  6:43   ` Peter Zijlstra
2007-06-09 15:27     ` Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 05 of 16] avoid selecting already killed tasks Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 06 of 16] reduce the probability of an OOM livelock Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 07 of 16] balance_pgdat doesn't return the number of pages freed Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 08 of 16] don't depend on PF_EXITING tasks to go away Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 09 of 16] fallback killing more tasks if tif-memdie doesn't " Andrea Arcangeli
2007-06-08 21:57   ` Christoph Lameter
2007-06-08 20:03 ` [PATCH 10 of 16] stop useless vm trashing while we wait the TIF_MEMDIE task to exit Andrea Arcangeli
2007-06-08 21:48   ` Christoph Lameter
2007-06-09  1:59     ` Andrea Arcangeli
2007-06-09  3:01       ` Christoph Lameter
2007-06-09 14:05         ` Andrea Arcangeli
2007-06-09 14:38           ` Andrea Arcangeli
2007-06-11 16:07             ` Christoph Lameter
2007-06-11 16:50               ` Andrea Arcangeli
2007-06-11 16:57                 ` Christoph Lameter
2007-06-11 17:51                   ` Andrea Arcangeli
2007-06-11 17:56                     ` Christoph Lameter
2007-06-11 18:22                       ` Andrea Arcangeli
2007-06-11 18:39                         ` Christoph Lameter
2007-06-11 18:58                           ` Andrea Arcangeli
2007-06-11 19:25                             ` Christoph Lameter
2007-06-11 16:04           ` Christoph Lameter
2007-06-08 20:03 ` [PATCH 11 of 16] the oom schedule timeout isn't needed with the VM_is_OOM logic Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 12 of 16] show mem information only when a task is actually being killed Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 13 of 16] simplify oom heuristics Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 14 of 16] oom select should only take rss into account Andrea Arcangeli
2007-06-10 17:17   ` Rik van Riel
2007-06-10 17:30     ` Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 15 of 16] limit reclaim if enough pages have been freed Andrea Arcangeli
2007-06-10 17:20   ` Rik van Riel
2007-06-10 17:32     ` Andrea Arcangeli
2007-06-10 17:52       ` Rik van Riel
2007-06-11 16:23         ` Christoph Lameter
2007-06-11 16:57           ` Rik van Riel
2007-06-08 20:03 ` [PATCH 16 of 16] avoid some lock operation in vm fast path Andrea Arcangeli
2007-06-08 21:26 ` [PATCH 00 of 16] OOM related fixes William Lee Irwin III
2007-06-09 14:55   ` Andrea Arcangeli
2007-06-12  8:58     ` Petr Tesarik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46845E68.9070508@redhat.com \
    --to=riel@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrea@suse.de \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox