From: Christoph Lameter <clameter@sgi.com>
To: Matt Mackall <mpm@selenic.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
David Miller <davem@davemloft.net>,
Andrew Morton <akpm@linux-foundation.org>,
Daniel Phillips <phillips@google.com>,
Pekka Enberg <penberg@cs.helsinki.fi>,
Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
Steve Dickson <SteveD@redhat.com>
Subject: Re: [PATCH 04/10] mm: slub: add knowledge of reserve pages
Date: Wed, 8 Aug 2007 10:13:05 -0700 (PDT) [thread overview]
Message-ID: <Pine.LNX.4.64.0708081004290.12652@schroedinger.engr.sgi.com> (raw)
In-Reply-To: <20070808014435.GG30556@waste.org>
On Tue, 7 Aug 2007, Matt Mackall wrote:
> > If you are in an atomic context and bound to a cpu then a per cpu slab is
> > assigned to you and no one else can take object aways from that process
> > since nothing else can run on the cpu.
>
> Servicing I/O over the network requires an allocation to send a buffer
> and an allocation to later receive the acknowledgement. We can't free
> our send buffer (or the memory it's supposed to clean) until the
> relevant ack is received. We have to hold our reserves privately
> throughout, even if an interrupt that wants to do GFP_ATOMIC
> allocation shows up in-between.
If you can take an interrupt then you can move to a different allocation
context. This means reclaim could free up more pages if we tell reclaim
not to allocate any memory.
> > If you are not in an atomic context and are preemptable or can switch
> > allocation context then you can create another context in which reclaim
> > could be run to remove some clean pages and get you more memory. Again no
> > need for the patch.
>
> By the point that this patch is relevant, there are already no clean
> pages. The only way to free up more memory is via I/O.
That is never true. The dirty ratio limit limits the number of dirty pages
in memory. There is always a large percentage of memory that is kept
clean. Pages that are file backed and clean can be freed without any
additional memory allocation. This is true for the executable code that
you must have to execute any instructions. We could guarantee that the
number of pages reclaimable without memory allocs stays above certain
limits by checking VM counters.
I think there are two ways to address this in a simpler way:
1. Allow recursive calls into reclaim. If we are in a PF_MEMALLOC context
then we can still scan lru lists and free up memory of clean pages. Idea
patch follows.
2. Make pageout figure out if the write action requires actual I/O
submission. If so then the submission will *not* immediately free memory
and we have to wait for I/O to complete. In that case do not immediately
initiate I/O (which would not free up memory and its bad to initiate
I/O when we have not enough free memory) but put all those pages on a
pageout list. When reclaim has reclaimed enough memory then go through the
pageout list and trigger I/O. That can be done without PF_MEMALLOC so that
additional reclaim could be triggered as needed. Maybe we can just get rid
of PF_MEMALLOC and some of the contorted code around it?
Recursive reclaim concept patch:
---
include/linux/swap.h | 2 ++
mm/page_alloc.c | 11 +++++++++++
mm/vmscan.c | 27 +++++++++++++++++++++++++++
3 files changed, 40 insertions(+)
Index: linux-2.6/include/linux/swap.h
===================================================================
--- linux-2.6.orig/include/linux/swap.h 2007-08-08 04:31:06.000000000 -0700
+++ linux-2.6/include/linux/swap.h 2007-08-08 04:31:28.000000000 -0700
@@ -190,6 +190,8 @@ extern void swap_setup(void);
/* linux/mm/vmscan.c */
extern unsigned long try_to_free_pages(struct zone **zones, int order,
gfp_t gfp_mask);
+extern unsigned long emergency_free_pages(struct zone **zones, int order,
+ gfp_t gfp_mask);
extern unsigned long shrink_all_memory(unsigned long nr_pages);
extern int vm_swappiness;
extern int remove_mapping(struct address_space *mapping, struct page *page);
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c 2007-08-08 04:17:33.000000000 -0700
+++ linux-2.6/mm/page_alloc.c 2007-08-08 04:39:26.000000000 -0700
@@ -1306,6 +1306,17 @@ nofail_alloc:
zonelist, ALLOC_NO_WATERMARKS);
if (page)
goto got_pg;
+
+ /*
+ * We cannot go into full synchrononous reclaim
+ * but we can still scan for easily reclaimable
+ * pages.
+ */
+ if (p->flags & PF_MEMALLOC &&
+ emergency_free_pages(zonelist->zones, order,
+ gfp_mask))
+ goto nofail_alloc;
+
if (gfp_mask & __GFP_NOFAIL) {
congestion_wait(WRITE, HZ/50);
goto nofail_alloc;
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c 2007-08-08 04:21:14.000000000 -0700
+++ linux-2.6/mm/vmscan.c 2007-08-08 04:42:24.000000000 -0700
@@ -1204,6 +1204,33 @@ out:
}
/*
+ * Emergency reclaim. We are alreedy in the vm write out path
+ * and we have exhausted all memory. We have to free memory without
+ * any additional allocations. So no writes and no swap. Get
+ * as bare bones as we can.
+ */
+unsigned long emergency_free_pages(struct zone **zones, int order, gfp_t gfp_mask)
+{
+ int priority;
+ unsigned long nr_reclaimed = 0;
+ struct scan_control sc = {
+ .gfp_mask = gfp_mask,
+ .swap_cluster_max = SWAP_CLUSTER_MAX,
+ .order = order,
+ };
+
+ for (priority = DEF_PRIORITY; priority >= 0; priority--) {
+ sc.nr_scanned = 0;
+ nr_reclaimed += shrink_zones(priority, zones, &sc);
+ if (nr_reclaimed >= sc.swap_cluster_max)
+ return 1;
+ }
+
+ /* top priority shrink_caches still had more to do? don't OOM, then */
+ return sc.all_unreclaimable;
+}
+
+/*
* For kswapd, balance_pgdat() will work across all this node's zones until
* they are all at pages_high.
*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-08-08 17:13 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-08-06 10:29 [PATCH 00/10] foundations for reserve-based allocation Peter Zijlstra
2007-08-06 10:29 ` [PATCH 01/10] mm: gfp_to_alloc_flags() Peter Zijlstra
2007-08-06 10:29 ` [PATCH 02/10] mm: system wide ALLOC_NO_WATERMARK Peter Zijlstra
2007-08-06 18:11 ` Christoph Lameter
2007-08-06 18:21 ` Daniel Phillips
2007-08-06 18:31 ` Peter Zijlstra
2007-08-06 18:43 ` Daniel Phillips
2007-08-06 19:11 ` Christoph Lameter
2007-08-06 19:31 ` Peter Zijlstra
2007-08-06 20:12 ` Christoph Lameter
2007-08-06 18:42 ` Christoph Lameter
2007-08-06 18:48 ` Daniel Phillips
2007-08-06 18:51 ` Christoph Lameter
2007-08-06 19:15 ` Daniel Phillips
2007-08-06 20:12 ` Matt Mackall
2007-08-06 20:19 ` Christoph Lameter
2007-08-06 20:26 ` Peter Zijlstra
2007-08-06 21:05 ` Christoph Lameter
2007-08-06 22:59 ` Daniel Phillips
2007-08-06 23:14 ` Christoph Lameter
2007-08-06 23:49 ` Daniel Phillips
2007-08-07 22:18 ` Christoph Lameter
2007-08-08 7:24 ` Peter Zijlstra
2007-08-08 18:06 ` Christoph Lameter
2007-08-08 7:37 ` Daniel Phillips
2007-08-08 18:09 ` Christoph Lameter
2007-08-09 18:41 ` Daniel Phillips
2007-08-09 18:49 ` Christoph Lameter
2007-08-10 0:17 ` Daniel Phillips
2007-08-10 1:48 ` Christoph Lameter
2007-08-10 3:34 ` Daniel Phillips
2007-08-10 3:48 ` Christoph Lameter
2007-08-10 8:15 ` Daniel Phillips
2007-08-10 17:46 ` Christoph Lameter
2007-08-10 23:25 ` Daniel Phillips
2007-08-13 6:55 ` Daniel Phillips
2007-08-13 23:04 ` Christoph Lameter
2007-08-06 20:27 ` Andrew Morton
2007-08-06 23:16 ` Daniel Phillips
2007-08-06 22:47 ` Daniel Phillips
2007-08-06 10:29 ` [PATCH 03/10] mm: tag reseve pages Peter Zijlstra
2007-08-06 18:11 ` Christoph Lameter
2007-08-06 18:13 ` Daniel Phillips
2007-08-06 18:28 ` Peter Zijlstra
2007-08-06 19:34 ` Andi Kleen
2007-08-06 18:43 ` Christoph Lameter
2007-08-06 18:47 ` Peter Zijlstra
2007-08-06 18:59 ` Andi Kleen
2007-08-06 19:09 ` Christoph Lameter
2007-08-06 19:10 ` Andrew Morton
2007-08-06 19:16 ` Christoph Lameter
2007-08-06 19:38 ` Matt Mackall
2007-08-06 20:18 ` Andi Kleen
2007-08-06 10:29 ` [PATCH 04/10] mm: slub: add knowledge of reserve pages Peter Zijlstra
2007-08-08 0:13 ` Christoph Lameter
2007-08-08 1:44 ` Matt Mackall
2007-08-08 17:13 ` Christoph Lameter [this message]
2007-08-08 17:39 ` Andrew Morton
2007-08-08 17:57 ` Christoph Lameter
2007-08-08 18:46 ` Andrew Morton
2007-08-10 1:54 ` Daniel Phillips
2007-08-10 2:01 ` Christoph Lameter
2007-08-20 7:38 ` Peter Zijlstra
2007-08-20 7:43 ` Peter Zijlstra
2007-08-20 9:12 ` Pekka J Enberg
2007-08-20 9:17 ` Peter Zijlstra
2007-08-20 9:28 ` Pekka Enberg
2007-08-20 19:26 ` Christoph Lameter
2007-08-20 20:08 ` Peter Zijlstra
2007-08-06 10:29 ` [PATCH 05/10] mm: allow mempool to fall back to memalloc reserves Peter Zijlstra
2007-08-06 10:29 ` [PATCH 06/10] mm: kmem_estimate_pages() Peter Zijlstra
2007-08-06 10:29 ` [PATCH 07/10] mm: allow PF_MEMALLOC from softirq context Peter Zijlstra
2007-08-06 10:29 ` [PATCH 08/10] mm: serialize access to min_free_kbytes Peter Zijlstra
2007-08-06 10:29 ` [PATCH 09/10] mm: emergency pool Peter Zijlstra
2007-08-06 10:29 ` [PATCH 10/10] mm: __GFP_MEMALLOC Peter Zijlstra
2007-08-06 17:35 ` [PATCH 00/10] foundations for reserve-based allocation Daniel Phillips
2007-08-06 18:17 ` Peter Zijlstra
2007-08-06 18:40 ` Daniel Phillips
2007-08-06 19:31 ` Daniel Phillips
2007-08-06 19:36 ` Peter Zijlstra
2007-08-06 19:53 ` Daniel Phillips
2007-08-06 17:56 ` Christoph Lameter
2007-08-06 18:33 ` Peter Zijlstra
2007-08-06 20:23 ` Matt Mackall
2007-08-07 0:09 ` Daniel Phillips
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0708081004290.12652@schroedinger.engr.sgi.com \
--to=clameter@sgi.com \
--cc=Lee.Schermerhorn@hp.com \
--cc=SteveD@redhat.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=davem@davemloft.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mpm@selenic.com \
--cc=penberg@cs.helsinki.fi \
--cc=phillips@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox