linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, linux-numa@vger.kernel.org,
	Lee Schermerhorn <lee.schermerhorn@hp.com>,
	Mel Gorman <mel@csn.ul.ie>,
	Randy Dunlap <randy.dunlap@oracle.com>,
	Nishanth Aravamudan <nacc@us.ibm.com>,
	Andi Kleen <andi@firstfloor.org>, Adam Litke <agl@us.ibm.com>,
	Andy Whitcroft <apw@canonical.com>,
	eric.whitney@hp.com,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: [patch] mm: add gfp flags for NODEMASK_ALLOC slab allocations
Date: Thu, 8 Oct 2009 14:22:21 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.1.00.0910081422100.676@chino.kir.corp.google.com> (raw)
In-Reply-To: <20091008162527.23192.68825.sendpatchset@localhost.localdomain>

Objects passed to NODEMASK_ALLOC() are relatively small in size and are
backed by slab caches that are not of large order, traditionally never
greater than PAGE_ALLOC_COSTLY_ORDER.

Thus, using GFP_KERNEL for these allocations on large machines when
CONFIG_NODES_SHIFT > 8 will cause the page allocator to loop endlessly in
the allocation attempt, each time invoking both direct reclaim or the oom
killer.

This is of particular interest when using NODEMASK_ALLOC() from a
mempolicy context (either directly in mm/mempolicy.c or the mempolicy
constrained hugetlb allocations) since the oom killer always kills
current when allocations are constrained by mempolicies.  So for all
present use cases in the kernel, current would end up being oom killed
when direct reclaim fails.  That would allow the NODEMASK_ALLOC() to
succeed but current would have sacrificed itself upon returning.

This patch adds gfp flags to NODEMASK_ALLOC() to pass to kmalloc() on
CONFIG_NODES_SHIFT > 8; this parameter is a nop on other configurations.
All current use cases either directly from hugetlb code or indirectly via
NODEMASK_SCRATCH() union __GFP_NORETRY to avoid direct reclaim and the
oom killer when the slab allocator needs to allocate additional pages.

The side-effect of this change is that all current use cases of either
NODEMASK_ALLOC() or NODEMASK_SCRATCH() need appropriate -ENOMEM handling
when the allocation fails (never for CONFIG_NODES_SHIFT <= 8).  All
current use cases were audited and do have appropriate error handling at
this time.

Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: David Rientjes <rientjes@google.com>
---
 Andrew, this was written on mmotm-09251435 plus Lee's entire patchset.

 include/linux/nodemask.h |   21 ++++++++++++---------
 mm/hugetlb.c             |    5 +++--
 2 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -485,15 +485,17 @@ static inline int num_node_state(enum node_states state)
 #define for_each_online_node(node) for_each_node_state(node, N_ONLINE)
 
 /*
- * For nodemask scrach area.(See CPUMASK_ALLOC() in cpumask.h)
- * NODEMASK_ALLOC(x, m) allocates an object of type 'x' with the name 'm'.
+ * For nodemask scrach area.
+ * NODEMASK_ALLOC(type, name) allocates an object with a specified type and
+ * name.
  */
-#if NODES_SHIFT > 8 /* nodemask_t > 64 bytes */
-#define NODEMASK_ALLOC(x, m)		x *m = kmalloc(sizeof(*m), GFP_KERNEL)
-#define NODEMASK_FREE(m)		kfree(m)
+#if NODES_SHIFT > 8 /* nodemask_t > 256 bytes */
+#define NODEMASK_ALLOC(type, name, gfp_flags)	\
+			type *name = kmalloc(sizeof(*name), gfp_flags)
+#define NODEMASK_FREE(m)			kfree(m)
 #else
-#define NODEMASK_ALLOC(x, m)		x _m, *m = &_m
-#define NODEMASK_FREE(m)		do {} while (0)
+#define NODEMASK_ALLOC(type, name, gfp_flags)	type _name, *name = &_name
+#define NODEMASK_FREE(m)			do {} while (0)
 #endif
 
 /* A example struture for using NODEMASK_ALLOC, used in mempolicy. */
@@ -502,8 +504,9 @@ struct nodemask_scratch {
 	nodemask_t	mask2;
 };
 
-#define NODEMASK_SCRATCH(x)	\
-		NODEMASK_ALLOC(struct nodemask_scratch, x)
+#define NODEMASK_SCRATCH(x)						\
+			NODEMASK_ALLOC(struct nodemask_scratch, x,	\
+					GFP_KERNEL | __GFP_NORETRY)
 #define NODEMASK_SCRATCH_FREE(x)	NODEMASK_FREE(x)
 
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1361,7 +1361,7 @@ static ssize_t nr_hugepages_store_common(bool obey_mempolicy,
 	int nid;
 	unsigned long count;
 	struct hstate *h;
-	NODEMASK_ALLOC(nodemask_t, nodes_allowed);
+	NODEMASK_ALLOC(nodemask_t, nodes_allowed, GFP_KERNEL | __GFP_NORETRY);
 
 	err = strict_strtoul(buf, 10, &count);
 	if (err)
@@ -1857,7 +1857,8 @@ static int hugetlb_sysctl_handler_common(bool obey_mempolicy,
 	proc_doulongvec_minmax(table, write, buffer, length, ppos);
 
 	if (write) {
-		NODEMASK_ALLOC(nodemask_t, nodes_allowed);
+		NODEMASK_ALLOC(nodemask_t, nodes_allowed,
+						GFP_KERNEL | __GFP_NORETRY);
 		if (!(obey_mempolicy &&
 			       init_nodemask_of_mempolicy(nodes_allowed))) {
 			NODEMASK_FREE(nodes_allowed);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-10-08 21:22 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-08 16:24 [PATCH 0/12] hugetlb: V10 numa control of persistent huge pages alloc/free Lee Schermerhorn
2009-10-08 16:25 ` [PATCH 1/12] nodemask: make NODEMASK_ALLOC more general Lee Schermerhorn
2009-10-08 20:17   ` David Rientjes
2009-10-08 16:25 ` [PATCH 2/12] hugetlb: rework hstate_next_node_* functions Lee Schermerhorn
2009-10-08 16:25 ` [PATCH 3/12] hugetlb: add nodemask arg to huge page alloc, free and surplus adjust fcns Lee Schermerhorn
2009-10-08 20:32   ` David Rientjes
2009-10-08 16:25 ` [PATCH 4/12] hugetlb: factor init_nodemask_of_node Lee Schermerhorn
2009-10-08 20:20   ` David Rientjes
2009-10-08 16:25 ` [PATCH 5/12] hugetlb: derive huge pages nodes allowed from task mempolicy Lee Schermerhorn
2009-10-08 21:22   ` David Rientjes [this message]
2009-10-09  1:01     ` [patch] mm: add gfp flags for NODEMASK_ALLOC slab allocations KAMEZAWA Hiroyuki
2009-10-08 16:25 ` [PATCH 6/12] hugetlb: add generic definition of NUMA_NO_NODE Lee Schermerhorn
2009-10-08 20:16   ` Christoph Lameter
2009-10-08 20:26     ` David Rientjes
2009-10-27 21:44       ` [patch -mm] acpi: remove NID_INVAL David Rientjes
2009-10-28 14:53         ` Cyrill Gorcunov
2009-10-29 18:40         ` Christoph Lameter
2009-10-08 16:25 ` [PATCH 7/12] hugetlb: add per node hstate attributes Lee Schermerhorn
2009-10-08 20:42   ` David Rientjes
2009-10-09 12:57     ` Lee Schermerhorn
2009-10-09 22:10       ` David Rientjes
2009-10-09 13:49     ` Lee Schermerhorn
2009-10-09 22:18       ` David Rientjes
2009-10-12 15:41         ` Lee Schermerhorn
2009-10-13  2:09           ` David Rientjes
2009-10-08 16:25 ` [PATCH 8/12] hugetlb: update hugetlb documentation for NUMA controls Lee Schermerhorn
2009-10-08 16:25 ` [PATCH 9/12] hugetlb: use only nodes with memory for huge pages Lee Schermerhorn
2009-10-08 16:26 ` [PATCH 10/12] mm: clear node in N_HIGH_MEMORY and stop kswapd when all memory is offlined Lee Schermerhorn
2009-10-08 20:19   ` David Rientjes
2009-10-08 16:26 ` [PATCH 11/12] hugetlb: handle memory hot-plug events Lee Schermerhorn
2009-10-08 16:26 ` [PATCH 12/12] hugetlb: offload per node attribute registrations Lee Schermerhorn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.1.00.0910081422100.676@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=agl@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=apw@canonical.com \
    --cc=eric.whitney@hp.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=lee.schermerhorn@hp.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-numa@vger.kernel.org \
    --cc=mel@csn.ul.ie \
    --cc=nacc@us.ibm.com \
    --cc=randy.dunlap@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox