linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@osdl.org>
To: Christoph Lameter <clameter@sgi.com>
Cc: linux-mm@kvack.org, Paul Jackson <pj@sgi.com>
Subject: Re: [PATCH] GFP_THISNODE for the slab allocator
Date: Thu, 14 Sep 2006 22:00:11 -0700	[thread overview]
Message-ID: <20060914220011.2be9100a.akpm@osdl.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0609131649110.20799@schroedinger.engr.sgi.com>

On Wed, 13 Sep 2006 16:50:41 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> This patch insures that the slab node lists in the NUMA case only contain
> slabs that belong to that specific node. All slab allocations use
> GFP_THISNODE when calling into the page allocator. If an allocation fails
> then we fall back in the slab allocator according to the zonelists
> appropriate for a certain context.
> 
> This allows a replication of the behavior of alloc_pages and alloc_pages
> node in the slab layer.
> 
> Currently allocations requested from the page allocator may be redirected
> via cpusets to other nodes. This results in remote pages on nodelists and
> that in turn results in interrupt latency issues during cache draining.
> Plus the slab is handing out memory as local when it is really remote.
> 
> Fallback for slab memory allocations will occur within the slab
> allocator and not in the page allocator. This is necessary in order
> to be able to use the existing pools of objects on the nodes that
> we fall back to before adding more pages to a slab.
> 
> The fallback function insures that the nodes we fall back to obey
> cpuset restrictions of the current context. We do not allocate
> objects from outside of the current cpuset context like before.
> 
> Note that the implementation of locality constraints within the slab
> allocator requires importing logic from the page allocator. This is a
> mischmash that is not that great. Other allocators (uncached allocator,
> vmalloc, huge pages) face similar problems and have similar minimal
> reimplementations of the basic fallback logic of the page allocator.
> There is another way of implementing a slab by avoiding per node lists
> (see modular slab) but this wont work within the existing slab.
> 
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> 
> Index: linux-2.6.18-rc6-mm2/mm/slab.c
> ===================================================================
> --- linux-2.6.18-rc6-mm2.orig/mm/slab.c	2006-09-13 18:04:57.000000000 -0500
> +++ linux-2.6.18-rc6-mm2/mm/slab.c	2006-09-13 18:20:41.356901622 -0500
> @@ -1566,6 +1566,14 @@ static void *kmem_getpages(struct kmem_c
>  	 */
>  	flags |= __GFP_COMP;
>  #endif
> +#ifdef CONFIG_NUMA
> +	/*
> +	 * Under NUMA we want memory on the indicated node. We will handle
> +	 * the needed fallback ourselves since we want to serve from our
> +	 * per node object lists first for other nodes.
> +	 */
> +	flags |= GFP_THISNODE;
> +#endif

hm.  GFP_THISNODE is dangerous.  For example, its use in
kernel/profile.c:create_hash_tables() has gone and caused non-NUMA machines
to use __GFP_NOWARN | __GFP_NORETRY in this situation.

OK, that's relatively harmless here, but why on earth did non-NUMA
machines want to make this change?

Would it not be saner to do away with the dangerous GFP_THISNODE and then
open-code __GFP_THIS_NODE in those places which want that behaviour?

And to then make non-NUMA __GFP_THISNODE equal literal zero, so we can
remove the above ifdefs?

>  	flags |= cachep->gfpflags;
>  
>  	page = alloc_pages_node(nodeid, flags, cachep->gfporder);
> @@ -3085,6 +3093,15 @@ static __always_inline void *__cache_all
>  
>  	objp = ____cache_alloc(cachep, flags);
>  out:
> +
> +#ifdef CONFIG_NUMA
> +	/*
> +	 * We may just have run out of memory on the local know.
> +	 * __cache_alloc_node knows how to locate memory on other nodes
> +	 */
> + 	if (!objp)
> + 		objp = __cache_alloc_node(cachep, flags, numa_node_id());
> +#endif

What happened to my `#define NUMA_BUILD 0 or 1' proposal?  If we had that,
the above could be

	if (NUMA_BUILD && !objp)
		objp = ...


>  /*
> + * Fallback function if there was no memory available and no objects on a
> + * certain node and we are allowed to fall back. We mimick the behavior of
> + * the page allocator. We fall back according to a zonelist determined by
> + * the policy layer while obeying cpuset constraints.
> + */
> +void *fallback_alloc(struct kmem_cache *cache, gfp_t flags)
> +{
> +	struct zonelist *zonelist = &NODE_DATA(slab_node(current->mempolicy))
> +					->node_zonelists[gfp_zone(flags)];
> +	struct zone **z;
> +	void *obj = NULL;
> +
> +	for (z = zonelist->zones; *z && !obj; z++)
> +		if (zone_idx(*z) <= ZONE_NORMAL &&
> +				cpuset_zone_allowed(*z, flags))
> +			obj = __cache_alloc_node(cache,
> +					flags | __GFP_THISNODE,
> +					zone_to_nid(*z));
> +	return obj;
> +}

hm, there's cpuset_zone_allowed() again.

I have a feeling that we need to nuke that thing: take a 128-node machine,
create a cpuset which has 64 memnodes, consume all the memory in 60 of
them, do some heavy page allocation, then stick a thermometer into
get_page_from_freelist()?



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2006-09-15  5:00 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-09-13 23:50 Christoph Lameter
2006-09-15  5:00 ` Andrew Morton [this message]
2006-09-15  6:49   ` Paul Jackson
2006-09-15  7:23     ` Andrew Morton
2006-09-15  7:44       ` Paul Jackson
2006-09-15  8:06         ` Andrew Morton
2006-09-15 15:53           ` David Rientjes
2006-09-15 23:03           ` David Rientjes
2006-09-16  0:04             ` Paul Jackson
2006-09-16  1:36               ` Andrew Morton
2006-09-16  2:23                 ` Christoph Lameter
2006-09-16  4:34                   ` Andrew Morton
2006-09-16  3:28                 ` [PATCH] Add node to zone for the NUMA case Christoph Lameter
2006-09-16  3:40                   ` Paul Jackson
2006-09-16  3:45                 ` [PATCH] GFP_THISNODE for the slab allocator Paul Jackson
2006-09-16  2:47             ` Christoph Lameter
2006-09-17  3:45             ` David Rientjes
2006-09-17 11:17               ` Paul Jackson
2006-09-17 12:41                 ` Christoph Lameter
2006-09-17 13:03                   ` Paul Jackson
2006-09-17 20:36                     ` David Rientjes
2006-09-17 21:20                       ` Paul Jackson
2006-09-17 22:27                       ` Paul Jackson
2006-09-17 23:49                         ` David Rientjes
2006-09-18  2:20                           ` Paul Jackson
2006-09-18 16:34                             ` Paul Jackson
2006-09-18 17:49                               ` David Rientjes
2006-09-18 20:46                                 ` Paul Jackson
2006-09-19 20:52                               ` David Rientjes
2006-09-19 21:26                                 ` Christoph Lameter
2006-09-19 21:50                                   ` David Rientjes
2006-09-21 22:11                                 ` David Rientjes
2006-09-22 10:10                                   ` Nick Piggin
2006-09-22 16:26                                   ` Paul Jackson
2006-09-22 16:36                                     ` Christoph Lameter
2006-09-15  8:28       ` Andrew Morton
2006-09-16  3:38         ` Paul Jackson
2006-09-16  4:42           ` Andi Kleen
2006-09-16 11:38             ` Paul Jackson
2006-09-16  4:48           ` Andrew Morton
2006-09-16 11:30             ` Paul Jackson
2006-09-16 15:18               ` Andrew Morton
2006-09-17  9:28                 ` Paul Jackson
2006-09-17  9:51                   ` Nick Piggin
2006-09-17 11:15                     ` Paul Jackson
2006-09-17 12:44                       ` Nick Piggin
2006-09-17 13:19                         ` Paul Jackson
2006-09-17 13:52                           ` Nick Piggin
2006-09-17 21:19                             ` Paul Jackson
2006-09-18 12:44                             ` [PATCH] mm: exempt pcp alloc from watermarks Peter Zijlstra
2006-09-18 20:20                               ` Christoph Lameter
2006-09-18 20:43                                 ` Peter Zijlstra
2006-09-19 14:35                               ` Nick Piggin
2006-09-19 14:44                                 ` Christoph Lameter
2006-09-19 15:02                                   ` Nick Piggin
2006-09-19 14:51                                 ` Peter Zijlstra
2006-09-19 15:10                                   ` Nick Piggin
2006-09-19 15:05                                     ` Peter Zijlstra
2006-09-19 15:39                                       ` Christoph Lameter
2006-09-17 16:29                   ` [PATCH] GFP_THISNODE for the slab allocator Andrew Morton
2006-09-18  2:11                     ` Paul Jackson
2006-09-18  5:09                       ` Andrew Morton
2006-09-18  7:49                         ` Paul Jackson
2006-09-16 11:48       ` Paul Jackson
2006-09-16 15:38         ` Andrew Morton
2006-09-16 21:51           ` Paul Jackson
2006-09-16 23:10             ` Andrew Morton
2006-09-17  4:37               ` Christoph Lameter
2006-09-17  4:55                 ` Andrew Morton
2006-09-17 12:09                   ` Paul Jackson
2006-09-17 12:36                   ` Christoph Lameter
2006-09-17 13:06                     ` Paul Jackson
2006-09-19 19:17                 ` David Rientjes
2006-09-19 19:19                   ` David Rientjes
2006-09-19 19:31                   ` Christoph Lameter
2006-09-19 21:12                     ` David Rientjes
2006-09-19 21:28                       ` Christoph Lameter
2006-09-19 21:53                         ` Paul Jackson
2006-09-15 17:08   ` Christoph Lameter
2006-09-15 17:37   ` [PATCH] Add NUMA_BUILD definition in kernel.h to avoid #ifdef CONFIG_NUMA Christoph Lameter
2006-09-15 17:38   ` [PATCH] Disable GFP_THISNODE in the non-NUMA case Christoph Lameter
2006-09-15 17:42   ` [PATCH] GFP_THISNODE for the slab allocator V2 Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060914220011.2be9100a.akpm@osdl.org \
    --to=akpm@osdl.org \
    --cc=clameter@sgi.com \
    --cc=linux-mm@kvack.org \
    --cc=pj@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox