From: Paul Mundt <lethal@linux-sh.org>
To: Nish Aravamudan <nish.aravamudan@gmail.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
Paul Jackson <pj@sgi.com>, Adam Litke <agl@us.ibm.com>,
linux-mm@kvack.org, mel@skynet.ie, apw@shadowen.org,
wli@holomorphy.com, clameter@sgi.com, kenchen@google.com
Subject: Re: [PATCH 5/5] [hugetlb] Try to grow pool for MAP_SHARED mappings
Date: Sun, 22 Jul 2007 01:57:55 +0900 [thread overview]
Message-ID: <20070721165755.GB4043@linux-sh.org> (raw)
In-Reply-To: <29495f1d0707201335u5fbc9565o2a53a18e45d8b28@mail.gmail.com>
On Fri, Jul 20, 2007 at 01:35:52PM -0700, Nish Aravamudan wrote:
> On 7/18/07, Paul Mundt <lethal@linux-sh.org> wrote:
> >On Wed, Jul 18, 2007 at 12:02:03PM -0400, Lee Schermerhorn wrote:
> >> On Wed, 2007-07-18 at 08:17 -0700, Nish Aravamudan wrote:
> >> > On 7/18/07, Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
> >> > > I have always considered the huge page pool, as populated by
> >> > > alloc_fresh_huge_page() in response to changes in nr_hugepages, to
> >be a
> >> > > system global resource. I think the system "does the right
> >> > > thing"--well, almost--with Christoph's memoryless patches and your
> >> > > hugetlb patches. Certaintly, the huge pages allocated at boot time,
> >> > > based on the command line parameter, are system-wide. cpusets have
> >not
> >> > > been set up at that time.
> >> >
> >> > I fully agree that hugepages are a global resource.
> >> >
> >> > > It requires privilege to write to the nr_hugepages sysctl, so
> >allowing
> >> > > it to spread pages across all available nodes [with memory],
> >regardless
> >> > > of cpusets, makes sense to me. Altho' I don't expect many folks are
> >> > > currently changing nr_hugepages from within a constrained cpuset, I
> >> > > wouldn't want to see us change existing behavior, in this respect.
> >Your
> >> > > per node attributes will provide the mechanism to allocate different
> >> > > numbers of hugepages for, e.g., nodes in cpusets that have
> >applications
> >> > > that need them.
> >> >
> >> > The issue is that with Adam's patches, the hugepage pool will grow on
> >> > demand, presuming the process owner's mlock limit is sufficiently
> >> > high. If said process were running within a constrained cpuset, it
> >> > seems slightly out-of-whack to allow it grow the pool on other nodes
> >> > to satisfy the demand.
> >>
> >> Ah, I see. In that case, it might make sense to grow just for the
> >> cpuset. A couple of things come to mind tho':
> >>
> >> 1) we might want a per cpuset control to enable/disable hugetlb pool
> >> growth on demand, or to limit the max size of the pool--especially if
> >> the memories are not exclusively owned by the cpuset. Otherwise,
> >> non-privileged processes could grow the hugetlb pool in memories shared
> >> with other cpusets [maybe the root cpuset?], thereby reducing the amount
> >> of normal, managed pages available to the other cpusets. Probably want
> >> such a control in the absense of cpusets as well, if on-demand hugetlb
> >> pool growth is implemented.
> >>
> >I don't see that the two are mutually exclusive. Hugetlb pools have to be
> >node-local anyways due to the varying distances, so perhaps the global
> >resource thing is the wrong way to approach it. There are already hooks
> >for spreading slab and page cache pages in cpusets, perhaps it makes
> >sense to add a hugepage spread variant to balance across the constrained
> >set?
>
> I'm not sure I understand why you say "hugetlb pools"? There is no
> plural in the kernel, there is only the global pool. Now, on NUMA
> machines, yes, the pool is spread across nodes, but, well, that's just
> because of where the memory is. We already spread out the allocation
> of hugepages across all NUMA nodes (or will, once my patches go in).
> And I think with my earlier suggestion (of just changing the
> interleave mask used for those allocations to be cpuset-aware), that
> we'd spread across the cpuset too, if there is one. Is that what you
> mean by "spread variant"?
>
Yes, that's what I was referring to. The main thing is that there may
simply be nodes where we don't want to spread the huge pages (mostly due
to size constraints). For instance, nodes that don't make it in to
the interleave map are a reasonable candidate for also never spreading
hugepage pages to.
> >It would be quite nice to have some way to have nodes opt-in to the sort
> >of behaviour they're willing to tolerate. Some nodes are never going to
> >tolerate spreading of any sort, hugepages, and so forth. Perhaps it makes
> >more sense to have some flags in the pgdat where we can more strongly
> >type the sort of behaviour the node is willing to put up with (or capable
> >of supporting), at least in this case the nodes that explicitly can't
> >cope are factored out before we even get to cpuset constraints (plus this
> >gives us a hook for setting up the interleave nodes in both the system
> >init and default policies). Thoughts?
>
> I guess I don't understand which nodes you're talking about now? How
> do you spread across any particular single node (how I read "Some
> nodes are never going to tolerate spreading of any sort")? Or do you
> mean that some cpusets aren't going to want to spread (interleave?).
>
> Oh, are you trying to say that some nodes should be dropped from
> interleave masks (explicitly excluded from all possible interleave
> masks)? What kind of nodes would these be? We're doing something
> similar to deal with memoryless nodes, perhaps it could be
> generalized?
>
Correct. You can see some the changes in mm/mempolicy,c:numa_policy_init()
for keeping nodes out of the system init policy. While we want to be able
to let the kernel manage the node and let applications do node-local
allocation, this nodes will never want slab pages or anything like that
due to the size constraints.
Christoph had posted some earlier slub patches for excluding certain
nodes from slub entirely, this may also be something you want to pick up
and work on for memoryless nodes. I've been opting for SLOB + NUMA on my
platforms, but if something like this is tidied up generically then slub
is certainly something to support as an alternative.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-07-21 16:57 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-07-13 15:16 [PATCH 0/5] [RFC] Dynamic hugetlb pool resizing Adam Litke
2007-07-13 15:16 ` [PATCH 1/5] [hugetlb] Introduce BASE_PAGES_PER_HPAGE constant Adam Litke
2007-07-23 19:43 ` Christoph Lameter
2007-07-23 19:52 ` Adam Litke
2007-07-13 15:16 ` [PATCH 2/5] [hugetlb] Account for hugepages as locked_vm Adam Litke
2007-07-13 15:16 ` [PATCH 3/5] [hugetlb] Move update_and_free_page so it can be used by alloc functions Adam Litke
2007-07-13 15:17 ` [PATCH 4/5] [hugetlb] Try to grow pool on alloc_huge_page failure Adam Litke
2007-07-13 15:17 ` [PATCH 5/5] [hugetlb] Try to grow pool for MAP_SHARED mappings Adam Litke
2007-07-13 20:05 ` Paul Jackson
2007-07-13 21:05 ` Adam Litke
2007-07-13 21:24 ` Ken Chen
2007-07-13 21:29 ` Christoph Lameter
2007-07-13 21:38 ` Ken Chen
2007-07-13 21:47 ` Christoph Lameter
2007-07-13 22:21 ` Paul Jackson
2007-07-13 21:38 ` Paul Jackson
2007-07-17 23:42 ` Nish Aravamudan
2007-07-18 14:44 ` Lee Schermerhorn
2007-07-18 15:17 ` Nish Aravamudan
2007-07-18 16:02 ` Lee Schermerhorn
2007-07-18 21:16 ` Nish Aravamudan
2007-07-18 21:40 ` Lee Schermerhorn
2007-07-19 1:52 ` Paul Mundt
2007-07-20 20:35 ` Nish Aravamudan
2007-07-20 20:53 ` Lee Schermerhorn
2007-07-20 21:12 ` Nish Aravamudan
2007-07-21 16:57 ` Paul Mundt [this message]
2007-07-13 23:15 ` Nish Aravamudan
2007-07-13 21:09 ` Ken Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070721165755.GB4043@linux-sh.org \
--to=lethal@linux-sh.org \
--cc=Lee.Schermerhorn@hp.com \
--cc=agl@us.ibm.com \
--cc=apw@shadowen.org \
--cc=clameter@sgi.com \
--cc=kenchen@google.com \
--cc=linux-mm@kvack.org \
--cc=mel@skynet.ie \
--cc=nish.aravamudan@gmail.com \
--cc=pj@sgi.com \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox