From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Nish Aravamudan <nish.aravamudan@gmail.com>
Cc: Paul Jackson <pj@sgi.com>, Adam Litke <agl@us.ibm.com>,
linux-mm@kvack.org, mel@skynet.ie, apw@shadowen.org,
wli@holomorphy.com, clameter@sgi.com, kenchen@google.com,
Paul Mundt <lethal@linux-sh.org>
Subject: Re: [PATCH 5/5] [hugetlb] Try to grow pool for MAP_SHARED mappings
Date: Wed, 18 Jul 2007 12:02:03 -0400 [thread overview]
Message-ID: <1184774524.5899.49.camel@localhost> (raw)
In-Reply-To: <29495f1d0707180817n7a5709dcr78b641a02cb18057@mail.gmail.com>
On Wed, 2007-07-18 at 08:17 -0700, Nish Aravamudan wrote:
> On 7/18/07, Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
> > On Tue, 2007-07-17 at 16:42 -0700, Nish Aravamudan wrote:
> > > On 7/13/07, Paul Jackson <pj@sgi.com> wrote:
> > > > Adam wrote:
> > > > > To be honest, I just don't think a global hugetlb pool and cpusets are
> > > > > compatible, period.
> > > >
> > > > It's not an easy fit, that's for sure ;).
> > >
> > > In the context of my patches to make the hugetlb pool's interleave
> > > work with memoryless nodes, I may have pseudo-solution for growing the
> > > pool while respecting cpusets.
> > >
> > > Essentially, given that GFP_THISNODE allocations stay on the node
> > > requested (which is the case after Christoph's set of memoryless node
> > > patches go in), we invoke:
> > >
> > > pol = mpol_new(MPOL_INTERLEAVE, &node_states[N_MEMORY])
> > >
> > > in the two callers of alloc_fresh_huge_page(pol) in hugetlb.c.
> > > alloc_fresh_huge_page() in turn invokes interleave_nodes(pol) so that
> > > we request hugepages in an interleaved fashion over all nodes with
> > > memory.
> > >
> > > Now, what I'm wondering is why interleave_nodes() is not cpuset aware?
> > > Or is it expected that the caller do the right thing with the policy
> > > beforehand? If so, I think I could just make those two callers do
> > >
> > > pol = mpol_new(MPOL_INTERLEAVE, cpuset_mems_allowed(current))
> > >
> > > ?
> > >
> > > Or am I way off here?
> >
> >
> > Nish:
> >
> > I have always considered the huge page pool, as populated by
> > alloc_fresh_huge_page() in response to changes in nr_hugepages, to be a
> > system global resource. I think the system "does the right
> > thing"--well, almost--with Christoph's memoryless patches and your
> > hugetlb patches. Certaintly, the huge pages allocated at boot time,
> > based on the command line parameter, are system-wide. cpusets have not
> > been set up at that time.
>
> I fully agree that hugepages are a global resource.
>
> > It requires privilege to write to the nr_hugepages sysctl, so allowing
> > it to spread pages across all available nodes [with memory], regardless
> > of cpusets, makes sense to me. Altho' I don't expect many folks are
> > currently changing nr_hugepages from within a constrained cpuset, I
> > wouldn't want to see us change existing behavior, in this respect. Your
> > per node attributes will provide the mechanism to allocate different
> > numbers of hugepages for, e.g., nodes in cpusets that have applications
> > that need them.
>
> The issue is that with Adam's patches, the hugepage pool will grow on
> demand, presuming the process owner's mlock limit is sufficiently
> high. If said process were running within a constrained cpuset, it
> seems slightly out-of-whack to allow it grow the pool on other nodes
> to satisfy the demand.
Ah, I see. In that case, it might make sense to grow just for the
cpuset. A couple of things come to mind tho':
1) we might want a per cpuset control to enable/disable hugetlb pool
growth on demand, or to limit the max size of the pool--especially if
the memories are not exclusively owned by the cpuset. Otherwise,
non-privileged processes could grow the hugetlb pool in memories shared
with other cpusets [maybe the root cpuset?], thereby reducing the amount
of normal, managed pages available to the other cpusets. Probably want
such a control in the absense of cpusets as well, if on-demand hugetlb
pool growth is implemented.
2) per cpuset, on-demand hugetlb pool growth shouldn't affect the
behavior of the nr_hugepages sysctl--IMO, anyway.
3) managed "superpages" keeps sounding better and better ;-)
>
> > Re: the "well, almost": nr_hugepages is still "broken" for me on some
> > of my platforms where the interleaved, dma-only pseudo-node contains
> > sufficient memory to satisfy a hugepage request. I'll end up with a few
> > hugepages consuming most of the dma memory. Consuming the dma isn't the
> > issue--there should be enough remaining for any dma needs. I just want
> > more control over what gets placed on the interleaved pseudo-node by
> > default. I think that Paul Mundt [added to cc list] has similar
> > concerns about default policies on the sh platforms. I have some ideas,
> > but I'm waiting for the memoryless nodes and your patches to stabilize
> > in the mm tree.
>
> And well, we're already 'broken' as far as I can tell with cpusets and
> the hugepage pool. I'm just trying to decide if it's fixable as is, or
> if we need extra cleverness. A simple hack would be to just modify the
> interleave call with a callback that uses the appropriate mask if
> CPUSETS is on or off (I don't want to always use cpuset_mems_allowed()
> unconditionally, becuase it returns node_possible_map if !CPUSETS.
Maybe you want/need a cpuset_hugemems_allowed() that does "the right
thing" with and without cpusets?
>
> Thanks for the feedback. If folks are ok with the way things are, then
> so be it. I was just hoping Paul might have some thoughts on how best
> to avoid violating cpuset constraints with Adam's patches in the
> context of my patches.
I'm not trying to discourage you, here. I agree that cpusets, as useful
as I find them, do make things, uh, "interesting"--especially with
shared resources.
Lee
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-07-18 16:02 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-07-13 15:16 [PATCH 0/5] [RFC] Dynamic hugetlb pool resizing Adam Litke
2007-07-13 15:16 ` [PATCH 1/5] [hugetlb] Introduce BASE_PAGES_PER_HPAGE constant Adam Litke
2007-07-23 19:43 ` Christoph Lameter
2007-07-23 19:52 ` Adam Litke
2007-07-13 15:16 ` [PATCH 2/5] [hugetlb] Account for hugepages as locked_vm Adam Litke
2007-07-13 15:16 ` [PATCH 3/5] [hugetlb] Move update_and_free_page so it can be used by alloc functions Adam Litke
2007-07-13 15:17 ` [PATCH 4/5] [hugetlb] Try to grow pool on alloc_huge_page failure Adam Litke
2007-07-13 15:17 ` [PATCH 5/5] [hugetlb] Try to grow pool for MAP_SHARED mappings Adam Litke
2007-07-13 20:05 ` Paul Jackson
2007-07-13 21:05 ` Adam Litke
2007-07-13 21:24 ` Ken Chen
2007-07-13 21:29 ` Christoph Lameter
2007-07-13 21:38 ` Ken Chen
2007-07-13 21:47 ` Christoph Lameter
2007-07-13 22:21 ` Paul Jackson
2007-07-13 21:38 ` Paul Jackson
2007-07-17 23:42 ` Nish Aravamudan
2007-07-18 14:44 ` Lee Schermerhorn
2007-07-18 15:17 ` Nish Aravamudan
2007-07-18 16:02 ` Lee Schermerhorn [this message]
2007-07-18 21:16 ` Nish Aravamudan
2007-07-18 21:40 ` Lee Schermerhorn
2007-07-19 1:52 ` Paul Mundt
2007-07-20 20:35 ` Nish Aravamudan
2007-07-20 20:53 ` Lee Schermerhorn
2007-07-20 21:12 ` Nish Aravamudan
2007-07-21 16:57 ` Paul Mundt
2007-07-13 23:15 ` Nish Aravamudan
2007-07-13 21:09 ` Ken Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1184774524.5899.49.camel@localhost \
--to=lee.schermerhorn@hp.com \
--cc=agl@us.ibm.com \
--cc=apw@shadowen.org \
--cc=clameter@sgi.com \
--cc=kenchen@google.com \
--cc=lethal@linux-sh.org \
--cc=linux-mm@kvack.org \
--cc=mel@skynet.ie \
--cc=nish.aravamudan@gmail.com \
--cc=pj@sgi.com \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox