Re: [PATCH 5/5] [hugetlb] Try to grow pool for MAP_SHARED mappings

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Nish Aravamudan <nish.aravamudan@gmail.com>
Cc: Paul Jackson <pj@sgi.com>, Adam Litke <agl@us.ibm.com>,
	linux-mm@kvack.org, mel@skynet.ie, apw@shadowen.org,
	wli@holomorphy.com, clameter@sgi.com, kenchen@google.com,
	Paul Mundt <lethal@linux-sh.org>
Subject: Re: [PATCH 5/5] [hugetlb] Try to grow pool for MAP_SHARED mappings
Date: Wed, 18 Jul 2007 17:40:07 -0400	[thread overview]
Message-ID: <1184794808.5899.105.camel@localhost> (raw)
In-Reply-To: <29495f1d0707181416g182ef877sfbf75d2a20c48e3b@mail.gmail.com>

On Wed, 2007-07-18 at 14:16 -0700, Nish Aravamudan wrote:
> On 7/18/07, Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
> > On Wed, 2007-07-18 at 08:17 -0700, Nish Aravamudan wrote:
> > > On 7/18/07, Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
> > > > On Tue, 2007-07-17 at 16:42 -0700, Nish Aravamudan wrote:
> > > > > On 7/13/07, Paul Jackson <pj@sgi.com> wrote:
> > > > > > Adam wrote:
> > > > > > > To be honest, I just don't think a global hugetlb pool and cpusets are
> > > > > > > compatible, period.
> > > > > >
> > > > > > It's not an easy fit, that's for sure ;).
> > > > >
> > > > > In the context of my patches to make the hugetlb pool's interleave
> > > > > work with memoryless nodes, I may have pseudo-solution for growing the
> > > > > pool while respecting cpusets.
> > > > >
> > > > > Essentially, given that GFP_THISNODE allocations stay on the node
> > > > > requested (which is the case after Christoph's set of memoryless node
> > > > > patches go in), we invoke:
> > > > >
> > > > >   pol = mpol_new(MPOL_INTERLEAVE, &node_states[N_MEMORY])
> > > > >
> > > > > in the two callers of alloc_fresh_huge_page(pol) in hugetlb.c.
> > > > > alloc_fresh_huge_page() in turn invokes interleave_nodes(pol) so that
> > > > > we request hugepages in an interleaved fashion over all nodes with
> > > > > memory.
> > > > >
> > > > > Now, what I'm wondering is why interleave_nodes() is not cpuset aware?
> > > > > Or is it expected that the caller do the right thing with the policy
> > > > > beforehand? If so, I think I could just make those two callers do
> > > > >
> > > > >   pol = mpol_new(MPOL_INTERLEAVE, cpuset_mems_allowed(current))
> > > > >
> > > > > ?
> > > > >
> > > > > Or am I way off here?
> > > >
> > > >
> > > > Nish:
> > > >
> > > > I have always considered the huge page pool, as populated by
> > > > alloc_fresh_huge_page() in response to changes in nr_hugepages, to be a
> > > > system global resource.  I think the system "does the right
> > > > thing"--well, almost--with Christoph's memoryless patches and your
> > > > hugetlb patches.  Certaintly, the huge pages allocated at boot time,
> > > > based on the command line parameter, are system-wide.  cpusets have not
> > > > been set up at that time.
> > >
> > > I fully agree that hugepages are a global resource.
> > >
> > > > It requires privilege to write to the nr_hugepages sysctl, so allowing
> > > > it to spread pages across all available nodes [with memory], regardless
> > > > of cpusets, makes sense to me.  Altho' I don't expect many folks are
> > > > currently changing nr_hugepages from within a constrained cpuset, I
> > > > wouldn't want to see us change existing behavior, in this respect.  Your
> > > > per node attributes will provide the mechanism to allocate different
> > > > numbers of hugepages for, e.g., nodes in cpusets that have applications
> > > > that need them.
> > >
> > > The issue is that with Adam's patches, the hugepage pool will grow on
> > > demand, presuming the process owner's mlock limit is sufficiently
> > > high. If said process were running within a constrained cpuset, it
> > > seems slightly out-of-whack to allow it grow the pool on other nodes
> > > to satisfy the demand.
> >
> > Ah, I see.  In that case, it might make sense to grow just for the
> > cpuset.  A couple of things come to mind tho':
> >
> > 1) we might want a per cpuset control to enable/disable hugetlb pool
> > growth on demand, or to limit the max size of the pool--especially if
> > the memories are not exclusively owned by the cpuset.  Otherwise,
> > non-privileged processes could grow the hugetlb pool in memories shared
> > with other cpusets [maybe the root cpuset?], thereby reducing the amount
> > of normal, managed pages available to the other cpusets.  Probably want
> > such a control in the absense of cpusets as well, if on-demand hugetlb
> > pool growth is implemented.
> 
> Well, the current restriction is on a per-process basis for locked
> memory. But it might make sense to add a separate rlimit for hugepages
> and then just allow cpusets to restrict that rlimit for processes
> contained therein?
> 
> Similar would probably hold for the non-cpuset case?
> 
> But that seems like special casing for hugetlb pages where small pages
> don't have the same restriction. If two cpusets share the same node,
> can't one exhaust the node and thus starve the other cpuset? At that
> point you need more than cpusets (arguably) and want resource
> management at some level.
> 

The difference I see is that "small pages" are "managed"--i.e., can be
reclaimed if not locked.  And you've already pointed out that we have a
resource limit on locking regular/small pages.  Huge pages are not
managed [unless Adam plans on tackling that as well!], so they are
effectively locked.  I guess that by a limiting the number of pages any
process could attach with another resource limit, we would limit the
growth of the huge page pool.  However, multiple processes in a cpuset
could attach different huge pages, thus growing the pool at the expense
of other cpusets.  No different from locked pages, huh?

Maybe just a system wide limit on the maximum size of the huge page
pool--i.e., on how large it can grow dynamically--is sufficient.

<snip remainder of discussion>

Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2007-07-18 21:40 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-13 15:16 [PATCH 0/5] [RFC] Dynamic hugetlb pool resizing Adam Litke
2007-07-13 15:16 ` [PATCH 1/5] [hugetlb] Introduce BASE_PAGES_PER_HPAGE constant Adam Litke
2007-07-23 19:43   ` Christoph Lameter
2007-07-23 19:52     ` Adam Litke
2007-07-13 15:16 ` [PATCH 2/5] [hugetlb] Account for hugepages as locked_vm Adam Litke
2007-07-13 15:16 ` [PATCH 3/5] [hugetlb] Move update_and_free_page so it can be used by alloc functions Adam Litke
2007-07-13 15:17 ` [PATCH 4/5] [hugetlb] Try to grow pool on alloc_huge_page failure Adam Litke
2007-07-13 15:17 ` [PATCH 5/5] [hugetlb] Try to grow pool for MAP_SHARED mappings Adam Litke
2007-07-13 20:05   ` Paul Jackson
2007-07-13 21:05     ` Adam Litke
2007-07-13 21:24       ` Ken Chen
2007-07-13 21:29       ` Christoph Lameter
2007-07-13 21:38         ` Ken Chen
2007-07-13 21:47           ` Christoph Lameter
2007-07-13 22:21           ` Paul Jackson
2007-07-13 21:38       ` Paul Jackson
2007-07-17 23:42         ` Nish Aravamudan
2007-07-18 14:44           ` Lee Schermerhorn
2007-07-18 15:17             ` Nish Aravamudan
2007-07-18 16:02               ` Lee Schermerhorn
2007-07-18 21:16                 ` Nish Aravamudan
2007-07-18 21:40                   ` Lee Schermerhorn [this message]
2007-07-19  1:52                 ` Paul Mundt
2007-07-20 20:35                   ` Nish Aravamudan
2007-07-20 20:53                     ` Lee Schermerhorn
2007-07-20 21:12                       ` Nish Aravamudan
2007-07-21 16:57                     ` Paul Mundt
2007-07-13 23:15       ` Nish Aravamudan
2007-07-13 21:09     ` Ken Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1184794808.5899.105.camel@localhost \
    --to=lee.schermerhorn@hp.com \
    --cc=agl@us.ibm.com \
    --cc=apw@shadowen.org \
    --cc=clameter@sgi.com \
    --cc=kenchen@google.com \
    --cc=lethal@linux-sh.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@skynet.ie \
    --cc=nish.aravamudan@gmail.com \
    --cc=pj@sgi.com \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox