linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	ebmunson@us.ibm.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linuxppc-dev@ozlabs.org,
	libhugetlbfs-devel@lists.sourceforge.net, abh@cray.com
Subject: Re: [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks
Date: Mon, 11 Aug 2008 09:04:49 +0100	[thread overview]
Message-ID: <20080811080449.GB17452@csn.ul.ie> (raw)
In-Reply-To: <1218130190.10907.188.camel@nimitz>

On (07/08/08 10:29), Dave Hansen didst pronounce:
> On Thu, 2008-08-07 at 17:06 +0100, Mel Gorman wrote:
> > On (06/08/08 12:50), Dave Hansen didst pronounce:
> > > The main thing this set of patches does that I care about is take an
> > > anonymous VMA and replace it with a hugetlb VMA.  It does this on a
> > > special cue, but does it nonetheless.
> > 
> > This is not actually a new thing. For a long time now, it has been possible to
> > back  malloc() with hugepages at a userspace level using the morecore glibc
> > hook. That is replacing anonymous memory with a file-backed VMA. It happens
> > in a different place but it's just as deliberate as backing stack and the
> > end result is very similar. As the file is ram-based, it doesn't have the
> > same types of consequences like dirty page syncing that you'd ordinarily
> > watch for when moving from anonymous to file-backed memory.
> 
> Yes, it has already been done in userspace.  That's fine.  It isn't
> adding any complexity to the kernel.  This is adding behavior that the
> kernel has to support as well as complexity.
> 

The complexity is minimal and the progression logical.
hugetlb_file_setup() is the API shmem was using to create a file on an
internal mount suitable for MAP_SHARED. This patchset adds support for
MAP_PRIVATE and the additional complexity is a lot less than supporting
direct pagetable inserts.

> > > This patch has crossed a line in that it is really the first
> > > *replacement* of a normal VMA with a hugetlb VMA instead of the creation
> > > of the VMAs at the user's request. 
> > 
> > We crossed that line with morecore, it's back there somewhere. We're just
> > doing in kernel this time because backing stacks with hugepages in userspace
> > turned out to be a hairy endevour.
> > 
> > Properly supporting anonymous hugepages would either require larger
> > changes to the core or reimplementing yet more of mm/ in mm/hugetlb.c.
> > Neither is a particularly appealing approach, nor is it likely to be a
> > very popular one.
> 
> I agree.  It is always much harder to write code that can work
> generically??? (and get it accepted) than just write the smallest possible
> hack and stick it in fs/exec.c.
> 
> Could this patch at least get fixed up to look like it could be used
> more generically?  Some code to look up and replace anonymous VMAs with
> hugetlb-backed ones???

Ok, this latter point can be looked into at least although the
underlying principal may still be using hugetlb_file_setup() rather than
direct pagetable insertions.

> > > Because of the limitations like its inability to grow the VMA, I can't
> > > imagine that this would be a generic mechanism that we can use
> > > elsewhere.
> > 
> > What other than a stack even cares about VM_GROWSDOWN working? Besides,
> > VM_GROWSDOWN could be supported in a hugetlbfs file by mapping the end of
> > the file and moving the offset backwards (yeah ok, it ain't the prettiest
> > but it's less churn than introducing significantly different codepaths). It's
> > just not something that needs to be supported at first cut.
> > 
> > brk() if you wanted to back hugepages with it conceivably needs a resizing
> > VMA but in that case it's growing up in which case just extend the end of
> > the VMA and increase the size of the file.
> 
> I'm more worried about a small huge page size (say 64k) and not being
> able to merge the VMAs.  I guess it could start in the *middle* of a
> file and map both directions.
> 
> I guess you could always just have a single (very sparse) hugetlb file
> per mm to do all of this 'anonymous' hugetlb memory memory stuff, and
> just map its offsets 1:1 on to the process's virtual address space.
> That would make sure you could always merge VMAs, no matter how they
> grew together.
> 

That's an interesting idea. It isn't as straight-forward as it sounds
due to reservation tracking but at the face of it, I can't see why it
couldn't be made work.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-08-11  8:04 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-28 19:17 Eric Munson
2008-07-28 19:17 ` [PATCH 1/5 V2] Align stack boundaries based on personality Eric Munson
2008-07-28 20:09   ` Dave Hansen
2008-07-28 19:17 ` [PATCH 2/5 V2] Add shared and reservation control to hugetlb_file_setup Eric Munson
2008-07-28 19:17 ` [PATCH 3/5] Split boundary checking from body of do_munmap Eric Munson
2008-07-28 19:17 ` [PATCH 4/5 V2] Build hugetlb backed process stacks Eric Munson
2008-07-28 20:37   ` Dave Hansen
2008-07-28 19:17 ` [PATCH 5/5 V2] [PPC] Setup stack memory segment for hugetlb pages Eric Munson
2008-07-28 20:33 ` [RFC] [PATCH 0/5 V2] Huge page backed user-space stacks Dave Hansen
2008-07-28 21:23   ` Eric B Munson
2008-07-30  8:41 ` Andrew Morton
2008-07-30 15:04   ` Eric B Munson
2008-07-30 15:08   ` Eric B Munson
2008-07-30  8:43 ` Andrew Morton
2008-07-30 17:23   ` Mel Gorman
2008-07-30 17:34     ` Andrew Morton
2008-07-30 19:30       ` Mel Gorman
2008-07-30 19:40         ` Christoph Lameter
2008-07-30 20:07         ` Andrew Morton
2008-07-31 10:31           ` Mel Gorman
2008-08-04 21:10             ` Dave Hansen
2008-08-05 11:11               ` Mel Gorman
2008-08-05 16:12                 ` Dave Hansen
2008-08-05 16:28                   ` Mel Gorman
2008-08-05 17:53                     ` Dave Hansen
2008-08-06  9:02                       ` Mel Gorman
2008-08-06 19:50                         ` Dave Hansen
2008-08-07 16:06                           ` Mel Gorman
2008-08-07 17:29                             ` Dave Hansen
2008-08-11  8:04                               ` Mel Gorman [this message]
2008-07-31  6:04       ` Nick Piggin
2008-07-31  6:14         ` Andrew Morton
2008-07-31  6:26           ` Nick Piggin
2008-07-31 11:27             ` Mel Gorman
2008-07-31 11:51               ` Nick Piggin
2008-07-31 13:50                 ` Mel Gorman
2008-07-31 14:32                   ` Michael Ellerman
2008-08-06 18:49       ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080811080449.GB17452@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=abh@cray.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave@linux.vnet.ibm.com \
    --cc=ebmunson@us.ibm.com \
    --cc=libhugetlbfs-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox