From: Nick Piggin <npiggin@suse.de>
To: Nish Aravamudan <nish.aravamudan@gmail.com>
Cc: akpm@linux-foundation.org, Andi Kleen <ak@suse.de>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org, pj@sgi.com,
andi@firstfloor.org, kniht@linux.vnet.ibm.com,
Adam Litke <agl@us.ibm.com>
Subject: Re: [patch 00/17] multi size, and giant hugetlb page support, 1GB hugetlb for x86
Date: Fri, 11 Apr 2008 10:28:58 +0200 [thread overview]
Message-ID: <20080411082858.GB20253@wotan.suse.de> (raw)
In-Reply-To: <29495f1d0804101659r44f4a8c2wa1ec05a84e7876fe@mail.gmail.com>
On Thu, Apr 10, 2008 at 04:59:15PM -0700, Nish Aravamudan wrote:
> Hi Nick,
>
> On 4/10/08, npiggin@suse.de <npiggin@suse.de> wrote:
> > Hi,
> >
> > I'm taking care of Andi's hugetlb patchset now. I've taken a while to appear
> > to do anything with it because I have had other things to do and also needed
> > some time to get up to speed on it.
> >
> > Anyway, from my reviewing of the patchset, I didn't find a great deal
> > wrong with it in the technical aspects. Taking hstate out of the hugetlbfs
> > inode and vma is really the main thing I did.
>
> Have you tested with the libhugetlbfs test suite? We're gearing up for
> libhugetlbfs 1.3, so most of the test are uptodate and expected to run
> cleanly, even with giant hugetlb page support (Jon has been working
> diligently to test with his 16G page support for power). I'm planning
> on pushing the last bits out today for Adam to pick up before we start
> stabilizing for 1.3, so I'm hoping if you grab tomorrow's development
> snapshot from libhugetlbfs.ozlabs.org, things should run ok. Probably
> only with just 1G hugepages, though, we haven't yet taught
> libhugetlbfs about multiple hugepage size availability at run-time,
> but that shouldn't be hard.
Yeah, it should be easy to disable the 2MB default and just make it
look exactly the same but with 1G pages.
Thanks a lot for your suggestion, I'll pull the snapshot over the
weekend and try to make it pass on x86 and work with Jon to ensure it
is working with powerpc...
> > However on the less technical side, I think a few things could be improved,
> > eg. to do with the configuring and reporting, as well as the "administrative"
> > type of code. I tried to make improvements to things in the last patch of
> > the series. I will end up folding this properly into the rest of the patchset
> > where possible.
>
> I've got a few ideas here. Are we sure that
> /proc/sys/vm/nr_{,overcommit}_hugepages is the pool allocation
> interface we want going forward? I'm fairly sure we don't. I think
> we're best off moving to a sysfs-based allocator scheme, while keeping
> /proc/sys/vm/nr_{,overcommit}_hugepages around for the default
> hugepage size (which may be the only for many folks for now).
>
> I'm thinking something like:
>
> /sys/devices/system/[DIRNAME]/nr_hugepages ->
> nr_hugepages_{default_hugepagesize}
> /sys/devices/system/[DIRNAME]/nr_hugepages_default_hugepagesize
> /sys/devices/system/[DIRNAME]/nr_hugepages_other_hugepagesize1
> /sys/devices/system/[DIRNAME]/nr_hugepages_other_hugepagesize2
> /sys/devices/system/[DIRNAME]/nr_overcommit_hugepages ->
> nr_overcommit_hugepages_{default_hugepagesize}
> /sys/devices/system/[DIRNAME]/nr_overcommit_hugepages_default_hugepagesize
> /sys/devices/system/[DIRNAME]/nr_overcommit_hugepages_other_hugepagesize1
> /sys/devices/system/[DIRNAME]/nr_overcommit_hugepages_other_hugepagesize2
>
> That is, nr_hugepages in the directory (should it be called vm?
> memory? hugepages specifically? I'm looking for ideas!) will just be a
> symlink to the underlying default hugepagesize allocator. The files
> themselves would probably be named along the lines of:
>
> nr_hugepages_2M
> nr_hugepages_1G
> nr_hugepages_64K
>
> etc?
Yes I don't like the proc interface, nor the way it has been extended
(although that's not Andi's fault it is just a limitation of the old
API).
I think actually we should have individual directories for each hstate
size, and we can put all other stuff (reservations and per-node stuff
etc) under those directories. Leave the proc stuff just for the default
page size.
I think it should go in /sys/kernel/, because I think /sys/devices is
more of the hardware side of the system (so it makes sense for
reporting eg the actual supported TLB sizes, but for configuring your
page reserves, I think it makes more sense under /sys/kernel/). But
we'll ask the sysfs folk for guidance there.
> We'd want to have a similar layout on a per-node basis, I think (see
> my patchsets to add a per-node interface).
>
> > The other thing I did was try to shuffle the patches around a bit. There
> > were one or two (pretty trivial) points where it wasn't bisectable, and also
> > merge a couple of patches.
> >
> > I will try to get this patchset merged in -mm soon if feedback is positive.
> > I would also like to take patches for other architectures or any other
> > patches or suggestions for improvements.
>
> There are definitely going to be conflicts between my per-node stack
> and your set, but if you agree the interface should be cleaned up for
> multiple hugepage size support, then I'd like to get my sysfs bits
> into -mm and work on putting the global allocator into sysfs properly
> for you to base off. I think there's enough room for discussion that
> -mm may be a bit premature, but that's just my opinion.
>
> Thanks for keeping the patchset uptodate, I hope to do a more careful
> review next week of the individual patches.
Sure, I haven't seen your work but it shouldn't be terribly hard to merge
either way. It should be easy if we work together ;)
Thanks,
Nick
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-04-11 8:28 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-10 17:02 npiggin
2008-04-10 17:02 ` [patch 01/17] hugetlb: modular state npiggin
2008-04-21 20:51 ` Jon Tollefson
2008-04-22 6:45 ` Nick Piggin
2008-04-10 17:02 ` [patch 02/17] hugetlb: multiple hstates npiggin
2008-04-10 17:02 ` [patch 03/17] hugetlb: multi hstate proc files npiggin
2008-04-10 17:02 ` [patch 04/17] hugetlbfs: per mount hstates npiggin
2008-04-10 17:02 ` [patch 05/17] hugetlb: multi hstate sysctls npiggin
2008-04-10 17:02 ` [patch 06/17] hugetlb: abstract numa round robin selection npiggin
2008-04-10 17:02 ` [patch 07/17] mm: introduce non panic alloc_bootmem npiggin
2008-04-10 17:02 ` [patch 08/17] mm: export prep_compound_page to mm npiggin
2008-04-10 17:02 ` [patch 09/17] hugetlb: factor out huge_new_page npiggin
2008-04-10 17:02 ` [patch 10/17] mm: fix bootmem alignment npiggin
2008-04-10 17:33 ` Yinghai Lu
2008-04-10 17:39 ` Nick Piggin
2008-04-11 11:58 ` Nick Piggin
2008-04-10 17:02 ` [patch 11/17] hugetlbfs: support larger than MAX_ORDER npiggin
2008-04-11 8:13 ` Andi Kleen
2008-04-11 8:59 ` Nick Piggin
2008-04-10 17:02 ` [patch 12/17] hugetlb: support boot allocate different sizes npiggin
2008-04-10 17:02 ` [patch 13/17] hugetlb: printk cleanup npiggin
2008-04-10 17:02 ` [patch 14/17] hugetlb: introduce huge_pud npiggin
2008-04-10 17:02 ` [patch 15/17] x86: support GB hugepages on 64-bit npiggin
2008-04-10 17:02 ` [patch 16/17] x86: add hugepagesz option " npiggin
2008-04-10 17:02 ` [patch 17/17] hugetlb: misc fixes npiggin
2008-04-10 23:59 ` [patch 00/17] multi size, and giant hugetlb page support, 1GB hugetlb for x86 Nish Aravamudan
2008-04-11 8:28 ` Nick Piggin [this message]
2008-04-11 19:57 ` Nish Aravamudan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080411082858.GB20253@wotan.suse.de \
--to=npiggin@suse.de \
--cc=agl@us.ibm.com \
--cc=ak@suse.de \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=kniht@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nish.aravamudan@gmail.com \
--cc=pj@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox