From: Mel Gorman <mel@csn.ul.ie>
To: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
ak@suse.de, clameter@sgi.com, kamezawa.hiroyu@jp.fujitsu.com,
linux-mm@kvack.org, rientjes@google.com, eric.whitney@hp.com
Subject: Re: [PATCH 4/6] Use two zonelist that are filtered by GFP mask
Date: Tue, 4 Mar 2008 18:01:46 +0000 [thread overview]
Message-ID: <20080304180145.GB9051@csn.ul.ie> (raw)
In-Reply-To: <1204300094.5311.50.camel@localhost>
On (29/02/08 10:48), Lee Schermerhorn didst pronounce:
> On Fri, 2008-02-29 at 14:50 +0000, Mel Gorman wrote:
> > On (28/02/08 13:32), Andrew Morton didst pronounce:
> > > On Wed, 27 Feb 2008 16:47:34 -0500
> > > Lee Schermerhorn <lee.schermerhorn@hp.com> wrote:
> > >
> > > > +/* Returns the first zone at or below highest_zoneidx in a zonelist */
> > > > +static inline struct zone **first_zones_zonelist(struct zonelist *zonelist,
> > > > + enum zone_type highest_zoneidx)
> > > > +{
> > > > + struct zone **z;
> > > > +
> > > > + /* Find the first suitable zone to use for the allocation */
> > > > + z = zonelist->zones;
> > > > + while (*z && zone_idx(*z) > highest_zoneidx)
> > > > + z++;
> > > > +
> > > > + return z;
> > > > +}
> > > > +
> > > > +/* Returns the next zone at or below highest_zoneidx in a zonelist */
> > > > +static inline struct zone **next_zones_zonelist(struct zone **z,
> > > > + enum zone_type highest_zoneidx)
> > > > +{
> > > > + /* Find the next suitable zone to use for the allocation */
> > > > + while (*z && zone_idx(*z) > highest_zoneidx)
> > > > + z++;
> > > > +
> > > > + return z;
> > > > +}
> > > > +
> > > > +/**
> > > > + * for_each_zone_zonelist - helper macro to iterate over valid zones in a zonelist at or below a given zone index
> > > > + * @zone - The current zone in the iterator
> > > > + * @z - The current pointer within zonelist->zones being iterated
> > > > + * @zlist - The zonelist being iterated
> > > > + * @highidx - The zone index of the highest zone to return
> > > > + *
> > > > + * This iterator iterates though all zones at or below a given zone index.
> > > > + */
> > > > +#define for_each_zone_zonelist(zone, z, zlist, highidx) \
> > > > + for (z = first_zones_zonelist(zlist, highidx), zone = *z++; \
> > > > + zone; \
> > > > + z = next_zones_zonelist(z, highidx), zone = *z++)
> > > > +
> > >
> > > omygawd will that thing generate a lot of code!
> > >
> > > It has four call sites in mm/oom_kill.c and the overall patchset increases
> > > mm/oom_kill.o's text section (x86_64 allmodconfig) from 3268 bytes to 3845.
> > >
> >
> > Yeah... that's pretty bad. They were inlined to avoid function call overhead
> > when trying to avoid any additional performance overhead but the text overhead
> > is not helping either. I'll start looking at things to uninline and see what
> > can be gained text-reduction wise without mucking performance.
> >
> > > vmscan.o and page_alloc.o also grew a lot. otoh total vmlinux bloat from
> > > the patchset is only around 700 bytes, so I expect that with a little less
> > > insanity we could actually get an aggregate improvement here.
>
> Mel:
>
> Thinking about this:
>
> for_each_zone_zonelist():
>
> Seems like the call sites to this macro are not hot paths, so maybe
> these can call out to a zonelist iterator func in page_alloc.c or, as
> Kame-san suggested, mmzone.c.
>
I am trying an unlined version in mmzone.c to see what is looks like. As
expected there is less text bloat and I'll know in another day whether
it makes a performance difference or not. As you note below, the
majority of call sites are not in hot-paths as such.
> + oom_kill and vmscan call sites: if these are hot, we're already in,
> uh..., slow mode.
>
> + usage in slab.c and slub.c appears to be the fallback/slow path.
> Christoph can chime in, here, if he disagrees.
>
> + in page_alloc.c: waking up of kswapd and counting free zone pages
> [mostly for init code] don't appear to be fast paths.
>
> + The call site in hugetlb.c is in the huge-page allocation path, which
> is under a global spinlock. So, any slowdown here could result in
> longer lock hold time and higher contention. But, I have to believe
> that in the grand scheme of things, huge-page allocation is not that
> hot. [Someone faulting in terabytes of hugepages might contest that.]
>
> That leaves the call to for_each_zone_zonelist_nodemask() in
> get_page_from_freelist(). This might be deserving of inlining?
>
> If this works out, we could end up with these macros being inlined in
> only 2 places: get_page_from_freelist() and a to-be-designed zonelist
> iterator function. [In fact, I believe that such an iterator need not
> expose the details of zonelists outside of page_alloc/mmzone, but that
> would require more rework of the call sites, and additional helper
> functions. Maybe someday...]
>
> Comments?
>
> Right now, I've got to build/test the latest reclaim scalability patches
> that Rik posted, and clean up the issues already pointed out. If you
> don't get to this, I can look at it further next week.
>
> Lee
>
> > >
> > > Some of the inlining in mmzone.h is just comical. Some of it is obvious
> > > (first_zones_zonelist) and some of it is less obvious (pfn_present).
> > >
> > > I applied these for testing but I really don't think we should be merging
> > > such easily-fixed regressions into mainline. Could someone please take a
> > > look at de-porking core MM?
> > >
> > >
> > > Also, I switched all your Tested-by:s to Signed-off-by:s. You were on the
> > > delivery path, so s-o-b is the appropriate tag. I would like to believe
> > > that Signed-off-by: implies Tested-by: anyway (rofl).
> > >
> >
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-03-04 18:01 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-27 21:47 [PATCH 0/6] Use two zonelists per node instead of multiple zonelists v11r3 Lee Schermerhorn, Mel Gorman
2008-02-27 21:47 ` [PATCH 1/6] Use zonelists instead of zones when direct reclaiming pages Lee Schermerhorn, Mel Gorman
2008-02-27 21:47 ` [PATCH 2/6] Introduce node_zonelist() for accessing the zonelist for a GFP mask Lee Schermerhorn, Mel Gorman
2008-02-27 21:47 ` [PATCH 3/6] Remember what the preferred zone is for zone_statistics Lee Schermerhorn, Mel Gorman
2008-02-27 22:00 ` Christoph Lameter
2008-02-28 17:45 ` Lee Schermerhorn
2008-02-29 14:19 ` Mel Gorman
2008-02-29 2:30 ` KAMEZAWA Hiroyuki
2008-02-29 14:32 ` Mel Gorman
2008-02-27 21:47 ` [PATCH 4/6] Use two zonelist that are filtered by GFP mask Lee Schermerhorn, Mel Gorman
2008-02-28 21:32 ` Andrew Morton
2008-02-28 21:53 ` Lee Schermerhorn
2008-02-29 2:37 ` KAMEZAWA Hiroyuki
2008-02-29 14:50 ` Mel Gorman
2008-02-29 15:48 ` Lee Schermerhorn
2008-02-29 21:07 ` Christoph Lameter
2008-03-04 18:01 ` Mel Gorman [this message]
2008-03-05 16:06 ` [PATCH] 2.6.25-rc3-mm1 - Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask Lee Schermerhorn
2008-03-05 18:03 ` Nishanth Aravamudan
2008-03-05 19:02 ` Lee Schermerhorn
2008-03-06 1:04 ` Nishanth Aravamudan
2008-03-06 15:38 ` Lee Schermerhorn
2008-03-06 21:24 ` [PATCH] Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask rework Lee Schermerhorn
2008-03-07 17:35 ` Nishanth Aravamudan
2008-03-07 18:31 ` Lee Schermerhorn
2008-03-08 0:27 ` Nishanth Aravamudan
2008-03-06 0:39 ` [PATCH] 2.6.25-rc3-mm1 - Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask Andrew Morton
2008-03-06 15:17 ` Lee Schermerhorn
2008-03-06 18:41 ` [PATCH 4/6] Use two zonelist that are filtered by GFP mask Mel Gorman
2008-02-27 21:47 ` [PATCH 5/6] Have zonelist contains structs with both a zone pointer and zone_idx Lee Schermerhorn, Mel Gorman
2008-02-29 7:49 ` KOSAKI Motohiro
2008-02-27 21:47 ` [PATCH 6/6] Filter based on a nodemask as well as a gfp_mask Lee Schermerhorn, Mel Gorman
2008-02-29 2:59 ` KAMEZAWA Hiroyuki
2008-03-07 11:56 ` Mel Gorman
2008-02-29 8:48 ` KOSAKI Motohiro
2008-02-27 21:53 ` [PATCH 0/6] Use two zonelists per node instead of multiple zonelists v11r3 Lee Schermerhorn
2008-02-29 14:12 ` Mel Gorman
-- strict thread matches above, loose matches on Subject: below --
2007-12-11 20:21 [PATCH 0/6] Use two zonelists per node instead of multiple zonelists v11r2 Mel Gorman
2007-12-11 20:23 ` [PATCH 4/6] Use two zonelist that are filtered by GFP mask Mel Gorman
2007-11-21 0:38 [PATCH 0/6] Use two zonelists per node instead of multiple zonelists v10 Mel Gorman
2007-11-21 0:40 ` [PATCH 4/6] Use two zonelist that are filtered by GFP mask Mel Gorman
2007-11-21 2:37 ` 小崎資広
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080304180145.GB9051@csn.ul.ie \
--to=mel@csn.ul.ie \
--cc=Lee.Schermerhorn@hp.com \
--cc=ak@suse.de \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=eric.whitney@hp.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox