From: Nishanth Aravamudan <nacc@us.ibm.com>
To: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mel@csn.ul.ie>,
agl@us.ibm.com, wli@holomorphy.com, clameter@sgi.com, ak@suse.de,
kamezawa.hiroyu@jp.fujitsu.com, rientjes@google.com,
linux-mm@kvack.org, eric.whitney@hp.com
Subject: Re: [PATCH] Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask rework
Date: Fri, 7 Mar 2008 16:27:22 -0800 [thread overview]
Message-ID: <20080308002722.GB16868@us.ibm.com> (raw)
In-Reply-To: <1204914705.5340.36.camel@localhost>
On 07.03.2008 [13:31:44 -0500], Lee Schermerhorn wrote:
> On Fri, 2008-03-07 at 09:35 -0800, Nishanth Aravamudan wrote:
> > On 06.03.2008 [16:24:53 -0500], Lee Schermerhorn wrote:
> > >
> > > Fix for earlier patch:
> > > "mempolicy-make-dequeue_huge_page_vma-obey-bind-policy"
> > >
> > > Against: 2.6.25-rc3-mm1 atop the above patch.
> > >
> > > As suggested by Nish Aravamudan, remove the mpol_bind_nodemask()
> > > helper and return a pointer to the policy node mask from
> > > huge_zonelist for MPOL_BIND. This hides more of the mempolicy
> > > quirks from hugetlb.
> > >
> > > In making this change, I noticed that the huge_zonelist() stub
> > > for !NUMA wasn't nulling out the mpol. Added that as well.
> >
> > Hrm, I was thinking more of the following (on top of this patch):
> >
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 4c5d41d..3790f5a 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -1298,9 +1298,7 @@ struct zonelist *huge_zonelist(struct vm_area_struct *vma, unsigned long addr,
> >
> > *mpol = NULL; /* probably no unref needed */
> > *nodemask = NULL; /* assume !MPOL_BIND */
> > - if (pol->policy == MPOL_BIND) {
> > - *nodemask = &pol->v.nodes;
> > - } else if (pol->policy == MPOL_INTERLEAVE) {
> > + if (pol->policy == MPOL_INTERLEAVE) {
> > unsigned nid;
> >
> > nid = interleave_nid(pol, vma, addr, HPAGE_SHIFT);
> > @@ -1310,10 +1308,12 @@ struct zonelist *huge_zonelist(struct vm_area_struct *vma, unsigned long addr,
> >
> > zl = zonelist_policy(GFP_HIGHUSER, pol);
> > if (unlikely(pol != &default_policy && pol != current->mempolicy)) {
> > - if (pol->policy != MPOL_BIND)
> > + if (pol->policy != MPOL_BIND) {
> > __mpol_free(pol); /* finished with pol */
> > - else
> > + } else {
> > *mpol = pol; /* unref needed after allocation */
> > + *nodemask = &pol->v.nodes;
> > + }
> > }
> > return zl;
> > }
> >
> > but perhaps that won't do the right thing if pol == current->mempolicy
> > and pol->policy == MPOL_BIND.
>
> Right, you won't return the nodemask for current task policy == MBIND.
>
> > So something like:
> >
> >
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 4c5d41d..7eb77e0 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -1298,9 +1298,7 @@ struct zonelist *huge_zonelist(struct vm_area_struct *vma, unsigned long addr,
> >
> > *mpol = NULL; /* probably no unref needed */
> > *nodemask = NULL; /* assume !MPOL_BIND */
> > - if (pol->policy == MPOL_BIND) {
> > - *nodemask = &pol->v.nodes;
> > - } else if (pol->policy == MPOL_INTERLEAVE) {
> > + if (pol->policy == MPOL_INTERLEAVE) {
> > unsigned nid;
> >
> > nid = interleave_nid(pol, vma, addr, HPAGE_SHIFT);
> > @@ -1309,11 +1307,12 @@ struct zonelist *huge_zonelist(struct vm_area_struct *vma, unsigned long addr,
> > }
> >
> > zl = zonelist_policy(GFP_HIGHUSER, pol);
> > - if (unlikely(pol != &default_policy && pol != current->mempolicy)) {
> > - if (pol->policy != MPOL_BIND)
> > - __mpol_free(pol); /* finished with pol */
> > - else
> > + if (unlikely(pol != &default_policy && pol != current->mempolicy
> > + && pol->policy != MPOL_BIND))
> > + __mpol_free(pol); /* finished with pol */
> > + if (pol->policy == MPOL_BIND) {
> > *mpol = pol; /* unref needed after allocation */
> > + *nodemask = &pol->v.nodes;
> > }
> > return zl;
> > }
> >
> > Still not quite as clean, but I think it's best to keep the *mpol and
> > *nodemask assignments together, as if *mpol is being assigned, that's
> > the only time we should need to set *nodemask, right?
>
> Well, as you've noted, we do have to test MPOL_BIND twice: once to
> return the nodemask for any 'BIND policy and once to return a non-NULL
> mpol ONLY if it's MPOL_BIND and we need an unref. However, I wanted to
> avoid checking the policies twice as well, or storing *nodemask 3rd
> time.
>
> I think that your second change above is not quite right, either.
> You're unconditionally returning the policy when the 'mode' == MBIND,
> even if it does not need a deref. This could result in prematurely
> freeing the task policy, causing a "use after free" error on next
> allocation; or even decrementing the reference on the system_default
> policy, which is probably benign, but not "nice". [Also, check your
> parentheses...]
>
> Anyway you slice it, it's pretty ugly.
Yep. You understand all this mempolicy code much better than I :) I was
just trying to explain my aesthetic goal :)
> So, for now, I'd like to keep it the way I have it. I'll be sending
> out a set of patches to rework the reference counting after mempolicy
> settles down--i.e., Mel's and David's patches, which I'm testing now.
> That will clean this area up quite a bit, IMO.
Sounds good. Looking forward to it.
-Nish
--
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-03-08 0:27 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-27 21:47 [PATCH 0/6] Use two zonelists per node instead of multiple zonelists v11r3 Lee Schermerhorn, Mel Gorman
2008-02-27 21:47 ` [PATCH 1/6] Use zonelists instead of zones when direct reclaiming pages Lee Schermerhorn, Mel Gorman
2008-02-27 21:47 ` [PATCH 2/6] Introduce node_zonelist() for accessing the zonelist for a GFP mask Lee Schermerhorn, Mel Gorman
2008-02-27 21:47 ` [PATCH 3/6] Remember what the preferred zone is for zone_statistics Lee Schermerhorn, Mel Gorman
2008-02-27 22:00 ` Christoph Lameter
2008-02-28 17:45 ` Lee Schermerhorn
2008-02-29 14:19 ` Mel Gorman
2008-02-29 2:30 ` KAMEZAWA Hiroyuki
2008-02-29 14:32 ` Mel Gorman
2008-02-27 21:47 ` [PATCH 4/6] Use two zonelist that are filtered by GFP mask Lee Schermerhorn, Mel Gorman
2008-02-28 21:32 ` Andrew Morton
2008-02-28 21:53 ` Lee Schermerhorn
2008-02-29 2:37 ` KAMEZAWA Hiroyuki
2008-02-29 14:50 ` Mel Gorman
2008-02-29 15:48 ` Lee Schermerhorn
2008-02-29 21:07 ` Christoph Lameter
2008-03-04 18:01 ` Mel Gorman
2008-03-05 16:06 ` [PATCH] 2.6.25-rc3-mm1 - Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask Lee Schermerhorn
2008-03-05 18:03 ` Nishanth Aravamudan
2008-03-05 19:02 ` Lee Schermerhorn
2008-03-06 1:04 ` Nishanth Aravamudan
2008-03-06 15:38 ` Lee Schermerhorn
2008-03-06 21:24 ` [PATCH] Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask rework Lee Schermerhorn
2008-03-07 17:35 ` Nishanth Aravamudan
2008-03-07 18:31 ` Lee Schermerhorn
2008-03-08 0:27 ` Nishanth Aravamudan [this message]
2008-03-06 0:39 ` [PATCH] 2.6.25-rc3-mm1 - Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask Andrew Morton
2008-03-06 15:17 ` Lee Schermerhorn
2008-03-06 18:41 ` [PATCH 4/6] Use two zonelist that are filtered by GFP mask Mel Gorman
2008-02-27 21:47 ` [PATCH 5/6] Have zonelist contains structs with both a zone pointer and zone_idx Lee Schermerhorn, Mel Gorman
2008-02-29 7:49 ` KOSAKI Motohiro
2008-02-27 21:47 ` [PATCH 6/6] Filter based on a nodemask as well as a gfp_mask Lee Schermerhorn, Mel Gorman
2008-02-29 2:59 ` KAMEZAWA Hiroyuki
2008-03-07 11:56 ` Mel Gorman
2008-02-29 8:48 ` KOSAKI Motohiro
2008-02-27 21:53 ` [PATCH 0/6] Use two zonelists per node instead of multiple zonelists v11r3 Lee Schermerhorn
2008-02-29 14:12 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080308002722.GB16868@us.ibm.com \
--to=nacc@us.ibm.com \
--cc=Lee.Schermerhorn@hp.com \
--cc=agl@us.ibm.com \
--cc=ak@suse.de \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=eric.whitney@hp.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=rientjes@google.com \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox