linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Mel Gorman <mgorman@suse.de>, Pravin Shelar <pshelar@nicira.com>,
	Jarno Rajahalme <jrajahalme@nicira.com>,
	Greg Thelen <gthelen@google.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	netdev@vger.kernel.org, dev@openvswitch.org
Subject: Re: [patch 1/2] mm: remove GFP_THISNODE
Date: Fri, 27 Feb 2015 14:03:41 -0800 (PST)	[thread overview]
Message-ID: <alpine.DEB.2.10.1502271335520.4718@chino.kir.corp.google.com> (raw)
In-Reply-To: <54F01E02.1090007@suse.cz>

On Fri, 27 Feb 2015, Vlastimil Babka wrote:

> Oh, right. I missed the new trigger. My sanity and career is saved!
> 

Haha.

> Well, no... the flags are still a mess. Aren't GFP_TRANSHUGE | __GFP_THISNODE
> allocations still problematic after this patch and 2/2? Those do include
> __GFP_WAIT (unless !defrag). So with only patch 2/2 without 1/2 they would match
> GFP_THISNODE and bail out (not good for khugepaged at least...).

With both patches: if __GFP_WAIT isn't set, either for page fault or 
khugepaged, then we always exit immediately from __alloc_pages_slowpath(): 
we can't try reclaim or compaction.  If __GFP_WAIT is set, then the new 
conditional fails, and the slowpath proceeds as we want it to with a 
zonelist that only includes local nodes because __GFP_THISNODE is set for 
node_zonelist() in alloc_pages_exact_node().  Those are the only zones 
that get_page_from_freelist() gets to iterate over.

With only this patch: we still have the problem that is fixed with the 
second patch, thp is preferred on the node of choice but can be allocated 
from any other node for fallback because the allocations lack 
__GFP_THISNODE.

> With both
> patches they won't bail out and __GFP_NO_KSWAPD will prevent most of the stuff
> described above, including clearing ALLOC_CPUSET.

Yeah, ALLOC_CPUSET is never cleared for thp allocations because atomic == 
false for thp, regardless of this series.

> But __cpuset_node_allowed()
> will allow it to allocate anywhere anyway thanks to the newly passed
> __GFP_THISNODE, which would be a regression of what b104a35d32 fixed... unless
> I'm missing something else that prevents it, which wouldn't surprise me at all.
> 
> There's this outdated comment:
> 
>  * The __GFP_THISNODE placement logic is really handled elsewhere,
>  * by forcibly using a zonelist starting at a specified node, and by
>  * (in get_page_from_freelist()) refusing to consider the zones for
>  * any node on the zonelist except the first.  By the time any such
>  * calls get to this routine, we should just shut up and say 'yes'.
> 
> AFAIK the __GFP_THISNODE zonelist contains *only* zones from the single node and
> there's no other "refusing".

Yes, __cpuset_node_allowed() is never called for a zone from any other 
node when __GFP_THISNODE is passed because of node_zonelist().  It's 
pointless to iterate over those zones since the allocation wants to fail 
instead of allocate on them.

Do you see any issues with either patch 1/2 or patch 2/2 besides the 
s/GFP_TRANSHUGE/GFP_THISNODE/ that is necessary on the changelog?

> And I don't really see why __GFP_THISNODE should
> have this exception, it feels to me like "well we shouldn't reach this but we
> are not sure, so let's play it safe". So maybe we could just remove this
> exception? I don't think any other user of __GFP_THISNODE | __GFP_WAIT user
> relies on this allowed memset violation?
> 

Since this function was written, there were other callers to 
cpuset_{node,zone}_allowed_{soft,hard}wall() that may have required it.  I 
looked at all the current callers of cpuset_zone_allowed() and they don't 
appear to need this "exception" (slub calls node_zonelist() itself for the 
iteration and slab never calls it for __GFP_THISNODE).  So, yeah, I think 
it can be removed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-02-27 22:03 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-26  0:23 David Rientjes
2015-02-26  0:56 ` Christoph Lameter
2015-02-26  1:04   ` David Rientjes
2015-02-26  8:30 ` Vlastimil Babka
2015-02-27  3:09   ` David Rientjes
2015-02-27  7:34     ` Vlastimil Babka
2015-02-27 22:03       ` David Rientjes [this message]
2015-02-27 22:19         ` Vlastimil Babka
2015-02-27 22:31           ` David Rientjes
2015-02-27 22:52             ` Vlastimil Babka
2015-02-27 22:16 ` [patch v2 1/3] " David Rientjes
2015-02-27 22:17   ` [patch v2 2/3] mm, thp: really limit transparent hugepage allocation to local node David Rientjes
2015-03-02 13:47     ` Vlastimil Babka
2015-02-27 22:17   ` [patch v2 3/3] kernel, cpuset: remove exception for __GFP_THISNODE David Rientjes
2015-03-02 13:47     ` Vlastimil Babka
2015-02-27 22:53   ` [patch v2 1/3] mm: remove GFP_THISNODE Christoph Lameter
2015-02-28  3:21     ` David Rientjes
2015-03-02 13:46   ` Vlastimil Babka
2015-03-02 15:46     ` Christoph Lameter
2015-03-02 16:02       ` Vlastimil Babka
2015-03-02 16:08         ` Christoph Lameter
2015-03-02 16:23           ` Vlastimil Babka
2015-03-02 20:40             ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.10.1502271335520.4718@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=dev@openvswitch.org \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=jrajahalme@nicira.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=netdev@vger.kernel.org \
    --cc=penberg@kernel.org \
    --cc=pshelar@nicira.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox