From: David Rientjes <rientjes@google.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Christoph Lameter <cl@linux.com>,
Pekka Enberg <penberg@kernel.org>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Mel Gorman <mgorman@suse.de>, Pravin Shelar <pshelar@nicira.com>,
Jarno Rajahalme <jrajahalme@nicira.com>,
Greg Thelen <gthelen@google.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
netdev@vger.kernel.org, dev@openvswitch.org
Subject: Re: [patch 1/2] mm: remove GFP_THISNODE
Date: Fri, 27 Feb 2015 14:03:41 -0800 (PST) [thread overview]
Message-ID: <alpine.DEB.2.10.1502271335520.4718@chino.kir.corp.google.com> (raw)
In-Reply-To: <54F01E02.1090007@suse.cz>
On Fri, 27 Feb 2015, Vlastimil Babka wrote:
> Oh, right. I missed the new trigger. My sanity and career is saved!
>
Haha.
> Well, no... the flags are still a mess. Aren't GFP_TRANSHUGE | __GFP_THISNODE
> allocations still problematic after this patch and 2/2? Those do include
> __GFP_WAIT (unless !defrag). So with only patch 2/2 without 1/2 they would match
> GFP_THISNODE and bail out (not good for khugepaged at least...).
With both patches: if __GFP_WAIT isn't set, either for page fault or
khugepaged, then we always exit immediately from __alloc_pages_slowpath():
we can't try reclaim or compaction. If __GFP_WAIT is set, then the new
conditional fails, and the slowpath proceeds as we want it to with a
zonelist that only includes local nodes because __GFP_THISNODE is set for
node_zonelist() in alloc_pages_exact_node(). Those are the only zones
that get_page_from_freelist() gets to iterate over.
With only this patch: we still have the problem that is fixed with the
second patch, thp is preferred on the node of choice but can be allocated
from any other node for fallback because the allocations lack
__GFP_THISNODE.
> With both
> patches they won't bail out and __GFP_NO_KSWAPD will prevent most of the stuff
> described above, including clearing ALLOC_CPUSET.
Yeah, ALLOC_CPUSET is never cleared for thp allocations because atomic ==
false for thp, regardless of this series.
> But __cpuset_node_allowed()
> will allow it to allocate anywhere anyway thanks to the newly passed
> __GFP_THISNODE, which would be a regression of what b104a35d32 fixed... unless
> I'm missing something else that prevents it, which wouldn't surprise me at all.
>
> There's this outdated comment:
>
> * The __GFP_THISNODE placement logic is really handled elsewhere,
> * by forcibly using a zonelist starting at a specified node, and by
> * (in get_page_from_freelist()) refusing to consider the zones for
> * any node on the zonelist except the first. By the time any such
> * calls get to this routine, we should just shut up and say 'yes'.
>
> AFAIK the __GFP_THISNODE zonelist contains *only* zones from the single node and
> there's no other "refusing".
Yes, __cpuset_node_allowed() is never called for a zone from any other
node when __GFP_THISNODE is passed because of node_zonelist(). It's
pointless to iterate over those zones since the allocation wants to fail
instead of allocate on them.
Do you see any issues with either patch 1/2 or patch 2/2 besides the
s/GFP_TRANSHUGE/GFP_THISNODE/ that is necessary on the changelog?
> And I don't really see why __GFP_THISNODE should
> have this exception, it feels to me like "well we shouldn't reach this but we
> are not sure, so let's play it safe". So maybe we could just remove this
> exception? I don't think any other user of __GFP_THISNODE | __GFP_WAIT user
> relies on this allowed memset violation?
>
Since this function was written, there were other callers to
cpuset_{node,zone}_allowed_{soft,hard}wall() that may have required it. I
looked at all the current callers of cpuset_zone_allowed() and they don't
appear to need this "exception" (slub calls node_zonelist() itself for the
iteration and slab never calls it for __GFP_THISNODE). So, yeah, I think
it can be removed.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-02-27 22:03 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-26 0:23 David Rientjes
2015-02-26 0:56 ` Christoph Lameter
2015-02-26 1:04 ` David Rientjes
2015-02-26 8:30 ` Vlastimil Babka
2015-02-27 3:09 ` David Rientjes
2015-02-27 7:34 ` Vlastimil Babka
2015-02-27 22:03 ` David Rientjes [this message]
2015-02-27 22:19 ` Vlastimil Babka
2015-02-27 22:31 ` David Rientjes
2015-02-27 22:52 ` Vlastimil Babka
2015-02-27 22:16 ` [patch v2 1/3] " David Rientjes
2015-02-27 22:17 ` [patch v2 2/3] mm, thp: really limit transparent hugepage allocation to local node David Rientjes
2015-03-02 13:47 ` Vlastimil Babka
2015-02-27 22:17 ` [patch v2 3/3] kernel, cpuset: remove exception for __GFP_THISNODE David Rientjes
2015-03-02 13:47 ` Vlastimil Babka
2015-02-27 22:53 ` [patch v2 1/3] mm: remove GFP_THISNODE Christoph Lameter
2015-02-28 3:21 ` David Rientjes
2015-03-02 13:46 ` Vlastimil Babka
2015-03-02 15:46 ` Christoph Lameter
2015-03-02 16:02 ` Vlastimil Babka
2015-03-02 16:08 ` Christoph Lameter
2015-03-02 16:23 ` Vlastimil Babka
2015-03-02 20:40 ` David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.10.1502271335520.4718@chino.kir.corp.google.com \
--to=rientjes@google.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=dev@openvswitch.org \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=iamjoonsoo.kim@lge.com \
--cc=jrajahalme@nicira.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=netdev@vger.kernel.org \
--cc=penberg@kernel.org \
--cc=pshelar@nicira.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox