From: Feng Tang <feng.tang@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@suse.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Matthew Wilcox <willy@infradead.org>,
Mel Gorman <mgorman@suse.de>,
dave.hansen@intel.com, ying.huang@intel.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Cc: Feng Tang <feng.tang@intel.com>
Subject: [RFC PATCH 2/2] mm, page_alloc: loose the node binding check to avoid helpless oom killing
Date: Wed, 4 Nov 2020 14:10:10 +0800 [thread overview]
Message-ID: <1604470210-124827-3-git-send-email-feng.tang@intel.com> (raw)
In-Reply-To: <1604470210-124827-1-git-send-email-feng.tang@intel.com>
With the incoming of memory hotplug feature and persitent memory, in
some platform there are memory nodes which only have movable zone.
Users may bind some of their workload(like docker/container) to
these nodes, and there are many reports of OOM and page allocation
failures, one callstack is:
[ 1387.877565] runc:[2:INIT] invoked oom-killer: gfp_mask=0x500cc2(GFP_HIGHUSER|__GFP_ACCOUNT), order=0, oom_score_adj=0
[ 1387.877568] CPU: 8 PID: 8291 Comm: runc:[2:INIT] Tainted: G W I E 5.8.2-0.g71b519a-default #1 openSUSE Tumbleweed (unreleased)
[ 1387.877569] Hardware name: Dell Inc. PowerEdge R640/0PHYDR, BIOS 2.6.4 04/09/2020
[ 1387.877570] Call Trace:
[ 1387.877579] dump_stack+0x6b/0x88
[ 1387.877584] dump_header+0x4a/0x1e2
[ 1387.877586] oom_kill_process.cold+0xb/0x10
[ 1387.877588] out_of_memory.part.0+0xaf/0x230
[ 1387.877591] out_of_memory+0x3d/0x80
[ 1387.877595] __alloc_pages_slowpath.constprop.0+0x954/0xa20
[ 1387.877599] __alloc_pages_nodemask+0x2d3/0x300
[ 1387.877602] pipe_write+0x322/0x590
[ 1387.877607] new_sync_write+0x196/0x1b0
[ 1387.877609] vfs_write+0x1c3/0x1f0
[ 1387.877611] ksys_write+0xa7/0xe0
[ 1387.877617] do_syscall_64+0x52/0xd0
[ 1387.877621] entry_SYSCALL_64_after_hwframe+0x44/0xa9
In a full container run, like installing and running the stress tool
"stress-ng", there are many different kinds of page requests (gfp_masks),
many of which only allow non-movable zones. Some of them can fall back
to other nodes with NORMAL/DMA32/DMA zones, but others are blocked by
the __GFP_HARDWALL or ALLOC_CPUSET check, and cause OOM killing. But
OOM killing won't do any help here, as this is not an issue of lack of
free memory, but simply blocked by the node binding policy check.
So loose the policy check for this case.
Signed-off-by: Feng Tang <feng.tang@intel.com>
---
mm/page_alloc.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d772206..efd49a9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4669,6 +4669,28 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
if (!ac->preferred_zoneref->zone)
goto nopage;
+ /*
+ * If the task's target memory nodes only has movable zones, while the
+ * gfp_mask allowed zone is lower than ZONE_MOVABLE, loose the check
+ * for __GFP_HARDWALL and ALLOC_CPUSET, otherwise it could trigger OOM
+ * killing, which still can not solve this policy check.
+ */
+ if (ac->highest_zoneidx <= ZONE_NORMAL) {
+ int nid;
+ unsigned long unmovable = 0;
+
+ /* FIXME: this could be a separate function */
+ for_each_node_mask(nid, cpuset_current_mems_allowed) {
+ unmovable += NODE_DATA(nid)->node_present_pages -
+ NODE_DATA(nid)->node_zones[ZONE_MOVABLE].present_pages;
+ }
+
+ if (!unmovable) {
+ gfp_mask &= ~(__GFP_HARDWALL);
+ alloc_flags &= ~ALLOC_CPUSET;
+ }
+ }
+
if (alloc_flags & ALLOC_KSWAPD)
wake_all_kswapds(order, gfp_mask, ac);
--
2.7.4
next prev parent reply other threads:[~2020-11-04 6:10 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-04 6:10 [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node Feng Tang
2020-11-04 6:10 ` [RFC PATCH 1/2] mm, oom: dump meminfo for all memory nodes Feng Tang
2020-11-04 7:18 ` Michal Hocko
2020-11-04 6:10 ` Feng Tang [this message]
2020-11-04 7:23 ` [RFC PATCH 2/2] mm, page_alloc: loose the node binding check to avoid helpless oom killing Michal Hocko
2020-11-04 7:13 ` [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node Michal Hocko
2020-11-04 7:38 ` Feng Tang
2020-11-04 7:58 ` Michal Hocko
2020-11-04 8:40 ` Feng Tang
2020-11-04 8:53 ` Michal Hocko
[not found] ` <20201105014028.GA86777@shbuild999.sh.intel.com>
2020-11-05 12:08 ` Michal Hocko
2020-11-05 12:53 ` Vlastimil Babka
2020-11-05 12:58 ` Michal Hocko
2020-11-05 13:07 ` Feng Tang
2020-11-05 13:12 ` Michal Hocko
2020-11-05 13:43 ` Feng Tang
2020-11-05 16:16 ` Michal Hocko
2020-11-06 7:06 ` Feng Tang
2020-11-06 8:10 ` Michal Hocko
2020-11-06 9:08 ` Feng Tang
2020-11-06 10:35 ` Michal Hocko
2020-11-05 13:14 ` Vlastimil Babka
2020-11-05 13:19 ` Michal Hocko
2020-11-05 13:34 ` Vlastimil Babka
2020-11-06 4:32 ` Huang, Ying
2020-11-06 7:43 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1604470210-124827-3-git-send-email-feng.tang@intel.com \
--to=feng.tang@intel.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@intel.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.com \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox