From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ADC55C4741F for ; Wed, 4 Nov 2020 06:10:19 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 34071223BD for ; Wed, 4 Nov 2020 06:10:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 34071223BD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7C5DF6B0036; Wed, 4 Nov 2020 01:10:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 775806B005D; Wed, 4 Nov 2020 01:10:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 68B026B0068; Wed, 4 Nov 2020 01:10:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0027.hostedemail.com [216.40.44.27]) by kanga.kvack.org (Postfix) with ESMTP id 3AF926B0036 for ; Wed, 4 Nov 2020 01:10:18 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id CA6D51EE6 for ; Wed, 4 Nov 2020 06:10:17 +0000 (UTC) X-FDA: 77445710874.17.chess08_2102afa272bf Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin17.hostedemail.com (Postfix) with ESMTP id AD56F180D0181 for ; Wed, 4 Nov 2020 06:10:17 +0000 (UTC) X-HE-Tag: chess08_2102afa272bf X-Filterd-Recvd-Size: 3884 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf47.hostedemail.com (Postfix) with ESMTP for ; Wed, 4 Nov 2020 06:10:14 +0000 (UTC) IronPort-SDR: d2xErKAIFyT+i/t3i0u8MEI1TJY8aFJfcwhAFI6VtGmf9ojp7WTCZ+i/V+ZGCXST0rjb3DU9yG qbxJdQ39tcLw== X-IronPort-AV: E=McAfee;i="6000,8403,9794"; a="156160529" X-IronPort-AV: E=Sophos;i="5.77,450,1596524400"; d="scan'208";a="156160529" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Nov 2020 22:10:13 -0800 IronPort-SDR: CTlS8MEnIdbU1+0mOOm5HT8sn8pY6I3uLqexE2jIfLfZddLYwjNAfiLwZPkAQDPCmtXPihu2x+ oynKxKbM+Mdg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,450,1596524400"; d="scan'208";a="325489767" Received: from shbuild999.sh.intel.com ([10.239.147.98]) by orsmga006.jf.intel.com with ESMTP; 03 Nov 2020 22:10:10 -0800 From: Feng Tang To: Andrew Morton , Michal Hocko , Johannes Weiner , Matthew Wilcox , Mel Gorman , dave.hansen@intel.com, ying.huang@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Feng Tang Subject: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node Date: Wed, 4 Nov 2020 14:10:08 +0800 Message-Id: <1604470210-124827-1-git-send-email-feng.tang@intel.com> X-Mailer: git-send-email 2.7.4 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, This patchset tries to report a problem and get suggestion/review for the RFC fix patches. We recently got a OOM report, that when user try to bind a docker(container) instance to a memory node which only has movable zones, and OOM killing still can't solve the page allocation failure. The callstack was: [ 1387.877565] runc:[2:INIT] invoked oom-killer: gfp_mask=0x500cc2(GFP_HIGHUSER|__GFP_ACCOUNT), order=0, oom_score_adj=0 [ 1387.877568] CPU: 8 PID: 8291 Comm: runc:[2:INIT] Tainted: G W I E 5.8.2-0.g71b519a-default #1 openSUSE Tumbleweed (unreleased) [ 1387.877569] Hardware name: Dell Inc. PowerEdge R640/0PHYDR, BIOS 2.6.4 04/09/2020 [ 1387.877570] Call Trace: [ 1387.877579] dump_stack+0x6b/0x88 [ 1387.877584] dump_header+0x4a/0x1e2 [ 1387.877586] oom_kill_process.cold+0xb/0x10 [ 1387.877588] out_of_memory.part.0+0xaf/0x230 [ 1387.877591] out_of_memory+0x3d/0x80 [ 1387.877595] __alloc_pages_slowpath.constprop.0+0x954/0xa20 [ 1387.877599] __alloc_pages_nodemask+0x2d3/0x300 [ 1387.877602] pipe_write+0x322/0x590 [ 1387.877607] new_sync_write+0x196/0x1b0 [ 1387.877609] vfs_write+0x1c3/0x1f0 [ 1387.877611] ksys_write+0xa7/0xe0 [ 1387.877617] do_syscall_64+0x52/0xd0 [ 1387.877621] entry_SYSCALL_64_after_hwframe+0x44/0xa9 The meminfo log only shows the movable only node, which has plenty of free memory. And in our reproducing with 1/2 patch, the normal node (has DMA/DMA32/Normal) also has lot of free memory when OOM happens. If we hack to make this (GFP_HIGHUSER|__GFP_ACCOUNT) request get a page, and following full docker run (like installing and running 'stress-ng' stress test) will see more allocation failures due to different kinds of request(gfp_masks). And the 2/2 patch will detect such cases that the allowed target nodes only have movable zones and loose the binding check, otherwise it will trigger OOM while the OOM won't do any help, as the problem is not lack of free memory. Feng Tang (2): mm, oom: dump meminfo for all memory nodes mm, page_alloc: loose the node binding check to avoid helpless oom killing mm/oom_kill.c | 2 +- mm/page_alloc.c | 22 ++++++++++++++++++++++ 2 files changed, 23 insertions(+), 1 deletion(-) -- 2.7.4