From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1577CC5517A for ; Thu, 5 Nov 2020 12:53:30 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 77F7420756 for ; Thu, 5 Nov 2020 12:53:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 77F7420756 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A34706B0101; Thu, 5 Nov 2020 07:53:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9E2916B0102; Thu, 5 Nov 2020 07:53:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8D0F06B0103; Thu, 5 Nov 2020 07:53:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0098.hostedemail.com [216.40.44.98]) by kanga.kvack.org (Postfix) with ESMTP id 60AC66B0101 for ; Thu, 5 Nov 2020 07:53:28 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 03EDF181AC9CC for ; Thu, 5 Nov 2020 12:53:28 +0000 (UTC) X-FDA: 77450355696.07.pipe28_3d00209272ca Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin07.hostedemail.com (Postfix) with ESMTP id DE9A11803F9AA for ; Thu, 5 Nov 2020 12:53:27 +0000 (UTC) X-HE-Tag: pipe28_3d00209272ca X-Filterd-Recvd-Size: 4183 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf32.hostedemail.com (Postfix) with ESMTP for ; Thu, 5 Nov 2020 12:53:27 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id EB2FAABAE; Thu, 5 Nov 2020 12:53:25 +0000 (UTC) Subject: Re: [RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node To: Michal Hocko , Feng Tang Cc: Andrew Morton , Johannes Weiner , Matthew Wilcox , Mel Gorman , dave.hansen@intel.com, ying.huang@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <1604470210-124827-1-git-send-email-feng.tang@intel.com> <20201104071308.GN21990@dhcp22.suse.cz> <20201104073826.GA15700@shbuild999.sh.intel.com> <20201104075819.GA10052@dhcp22.suse.cz> <20201104084021.GB15700@shbuild999.sh.intel.com> <20201104085343.GA18718@dhcp22.suse.cz> <20201105014028.GA86777@shbuild999.sh.intel.com> <20201105120818.GC21348@dhcp22.suse.cz> From: Vlastimil Babka Message-ID: <4029c079-b1f3-f290-26b6-a819c52f5200@suse.cz> Date: Thu, 5 Nov 2020 13:53:24 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0 MIME-Version: 1.0 In-Reply-To: <20201105120818.GC21348@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 11/5/20 1:08 PM, Michal Hocko wrote: > On Thu 05-11-20 09:40:28, Feng Tang wrote: >> > >> > Could you be more specific? This sounds like a bug. Allocations >> > shouldn't spill over to a node which is not in the cpuset. There are few >> > exceptions like IRQ context but that shouldn't happen regurarly. >> >> I mean when the docker starts, it will spawn many processes which obey >> the mem binding set, and they have some kernel page requests, which got >> successfully allocated, like the following callstack: >> >> [ 567.044953] CPU: 1 PID: 2021 Comm: runc:[1:CHILD] Tainted: G W I 5.9.0-rc8+ #6 >> [ 567.044956] Hardware name: /NUC6i5SYB, BIOS SYSKLi35.86A.0051.2016.0804.1114 08/04/2016 >> [ 567.044958] Call Trace: >> [ 567.044972] dump_stack+0x74/0x9a >> [ 567.044978] __alloc_pages_nodemask.cold+0x22/0xe5 >> [ 567.044986] alloc_pages_current+0x87/0xe0 >> [ 567.044991] allocate_slab+0x2e5/0x4f0 >> [ 567.044996] ___slab_alloc+0x380/0x5d0 >> [ 567.045021] __slab_alloc+0x20/0x40 >> [ 567.045025] kmem_cache_alloc+0x2a0/0x2e0 >> [ 567.045033] mqueue_alloc_inode+0x1a/0x30 >> [ 567.045041] alloc_inode+0x22/0xa0 >> [ 567.045045] new_inode_pseudo+0x12/0x60 >> [ 567.045049] new_inode+0x17/0x30 >> [ 567.045052] mqueue_get_inode+0x45/0x3b0 >> [ 567.045060] mqueue_fill_super+0x41/0x70 >> [ 567.045067] vfs_get_super+0x7f/0x100 >> [ 567.045074] get_tree_keyed+0x1d/0x20 >> [ 567.045080] mqueue_get_tree+0x1c/0x20 >> [ 567.045086] vfs_get_tree+0x2a/0xc0 >> [ 567.045092] fc_mount+0x13/0x50 >> [ 567.045099] mq_create_mount+0x92/0xe0 >> [ 567.045102] mq_init_ns+0x3b/0x50 >> [ 567.045106] copy_ipcs+0x10a/0x1b0 >> [ 567.045113] create_new_namespaces+0xa6/0x2b0 >> [ 567.045118] unshare_nsproxy_namespaces+0x5a/0xb0 >> [ 567.045124] ksys_unshare+0x19f/0x360 >> [ 567.045129] __x64_sys_unshare+0x12/0x20 >> [ 567.045135] do_syscall_64+0x38/0x90 >> [ 567.045143] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> >> For it, the __alloc_pages_nodemask() will first try process's targed >> nodemask(unmovable node here), and there is no availabe zone, so it >> goes with the NULL nodemask, and get a page in the slowpath. > > OK, I see your point now. I was not aware of the slab allocator not > following cpusets. Sounds like a bug to me. SLAB and SLUB seem to not care about cpusets in the fast path. But this stack shows that it went all the way to the page allocator, so the cpusets should have been obeyed there at least.