From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C52D0CDB474 for ; Mon, 23 Oct 2023 03:13:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4BD846B009F; Sun, 22 Oct 2023 23:13:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 41F5A6B00A0; Sun, 22 Oct 2023 23:13:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2BF466B00A1; Sun, 22 Oct 2023 23:13:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1C7B86B009F for ; Sun, 22 Oct 2023 23:13:11 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id E232DB55D6 for ; Mon, 23 Oct 2023 03:13:10 +0000 (UTC) X-FDA: 81375254940.14.85C9670 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65]) by imf28.hostedemail.com (Postfix) with ESMTP id 39FA8C0003 for ; Mon, 23 Oct 2023 03:13:06 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="S/hyLnCg"; spf=pass (imf28.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698030789; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bnoOBC9NN4vlykJD0IrIXlAgvmHmwcR7n5goxqOIsmU=; b=KdT6S+e1RuVG8/9eGCvg7dNBRFxhgV7yeLlmhivLJMcK2c5aDpZykFHpBy03e/mUMU4JZo Nnp2P6STLtKDTjx6IfYVWGJDwqHRqNKA7OmWrhvYAS6rdg6iZMWOdI35hoQg9PZ5T4rzBG FPxbJ1Mc/A08Kz7Rw3Mc4IrGfntTtIc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698030789; a=rsa-sha256; cv=none; b=E/Xof+YVyB5gQaU2xL6Hbcg4loKVM0Gs6lxbfz9xrRjYyedHylQQm4ZX8oN8MDHo/0E9/l qiEiFG9ainSsFEMsV1EH8r/qRHGTMM3aq1FZ5uPuRlX7zxaGz8GggAUJkJgM0tiK6J/sfN AjZJVTEuO6qE/8E+6v2O5u2jqFVxzwo= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="S/hyLnCg"; spf=pass (imf28.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698030788; x=1729566788; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=gyCg2+9VLduhMWu9FS7TlcphhtM5CN94k/MYoshLttE=; b=S/hyLnCgsSU4/VhoNpNusW9iVSnfRtSjHNdB9oaEg371tgcEM8VZX3YW 3M4tphLc+Vxkf5QadSbeuYPcYY5TCHyVZ2iuBL9TdDpyY2RqKm437wl8O MUuF+KYHVoS2Vzhgb6MUSGXiQxmB1sfEkgQTBMgMzYBWgb41jJv8lWTw3 S/liRQX6uXE12hlA5TfzX2bnyDZZB4+qT2gely3d0NJUWrLyiQHoeNOOg TUvI8exF++lOCmp9EjYQzWrpZ2qNjKTXJpjvbajJt96BSKBm63TmWRYKN KLK+yNQT+ad2qUPJ347CoFE04oKtw0qT/rKJ55Z5QeQiLpTv53UdTCEQW g==; X-IronPort-AV: E=McAfee;i="6600,9927,10871"; a="390644199" X-IronPort-AV: E=Sophos;i="6.03,244,1694761200"; d="scan'208";a="390644199" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Oct 2023 20:13:04 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10871"; a="931561560" X-IronPort-AV: E=Sophos;i="6.03,244,1694761200"; d="scan'208";a="931561560" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Oct 2023 20:13:00 -0700 From: "Huang, Ying" To: Qi Zheng Cc: akpm@linux-foundation.org, rppt@kernel.org, david@redhat.com, vbabka@suse.cz, mhocko@suse.com, willy@infradead.org, mgorman@techsingularity.net, mingo@kernel.org, aneesh.kumar@linux.ibm.com, hannes@cmpxchg.org, osalvador@suse.de, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v3 2/2] mm: memory_hotplug: drop memoryless node from fallback lists In-Reply-To: (Qi Zheng's message of "Mon, 23 Oct 2023 10:53:08 +0800") References: <9f1dbe7ee1301c7163b2770e32954ff5e3ecf2c4.1697711415.git.zhengqi.arch@bytedance.com> <87bkctg4f4.fsf@yhuang6-desk2.ccr.corp.intel.com> <4bfa007c-a20f-9e68-4a9f-935dacf43222@bytedance.com> <8734y2f868.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Mon, 23 Oct 2023 11:10:58 +0800 Message-ID: <87pm16doe5.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Stat-Signature: heogzixpi4c9e47itzsf8bpkncro35fp X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 39FA8C0003 X-Rspam-User: X-HE-Tag: 1698030786-138424 X-HE-Meta: U2FsdGVkX1/JPvSxjgNJ1UfgIy7ZSEiCP26Uv4Fgt9NymYDzZrztbD0SYLjceaGStBV/DsXq6m6ERKussKNKwW8/aDwg4q6zjacKrQy5M31Gu4GhjHfEAvLqkLlEhHIzFiglgV3StYLVooyET2vQoM1oJ5xyWsxikUm7ay6Wd7bazo3gvUrgylo1cNEuwM8Syjo9TShoOiYzcSo3bEo9mLjX3v2nW6NhQ3lG6chUkOUjuxjB95k/tLTTl3C8bJNg7FAWcGbyJV1/r3fKUosMtILipknBMogsAYmXVRE+sZfeNtZj6HSCtRDa2rWeDIo7DYjkUicnbQDdgKk6v7J/HTtvhrnYx42k29veUOhn00EBUpxhpiTfZEYyGPW9O+ZDjxKzQhWEwg8jdjEFgKa8kK/c/5bak47iw3+Sg/E3dmIV9Pd5q8sFeYOSKUFoJmx91kXsOSizJ6YMyiZpdiFTXQs42et3f+MOSzZmcjwoyjN8CUvKV9UjeqlPDna5J6EQGUGuSiXhYBtorNOz8wv4J7x/gFoXaJVQjLpfIZFTkKH1Tbq0I6h9Nwmom0gdTM5+B7ZI6HJmtXkQePF9ER8JtHye+YuZYKPCvhKv0lEKgI+nfAUdp2zVmatE9B1TOxndrucSEiE/E246KXYr+22YCuIvNOy29/8IW8Tk6XXFcUcPrt/A7QqMF+b2fGC3KmlGTzi/GmnxBDSTe/oYcmDlLahyuZ5N2mpweGrJC9QAyTwGURzoQYfzkfbdyxsb3NkUmCAxgK8QyvCPfTQ/sVwe+oKaYOu0ARlExI7gf2r5Rssov4VMx27V68gLRFF94ZIdn4euQpygFiXVlN9JsqEnRE5SxACBmmyrvy5X1i6/pRu/GA/1C30I1CgjDJtpD9QGidI5pUcfY16dcAQkFpoA+H+/ObQP1IjnVBE4j5JjxDwgLZj17YaRxcNNbzxeJM6BA3yqClpkLLkwq9/r42y 0poH2ich 1MkrB+eWb2tGpaCUEVVYGKm4h80pw2xn1gXCM1O8IuDCQbJ+bvLvUvK8hmE35Xopa2FcheCfFjYkGtnsq25ZbfUW+jqL2Khx6utCjSv9sSYnNtyuWqFtv7+cxwGIBk/t5Y2IGOXA5gen2lu0E4Z6NyDxYwIR3T26UPGxARdtv4cciQsicNafANVSzMyYfYshSQUXfErJZfPOFAV+7Ap5ICA0WorlTcHyIR3B1W1QGAX8dVzQnyL6B1MEHMLlCDZirFDZucwg+EkPqY/IQn/55nCKFsBuvo++fMGYVtIS7q8jfoFFslgpQfFLM/zXqIAn1ciK6ginFKABM5DSS+SvwcDilNQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Qi Zheng writes: > Hi Ying, > > On 2023/10/23 09:18, Huang, Ying wrote: >> Qi Zheng writes: >> >>> Hi Ying, >>> >>> On 2023/10/20 15:05, Huang, Ying wrote: >>>> Qi Zheng writes: >>>> >>>>> In offline_pages(), if a node becomes memoryless, we >>>>> will clear its N_MEMORY state by calling node_states_clear_node(). >>>>> But we do this after rebuilding the zonelists by calling >>>>> build_all_zonelists(), which will cause this memoryless node to >>>>> still be in the fallback list of other nodes. >>>> For fallback list, do you mean pgdat->node_zonelists[]? If so, in >>>> build_all_zonelists >>>> __build_all_zonelists >>>> build_zonelists >>>> build_zonelists_in_node_order >>>> build_zonerefs_node >>>> populated_zone() will be checked before adding zone into zonelist. >>>> So, IIUC, we will not try to allocate from the memory less node. >>> >>> Normally yes, but if it is the weird topology mentioned in [1], it's >>> possible to allocate memory from it, it is a memoryless node, but it >>> also has memory. >>> >>> In addition to the above case, I think it's reasonable to remove >>> memory less node from node_order[] in advance. In this way it will >>> not to be traversed in build_zonelists_in_node_order(). >>> >>> [1]. https://lore.kernel.org/all/20230212110305.93670-1-zhengqi.arch@bytedance.com/ >> Got it! Thank you for information. I think that it may be good to >> include this in the patch description to avoid potential confusing in >> the future. > > OK, maybe the commit message can be changed to the following: > > ``` > In offline_pages(), if a node becomes memoryless, we > will clear its N_MEMORY state by calling node_states_clear_node(). > But we do this after rebuilding the zonelists by calling > build_all_zonelists(), which will cause this memoryless node to > still be in the fallback nodes (node_order[]) of other nodes. > > To drop memoryless nodes from fallback nodes in this case, just > call node_states_clear_node() before calling build_all_zonelists(). > > In this way, we will not try to allocate pages from memoryless > node0, then the panic mentioned in [1] will also be fixed. Even though > this problem has been solved by dropping the NODE_MIN_SIZE constrain > in x86 [2], it would be better to fix it in the core MM as well. > > [1]. https://lore.kernel.org/all/20230212110305.93670-1-zhengqi.arch@bytedance.com/ > [2]. https://lore.kernel.org/all/20231017062215.171670-1-rppt@kernel.org/ > > ``` This is helpful. Thanks! -- Best Regards, Huang, Ying > Thanks, > Qi > >> -- >> Best Regards, >> Huang, Ying >> >>> Thanks, >>> Qi >>> >>> >>>> -- >>>> Best Regards, >>>> Huang, Ying >>>> >>>>> This will incur >>>>> some runtime overhead. >>>>> >>>>> To drop memoryless node from fallback lists in this case, just >>>>> call node_states_clear_node() before calling build_all_zonelists(). >>>>> >>>>> Signed-off-by: Qi Zheng >>>>> Acked-by: David Hildenbrand >>>> [snip] >>>> -- >>>> Best Regards, >>>> Huang, Ying