From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3F6BEB64D7 for ; Fri, 16 Jun 2023 08:28:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 296748E0002; Fri, 16 Jun 2023 04:28:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 246866B0075; Fri, 16 Jun 2023 04:28:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 10F038E0002; Fri, 16 Jun 2023 04:28:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 01AF76B0074 for ; Fri, 16 Jun 2023 04:28:46 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id C192B120BF9 for ; Fri, 16 Jun 2023 08:28:46 +0000 (UTC) X-FDA: 80907935052.18.C707B71 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf18.hostedemail.com (Postfix) with ESMTP id D0DDA1C000E for ; Fri, 16 Jun 2023 08:28:44 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=shopee.com header.s=shopee.com header.b=RqRoDM6U; spf=pass (imf18.hostedemail.com: domain of haifeng.xu@shopee.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=haifeng.xu@shopee.com; dmarc=pass (policy=reject) header.from=shopee.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686904125; a=rsa-sha256; cv=none; b=YVfHm17fyi3DVWVynclcjZ6r5uNGg2j7Rb3iKVmz8lbV+k1mrxnSRFdvcTxJB55PnrNDrj H7hhXNRaogmUy+MTGi628M2PWUobHLZTz1bNvFVWmokFHBOvuw9A6VGuG0w/dzNxkSRPk5 g+9+uMedlUQhUlq63yzrSeAaVoobwNg= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=shopee.com header.s=shopee.com header.b=RqRoDM6U; spf=pass (imf18.hostedemail.com: domain of haifeng.xu@shopee.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=haifeng.xu@shopee.com; dmarc=pass (policy=reject) header.from=shopee.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686904125; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oDUVlxr6MIHuvrj4qFPZKLZUZbfg4Er9rrj+u7M45oY=; b=eVx/4Z8Gbx1OqbSbMPVoragM8So3ycYF/LxVMX6WnyenjverqdqalOwdNNOI8hSI1iuXe8 A+bq0sX4747YAhnaGExE5n2eQQ76ya0cSXqk6XwxX+zkuXejSONItU58CmyBfS0PnQeQkW M8Y3fnfMyNGyQPnsc4hrC9iiuu6u3oI= Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-1b512309c86so4022715ad.1 for ; Fri, 16 Jun 2023 01:28:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shopee.com; s=shopee.com; t=1686904123; x=1689496123; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=oDUVlxr6MIHuvrj4qFPZKLZUZbfg4Er9rrj+u7M45oY=; b=RqRoDM6U9yZrXYYLMxuPkioIiUXPKKv4f1voJfglrx856kDWQ4InU3T953oGyFvEx1 VCtzXMmCfz2+nWM1hHGDbWfIyg+YohWISPrJ5C+isy1rxB3x9zqYAvfFMek7vUt6E5S7 +xQ53ddcA0Eg7evsHfFGzfTvHPys0n4GIOGNfx7/l1est+VX3BIJwFflM2GRNFBa+Ahf NaGx4nl3WpfAIN3qCg4iI8BwXen7OGXZAm6fFB0SUQcRBBy3GOVkWtQXAlZGsqJ1EbAr nIFds4nKrdg9MSVAYMPWte1TiPY7Pwd8QwhS56xYzDSu8T+hJAv+A706RnxIh9M4W+NK DtiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686904123; x=1689496123; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=oDUVlxr6MIHuvrj4qFPZKLZUZbfg4Er9rrj+u7M45oY=; b=iodJuVsovb347EBnW8IKL8plSea147Qi/8be+ajjiJgHsrRT9qKWKQ7EMF9KkdxWYh 61mWAWIHQyJTsQ3MvqGGS07G/rCT1fa0oV8A/Hliz7hCYq/Bc/LJ0sDQAu2NzYp6whg0 iLIJuM2QGTQJBuZfbnTVcwsWv/CEaUIguJgPyqTLIKXr7p+OYReeylXbrmMjZfV0k0Kb LB52jwSkdrraHcqbbiXVu5Ap3AuXzpStvv1kf9wmTkjySf9sqrfor2E6jAzBnSmlfSJg JJ/Nn0NNuNhluwv4wQosSKnOnoI677GkUkr3XXLIZzptrqxXummSpwPCpSwW6r8Q+J6O 2C+A== X-Gm-Message-State: AC+VfDzq5iLefkjZEw3M0+jDyQb1rMgMxbl3lJFsWvlAVz5ghdLZJDNz yPK7K6PGItGKZQOh+B8q0uRQhw== X-Google-Smtp-Source: ACHHUZ4KH7TA4LKpFW6ObfZvJDADk1NeKn28oSeId8Ezlgfso5K4plJhA7iHfjiDF40xU3dWhpUmCA== X-Received: by 2002:a17:903:2310:b0:1b3:bf70:4ed4 with SMTP id d16-20020a170903231000b001b3bf704ed4mr10285440plh.23.1686904123425; Fri, 16 Jun 2023 01:28:43 -0700 (PDT) Received: from [10.54.24.10] (static-ip-148-99-134-202.rev.dyxnet.com. [202.134.99.148]) by smtp.gmail.com with ESMTPSA id jf3-20020a170903268300b001b24857d1f2sm6625845plb.188.2023.06.16.01.28.40 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 16 Jun 2023 01:28:43 -0700 (PDT) Message-ID: <47119364-30ac-cb57-7fd8-d9aa4b230478@shopee.com> Date: Fri, 16 Jun 2023 16:28:38 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.11.2 Subject: Re: [PATCH 1/2] mm/memcontrol: do not tweak node in mem_cgroup_init() To: Michal Hocko Cc: roman.gushchin@linux.dev, hannes@cmpxchg.org, shakeelb@google.com, akpm@linux-foundation.org, cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20230615073226.1343-1-haifeng.xu@shopee.com> From: Haifeng Xu In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: D0DDA1C000E X-Stat-Signature: 3kuoabe1jfuxj3gez63y7kxmrbnrnmww X-Rspam-User: X-HE-Tag: 1686904124-977612 X-HE-Meta: U2FsdGVkX18pwQlfJzoScXFy7YiWJOFc9Ev5JLOgVQLQXHWeEL8WhleNd2NAScly2xqsV1GFe1MhAeqjsRD4xG3YTkzq8PZ2iersE009rYcsGNRJvqrXg69pP+0nXGU5ZcO0aElZYERzqzBu6/TWctRD3c2q39ZtivH8384tuYeyhtMkCfAKJrOc2wG8y7DDg5Ub2D7fwyEomz1Akj9KU6QV69+9QehmJhI0QgnLHreyCMcLI7LbNFTwLYpew1kxF4Q7cITOVIYT9E+Ixe2Bu0hbkRWmM2fm+k8tfkh77jm7gTAGrSmS8TaxX3yUvH/G3JC4sLHb7zVqgdKGsR8LgyLwmsifVsiqlRV9/HeHc5QeXjJ7t0cPLHR1dW9HNyl96ICE9bgVedm+vr9W1VhAGsZA6GdiQIJf303ZgOON51tf81yC6mkICgEtwG0E8ixTymHZ3GNvO+zkID1/axQqrdigJfCAeHjRP2Xdbi2BCB/WVEgE5JApQLU9Hm9O40pXm2McnUwPPHgjIK5PzorV574llDxOrYmAYSeqpswVwIZqGAj+d9UjizOBegAuUU1HHKEC2eJ/oWYUUgxp5HuybgjSE/G/ApgdZdbzasax9TfCe0HNyirDZhSbiUmT+65nEuY+cKqsa2fxaoofztvOouqBj63mbTj2szZ/ZMJGWM4LJDJnNo341Us8Srv8hM5BBWkGKtekKs42RcQJ05a6wGse8ySn5qx4wV723/Wv3RLyfMMIJXrJoEVO4tJjUx6u/8yqUGl+IFggpLaQ7e49Rrfy2Wkx/PeI0+Esp1bXf4adnt/E/Hys3VKb+JZJ8rcDooHDxdh2LMUQNLg6wWiq2TpfYyA+ZQlVmefye5EUY/kw2NDcnSFFGECbM6Wd7RSKbLvRpmJ+XDt/YPeO1oggUzedvVn0cI/X90+xs6S0hoCPX+VOA+8vvmJIKJ03pp9CQ+rT01H6IQeVg4P1FvX 0NO+ycfs VZlRmetP7CdS09a+FKhACysO6q3BxOgyEIJ9irQsTwcCmbFDLSmB4bxia/lNdK+F7MobJhNyoBk7V6eP9BYyP2kVQ8oQ3Hn5zxW8Jyun0a2D4j83QyK1YmVBBwF6xEBSsAtcHzQbrZ9buksFQIAHQNZBYBaA8Brf7z0DlQNy9stzC6Gc7jRVwz4fh+nSddFJX0LtllW8xHDhbbiNuMPV86i84bj0cdnLOleU3hWXVPB+tS+zCO9tNEEMTkT4xN9iVZGmc5rqbMv5e0YNfEIBnLV7JRhSlPb+zLJac7SnueRUuegeI4g3GKMElC+AfzGt4kbkhp3hp15Ukdhjm80U0F9oGXtzbhm9DEhUX0HjcHBO5sbsnM7BtJ6h43Y5AywB9pLr5nRVPeyVOxFT1h5eEJMAEc0/o9OJTBFibgOGINyQJ79lIHnmrlCCKPdKosnIth/xzy/wFBPrdCQU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2023/6/15 16:14, Michal Hocko wrote: > On Thu 15-06-23 07:32:25, Haifeng Xu wrote: >> mem_cgroup_init() request for allocations from each possible node, and >> it's used to be a problem because NODE_DATA is not allocated for offline >> node. Things have already changed since commit 09f49dca570a9 ("mm: handle >> uninitialized numa nodes gracefully"), so it's unnecessary to check for >> !node_online nodes here. > > How have you tested this patch? Start with one empty node: qemu-system-x86_64 \ -kernel vmlinux \ -initrd full.rootfs.cpio.gz \ -append "console=ttyS0,115200 root=/dev/ram0 nokaslr earlyprintk=serial oops=panic panic_on_warn" \ -drive format=qcow2,file=vm_disk.qcow2,media=disk,if=ide \ -enable-kvm \ -cpu host \ -m 8G,slots=2,maxmem=16G \ -smp cores=4,threads=1,sockets=2 \ -object memory-backend-ram,id=mem0,size=4G \ -object memory-backend-ram,id=mem1,size=4G \ -numa node,memdev=mem0,cpus=0-3,nodeid=0 \ -numa node,memdev=mem1,cpus=4-7,nodeid=1 \ -numa node,nodeid=2 \ -net nic,model=virtio,macaddr=52:54:00:12:34:58 \ -net user \ -nographic \ -rtc base=localtime \ -gdb tcp::6000 Guest state when booting: [ 0.048881] NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xbfffffff] -> [mem 0x00000000-0xbfffffff] [ 0.050489] NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x100000000-0x13fffffff] -> [mem 0x00000000-0x13fffffff] [ 0.052173] NODE_DATA(0) allocated [mem 0x13fffc000-0x13fffffff] [ 0.053164] NODE_DATA(1) allocated [mem 0x23fffa000-0x23fffdfff] [ 0.054187] Zone ranges: [ 0.054587] DMA [mem 0x0000000000001000-0x0000000000ffffff] [ 0.055551] DMA32 [mem 0x0000000001000000-0x00000000ffffffff] [ 0.056515] Normal [mem 0x0000000100000000-0x000000023fffffff] [ 0.057484] Movable zone start for each node [ 0.058149] Early memory node ranges [ 0.058705] node 0: [mem 0x0000000000001000-0x000000000009efff] [ 0.059679] node 0: [mem 0x0000000000100000-0x00000000bffdffff] [ 0.060659] node 0: [mem 0x0000000100000000-0x000000013fffffff] [ 0.061649] node 1: [mem 0x0000000140000000-0x000000023fffffff] [ 0.062638] Initmem setup node 0 [mem 0x0000000000001000-0x000000013fffffff] [ 0.063745] Initmem setup node 1 [mem 0x0000000140000000-0x000000023fffffff] [ 0.064855] DMA zone: 158 reserved pages exceeds freesize 0 [ 0.065746] Initializing node 2 as memoryless [ 0.066437] Initmem setup node 2 as memoryless [ 0.067132] DMA zone: 158 reserved pages exceeds freesize 0 [ 0.068037] On node 0, zone DMA: 1 pages in unavailable ranges [ 0.068265] On node 0, zone DMA: 97 pages in unavailable ranges [ 0.124755] On node 0, zone Normal: 32 pages in unavailable ranges cat /sys/devices/system/node/online 0-1 cat /sys/devices/system/node/possible 0-2 In addition, I add a debug meesage: diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 7ebf64e48b25..3d786281377d 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -7424,7 +7424,7 @@ static int __init mem_cgroup_init(void) rtpn = kzalloc_node(sizeof(*rtpn), GFP_KERNEL, node); if (!rtpn) continue; - + pr_info("allocate rtpn node %d.\n", node); rtpn->rb_root = RB_ROOT; rtpn->rb_rightmost = NULL; spin_lock_init(&rtpn->lock); [ 0.561420] allocate rtpn node 0. [ 0.562324] allocate rtpn node 1. [ 0.563322] allocate rtpn node 2. > > I am not saying it is wrong and it looks like the right thing to do. But > the early init code has proven to be more subtle than expected so it is > definitely good to know that this has been tested on memory less setup > and passed. > >> Signed-off-by: Haifeng Xu >> --- >> mm/memcontrol.c | 3 +-- >> 1 file changed, 1 insertion(+), 2 deletions(-) >> >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index 4b27e245a055..c73c5fb33f65 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -7421,8 +7421,7 @@ static int __init mem_cgroup_init(void) >> for_each_node(node) { >> struct mem_cgroup_tree_per_node *rtpn; >> >> - rtpn = kzalloc_node(sizeof(*rtpn), GFP_KERNEL, >> - node_online(node) ? node : NUMA_NO_NODE); >> + rtpn = kzalloc_node(sizeof(*rtpn), GFP_KERNEL, node); >> >> rtpn->rb_root = RB_ROOT; >> rtpn->rb_rightmost = NULL; >> -- >> 2.25.1 >