From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3CC7C433F5 for ; Thu, 18 Nov 2021 08:35:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9351861875 for ; Thu, 18 Nov 2021 08:35:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 9351861875 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 2CF5F6B0072; Thu, 18 Nov 2021 03:35:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 27FD76B0073; Thu, 18 Nov 2021 03:35:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1477E6B0074; Thu, 18 Nov 2021 03:35:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0060.hostedemail.com [216.40.44.60]) by kanga.kvack.org (Postfix) with ESMTP id 061556B0072 for ; Thu, 18 Nov 2021 03:35:43 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id B338E84A12 for ; Thu, 18 Nov 2021 08:35:32 +0000 (UTC) X-FDA: 78821392104.17.BF9FE83 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf22.hostedemail.com (Postfix) with ESMTP id D8F48191F for ; Thu, 18 Nov 2021 08:35:31 +0000 (UTC) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id B434A1FD35; Thu, 18 Nov 2021 08:35:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1637224530; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=FD94QdcVS8s3Ogz1ge3REQoKg7gxIQqOlza6TNdJrPI=; b=OA7wgulOUrcOv5qwO4a+RjaWY4uDPt8N9xTjIxVuH5my7tyscaTqk+OW5XFq64bO3iDN7z YCLua1H3RKk2WEzm3KciLrWy2q6vHniqoblJjZkF8aAAdusJizLjTbZx0bl1AeucaqdnN9 GYeQhQNDBFLatH/ffpq17Y5LQ5MBJPY= Received: from suse.cz (unknown [10.100.201.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 803BAA3B83; Thu, 18 Nov 2021 08:35:30 +0000 (UTC) Date: Thu, 18 Nov 2021 09:35:30 +0100 From: Michal Hocko To: Alexey Makhalov Cc: Dennis Zhou , Eric Dumazet , "linux-mm@kvack.org" , Andrew Morton , David Hildenbrand , Oscar Salvador , Tejun Heo , Christoph Lameter , "linux-kernel@vger.kernel.org" , "stable@vger.kernel.org" Subject: Re: [PATCH v3] mm: fix panic in __alloc_pages Message-ID: References: <908909e0-4815-b580-7ff5-d824d36a141c@redhat.com> <20211108202325.20304-1-amakhalov@vmware.com> <2e191db3-286f-90c6-bf96-3f89891e9926@gmail.com> <5239D699-523C-4F0C-923A-B068E476043E@vmware.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5239D699-523C-4F0C-923A-B068E476043E@vmware.com> X-Rspamd-Queue-Id: D8F48191F X-Stat-Signature: 3iekjekyhyz4wj769kjziibhfap6c3o6 Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=OA7wgulO; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf22.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com X-Rspamd-Server: rspam02 X-HE-Tag: 1637224531-852681 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 16-11-21 20:22:49, Alexey Makhalov wrote: > > > > On Nov 16, 2021, at 1:17 AM, Michal Hocko wrote: > > > > On Tue 16-11-21 01:31:44, Alexey Makhalov wrote: > > [...] > >> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c > >> index 6737b1cbf..bbc1a70d5 100644 > >> --- a/drivers/acpi/acpi_processor.c > >> +++ b/drivers/acpi/acpi_processor.c > >> @@ -200,6 +200,10 @@ static int acpi_processor_hotadd_init(struct acpi_processor *pr) > >> * gets online for the first time. > >> */ > >> pr_info("CPU%d has been hot-added\n", pr->id); > >> + { > >> + int nid = cpu_to_node(pr->id); > >> + printk("%s:%d cpu %d, node %d, online %d, ndata %p\n", __FUNCTION__, __LINE__, pr->id, nid, node_online(nid), NODE_DATA(nid)); > >> + } > >> pr->flags.need_hotplug_init = 1; > > > > OK, IIUC you are adding a processor which is outside of > > possible_cpu_mask and that means that the node is not allocated for such > > a future to be hotplugged cpu and its memory node. init_cpu_to_node > > would have done that initialization otherwise. > It is not correct. > > possible_cpus is 128 for this VM. Look at SRAT and percpu output for proof. > [ 0.085524] SRAT: PXM 127 -> APIC 0xfe -> Node 127 > [ 0.118928] setup_percpu: NR_CPUS:128 nr_cpumask_bits:128 nr_cpu_ids:128 nr_node_ids:128 OK, I see. I have missed that when looking at the boot log you have sent. > It is impossible to add processor outside of possible_cpu_mask. possible_cpus is absolute maximum > that system can support. See Documentation/core-api/cpu_hotplug.rst That was my understanding hence the suspicion you might be doing something that is not really supported. > Number of present and onlined CPUs (and nodes) is 4. Other 124 CPUs (and nodes) are not present, but can > be potentially hot added. Yes this is a configuration I have already seen. The cpu->node binding was configured during the boot time though IIRC. > Number of initialized nodes is 4, as init_cpu_to_node() will skip not yet present nodes, > see arch/x86/mm/numa.c:798 (numa_cpu_node(CPU #4) == NUMA_NO_NODE) Isn't this the problem? Why is the cpu->node association missing here? > 788 void __init init_cpu_to_node(void) > 789 { > 790 int cpu; > 791 u16 *cpu_to_apicid = early_per_cpu_ptr(x86_cpu_to_apicid); > 792 > 793 BUG_ON(cpu_to_apicid == NULL); > 794 > 795 for_each_possible_cpu(cpu) { > 796 int node = numa_cpu_node(cpu); > 797 > 798 if (node == NUMA_NO_NODE) > 799 continue; > 800 > > After CPU (and node) hot plug: > - CPU 4 is marker as present, but not yet online > - New node got ID 4. numa_cpu_node(CPU #4) returns 4 > - node_online(4) == 0 and NODE_DATA(4) == NULL, but it will be accessed inside > for_each_possible_cpu loop in percpu allocation. > > Digging further. > Even if x86/CPU hot add maintainers decide to clean up memoryless node hot add code to initialize the node on time of > attaching it (to be aligned with mm node while memory hot add), this percpu fix is still needed as it is used during > the node onlining, See chicken and egg problem that I described above. I have to say I do not see the chicken and egg problem. As long as init_cpu_to_node initializes the memoryless node for the cpu properly then the pcp allocator doesn't really have to care as the page allocator falls back to to first populated node in a distance order. So I believe the whole issue boils down to addressing why init_cpu_to_node doesn't see a proper cpu->node association. > Or as 2nd option, numa_cpu_node(4) should return NUMA_NO_NODE until node 4 get fully initialized. -- Michal Hocko SUSE Labs