From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AAAC5C83F1A for ; Fri, 11 Jul 2025 16:22:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 45C716B008A; Fri, 11 Jul 2025 12:22:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 40D396B00A1; Fri, 11 Jul 2025 12:22:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2D5126B00A3; Fri, 11 Jul 2025 12:22:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 174B06B008A for ; Fri, 11 Jul 2025 12:22:52 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id ABE5A80C80 for ; Fri, 11 Jul 2025 16:22:51 +0000 (UTC) X-FDA: 83652502542.12.56FFDE7 Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com [209.85.128.66]) by imf29.hostedemail.com (Postfix) with ESMTP id 931B5120008 for ; Fri, 11 Jul 2025 16:22:49 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=neon.tech header.s=google header.b=ruo+s0q3; dmarc=pass (policy=reject) header.from=neon.tech; spf=pass (imf29.hostedemail.com: domain of sharnoff@neon.tech designates 209.85.128.66 as permitted sender) smtp.mailfrom=sharnoff@neon.tech ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752250969; a=rsa-sha256; cv=none; b=Ys+un+se7RAqOxF/QKoC+iR5sDY4Tflgbbsq6ymX0mFCr38uzPp4wQu8zIb/GWsWSxeuqc gtQ0seqbA/hfyhsAl20XhsnPECr8iHbbNi32Ms8Np+RuKdlyYTfx7IlHhHZpza4bNUxsKm Z6dC//hOBUwPmiWw3XWV93cZb9Ltz68= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=neon.tech header.s=google header.b=ruo+s0q3; dmarc=pass (policy=reject) header.from=neon.tech; spf=pass (imf29.hostedemail.com: domain of sharnoff@neon.tech designates 209.85.128.66 as permitted sender) smtp.mailfrom=sharnoff@neon.tech ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752250969; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=p2y1Y/GJGb0njvYUDEFciSrzXxfCPZzSLs33lH/7dOE=; b=xoQK1SE8elsFHyZ5vBDEh4Q8u3PWO9oo28gXYRa1M+qKmweuUFF8nFf5r9gdt/F/54x3cD yph4MLAQu5uEOEoSUQ7SXWmLY9bbtnArgOupEw8o7pmWlC6oqieeaNmb4nC9EjMhER14fg RXrfD8msjkpt4WaQUpZd5WcMzql8QuU= Received: by mail-wm1-f66.google.com with SMTP id 5b1f17b1804b1-451d3f72391so22547855e9.3 for ; Fri, 11 Jul 2025 09:22:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=neon.tech; s=google; t=1752250968; x=1752855768; darn=kvack.org; h=content-transfer-encoding:subject:from:cc:to:content-language :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=p2y1Y/GJGb0njvYUDEFciSrzXxfCPZzSLs33lH/7dOE=; b=ruo+s0q34X2lrrgILjgbs62OG80z8RSOJsf6r8OdC0xrmaxHJeNuAi+GAbM8MYGYyl Z3PfURpdEaizoltpIO/Qf5t7J+FQW19h9lnRx+HXw/1iW9lU9T/qIoolrUHUImzmS2Y9 GdG4bOxVREw4HWeujf7lKSO1iJQnQIPvlp1CM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752250968; x=1752855768; h=content-transfer-encoding:subject:from:cc:to:content-language :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=p2y1Y/GJGb0njvYUDEFciSrzXxfCPZzSLs33lH/7dOE=; b=a58vdKrfS//R8wnfBtR/cR3qIKJ2MBCmxoKB9UNkWSnW4XzH7sPh5tF/TtAc/Elj1t Bxabyhymnf2alXkYPquYcaqwwQm75mYoGbV4NjwwVi/nngmW+okD97dG3YKV92TWUIg/ gIEytAaZvZe7eLHGN8HRb7s19GCybOahLMvUoixipxYpejpmC293nEzNKa7Py75puXjd kLVtQKGqN4wiHAjiCBvAinw8k4TY2YsitpHWmKQbee6SZrB+DXY4E/567Sq9y7THjpSw 3ztacQ78WMgz9/3UAFsAykvOvifdnnj8IiHqK8ClqldK9F52nwePdKZfYRQB76FFukcn nyDg== X-Forwarded-Encrypted: i=1; AJvYcCWh8eMY2gufMoXvM4ohCxwi+DpLbLpp8mxLmaWTfs+EmYGV57JH0mzmEMJhzPlwcB39Y9hWvwRS6A==@kvack.org X-Gm-Message-State: AOJu0YykW5CTOkYfWWqVIwSdZHCCYOqsBf/DbHYNp+v34FcMGhV4LNGQ YCQT1tWokeEYVonYA4MkO8E7imZtPgvoSQHFDLtGxaf8hSp1mkSO1iFcvgFGQRdnIPc= X-Gm-Gg: ASbGncvwHCmG4fPRwbK11n+5hlWn9fFSGi+yDz1b+sO7a2SIbn0JZnU2ODKNHuLyoOX dNAvlebnYUson0hDujfZkSAYGdM8BFkRlZFCUgoIN2z48g7t/XVImRkbgWebmeArMsD87dNjw+d 7B+6+w7cqKeqe1g842wRtRMnJ3jj0/rizWJRm3nHj2ARSCoJahzCZE15lJd0ZxTx6GdmeQsxOjP gPPGEz/D/pxiFmD9Yv6YK+7nt78cVqA+FlGfze2okejr4DuP8nZpsGwKFJZjb108SObdTxrpnB3 KJdoXx9lZgKOc8nnrkSHvS7jHCN1phva+XgKwCl7YUb3S+RpS2VlhUE3kqGr38nSqSnXAot+w7X zfppCNORUmfaTgTdgbb6/uSc6 X-Google-Smtp-Source: AGHT+IHrq1KxO4C9XkLwcbd5YmQer0iQoLTN8/f9ozINMiPvxSBkFT1/D2d2CFHQkihBBN4Zk1K6vQ== X-Received: by 2002:a5d:6f16:0:b0:3b4:58ff:ed1 with SMTP id ffacd0b85a97d-3b5f18f80e2mr3545065f8f.52.1752250967637; Fri, 11 Jul 2025 09:22:47 -0700 (PDT) Received: from [192.168.86.142] ([90.253.47.31]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3b5f16a6016sm2884419f8f.69.2025.07.11.09.22.46 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 11 Jul 2025 09:22:47 -0700 (PDT) Message-ID: <4fe0984f-74dc-45fe-b2b6-bdd81ec15bac@neon.tech> Date: Fri, 11 Jul 2025 17:22:46 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: linux-kernel@vger.kernel.org, x86@kernel.org, linux-mm@kvack.org Cc: Dave Hansen , Andy Lutomirski , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , "Edgecombe, Rick P" , Oleg Vasilev , Arthur Petukhovsky , Stefan Radig , Misha Sakhnov From: Em Sharnoff Subject: [PATCH v5 0/4] x86/mm: Improve alloc handling of phys_*_init() Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam11 X-Rspam-User: X-Stat-Signature: 7z4hodd1o4su5i6buezhg3erc6oxmjsr X-Rspamd-Queue-Id: 931B5120008 X-HE-Tag: 1752250969-963082 X-HE-Meta: U2FsdGVkX1+Y9pfdPbTDJdiyJUWYVvn+dclgCIoz0copDb6Qu7vky9lmVI1yQHFB1dYnZqoF8s9+slK4aKmiDe+JQ8ScVifD+y3vhtbfUPhx/pih9P825iHucDlNl/kDGfEqB7/gvzFbgAwMTcg3l6Nyw7wYfdKw2UJtKL2vqS9vX3Tu7bUHcHPUMAlffpYMDGsVWeybjdojSd7JrMn8g6DVTZocfKZBfJ3Bnzl5rj/seGdligiJjogdAeg1JrASje3/Lvx4ULqDFsZXmSF+Oup8M3BykTr+DREPSf/4nm3MYUK7ARvqhdrjlgfb36L5GK5nccollWCkWSBsGSL3FDeD4HTGlgvOFhLyS1PBTwGt8/I46zU6Aaz7vA2hJ4N9wFqhbnKLawkGPl8V6HVd81XPhc9fxn4LOLgAr9GuwX3mwIPqybf4zIIob7WhKNOB1He97tDW2bF16GAp9XQIGB437x+y6s1HR6HJRcjE6hOd1hyNm4GS1B8XgUPVulaT1u9kIPhpK8RNofI5QTS+gE81N1GcXTXRW+GZ1krXTVfe2gulEyZvPhAHh9jqlMMXNoaF6RPlcKua/ASFZ8VkMePMfAq4rfS8k8HWaHfFGn8ABy86lA/fuZMr4F1LldlVbJun88c0zbsL839TmZx2yDgRzGHxRoN+LTpAmYh8gf00QXVnjYbdu2oIKsg23myOTSyqTOTjKS+rEO5jX+irzNM1fzKpPtJgRJg7SNKh04Zg8wk+QQqBlhZ1XLmigJnCrxYvQyvz7mB/lvlTKWbSw3d2Kh5DzeulI6kQXZfYz7+5Udjw+d23bVV/vwztKU/ZWT4m+nuUZa2nk4wn18taIBDjepCnYfly4tTEejUhfbEBt735KzhPdDRl4eX8WWBg4i48HPw2cxSTp/AVOdtpgk6WYyEAtPo/AGi8/R1ozr5KlV9w6SKxt8Xcqoa9F2efN1pnMvk8FA2i2bLdFdg +PrpNdC9 v5nA436CvGBee6FhglQWE04Y92waavlgkwkRAFoWj2C07cTuKZv4ILqG3jkcPEGXRbd7/imVkEu2Vpti+43447bziPpEtLzHRcUrEPYxF9sBqhAQnYogEvyWgFmDAJYhoxCsYO1mlItSh1CBJwkpPWBROQmRPEhantocwJaFii3q8eFYyciAnR9QXqFv3Uo2hBQhLoSfcNiKd3hYATIB9uVQMQYAQmeC7KtAwDXFJ+HdGPLeIP5dI64UFE+G+gIXTdMpPkgQJNpGAXrzWsV/nazPwVI+RWNUQOD4KJhfOB6uF9E8lBEjMyoUwxAxmj9hwDoXEdOmF00AwLzhnMl8pWOrTFcexp0oxn7LGO9LX6ByfApBNEp38/qCvCb1AJTYo5EJAzCNUoy1vgj9S+8jz2iZHlJVaJG4uqTLNK6VOysu8AuF4y5B7obUCleE8NO53839h+w9gQpOwAPyENq9JIrn0x8K5m8IIJ2lBRUZDEKf7D5aKD/+r0zHDDDRoM8Mx/8xayXhaEH/7Q4qVjJQi8DYyc6Sb5kyzXe9m X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi folks, See changelog + more context below. tl;dr: * Currently alloc_low_page() uses GFP_ATOMIC after boot, which may fail * Those failures aren't currently handled by phys_pud_init() and similar functions. * Those failures can happen during memory hotplug So: 1. Add handling for those allocation failures (patches 1-3) 2. Use GFP_KERNEL instead of GFP_ATOMIC (patch 4) Previous version here: https://lore.kernel.org/all/7d0d307d-71eb-4913-8023-bccc7a8a4a3d@neon.tech/ === Changelog === v2: - Switch from special-casing zero values to ERR_PTR() - Add patch to move from GFP_ATOMIC -> GFP_KERNEL - Move commentary out of the patch message and into this cover letter v3: - Fix -Wint-conversion issues v4: - new patch: move 'paddr_last' usage into phys_{pud,pmd}_init() so the return from those functions is no longer needed. - new patch: make phys_*_init() and their callers return int v5: - resend; bumped base commit. I'm not sure if patch 2/4 ("Allow error returns ...") should be separate from patch 3/4 ("Handle alloc failure ..."), but it's easy enough to combine them if need be. === Background === We recently started observing these null pointer dereferences happening in practice (albeit quite rarely), triggered by allocation failures during virtio-mem hotplug. We use virtio-mem quite heavily - adding/removing memory based on resource usage of customer workloads across a fleet of VMs - so it's somewhat expected that we have occasional allocation failures here, if we run out of memory before hotplug takes place. We started seeing this bug after upgrading from 6.6.64 to 6.12.26, but there didn't appear to be relevant changes in the codepaths involved, so we figured the upgrade was triggering a latent issue. The possibility for this issue was also pointed out a while back: > For alloc_low_pages(), I noticed the callers don’t check for allocation > failure. I'm a little surprised that there haven't been reports of the > allocation failing, because these operations could result in a lot more > pages getting allocated way past boot, and failure causes a NULL > pointer dereference. https://lore.kernel.org/all/5aee7bcdf49b1c6b8ee902dd2abd9220169c694b.camel@intel.com/ For completeness, here's an example stack trace we saw (on 6.12.26): BUG: kernel NULL pointer dereference, address: 0000000000000000 .... Call Trace: phys_pud_init+0xa0/0x390 phys_p4d_init+0x93/0x330 __kernel_physical_mapping_init+0xa1/0x370 kernel_physical_mapping_init+0xf/0x20 init_memory_mapping+0x1fa/0x430 arch_add_memory+0x2b/0x50 add_memory_resource+0xe6/0x260 add_memory_driver_managed+0x78/0xc0 virtio_mem_add_memory+0x46/0xc0 virtio_mem_sbm_plug_and_add_mb+0xa3/0x160 virtio_mem_run_wq+0x1035/0x16c0 process_one_work+0x17a/0x3c0 worker_thread+0x2c5/0x3f0 ? _raw_spin_unlock_irqrestore+0x9/0x30 ? __pfx_worker_thread+0x10/0x10 kthread+0xdc/0x110 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x35/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 and the allocation failure preceding it: kworker/0:2: page allocation failure: order:0, mode:0x920(GFP_ATOMIC|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0 ... Call Trace: dump_stack_lvl+0x5b/0x70 dump_stack+0x10/0x20 warn_alloc+0x103/0x180 __alloc_pages_slowpath.constprop.0+0x738/0xf30 __alloc_pages_noprof+0x1e9/0x340 alloc_pages_mpol_noprof+0x47/0x100 alloc_pages_noprof+0x4b/0x80 get_free_pages_noprof+0xc/0x40 alloc_low_pages+0xc2/0x150 phys_pud_init+0x82/0x390 ... (everything from phys_pud_init and below was the same) There's some additional context in a github issue we opened on our side: https://github.com/neondatabase/autoscaling/issues/1391 === Reproducing / Testing === I was able to partially reproduce the original issue we saw by modifying phys_pud_init() to simulate alloc_low_page() returning null after boot, and then doing memory hotplug to trigger the "failure". Something roughly like: - pmd = alloc_low_page(); + if (!after_bootmem) + pmd = alloc_low_page(); + else + pmd = 0; To test recovery, I also tried simulating just one alloc_low_page() failure after boot. This change seemed to handle it at a basic level (virito-mem hotplug succeeded with the right amount, after retrying), but I didn't dig further. We have also been running this in our production environment, where we have observed that it fixes the issue. Em Sharnoff (4): x86/mm: Update mapped addresses in phys_{pmd,pud}_init() x86/mm: Allow error returns from phys_*_init() x86/mm: Handle alloc failure in phys_*_init() x86/mm: Use GFP_KERNEL for alloc_low_pages() after boot arch/x86/include/asm/pgtable.h | 3 +- arch/x86/mm/init.c | 29 ++++++--- arch/x86/mm/init_32.c | 6 +- arch/x86/mm/init_64.c | 116 ++++++++++++++++++++++----------- arch/x86/mm/mem_encrypt_amd.c | 8 ++- arch/x86/mm/mm_internal.h | 13 ++-- 6 files changed, 113 insertions(+), 62 deletions(-) base-commit: e04c78d86a9699d136910cfc0bdcf01087e3267e -- 2.39.5