From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6DE6DC02198 for ; Sun, 9 Feb 2025 00:14:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7F88E6B007B; Sat, 8 Feb 2025 19:14:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 780F86B0082; Sat, 8 Feb 2025 19:14:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5FADD6B0083; Sat, 8 Feb 2025 19:14:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 3EBAB6B007B for ; Sat, 8 Feb 2025 19:14:21 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 9C61FA03E0 for ; Sun, 9 Feb 2025 00:14:20 +0000 (UTC) X-FDA: 83098484280.05.614614E Received: from mail-qt1-f169.google.com (mail-qt1-f169.google.com [209.85.160.169]) by imf01.hostedemail.com (Postfix) with ESMTP id BAD9940006 for ; Sun, 9 Feb 2025 00:14:18 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=uWvQvpm2; spf=pass (imf01.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.169 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739060058; a=rsa-sha256; cv=none; b=ceZ2sj/mfuAcsmw0lwfwgwDMWPHus4Rhu5NjHFnBJDbicrfhJJPz1scpJ3R8f6/I1uHBJs 5FIBYFuTIN4m/k7K19b47OzfL0BGpncPh/IKiZAwxMb8VzJ9Bems8BAwGuvtqHkyiQDKJa sE9u5PqJIMsD3SCxrMWTZVgvuVpcjy8= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=uWvQvpm2; spf=pass (imf01.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.169 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739060058; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zHcRKHoHeO3yYwQrJ0rL+RTEHK8K10d3+YiRJ46SQRc=; b=d9hkYQlIjRUQyXMde7Iv2HoJvoFl9i9ghzJ8JrZAcA03CBPKOyaIHnAl3nO5C40UIyKJrB 3VyltTTpzlAAlGNqiZRZEU6FBXzOEHR1WhJUyj8J9cMEWXyJg8Q31fuxsXVPKysLgdav7j 0DsBsBWXeM9UtyHZjFYJhoVHD4uLstc= Received: by mail-qt1-f169.google.com with SMTP id d75a77b69052e-4678afeb133so41570211cf.0 for ; Sat, 08 Feb 2025 16:14:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen-com.20230601.gappssmtp.com; s=20230601; t=1739060058; x=1739664858; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=zHcRKHoHeO3yYwQrJ0rL+RTEHK8K10d3+YiRJ46SQRc=; b=uWvQvpm2VKQkoZiH0b03ntBxHFgGOPo/VjpNcuEMlp2YQ2QpI0Ssfe7IuGUd6tWlwq 4CsmQs1sJBTmBIyN6TyEesYavjnyep3etYsSx9M8jyL3L50Wg7vI7iZh5ZxIx2SZ/O/+ LMb5kH5lmtfu0noKmipsnbuGTuvBpsq/tEB6z6HoFfnkdWdnvJlHILCXAsvVT+Xc+dyY 86XCm2s8tTF8VfGw7V4av90yq5CU8M2mpjkm25DpxQrP6+cfvUUZadr2ejJEu+C2Vfck AE/pLip8WMXiknJ6BAW5mApa9TJkMCYMdsojScz4zQTzo99C1wjIjGt3jwGv+/p92UwB r/kg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739060058; x=1739664858; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zHcRKHoHeO3yYwQrJ0rL+RTEHK8K10d3+YiRJ46SQRc=; b=qntnmZvYWi10QePXFs0fOJpJzWs9GD2b3DYHrSpP9bTatfj7eU940zWT2L8VORf5mO u4eamRM6qp1LOjCPemEa5SBsOzADGV7Bl3Lp8wwocUkQLk+8gV87O18OCWw09eyWC6GE ffC1i8RzAK0DzRfpV2pBALqinBaPivb7FwDVsM4cEOXxqesV2ONDfLpU/devnslnHb2B ZLnoZeo0G0iDqTrCu21XXKn32xqRD5xYWyo93Jgl51h5paUzTv7isMBf/bXPBEpIHHde 4lwwyefMHZ0R+pL3rPVMGsHHpapm+3PG4R2pgypqMwikjTTLsNlBUyXDGIse1fHO9YHA m20Q== X-Forwarded-Encrypted: i=1; AJvYcCXLjojdheHqXjqe731xBGpAVKpZi47/I9DtSc94gaLt68l4HKZf3YgT4PPsdG0Bq9nQSJdY6ae+/w==@kvack.org X-Gm-Message-State: AOJu0YyZAli5nndoQ6SfQvdvgBZtIW3Y10y+qzV1jNm30BEp7RlE5P0t e3PDG9Fi6C2c/11Qx3vHfwDz/q8TwCnIcu7rdnbW3hyivTudv7H3kdWIXFEGDOmlknRvRSk6QA/ dlJetKIr/Rfi91EZKJ176UloL3LKMgEAkRYiKuA== X-Gm-Gg: ASbGncvlsxNMsUL8zVlgbNqHUaKnDg76PX8PWIcnoHbOUmRK7O1ivGj37zcF/Rh/270 rWI6VNbY7ovCmSkmAKEFY0CT2x11vBaPQd641C3OqfewniRLz2CVYV7/4h0Z73LF6D5yrcg== X-Google-Smtp-Source: AGHT+IEZxHNt3mZP0AuiY+W9AaSnauAT6AJRQRznlE6PgIrsIXbAga4WSCyQOQlcL+3VJDDj9abWUIyU1/BOi9O7OEA= X-Received: by 2002:a05:622a:1186:b0:471:8a10:63c0 with SMTP id d75a77b69052e-4718a1066demr10288491cf.10.1739060057757; Sat, 08 Feb 2025 16:14:17 -0800 (PST) MIME-Version: 1.0 References: <20250206132754.2596694-1-rppt@kernel.org> In-Reply-To: From: Pasha Tatashin Date: Sat, 8 Feb 2025 19:13:40 -0500 X-Gm-Features: AWEUYZm7hjYyWx8OgJnvEXBjFKJR8cD76rObTCepJOM4ok5LDKRuBdQi0XQitL0 Message-ID: Subject: Re: [PATCH v4 00/14] kexec: introduce Kexec HandOver (KHO) To: Cong Wang Cc: Mike Rapoport , linux-kernel@vger.kernel.org, Alexander Graf , Andrew Morton , Andy Lutomirski , Anthony Yznaga , Arnd Bergmann , Ashish Kalra , Benjamin Herrenschmidt , Borislav Petkov , Catalin Marinas , Dave Hansen , David Woodhouse , Eric Biederman , Ingo Molnar , James Gowans , Jonathan Corbet , Krzysztof Kozlowski , Mark Rutland , Paolo Bonzini , "H. Peter Anvin" , Peter Zijlstra , Pratyush Yadav , Rob Herring , Rob Herring , Saravana Kannan , Stanislav Kinsburskii , Steven Rostedt , Thomas Gleixner , Tom Lendacky , Usama Arif , Will Deacon , devicetree@vger.kernel.org, kexec@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: qmbndytxkft33qpuot5tz5ow7cryx9oz X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: BAD9940006 X-Rspam-User: X-HE-Tag: 1739060058-96664 X-HE-Meta: U2FsdGVkX1/qtUf08ONffm3KPZ9Ry06vedf6r7qNFvKdct9CK57p/F0Mf942oQvzRIz2AK81BONGKxgFhsBe9BP3lGsoR0yVM4dc//dIsel30dWdawOQpNt+F5PXGE0inVBXxsmwUm+POJHbP4/saEA+dGbI3+v7beVMkmduRktk+ZW8McoBExsWyyWEO0TmMnaUqc3/eroeY5xazTi689q/tdSMMrbEWDmuNG8Ni8D7J48fizFSTw3mpV+WZPBYLl2oGJ/zwNEud1gzHHCe3Qkhe598LybLbn2EVarO5ptKT2yRVP32mAIxbq8WHeZq5PcdnE+sJRbpBud49dzPGREqRudZF81LtihTqvcwC/BiEkvexzpZm2uhTjq0nU1rfbM6ut05WlZ9+UseHEfePqOnzU38oJYUq7cEyP7dlRksGz/t1XIjxKUOD5lq5S7FE41tTqezlCn/eySrjWyfs/J9uWVI0jnXtbVl+uqChV6mvY8ggyv/irUbJPBoGYUy42FXIOR5cianCbfbdNePq11FKO6g3X+wAE/xArbW4j/4gnGq+pw0QztAExMyzolHXRO68lbdrJ+RB3tfWOmRWrsnNMXw6tiA+T7sVtspvCYRVe2Z/ygv6FiKcKR/+vpwYVVy9OuyrkMPVo65ifZmZnyERKIGErPPXDTnvCPvmTq3yDqNrDyJj5CvfDTGIS2eXDqMnDRb7wlFS5d/hYDCiLkBjaCODCx6yVS66HzWuzKNaB0/yjNKWpcy9a7xdDHuUS2wgcBfpGFpXArphlnj3bjYn6SdCDAEweOXA6PBSg7tyt36Ko7xN9iuJQOLbdPr90HnSLg3ejT/KL/xY8Fy2odZpXPkjuoNQ7n+or8SiT/tjtoXJ0x3xi67yhp1OdgZpXCb3Hcl36Xdic7CYFxMHUTw5PMaziZCsh5OiBxiBeG4+MH+TWQemgh/0hF9j/9YOLgS8pLsnmMpfLLUFje BUQJRBdc BpSB0KYs42+rs1bsmSyY8ttwc4ToaB+83oH5Gj5OhZ01+ssWf5ZKgAITpyiS6TUs7S8du/v+ayktQ2NNiyuneqLexmdt55vSY1vskI4yWFiWlUK0rRidYfdvIJ7HvnDkhlyJ3KW339Dx9NXwwo+z2Ez0k6DsrCja7BZJFde86WPSyYMxd5SGv1GoyTnNGT10mQ9AdLzdCgtMKGdjB5czo3gamS7WcYG3Lp8cITMcyDbTc/lQvt8LcF3m6DyIsUS1F43lcqcoDg3pTI6Dh3KAQRZHP7cxCg5OnTQKu1x8LBYXC9tH73zDi8xGvZc/C5s9BPKWA1drLXuG4Dez6A/Hlh5PB/i+0x4GZRxHWBX2QgDddip8uryANymbw4D6XBfzavidHVCjXKsx8eIjf6W38ilf3fCU8qTodSXd0U83vICiiENMB24cFu1ApMkfhUMYD5LMkx2+Wsstalz7skZO4fEPXrrH6BNkhrJSg X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Feb 8, 2025 at 6:39=E2=80=AFPM Cong Wang = wrote: > > Hi Mike, > > On Thu, Feb 6, 2025 at 5:28=E2=80=AFAM Mike Rapoport wr= ote: > > > > From: "Mike Rapoport (Microsoft)" > > > > Hi, > > > > This a next version of Alex's "kexec: Allow preservation of ftrace buff= ers" > > series (https://lore.kernel.org/all/20240117144704.602-1-graf@amazon.co= m), > > just to make things simpler instead of ftrace we decided to preserve > > "reserve_mem" regions. > > > > The patches are also available in git: > > https://git.kernel.org/rppt/h/kho/v4 > > > > > > Kexec today considers itself purely a boot loader: When we enter the ne= w > > kernel, any state the previous kernel left behind is irrelevant and the > > new kernel reinitializes the system. > > > > However, there are use cases where this mode of operation is not what w= e > > actually want. In virtualization hosts for example, we want to use kexe= c > > to update the host kernel while virtual machine memory stays untouched. > > When we add device assignment to the mix, we also need to ensure that > > IOMMU and VFIO states are untouched. If we add PCIe peer to peer DMA, w= e > > need to do the same for the PCI subsystem. If we want to kexec while an > > SEV-SNP enabled virtual machine is running, we need to preserve the VM > > context pages and physical memory. See "pkernfs: Persisting guest memor= y > > and kernel/device state safely across kexec" Linux Plumbers > > Conference 2023 presentation for details: > > > > https://lpc.events/event/17/contributions/1485/ > > > > To start us on the journey to support all the use cases above, this pat= ch > > implements basic infrastructure to allow hand over of kernel state acro= ss > > kexec (Kexec HandOver, aka KHO). As a really simple example target, we = use > > memblock's reserve_mem. > > With this patch set applied, memory that was reserved using "reserve_me= m" > > command line options remains intact after kexec and it is guaranteed to > > reside at the same physical address. > > Nice work! > > One concern there is that using memblock to reserve memory as crashkernel= =3D > is not flexible. I worked on kdump years ago and one of the biggest pains > of kdump is how much memory should be reserved with crashkernel=3D. And > it is still a pain today. > > If we reserve more, that would mean more waste for the 1st kernel. If we > reserve less, that would induce more OOM for the 2nd kernel. > > I'd suggest considering using CMA, where the "reserved" memory can be > still reusable for other purposes, just that pages can be migrated out of= this > reserved region on demand, that is, when loading a kexec kernel. Of cours= e, > we need to make sure they are not reused by what you want to preserve her= e, > e.g., IOMMU. So you might need additional work to make it work, but still= I > believe this is the right direction. This is exactly what scratch memory is used for. Unlike crashkernel=3D, the entire scratch area is available to user applications as CMA, as we know that no kernel-reserved memory will come from that area. This doesn't work for crashkernel=3D, because in some cases, the user pages might also need to be preserved in the crash dump. However, if user pages are going to be discarded from the crash dump (as is done 99% of the time), then it is better to also make it use CMA or ZONE_MOVABLE and use only the memory occupied by the crash kernel and do not waste any memory at all. We have an internal patch at Google that does this, and I think it would be a good improvement for the upstream kernel to carry as well. Pasha > > Just my two cents. > > Thanks!