From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 270DCC02198 for ; Sun, 9 Feb 2025 01:00:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5D8826B007B; Sat, 8 Feb 2025 20:00:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 588016B0082; Sat, 8 Feb 2025 20:00:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 44F6D6B0083; Sat, 8 Feb 2025 20:00:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 286076B007B for ; Sat, 8 Feb 2025 20:00:30 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id CD7FFA04A4 for ; Sun, 9 Feb 2025 01:00:29 +0000 (UTC) X-FDA: 83098600578.01.C268811 Received: from mail-ua1-f51.google.com (mail-ua1-f51.google.com [209.85.222.51]) by imf23.hostedemail.com (Postfix) with ESMTP id EC31F14000A for ; Sun, 9 Feb 2025 01:00:27 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=UDNWuS5P; spf=pass (imf23.hostedemail.com: domain of xiyou.wangcong@gmail.com designates 209.85.222.51 as permitted sender) smtp.mailfrom=xiyou.wangcong@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739062828; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=c1QNyMKSJ0hbPU3JV/DSlubSfGXUIpdwEbIrWUYwGH8=; b=GZscdyEZm4b50/lnG5IsdIGCwpY/Z8toUCALoFLj01NP/yWg6s+WoFX23t9Kujz+sw/k5Y qLCfZHAVtKdU2oQTkI/y7mCoTdFN8v4J3AJHK/IcDCvMwikSIO6RcAMzRlIi2/EXeyh6bg 5lObBhB1ci+d9tjkrKFDkKpETX/aJL0= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=UDNWuS5P; spf=pass (imf23.hostedemail.com: domain of xiyou.wangcong@gmail.com designates 209.85.222.51 as permitted sender) smtp.mailfrom=xiyou.wangcong@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739062828; a=rsa-sha256; cv=none; b=Q6aijKv2EDsHi+csDAYKm+KuBxXnD7Y7eydLxR+PT9qWG58kvgbz7DjCybnp7uuI81hgOW 2bC+kch/BudBifikQZpuyz2APHHOHUr+uyNn95VNXYTxsnZA4wTDKttf+8w4NmTBWEHUb5 I+825JbOVqF/v7lmVrwXI80ncRrOLbM= Received: by mail-ua1-f51.google.com with SMTP id a1e0cc1a2514c-867120d67baso454041241.0 for ; Sat, 08 Feb 2025 17:00:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1739062827; x=1739667627; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=c1QNyMKSJ0hbPU3JV/DSlubSfGXUIpdwEbIrWUYwGH8=; b=UDNWuS5Pcz3W/xUZxBW5etfKxAuUyiv2k+w4UPAc7xs9OWPr1qzRXeQRPn6SxSAeaa 9+9IIY3hpTngpSSIHE30OW+pRzYaTgzi1Y/umRJXu63fZ2a/zBdLTK68sBEtIYSnu8NI PzNu5nBvmeM2ocOICiueZ1VOqHRaoafWN9lpPZWxp9KP2dvuAD0m6EOmGFpnZCB0x6Ud H/8e/0E3wDr5zGKuZVPrSUSRg7FUzoyQd9MsyOTqijmi6sL2GRoIDFbDAvyeHhCy63Gj TFEeYw/rabPH0befDWuu9VBWf1/sLsMiC8yRMHvzpNDXLkw4rsN1Oz6B77Pt3jyz1lZw nFLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739062827; x=1739667627; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=c1QNyMKSJ0hbPU3JV/DSlubSfGXUIpdwEbIrWUYwGH8=; b=IEbNbvfp7PmLUGQP+THYEfhdci5O88+THubaDtJ0arjbnwn+WBT1JrOXLtAlwQeyAy VpFZ+KGQZqzq7jtLjgmWcWxzVA8VVFAvOA8h1YCw+tprwdtuda9FJppPCuaY0bOwV+kW LCzBxR97eQRW2HT+wDmboApcZF7Q5EUK2r0kXeebeLitZXQffpfr0MUumYRry7VfEJWQ pktIVg+Vyo1W0MtIrcitMCd8QDn41AJY5qgXjXx4dL5ZSOWQPYERDCB9a7/fSYYVzza2 nGx4FlC4M0hG6bAel6itMCFZCWggC+2TTX9VMJd9T3zYNarbq6lnp0H6M55QXUBKl1Ab fFlg== X-Forwarded-Encrypted: i=1; AJvYcCUq6q+6RKVrUwTeH6edVGDxekRrA2ATkYpd5NJwR+QaW0D9f/pQSeCaWevPxxdVh4Fqub8kX5353Q==@kvack.org X-Gm-Message-State: AOJu0YwkSTEh0l3qpKfY6o739uJemd+UfK7Dv44G5Sehv1NjwHUg4bq7 I8c/LkTJMK8iPwDDc+a6V0vRFAWSF2/BaLFTA/MNWrNNiOS4CIoKlbJph5i3kdFa2/pI4AALEHO RcHznqQRywiosOSytupNJLCpvKrE= X-Gm-Gg: ASbGncuJxWlK5Vmr2vNL3xVW2gkDwrNnBB7x8e4UYjYtSgibGt2+LOAL5sBsqJm6tR8 K6BMFvnT0YyhSXs9OMnFR0PT1JQuW9RDnORpRE5sOYAxY0tTed843fsJHGr+Br1T1RjEOqUq7Kl rc/4DCSrCdOLZCB/NKrJwLtZaDaCkovQ== X-Google-Smtp-Source: AGHT+IFLWFnU1Qw8fGqOUkm2sS5SvMcAKUH0QtXlpgwftaLaCUSnrvhOVYmXAaAgSRLNlyGtO1hm5jHRnEYHFNzWo88= X-Received: by 2002:a05:6102:1627:b0:4b6:1a4e:9ed7 with SMTP id ada2fe7eead31-4ba85d86d4fmr5498626137.5.1739062827035; Sat, 08 Feb 2025 17:00:27 -0800 (PST) MIME-Version: 1.0 References: <20250206132754.2596694-1-rppt@kernel.org> In-Reply-To: From: Cong Wang Date: Sat, 8 Feb 2025 17:00:15 -0800 X-Gm-Features: AWEUYZkp7_4rFtvLgYaHqSNG2VFYTpgh1ezDSM27S0gX2m2SV5kBvj6jdSbSZAc Message-ID: Subject: Re: [PATCH v4 00/14] kexec: introduce Kexec HandOver (KHO) To: Pasha Tatashin Cc: Mike Rapoport , linux-kernel@vger.kernel.org, Alexander Graf , Andrew Morton , Andy Lutomirski , Anthony Yznaga , Arnd Bergmann , Ashish Kalra , Benjamin Herrenschmidt , Borislav Petkov , Catalin Marinas , Dave Hansen , David Woodhouse , Eric Biederman , Ingo Molnar , James Gowans , Jonathan Corbet , Krzysztof Kozlowski , Mark Rutland , Paolo Bonzini , "H. Peter Anvin" , Peter Zijlstra , Pratyush Yadav , Rob Herring , Rob Herring , Saravana Kannan , Stanislav Kinsburskii , Steven Rostedt , Thomas Gleixner , Tom Lendacky , Usama Arif , Will Deacon , devicetree@vger.kernel.org, kexec@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: EC31F14000A X-Stat-Signature: cck4p31tqu1319pnde46q6oygktqnqpc X-HE-Tag: 1739062827-71306 X-HE-Meta: U2FsdGVkX1+4A42dZJjHr3zCLd7EJ+ARJTkYWL6pkcLZVC7PfnUExV2yelZ9XGnM19mdoczacqI8BHz3PoQ43ic9AXM3Zn87b3PNKH1atGZ6Kugd2ZB8CaQGaTbabqnu4/2uWtNpAXjQ/qGNGyhai0g6XmYc9Wtr9O+kye5ThqWuMdZZlodSRaScao0Dd4Fl9y2KbhNpleWQXjMANLfEwaKm/3qPAdLxOx0LLxpnpenlTEV6HFjXc34pLnyr4l90TliEVXlirECUcF7mQGHzdqf4rv2g4bF8kxTP0U3+TyxwRKCKNhJHA9NlI5bLm0fsrfLfvbJc0Xn5LLvVwgNOUQa2gX2xr6uGfKOS0D0hf2cSXwpfIArrNh29L8O4qNK+GBlwNtRODdmQ+Cu9i8opllhargo60LGo4Xp6BYgt4vxskFZpixRbvJuyAXz1Fqn1tU222Uw+XOM3lNfq8DG1FYltq9oUWuAsYSok3fsiB9r1H6TnAoI5DxF7nAt+Kdr1bkDsr/RPKqLCbq3YSO6guNYPc0qlAs2/+fLAZpe+y89Z24wR/2PGEpSE+dBqgYLDpVnprAxFlAipwkdR7K9OV3V78socBbdBn3QfupzT/bwknkLoCfeG3lzkI+gSk2M/Ep++D7fmOhCntbuc21gkdJ4WRE/30zhzX139OjomLSQXleJjWwgnlioQZGH3MXYZu6rmpeKhtdeaWql7sqaDU9SfqXWmCRmdCJNIm77SiaddKfR/+Sfpl/JYGk/MlVMVmqj/d4v0xPQ7LO2osaAzhJB1+7/BR43+f/T7eGkwbnoVO1YxcTsTBLu7RIl0yjyjpCyXRtd+fTYLaaXC76G2hThjykJxFXZwXpXWf0aYVc3Xyjar/IlDxZA0T0ww24LQJaHWwMsoLrnvn1BRblROHWo6zQXejgq/BbAExYJ3JiTu2iWdkMWXPg8HIiiEYHJT4oaGFqb8srWuIUFSswX JqZN+9XN eRZ3n9BYniMZIh38Vq8ZSbwehoxe5hGXEoF+a2vup3IbcWPw127HI/BWST5uSsimcRjWIqr1MtBXLS8b1/amVkIrIFqU+tDeY/YJJbn1rrh0CFCeyL+xuMCw1omLb15Yd7bs41uRD6jfR+qUxpS3SsWLLkB3MiXvH4yGntBIkGjRxGccVREuFvISrNbmdJEmXJqjHlFZoowAse2HZV34WA2QIENl8rG5nFkW0M9O1dKYmai1kHmOqpnyPP+61kQp0OKyvLvjVs7oNkcB/7VMknYQqzYzXXLkQ5fXfbAKKnkk2S3XZQuT3qntG8pEqvNFYmjdu4toX17cDdT+txb2AdL8dRgrQvtqEkbZcPmsVcdv12KMvlSJ5mAHoqqaShD4pbDSR0I5WQp+oOhEIx4tmELKv5q+YmUTg8pfwPHbw5EWJbxfLAzhq9KEmGHKCiLa07X9yExFuc6641LmWiYDRISXRruezfHRCk2ZTmL107HGdqajXF6tBSd4ddZURcb6lXBfuQocFx+zVWO9KCZ2YXPikkV+THzN+VKkfDv6a2bUA/grSDoeJBydX4lg9jOaofcgx7LodR0TLylQHeFkpboAr5WWoOLZvwB/l X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Feb 8, 2025 at 4:14=E2=80=AFPM Pasha Tatashin wrote: > > On Sat, Feb 8, 2025 at 6:39=E2=80=AFPM Cong Wang wrote: > > > > Hi Mike, > > > > On Thu, Feb 6, 2025 at 5:28=E2=80=AFAM Mike Rapoport = wrote: > > > > > > From: "Mike Rapoport (Microsoft)" > > > > > > Hi, > > > > > > This a next version of Alex's "kexec: Allow preservation of ftrace bu= ffers" > > > series (https://lore.kernel.org/all/20240117144704.602-1-graf@amazon.= com), > > > just to make things simpler instead of ftrace we decided to preserve > > > "reserve_mem" regions. > > > > > > The patches are also available in git: > > > https://git.kernel.org/rppt/h/kho/v4 > > > > > > > > > Kexec today considers itself purely a boot loader: When we enter the = new > > > kernel, any state the previous kernel left behind is irrelevant and t= he > > > new kernel reinitializes the system. > > > > > > However, there are use cases where this mode of operation is not what= we > > > actually want. In virtualization hosts for example, we want to use ke= xec > > > to update the host kernel while virtual machine memory stays untouche= d. > > > When we add device assignment to the mix, we also need to ensure that > > > IOMMU and VFIO states are untouched. If we add PCIe peer to peer DMA,= we > > > need to do the same for the PCI subsystem. If we want to kexec while = an > > > SEV-SNP enabled virtual machine is running, we need to preserve the V= M > > > context pages and physical memory. See "pkernfs: Persisting guest mem= ory > > > and kernel/device state safely across kexec" Linux Plumbers > > > Conference 2023 presentation for details: > > > > > > https://lpc.events/event/17/contributions/1485/ > > > > > > To start us on the journey to support all the use cases above, this p= atch > > > implements basic infrastructure to allow hand over of kernel state ac= ross > > > kexec (Kexec HandOver, aka KHO). As a really simple example target, w= e use > > > memblock's reserve_mem. > > > With this patch set applied, memory that was reserved using "reserve_= mem" > > > command line options remains intact after kexec and it is guaranteed = to > > > reside at the same physical address. > > > > Nice work! > > > > One concern there is that using memblock to reserve memory as crashkern= el=3D > > is not flexible. I worked on kdump years ago and one of the biggest pai= ns > > of kdump is how much memory should be reserved with crashkernel=3D. And > > it is still a pain today. > > > > If we reserve more, that would mean more waste for the 1st kernel. If w= e > > reserve less, that would induce more OOM for the 2nd kernel. > > > > I'd suggest considering using CMA, where the "reserved" memory can be > > still reusable for other purposes, just that pages can be migrated out = of this > > reserved region on demand, that is, when loading a kexec kernel. Of cou= rse, > > we need to make sure they are not reused by what you want to preserve h= ere, > > e.g., IOMMU. So you might need additional work to make it work, but sti= ll I > > believe this is the right direction. > > This is exactly what scratch memory is used for. Unlike crashkernel=3D, > the entire scratch area is available to user applications as CMA, as > we know that no kernel-reserved memory will come from that area. This > doesn't work for crashkernel=3D, because in some cases, the user pages > might also need to be preserved in the crash dump. However, if user > pages are going to be discarded from the crash dump (as is done 99% of > the time), then it is better to also make it use CMA or ZONE_MOVABLE > and use only the memory occupied by the crash kernel and do not waste > any memory at all. We have an internal patch at Google that does this, > and I think it would be a good improvement for the upstream kernel to > carry as well. Good to know CMA is already used, I could not tell from the cover letter. The case that user-space pages need to be preserved is for scenarios like RDMA which pins user-space pages for DMA transfer. Since the goal here is also to preserve hardware states like RDMA's I guess the same concern remains. Thanks!