From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5DBBC02198 for ; Mon, 10 Feb 2025 20:58:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4410C6B0092; Mon, 10 Feb 2025 15:58:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3F09A6B0093; Mon, 10 Feb 2025 15:58:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 269EA28002C; Mon, 10 Feb 2025 15:58:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 085536B0092 for ; Mon, 10 Feb 2025 15:58:41 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id A63EE1A01AE for ; Mon, 10 Feb 2025 20:58:40 +0000 (UTC) X-FDA: 83105248800.24.4FD4CC1 Received: from mail-qt1-f182.google.com (mail-qt1-f182.google.com [209.85.160.182]) by imf24.hostedemail.com (Postfix) with ESMTP id B823A180011 for ; Mon, 10 Feb 2025 20:58:38 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=KxdQJ8PM; spf=pass (imf24.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.182 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739221118; a=rsa-sha256; cv=none; b=SoDCA6uEKHWHo8mDKPQb87bVIjWDbTTB6t4R0hlS72OTJ0ppetLcH0DWQrrRGGCwZFdp5e 4q1tuEclhlRU9xSorTWxIrd/qVzT5wkFCar7/53UN8EuLSvdfttBsvI9Hk46/xjhUW0qm2 tL+j023Hi/bVdCslKFFOnlqGmTpafLc= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=KxdQJ8PM; spf=pass (imf24.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.182 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739221118; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Wl/1DFtCVA8JUg3JrvOVSDvXuYWKhgnuN/JPe0OCyKE=; b=0os0qDlfAkFRr4mD3BIAwJcgeFAyD7D1g4qc+LU+DZKVmWgp7Mp+Vff29NgAc82MpHKSj/ nN5TtK303OyvzWadlmtz3TCXRWa2b0nqr6B/Xkvu9ixuVVaa05+ixIfPFUUU20aXA8Ks+f K5W7Qg9GYEqrF6YbEy+anr0/OZX1SFI= Received: by mail-qt1-f182.google.com with SMTP id d75a77b69052e-471a01060f3so2727121cf.0 for ; Mon, 10 Feb 2025 12:58:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen-com.20230601.gappssmtp.com; s=20230601; t=1739221118; x=1739825918; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Wl/1DFtCVA8JUg3JrvOVSDvXuYWKhgnuN/JPe0OCyKE=; b=KxdQJ8PMHaif5T5cEAHppJHd/wzQr3jkGlZpyZglkYS206wZEytRBjBsJ4xvynQxyW W9qvl+ejOPY827UWOTF960/rQ+Gm3/weRt9LSW5DfbbEaxr68gTmC9cujd8w+QW/EL0l Wgd8MSUNDxcFEkHh6rO6JpCNpTaF5/boa1Y/cxbnOvzbKc+U0TfoIkrHfqO3djZWzGae pBNuzt+xWuZn2YxFlZiQFIOghaJgKoLBf7U1BcinANtjLJ7H3jOwcQRQ2gLRg9W5SV2r bQpFwzlI3k0bxNSWZpqvJ9MnGtYtUvgxGdGE4BD5hVX8Js03cO5t+D8SUUXBPL/HxCUg ZuZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739221118; x=1739825918; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Wl/1DFtCVA8JUg3JrvOVSDvXuYWKhgnuN/JPe0OCyKE=; b=BPtF6GwpFnCR5Vg8IRdgqT9lWOpwejvfve2wHOU0xPmcjJZMBQjaxf//q/KrISqyb+ zhw73Pv0S37R9H7WW9UBsTMkEEdykNLmJ5FrnFaDSXNmXn9Kr8T+b6jEDvgWDo5y1BP8 66zlVtmYojr22Nz+3KjLJci9RlRkP/xLM27NbgHtmEKsCD+3VCmQDRyWgIG9FYLipj9l zx/dwUAmz7/dLzmy2O8l18u359ynkZOJbSNxvVDKC3MU6T66hKaIaaZT1hX+krx70oMT GP6aPgqn4G4RNOQ1QyplbbccfsNBBOpvdtbzIXVqiS5ULz8vKVySfmuRHbgyV1nrIB1Y x5OQ== X-Forwarded-Encrypted: i=1; AJvYcCWZpc8Cg4KEscc3BFmgIXoPjP37VQEJICxovDHneC6UKbgbxGV2d3yNPuoPrjyp43XMJKmgGgiAfg==@kvack.org X-Gm-Message-State: AOJu0YyJYUpIG9OkDQ2uY7bWs/aIMFCqGd++sS+mFN6u0hXO8EvM66/B y/ejuH5Ykxk7MO6VLXido7uVsHztqeUDz2pH4Zi1FvpbCDJ83tA0+qDXeaeVW6xWn5RZIU6MmCW PtL4gbd+IFEcpIvNmdAbSZHluVJ23S6/eC3iMdw== X-Gm-Gg: ASbGncvmqk8WzlmsT2QJ9nwsFt/V63HjqM5BzSnIdggmDKhc48eb+y2akFkXFovPKu1 eDdPuPw4sP5L9zOm8OhcLz3Hi0bhY/mrHTP5R+30D+5iiqfW5mkbkqYp9WEDD+O4rdW8/0A== X-Google-Smtp-Source: AGHT+IETHCKadrtvKNJ7fIHLak3uf8ALu7tDsyEEqgrBMvB7GxxInEvWUg2pBVKBZMO1RROtaHb319Yhs3eTr5CpbD4= X-Received: by 2002:a05:622a:1a05:b0:46c:86d8:fc5 with SMTP id d75a77b69052e-471679cb76emr238937621cf.5.1739221117601; Mon, 10 Feb 2025 12:58:37 -0800 (PST) MIME-Version: 1.0 References: <20250206132754.2596694-1-rppt@kernel.org> <20250206132754.2596694-6-rppt@kernel.org> <20250210202220.GC3765641@nvidia.com> In-Reply-To: <20250210202220.GC3765641@nvidia.com> From: Pasha Tatashin Date: Mon, 10 Feb 2025 15:58:00 -0500 X-Gm-Features: AWEUYZnfXstpOnVhbWRWzg5zc_KYEQnAl1ReknkHBBoTgx3FQhtVYh2voma8nHI Message-ID: Subject: Re: [PATCH v4 05/14] kexec: Add Kexec HandOver (KHO) generation helpers To: Jason Gunthorpe Cc: Mike Rapoport , linux-kernel@vger.kernel.org, Alexander Graf , Andrew Morton , Andy Lutomirski , Anthony Yznaga , Arnd Bergmann , Ashish Kalra , Benjamin Herrenschmidt , Borislav Petkov , Catalin Marinas , Dave Hansen , David Woodhouse , Eric Biederman , Ingo Molnar , James Gowans , Jonathan Corbet , Krzysztof Kozlowski , Mark Rutland , Paolo Bonzini , "H. Peter Anvin" , Peter Zijlstra , Pratyush Yadav , Rob Herring , Rob Herring , Saravana Kannan , Stanislav Kinsburskii , Steven Rostedt , Thomas Gleixner , Tom Lendacky , Usama Arif , Will Deacon , devicetree@vger.kernel.org, kexec@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: cq6ismkici7z9uwqoygc65j8ez1f658j X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: B823A180011 X-Rspam-User: X-HE-Tag: 1739221118-919943 X-HE-Meta: U2FsdGVkX1/8Oo5pqggYBuP8IC7RCIpkwHfSfCiixylJf/ERwgnWE508XjtdeRALhR4wC4ATOhSwB1oC0N2T5pfgShOXI6GS4ZqaXxmV+qGD7lE4XNomfb6IUycFGt0w7AvQBh5gjwDkL2h4ixsbYca4/30NBbDIhut2DB6Bh4uEzoaGTBY6dhmQYTjHZCtOgsL5O4kmAG7IrQi1UjCBIM7fs0W6Ms6hXxg58PYoieEoNeff0TuUp0CCuzv3+xv16E5Ym/VSKpgnkrNj7HSFavuF2cnZkfhFCQKRfsQwzugv8UENBmBOouhG/jkiyvwK9vsh6HyoIaidLkMjs2go1OjhhjqsCAFkFGiPwHwHFMeONePPUFHNCjqDehSHKh5StW1FU4G+9S6La9psW4eE/vC6XHYAYInBO1umT9gYfqYsO3OhwxyveyVm9PdXcJKoy2Mu7N2gtwIvVLbZrCLwC0tLszxOzQhdhjPZZ/WUmYAOZn2LFYWfxVfXFJ81lSJ8yX7RrnPHWEs8I/yyyyIZ3wm9ChaaPt2DdZscBn5/PBegfZM7E516SRsMIGyMY2IvTI7fW8mPbNyZd+myTfiFrwg5wJBwhsbVJEfklFd0iNp1MUIXFf9U7KaEw1db/oG84Wt4xwtalFMyMfKTkxUFqRt+4MxEiWDdDaOjb9LOBstTXhxjDcFFRToBMjQ3B99w8k/a3Q2nSCSON/mHf8xeQDL4gAvSutbV6gskIKRoU3wwrM9caZiIfpREYUjWFD0aejD48RL6467b+zwq5BbyZi/wsbhgPCj6Ze3aSZ9BE3pcK2qr+VFWQXhtGaRRZxGbgmo5TTca2sQksh2f/GUfNlKougFdqraP0RFNZtRkAeyWHpZiHNxHrDLg9BRMK6zB+GSkcKjr6PQVWX3i4mL9Kpf2f+Yq/QEVp3wPf+PhcYOIUkMTlDxBSdvCQ65f66oaG5GQMWpdDj3JsnWDrQZ DKCtK1pp MIjR7bRouRJk5eS34CzZwU1gnK+JOibDLaXo7j1D8e+3q/+68fWn+WyQ+gl+Tq4Sm0Z7Mp3xTJ0FX6xoSDvojazDXw3vKa5IsCfTf4ocuoovKNF1wZT2sanMhlfbaFcPOcxQJpppsJoE80jOL9Ngi6u3UHOOmgYPtpnyF9oYGMI39ep8/lppG8oZQy+a0hNIQP7tjjH4N8/uL6BZPh9xs/23UQQ1+yjpmiwyevHXbT03eznUBzty4Zt6EUs0ImzdC+vaU5XhoOxV7XGRhvVsCelEZnkoByicFlVLtM1h2JwPLqEkfZb9j68PmG3pi/2FcoRuG0FDPNjZAZ4Ck9MYSo4TcwklrYIXX1+y2skx/EGnnVC7zdZEQkWqAnQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Feb 10, 2025 at 3:22=E2=80=AFPM Jason Gunthorpe wr= ote: > > On Thu, Feb 06, 2025 at 03:27:45PM +0200, Mike Rapoport wrote: > > diff --git a/Documentation/ABI/testing/sysfs-kernel-kho b/Documentation= /ABI/testing/sysfs-kernel-kho > > new file mode 100644 > > index 000000000000..f13b252bc303 > > --- /dev/null > > +++ b/Documentation/ABI/testing/sysfs-kernel-kho > > @@ -0,0 +1,53 @@ > > +What: /sys/kernel/kho/active > > +Date: December 2023 > > +Contact: Alexander Graf > > +Description: > > + Kexec HandOver (KHO) allows Linux to transition the state= of > > + compatible drivers into the next kexec'ed kernel. To do s= o, > > + device drivers will serialize their current state into a = DT. > > + While the state is serialized, they are unable to perform > > + any modifications to state that was serialized, such as > > + handed over memory allocations. > > + > > + When this file contains "1", the system is in the transit= ion > > + state. When contains "0", it is not. To switch between th= e > > + two states, echo the respective number into this file. > > I don't think this is a great interface for the actual state machine.. In our next proposal we are going to remove this "activate" phase. > > > +What: /sys/kernel/kho/dt_max > > +Date: December 2023 > > +Contact: Alexander Graf > > +Description: > > + KHO needs to allocate a buffer for the DT that gets > > + generated before it knows the final size. By default, it > > + will allocate 10 MiB for it. You can write to this file > > + to modify the size of that allocation. > > Seems gross, why can't it use a non-contiguous page list to generate > the FDT? :\ We will consider some of these ideas in the future version. I like the idea of using preserved memory to carry sparse KHO tree: i.e FDT over sparse memory, maybe use the anchor page to describe how it should be vmapped into a virtually contiguous tree in the next kernel? > > See below for a suggestion.. > > > +static int kho_serialize(void) > > +{ > > + void *fdt =3D NULL; > > + int err =3D -ENOMEM; > > + > > + fdt =3D kvmalloc(kho_out.dt_max, GFP_KERNEL); > > + if (!fdt) > > + goto out; > > + > > + if (fdt_create(fdt, kho_out.dt_max)) { > > + err =3D -EINVAL; > > + goto out; > > + } > > + > > + err =3D fdt_finish_reservemap(fdt); > > + if (err) > > + goto out; > > + > > + err =3D fdt_begin_node(fdt, ""); > > + if (err) > > + goto out; > > + > > + err =3D fdt_property_string(fdt, "compatible", "kho-v1"); > > + if (err) > > + goto out; > > + > > + /* Loop through all kho dump functions */ > > + err =3D blocking_notifier_call_chain(&kho_out.chain_head, KEXEC_K= HO_DUMP, fdt); > > + err =3D notifier_to_errno(err); > > I don't see this really working long term. I think we'd like each > component to be able to serialize at its own pace under userspace > control. > > This design requires that the whole thing be wrapped in a notifier > callback just so we can make use of the fdt APIs. > > It seems like a poor fit me. > > IMHO if you want to keep using FDT I suggest that each serializing > component (ie driver, ftrace whatever) allocate its own FDT fragment > from scratch and the main KHO one just link to the memories that holds > those fragements. > > Ie the driver experience would be more like > > kho =3D kho_start_storage("my_compatible_string,v1", some_kind_of_instan= ce_key); > > fdt...(kho->fdt..) > > kho_finish_storage(kho); > > Where this ends up creating a stand alone FDT fragment: > > /dts-v1/; > / { > compatible =3D "linux-kho,my_compatible_string,v1"; > instance =3D some_kind_of_instance_key; > key-value-1 =3D <..>; > key-value-1 =3D <..>; > }; > > And then kho_finish_storage() would remember the phys/length until the > kexec fdt is produced as the very last step. > > This way we could do things like fdbox an iommufd and create the above > FDT fragment completely seperately from any notifier chain and, > crucially, disconnected from the fdt_create() for the kexec payload. > > Further, if you split things like this (it will waste some small > amount of memory) you can probably get to a point where no single FDT > is more than 4k. That looks like it would simplify/robustify alot of > stuff? > > Jason >