From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id EEF15CCD18D for ; Mon, 13 Oct 2025 15:23:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 117878E0052; Mon, 13 Oct 2025 11:23:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0C8128E004E; Mon, 13 Oct 2025 11:23:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EF8D78E0052; Mon, 13 Oct 2025 11:23:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id D85228E004E for ; Mon, 13 Oct 2025 11:23:27 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A587346F33 for ; Mon, 13 Oct 2025 15:23:27 +0000 (UTC) X-FDA: 83993460054.18.BDD7242 Received: from sea.source.kernel.org (sea.source.kernel.org [172.234.252.31]) by imf05.hostedemail.com (Postfix) with ESMTP id 45AD1100003 for ; Mon, 13 Oct 2025 15:23:25 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=p1mwuIjG; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf05.hostedemail.com: domain of pratyush@kernel.org designates 172.234.252.31 as permitted sender) smtp.mailfrom=pratyush@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1760369005; a=rsa-sha256; cv=none; b=y+v7udKfMxvGwGB4U9g19+39hAdzdP2kvmr0kSdKijp5ehYX+g1HUIm/wnhWk6wofx8MM6 x7fhDf2m3GKfcVkQXV0581NiHOBXMhV9JvLqWUDGOI6zA5J+Y5XpCxrrbqlxx3HxE4TNpI quAmOls13sTSFVtb/I6mdX0yrTJMqbk= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=p1mwuIjG; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf05.hostedemail.com: domain of pratyush@kernel.org designates 172.234.252.31 as permitted sender) smtp.mailfrom=pratyush@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1760369005; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/MVcby/Y0q1LFyDo8skazcaE/YzD+TPpzdqEx0J8WuQ=; b=ObnYp375w6Sc9Mwya4CGzNOQ2oysdaW9iSGNdvQqEuUQBGjt6u9tnm+M4D0pcFErAsEl8k flPELtuW9o8VCfSZJR8dzI6hRHU7ClGeyskMSYPM+yicrlD17yLHZvrC5qBea8XHaSfHX+ fOW7ongYpq+9K5yufE9hAUi+CSZTaF4= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sea.source.kernel.org (Postfix) with ESMTP id D913A489D0; Mon, 13 Oct 2025 15:23:23 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 467E2C4CEE7; Mon, 13 Oct 2025 15:23:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1760369003; bh=HrpU56u8pxLbc0g2WXKv8ddcf/Eh7MXMNWvvG37jNK4=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=p1mwuIjGDqqsckgAZg28V4Tdmyj4Dd97sGqbbsa9lgNdyX/sRs42EfYMS3jIPWYV+ nr7vxK2+169v0M8HySQ0uqBzWr5hAOg9GhkSMNabre2gX2OpTEehC8GXK0Effdu/Fa 3qlF51k21MY8e6I51tFIoKonWMKzGJJTeSnhlkhcx9OTkxrj5xVP10uY7uempIhuyC whGSrPHmNuYmQhrwOqoM1Mm7g38uBzRluMba02MY4iX0Nqu/FbMRr67HApS8b0rMcl UOdvgHtogP3LzHWRJLxerRuaI1Yn22cFa8aVYGPDj1UhayuNHwJMC1NTcqr0MjEaca vUKwnxoP+7xhw== From: Pratyush Yadav To: Pasha Tatashin Cc: Pratyush Yadav , jasonmiu@google.com, graf@amazon.com, changyuanl@google.com, rppt@kernel.org, dmatlack@google.com, rientjes@google.com, corbet@lwn.net, rdunlap@infradead.org, ilpo.jarvinen@linux.intel.com, kanie@linux.alibaba.com, ojeda@kernel.org, aliceryhl@google.com, masahiroy@kernel.org, akpm@linux-foundation.org, tj@kernel.org, yoann.congal@smile.fr, mmaurer@google.com, roman.gushchin@linux.dev, chenridong@huawei.com, axboe@kernel.dk, mark.rutland@arm.com, jannh@google.com, vincent.guittot@linaro.org, hannes@cmpxchg.org, dan.j.williams@intel.com, david@redhat.com, joel.granados@kernel.org, rostedt@goodmis.org, anna.schumaker@oracle.com, song@kernel.org, zhangguopeng@kylinos.cn, linux@weissschuh.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, gregkh@linuxfoundation.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, rafael@kernel.org, dakr@kernel.org, bartosz.golaszewski@linaro.org, cw00.choi@samsung.com, myungjoo.ham@samsung.com, yesanishhere@gmail.com, Jonathan.Cameron@huawei.com, quic_zijuhu@quicinc.com, aleksander.lobakin@intel.com, ira.weiny@intel.com, andriy.shevchenko@linux.intel.com, leon@kernel.org, lukas@wunner.de, bhelgaas@google.com, wagi@kernel.org, djeffery@redhat.com, stuart.w.hayes@gmail.com, lennart@poettering.net, brauner@kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, saeedm@nvidia.com, ajayachandra@nvidia.com, jgg@nvidia.com, parav@nvidia.com, leonro@nvidia.com, witu@nvidia.com, hughd@google.com, skhawaja@google.com, chrisl@kernel.org, steven.sistare@oracle.com Subject: Re: [PATCH v4 00/30] Live Update Orchestrator In-Reply-To: (Pasha Tatashin's message of "Thu, 9 Oct 2025 19:50:12 -0400") References: <20250929010321.3462457-1-pasha.tatashin@soleen.com> Date: Mon, 13 Oct 2025 17:23:13 +0200 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 45AD1100003 X-Stat-Signature: ccno5tefs3z3ejs3ca3rsh4er8m3ufsh X-HE-Tag: 1760369005-937101 X-HE-Meta: U2FsdGVkX18hPKVrmdYcxXqhZ+eEmLbjoXvG1sbEsB1cCt2ozV+ScRE9ffqcZXI1atGG5nFqIsHsj1HAsIy3mlpod/1RNXCkk+PVIJEFh1of5dn77lDzSda9pKXy6R+gWxFu/4KGdzKrmN+6RTIevpMY5JF5h9cjvNc7y8oQhe78ej4TR2Sl3vYO+clzTVXfnaflU+v82sSWvtBwjPPHt6Tzltc6JlW2ECMnvIf1L1qzPqALckfk7vUJ81HtD4+RsQNh6EDz+ow73ryu4NrCGeFXj7zFokt92K11gGsl4ZwCrIZ7pCp9nLPJybeRYbk5f+LUTiSiW19mlEQIiZRaorqeAD/xai6SGFfYvisfVoqhF7i+8NJocFFwua6s+hOcnWdiJocrk4k4AOMEKlnqM4wIUcCUijT1FLLeFWMrn9Fps4jcHVxCuhiwiVBxHxfHTljiNtwZIe/+DlFl/Smuqhioa1v9k+/xPs8sIRqpLcmOl74ZK+86hIkO2bgONKkkJVFoE0oJGyYNUZV4dhGLhDVxbZpWG1hNOwv0ZOJYCdrAfb5MGGcyZ2UPhQZ6qE1Mpk/BST09m2t23TnnbFuOYMfytlpJuhPEHkL/BDAyoGBJ1SNUii1SVIgUH9bJAXIXnMLsM5jz5CONHytClLODrKpBWH/BmuqWEz5NR4jXZ4LwK2EBQ0Uzi0gHwaDwN0GzPEaX2k0duVJ74uRiLlVmg9tyxtGk0kpU4gYrA/6XccPNRlR+EbQNnU6x2l2JRxp6vU0ZRrg/QB7Zd/Ry95Or2q3Vs7539Eia/ZjPkSybZKepZqfLyWAFjv49E5tXalhpTQtkaFl9nFNJE8VE165SdUyEv/6UEUAINbEdLHaYA/xftjwSaIPBNqs2rkzVSgNtgLGV1Zx09MKOzFDjzX8R4C7nwBuy/f+bVIauOXazn3ZvS858eG6iDidoiMDnK7W6tZSwdzEM8b62+5u/Czg uqKWzgxE 0qPKJvFhZu4mjKM8cwRzvZ1EHQCDCXUNI0vjP8GaRbJbG5jnd1Hf6uxtYk4okojXDqxjmPpPWu27+4rKYbknHhOD/zgkog3+QvQdLgGNxU6j6lzSjI1XZGRxb6CsgyhMC6umCfzIxCTs3B1C8aEsGtYNDFNRmlRMHzux1HU2z3heKfrNVNYp2rajn/CVi+ScL12kP030I9hL47FWZaI854WF7xgH/lt/MARAHYfmpC01AmfGbN4bFHP8rCinimWYudE4do4TOIfGMkt4v6fV61lWr3+DhGlew8hA8Idpl1fv77zVUwWi3MJvY7U1YRjuCctlF3jOkvDNEyCc+iIyjdywoPPX0wix296FPXrXFWFKCsMhcjrzYO3htbVbomFbCN1YK2xZT76F18AA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Oct 09 2025, Pasha Tatashin wrote: > On Thu, Oct 9, 2025 at 6:58=E2=80=AFPM Pratyush Yadav wrote: >> >> On Tue, Oct 07 2025, Pasha Tatashin wrote: >> >> > On Sun, Sep 28, 2025 at 9:03=E2=80=AFPM Pasha Tatashin >> > wrote: >> >> >> [...] >> > 4. New File-Lifecycle-Bound Global State >> > ---------------------------------------- >> > A new mechanism for managing global state was proposed, designed to be >> > tied to the lifecycle of the preserved files themselves. This would >> > allow a file owner (e.g., the IOMMU subsystem) to save and retrieve >> > global state that is only relevant when one or more of its FDs are >> > being managed by LUO. >> >> Is this going to replace LUO subsystems? If yes, then why? The global >> state will likely need to have its own lifecycle just like the FDs, and >> subsystems are a simple and clean abstraction to control that. I get the >> idea of only "activating" a subsystem when one or more of its FDs are >> participating in LUO, but we can do that while keeping subsystems >> around. > > Thanks for the feedback. The FLB Global State is not replacing the LUO > subsystems. On the contrary, it's a higher-level abstraction that is > itself implemented as a LUO subsystem. The goal is to provide a > solution for a pattern that emerged during the PCI and IOMMU > discussions. Okay, makes sense then. I thought we were removing the subsystems idea. I didn't follow the PCI and IOMMU discussions that closely. Side note: I see a dependency between subsystems forming. For example, the FLB subsystem probably wants to make sure all its dependent subsystems (like LUO files) go through their callbacks before getting its callback. Maybe in the current implementation doing it in any order works, but in general, if it manages data of other subsystems, it should be serialized after them. Same with the hugetlb subsystem for example. On prepare or freeze time, it would probably be a good idea if the files callbacks finish first. I would imagine most subsystems would want to go after files. With the current registration mechanism, the order depends on when the subsystem is registered, which is hard to control. Maybe we should have a global list of subsystems and can manually specify the order? Not sure if that is a good idea, just throwing it out there off the top of my head. > > You can see the WIP implementation here, which shows it registering as > a subsystem named "luo-fh-states-v1-struct": > https://github.com/soleen/linux/commit/94e191aab6b355d83633718bc4a1d27dda= 390001 > > The existing subsystem API is a low-level tool that provides for the > preservation of a raw 8-byte handle. It doesn't provide locking, nor > is it explicitly tied to the lifecycle of any higher-level object like > a file handler. The new API is designed to solve a more specific > problem: allowing global components (like IOMMU or PCI) to > automatically track when resources relevant to them are added to or > removed from preservation. If HugeTLB requires a subsystem, it can > still use it, but I suspect it might benefit from FLB Global State as > well. Hmm, right. Let me see how I can make use of it. > >> Here is how I imagine the proposed API would compare against subsystems >> with hugetlb as an example (hugetlb support is still WIP, so I'm still >> not clear on specifics, but this is how I imagine it will work): >> >> - Hugetlb subsystem needs to track its huge page pools and which pages >> are allocated and free. This is its global state. The pools get >> reconstructed after kexec. Post-kexec, the free pages are ready for >> allocation from other "regular" files and the pages used in LUO files >> are reserved. >> >> - Pre-kexec, when a hugetlb FD is preserved, it marks that as preserved >> in hugetlb's global data structure tracking this. This is runtime data >> (say xarray), and _not_ serialized data. Reason being, there are >> likely more FDs to come so no point in wasting time serializing just >> yet. >> >> This can look something like: >> >> hugetlb_luo_preserve_folio(folio, ...); >> >> Nice and simple. >> >> Compare this with the new proposed API: >> >> liveupdate_fh_global_state_get(h, &hugetlb_data); >> // This will have update serialized state now. >> hugetlb_luo_preserve_folio(hugetlb_data, folio, ...); >> liveupdate_fh_global_state_put(h); >> >> We do the same thing but in a very complicated way. >> >> - When the system-wide preserve happens, the hugetlb subsystem gets a >> callback to serialize. It converts its runtime global state to >> serialized state since now it knows no more FDs will be added. >> >> With the new API, this doesn't need to be done since each FD prepare >> already updates serialized state. >> >> - If there are no hugetlb FDs, then the hugetlb subsystem doesn't put >> anything in LUO. This is same as new API. >> >> - If some hugetlb FDs are not restored after liveupdate and the finish >> event is triggered, the subsystem gets its finish() handler called and >> it can free things up. >> >> I don't get how that would work with the new API. > > The new API isn't more complicated; It codifies the common pattern of > "create on first use, destroy on last use" into a reusable helper, > saving each file handler from having to reinvent the same reference > counting and locking scheme. But, as you point out, subsystems provide > more control, specifically they handle full creation/free instead of > relying on file-handlers for that. > >> My point is, I see subsystems working perfectly fine here and I don't >> get how the proposed API is any better. >> >> Am I missing something? > > No, I don't think you are. Your analysis is correct that this is > achievable with subsystems. The goal of the new API is to make that > specific, common use case simpler. Right. Thanks for clarifying. > > Pasha --=20 Regards, Pratyush Yadav