From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD844C28B30 for ; Thu, 20 Mar 2025 19:01:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 103E9280003; Thu, 20 Mar 2025 15:01:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0B3D8280001; Thu, 20 Mar 2025 15:01:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EE4E4280003; Thu, 20 Mar 2025 15:01:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id CF280280001 for ; Thu, 20 Mar 2025 15:01:15 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 453BF161916 for ; Thu, 20 Mar 2025 19:01:16 +0000 (UTC) X-FDA: 83242847352.19.E1A628C Received: from mail-qt1-f171.google.com (mail-qt1-f171.google.com [209.85.160.171]) by imf11.hostedemail.com (Postfix) with ESMTP id 040B640022 for ; Thu, 20 Mar 2025 19:01:09 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=PY64XP8o; dmarc=pass (policy=none) header.from=soleen.com; spf=pass (imf11.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742497270; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BhESFHo1Qup0h87mxwY0rWQxJzpcXtzcXamkJ/AajPs=; b=ZUVXjZHBwzI8RmTkRu4HoPu8KVnWPcWICFNzxUv38P1O4JHiFqKeHXHP9JeTbFMCxl3qtK ERJJcoUjCXHPX04TSpep3lxnLnPq9bJCW9viOYBSbjlSHuncIUCme2oyv8Z6NgMvscI9bA 9U/rp/J6qwlyBEkj6ZliCopsNn86+jI= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=PY64XP8o; dmarc=pass (policy=none) header.from=soleen.com; spf=pass (imf11.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742497270; a=rsa-sha256; cv=none; b=GZrkcFUvwH8vO2UgD838JFDgTSORmcp33TGjOAxwR8IWCmEu/0x8LDLZcvHfFF8qMnv8Gf 5FSHdwMOcq8V/zlsVPUcz3kL0O1AzEJJyYR+IhMX+W8V66cGB5oegueWt3HZra9bKJ4naI qTXThIT9DASLoMfbghOwWCA4k00xvcM= Received: by mail-qt1-f171.google.com with SMTP id d75a77b69052e-4766631a6a4so12291211cf.2 for ; Thu, 20 Mar 2025 12:01:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen-com.20230601.gappssmtp.com; s=20230601; t=1742497268; x=1743102068; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=BhESFHo1Qup0h87mxwY0rWQxJzpcXtzcXamkJ/AajPs=; b=PY64XP8o3aSU7j70u7xi+X2XXbKorlOZu+7Du3HtWiAU+hFZTMqaTbhzOO+a6QfJRp zyd0B2lMvK8tmcRNmoxjOA8ypbN3xJSLo6qeuRH4uZUk1gdSOga2W3YmsgKTJxpXTbqi 9PGPLdChdFTCoSjtGXssxroe3UcvLEEcH8iU21bmM1NCpholzEBtvHNblxqxFY/NCCr+ VH0tFJ6H5sPvfUPd1/6NzfCLxWt7T/127L/4Us2eqw1HxxpDe32cko4dVy8UATmrPxKi NdGlGloDvj+d1h/AuymJ4O13AkClHOuJ9vVzjSxarxQ/BH+7Ual9+nfpobvcitIuUVCD wifg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742497268; x=1743102068; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=BhESFHo1Qup0h87mxwY0rWQxJzpcXtzcXamkJ/AajPs=; b=DwkxWvR7YUwbJ1dnaiAIgghcM9hl2872JWfuVEq93jmGR5X/L6sEPdBCMi9fiVMqyG wsAnujQjpwCz7MtexHXHo9tY/WbAuYPeqmjO/lSd4L2J6xNHozTrfm5KtG9st25afYmM qSBxhbIvaCjE9ALBLEdjwaUlHYUCm91DAIM/vEl9YMH4NCuyNwdmAjasFqOQNpgPPHN8 ePUaH0IgWTeHd5fEHo4Ap7diPoTKK9Jzs7XXpnvsG7Gmznbcq2ppny9bjHnrP4nKJpcl P2w3UA/FiFM+2j4yHJi4eKO559iV3VNYHhXCiutW148zorjd+v3OIsWrS5B1/Eh5VlfG rL0A== X-Forwarded-Encrypted: i=1; AJvYcCU92RtZK0V2LkOFJrz6UtxtwFGXHb9/asWOkAzUvh5KCnUGhdG028sT8M6Ysz3KCHa/gncdBtzNNw==@kvack.org X-Gm-Message-State: AOJu0YzmXRmOxeukhnI/IYI7FkAWmIBjRnf+m36f/Y2vVD1CLZNup77i fh75pNnmDxzPGy3pa14kNzr2vVP6yuLUfP17beZxHGibluyTlIgp8p/T5ONYqR7OHjOv+6Wmm3s OvxkwaqY8Y/Yjfj4c4zozkFesbmb/xkGe+fAaNw== X-Gm-Gg: ASbGncvWk+zamJ57FC/yCgC9Z51igry6rMglulQd0wCMkwfI3/KENbKQ1WwLV5iym1I OvMzgBv693BfFJHiucBxk9+GzelgJfg1zKLlF9NUuWujGO2fyrUbRieq7p9ZMgAUA0NVKPLAxON MjMOr1aMA1ktFqcsny+o7Gdz0= X-Google-Smtp-Source: AGHT+IEWuzyg9BKoYY96+r28nA923Sxob7bNjbOsMkoucEcF17cRXDEr0XabgiEuLbqh8S3+owG87YRnG7PjRKEVrfI= X-Received: by 2002:a05:622a:5108:b0:476:fe1b:d979 with SMTP id d75a77b69052e-4771de612d0mr7065671cf.48.1742497268430; Thu, 20 Mar 2025 12:01:08 -0700 (PDT) MIME-Version: 1.0 References: <20250320024011.2995837-1-pasha.tatashin@soleen.com> <20250320024011.2995837-2-pasha.tatashin@soleen.com> <20250320144338.GW9311@nvidia.com> In-Reply-To: <20250320144338.GW9311@nvidia.com> From: Pasha Tatashin Date: Thu, 20 Mar 2025 15:00:31 -0400 X-Gm-Features: AQ5f1JoRx2DXMVlFoq6PSocQQynFcw4m_U7KN9tp2PZfJfgfG69kbcGZpJvT488 Message-ID: Subject: Re: [RFC v1 1/3] luo: Live Update Orchestrator To: Jason Gunthorpe Cc: changyuanl@google.com, graf@amazon.com, rppt@kernel.org, rientjes@google.com, corbet@lwn.net, rdunlap@infradead.org, ilpo.jarvinen@linux.intel.com, kanie@linux.alibaba.com, ojeda@kernel.org, aliceryhl@google.com, masahiroy@kernel.org, akpm@linux-foundation.org, tj@kernel.org, yoann.congal@smile.fr, mmaurer@google.com, roman.gushchin@linux.dev, chenridong@huawei.com, axboe@kernel.dk, mark.rutland@arm.com, jannh@google.com, vincent.guittot@linaro.org, hannes@cmpxchg.org, dan.j.williams@intel.com, david@redhat.com, joel.granados@kernel.org, rostedt@goodmis.org, anna.schumaker@oracle.com, song@kernel.org, zhangguopeng@kylinos.cn, linux@weissschuh.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, gregkh@linuxfoundation.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, rafael@kernel.org, dakr@kernel.org, bartosz.golaszewski@linaro.org, cw00.choi@samsung.com, myungjoo.ham@samsung.com, yesanishhere@gmail.com, Jonathan.Cameron@huawei.com, quic_zijuhu@quicinc.com, aleksander.lobakin@intel.com, ira.weiny@intel.com, andriy.shevchenko@linux.intel.com, leon@kernel.org, lukas@wunner.de, bhelgaas@google.com, wagi@kernel.org, djeffery@redhat.com, stuart.w.hayes@gmail.com, jgowans@amazon.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam01 X-Stat-Signature: wnaz5eouoqbxay3hqza46xe8iobsdsch X-Rspam-User: X-Rspamd-Queue-Id: 040B640022 X-HE-Tag: 1742497269-927069 X-HE-Meta: U2FsdGVkX19Yn/pxr7y1yMzpFPnD0HCsZyo5jcR+EjY+74gvdWOOAuiizyJ1lElCzZyx8D+YWQ0JKzhfScbUn2+kp2PM5lfquxyeO8RIlB3BIvzKhvl71ykV4Lj03Wl4b+t1qvkZVXHpVY+PXINPDolhrmsQc+pGarSOTH69VExCslJrcnWvC2dyIYyxh0kFOHzoo/H2aLvYFomEwl8eP9uZBaQ7GUPZzldvr6WY4+WkyWVGFegjdTFTZP+OFjnrP38QcuXRPEIOw92UK19ZAXlaN5Bq21q1Tl2uogf2mk8YzbQH6cz6eXkTsBqJeqsA0OOO0mqAevt42R9T76rD9/P7UtAUauWRhj0dtO8xFBUijuF0Hmf7iG3zEUMg1nesjg+y4LRi5TucyGXD83lA8euQ0If90slehq3vT075jU518YRSzC3YxrPsdqo58BMcigO79Wv9G7PmypQLRTrb41vMUoj5r0OkvOjS7R/HzB3zR3sL8n90UOd7PJXF626WyVhfQ8Z1xy+M8QJafsSez9YUwhATqaZ4M+FEQKgs8mVdgbUhEDnUfPahtAXeecGZnjCQX5wSzSS7HXAbEApgJmLAo8F7vmFipdruPSuOEBcwlA9JiMrLMgHhIzd0lMr4JQ8SRIkj+6KK+alMdHzYBVNSoyxEXlfzJrF4G0oUvjOPAuJg2up9Mn+1y8trohHMdLi07NaYcFjcUapQZ5MuhItX6GbWhxIXPXYqDl7fe0TeMSAZ/h/n4v+F5n73cNAPPcA5c93z6U9SdqXR+yia0a9B14zoeh55EOvrSA1D1ZfFSr6E2Zr2m7WPpdJgQ8Vm3zxZpkoNuV7IK2gOXwXs+fPdC2ngLV+wk4Osoh8eZzZgtbiPGN7vzV0u0lAQ2GhQZSeN7mdsIySXIcA2HpOUoCClCl+hBZg9WDjz+U/nmMvjNhGzTdE+gyt0W2YHbGi4bMeC07tAMisU4FNcb9I fmr/avi5 OoCq3GSi4Hz8WUTvSFmyCe8+O7nimUqIFRt0/A66t+uGM963nI0tzW+jDVzG2xtWFjljm2+TDATKF/Nm2dGxHocFn5NNPO21qm4zTDwTeBAuoyubs6+lO72Cnc9ywuDomhuSaY/vpVA9xmd/ZY/1wx2fMIzuSgqSH7/PJitPrqLLm+IiiRAqLiIrX1T4T1QFWghnajTPJJSP8UYNuRYMpfKQxaklzoRIA8OB2dMt4rY1ADCPerQWiAF+gomLih8aDwdFhup69dcriYP9/w7LqFDCxcbpzGJTeUH+ZKdlM2eNM+ohO5oIDn3zTL3Q44ANoa/7R X-Bogosity: Ham, tests=bogofilter, spamicity=0.000003, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Jason, Thank you for your feedback. > > Features introduced: > > > > - Core orchestration logic for managing the live update process. > > - A state machine (NORMAL, PREPARED, UPDATED, *_FAILED) to track > > the progress of live updates. > > - Notifier chains for subsystems (device layer, interrupts, KVM, IOMMU, > > etc.) to register callbacks for different live update events: > > - LIVEUPDATE_PREPARE: Prepare for reboot (before blackout). > > - LIVEUPDATE_REBOOT: Final serialization before kexec (blackout). > > - LIVEUPDATE_FINISH: Cleanup after update (after blackout). > > - LIVEUPDATE_CANCEL: Rollback actions on failure or user request. > > I still don't think notifier chains are the right way to go about alot > of this, most if it should be driven off of the file descriptors and > fdbox, not through notification. > > At the very least we should not be adding notifier chains without a > clear user of them, and I'm not convinced that the iommu driver or > vfio are those users at the moment. > > I feel more like the iommu can be brought into the serialization > indirectly by putting an iommufd into a fdbox. We have identified the subsystems that need to participate in Live Update: KVM, IOMMU, Devices, and Interrupts. We are planning to present how each of them will integrate with the LUO. > > - A sysfs interface (/sys/kernel/liveupdate/) for user-space control: > > - `prepare`: Initiate preparation (write 1) or reset (write 0). > > - `finish`: Finalize update in new kernel (write 1). > > - `cancel`: Abort ongoing preparation or reboot (write 1). > > - `reset`: Force state back to normal (write 1). > > - `state`: Read-only view of the current LUO state. > > - `enabled`: Read-only view of whether live update is enabled. I forgot to update the commit message, there are no: enabled, reset, and cancel files. We only have three files in LUO: `prepare`, `finish`, and `prepare` > > I also think we should give up on the sysfs. If fdbox is going forward > in a char dev direction then I think we should have two char devs > /dev/kho/serialize and /dev/kho/deserialize and run the whole thing KHO is a mechanism to preserve kernel memory across reboots. It can be used independently of live update, for example, to preserve kexec reboot telemetry, traces, and for other purposes. The LUO utilizes KHO for memory preservation but also orchestrates specifically a live update process, provides a generic way for subsystems and devices to participate, handles error recovery, unclaimed devices, and other live update-specific steps. That said, I can transition the LUO interface from sysfs to a character device. > through that. The concepts shown in the fdbox patches should be merged > into the kho/serialize char dev as just a general architecture of open > the char dev, put stuff into it, then finalize and do the kexec. Some participating subsystems, such as interrupts, do not have a way to export a file descriptor. It is unclear why we would require this for kernel-internal state that needs to be preserved for live update, which should instead register with internally. > It gives you more options to avoid things like notifiers and a very > clear "session" linked to a FD lifetime that encloses the > serialization effort. I think that will make error case cleanup easier > and the whole thing more maintainable. IMHO sysfs is not a great API > choice for something so complicated. IMO, the current API and state machine are quite simple (I plan to present and go through them at one of the Hypervisor Live Update meetings). However, I am open to changing to a different API, and we can expose it through a character device. > Also agree with Greg, I think this needs more thoughtful patch staging > with actual complete solutions. I think focusing on a progression of > demonstrable kexec preservation: > - A simple KVM and the VM's backing memory in a memfd is perserved > - A simple vfio-noiommu doing DMA to a preserved memfd, including not > resetting the device (but with no iommu driver) > - iommufd We are working on this. However, each component builds upon the previous one, so it makes sense to discuss the lower layers early to get early feedback. Pasha