From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2C84C5AD49 for ; Sun, 8 Jun 2025 13:38:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1E4236B008A; Sun, 8 Jun 2025 09:38:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1BB956B008C; Sun, 8 Jun 2025 09:38:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0FACA6B0092; Sun, 8 Jun 2025 09:38:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id DF74A6B008A for ; Sun, 8 Jun 2025 09:38:19 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 64A36101BF4 for ; Sun, 8 Jun 2025 13:38:19 +0000 (UTC) X-FDA: 83532337518.05.E3E4F81 Received: from mail-qt1-f176.google.com (mail-qt1-f176.google.com [209.85.160.176]) by imf10.hostedemail.com (Postfix) with ESMTP id 822E0C0007 for ; Sun, 8 Jun 2025 13:38:16 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=SRSKzyx2; spf=pass (imf10.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.176 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749389896; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TUbBijHBgxotN6xjfy6fLHclw6xV0tsQ2jT2nZZ0+Zg=; b=Fl5CPpbPaz6Ui24sYEtOMW7ejchnMcgbaCs4Jeqa+obx161kbtuK/IJUJ26fjJm/i9RSc+ ptuQXAhefYA7zJk8Vz4lVVpSjkqMGPl79QN+jvUzmY4LWKT13BIq6cEFlr3VWXT6/TP28T ZFMfPPwFBWuF8mWI7auwwvriBGbM3Oo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749389896; a=rsa-sha256; cv=none; b=P9Ke7TyemiGMhvqmRdfdSKP2RxqKX7mnK5xaAJSd4ZmPsQFpfBM6xOoB6ZT0u8vDkBRNgT cxZNGmzBieg6/AHUM9ZUDtXPS3txfZB0nORZUKUCq1rC0P1UYIUeaHMBE4J8REqJCvln/o API7kovDjovmK2p8RQ7B7fCe6dByoHM= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=SRSKzyx2; spf=pass (imf10.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.160.176 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=pass (policy=none) header.from=soleen.com Received: by mail-qt1-f176.google.com with SMTP id d75a77b69052e-4772f48f516so48087901cf.1 for ; Sun, 08 Jun 2025 06:38:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen-com.20230601.gappssmtp.com; s=20230601; t=1749389895; x=1749994695; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=TUbBijHBgxotN6xjfy6fLHclw6xV0tsQ2jT2nZZ0+Zg=; b=SRSKzyx2JJfSs6rmydXZ1Ld+1H63Qv2r7obf7QUzGXDlULAvFjjj8L8O+ut5OReSnW QjxBZ0KjkiswVxc6CWqeGceeLrDb9vOK6ppfKxrjQLmiC7J+FUqOhFfIHPcf/zAze4SC 0TO8K69gju5NApIP+RnL+bvxX7ZRUOTRWzEikmRJQ5Bb0++abXUFX1QL+l85gmzHcxws ojwXnaRS1Sp4lL84MZjUWv8TmEqMAR8KGpVaKa7t0nghztWJMiXdhmTXYq7/mThb/3eg L8txEagOLU77mbjdqoO7tGyyTdDSyjbjxkXCGZj8LsLx5ZsVt5pJEmE5Scyj0Ckp5LOz DBsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749389895; x=1749994695; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=TUbBijHBgxotN6xjfy6fLHclw6xV0tsQ2jT2nZZ0+Zg=; b=ez0nIhRhON1sSjzg1rMZyfpZXAKrsHyzKEeWD96blBbg3gKZz0g227iNuoddoGcHAW CQp8Zd16o0sF8e7HiagXnXY9f2FAoCJpzSiBx7cQS1Z0L7IRccdR9Sv5S98jPHEXmnD6 cJEzwjtuldRQQlyoLeh/SGaY/z7/AohmcNGbqJ3wyLbECRYHCzwp3EfH5r/YOt/e7BS0 E8+azxlV/3FdG2gMCwu5ZTy2sW2WvGGhqWG4+SPTGZEWs9cEJoBqqui3vxxYBr/URUgy T1nbSH79q8QK2q1Jkk7i5YOIxO24UQU8MRhsyJ4gQm6NSHPkbTEKGNYtKyeV40lFCh9i uxoA== X-Forwarded-Encrypted: i=1; AJvYcCUHrT+Bx0UzB6bM97loXNrUwp85dl5pU6ebHxp1uRWrJWGxIfJ5yZ3H3/TS1Mop+FZaKvlyhJgHlg==@kvack.org X-Gm-Message-State: AOJu0YwvwwPiZeG6TsZHKzUP3BJH8i5dBmRP2ad89yQKyXUhggDX4c6D r7Bd2VUBqKBbenwmfVGUpr0bT585M6fBL02qRbvQVqZHu8QqtC+36xSy1YImh75jnIikcuQBfKf A59XSiz55QTk5tFB4iKOJaJjFirV0PqC47QbslGqiow== X-Gm-Gg: ASbGncv8JXTUILJIyFKwkqOm6Wqh7bSIyYlIA2sTn4xFa3hH9eYDbEdPZ4QhLaAG88p 3qPFgJ6E99pZtNcuqxyJxdzWDUqI0hFxU9lDfnngxC+S6PAd99S2Qin2I9ju1AGJ13LAMBTU7jd ttuBdts2w7vksl4Q8S73F/0WLtX3E7mg== X-Google-Smtp-Source: AGHT+IHxkVYmrqTy++9U+zVtJe1k2fuiraSQf431LIwxqL/0g1Ki6gT4p1nw2WMON0OXJE5372Dw/jab+Jv/xTg0BRM= X-Received: by 2002:a05:622a:229b:b0:494:7347:d73f with SMTP id d75a77b69052e-4a5baa62fdbmr152821741cf.11.1749389895512; Sun, 08 Jun 2025 06:38:15 -0700 (PDT) MIME-Version: 1.0 References: <20250515182322.117840-1-pasha.tatashin@soleen.com> <20250515182322.117840-9-pasha.tatashin@soleen.com> In-Reply-To: From: Pasha Tatashin Date: Sun, 8 Jun 2025 09:37:38 -0400 X-Gm-Features: AX0GCFtZNmgBc-IWiXVSdESmqmVzgwWJDO_gK1yQRaVW8V8y6wninwTbeWi6VDs Message-ID: Subject: Re: [RFC v2 08/16] luo: luo_files: add infrastructure for FDs To: Pratyush Yadav Cc: jasonmiu@google.com, graf@amazon.com, changyuanl@google.com, rppt@kernel.org, dmatlack@google.com, rientjes@google.com, corbet@lwn.net, rdunlap@infradead.org, ilpo.jarvinen@linux.intel.com, kanie@linux.alibaba.com, ojeda@kernel.org, aliceryhl@google.com, masahiroy@kernel.org, akpm@linux-foundation.org, tj@kernel.org, yoann.congal@smile.fr, mmaurer@google.com, roman.gushchin@linux.dev, chenridong@huawei.com, axboe@kernel.dk, mark.rutland@arm.com, jannh@google.com, vincent.guittot@linaro.org, hannes@cmpxchg.org, dan.j.williams@intel.com, david@redhat.com, joel.granados@kernel.org, rostedt@goodmis.org, anna.schumaker@oracle.com, song@kernel.org, zhangguopeng@kylinos.cn, linux@weissschuh.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, gregkh@linuxfoundation.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, rafael@kernel.org, dakr@kernel.org, bartosz.golaszewski@linaro.org, cw00.choi@samsung.com, myungjoo.ham@samsung.com, yesanishhere@gmail.com, Jonathan.Cameron@huawei.com, quic_zijuhu@quicinc.com, aleksander.lobakin@intel.com, ira.weiny@intel.com, andriy.shevchenko@linux.intel.com, leon@kernel.org, lukas@wunner.de, bhelgaas@google.com, wagi@kernel.org, djeffery@redhat.com, stuart.w.hayes@gmail.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 822E0C0007 X-Stat-Signature: 4iqmrwi9wkrapdhzm1qiw3ak4k8k9pum X-Rspam-User: X-HE-Tag: 1749389896-481223 X-HE-Meta: U2FsdGVkX1/aSPOWD0+xgCG2AIOeU34nP4ErTNPFSGYATNDCOOX2Sp6sDfWATcTY6rLAVdyKOu8DnXAqkKNyktE7YW1rLaR9LEKhEVQ1TmD+6VAvaqbmL007K4+5qKKhb1VuJcHvZBoR/rsrnHgQTOmnxg9GERb0qerymmt2kexBrY5Fr1IPfhjSL02Kp6fnpLzfzf0fP5Av6gV6mnHeaMBdZjF0DrVxD4mBy3NQWXTowCb4oZoC86NoTw5Dn6cCefs5FxQ2mEwcJDzl6fWk7Hri6w6itiXBsJLCF9BY+GGEEqwlbpyU2oaVHMtwUxZWF6D7jDy8Y7iHXuMnhGj039xY7qncJZLCF7A42FlupxSmjl4O1rHpyrDoNBIw5f6TfLBW8zcWZlK0a9wmpbhiSBkKF5jvL55XTgJCHbGS2zDWmwlQoy05799HKHplNbAxgQxMhBgUYynOG6gh5zaz1Q/RmbAhUjQYS0KjjmLTHhMvi0MvWUej7KHxEk8quv18bJ2yLPL/1fCmCeaNsJ6PMqiOFf1Ynj+Zd4kMciGiYWP7UbrhRnXeTGlueD7pcb8OWwnPnmc2zRF4iuvt8jgRLyuIddrOXVKi/krjvZfQcL7SUH+f//YBds/cG1gCLZVLGKsDAkImSDfisBeHe5e6n4pTy6aQgzmxjj1kE5ULEKbXoJv86bG3YbJXgTC6sloUGAclohJCtstaEnRVZTiGVpcwXep2GQPWNWCYnPFPym82cV5k59M4+NvzMV+1g7jz2nbWnIRAaBTJeVH+sZvn2DCkMp6x/1w2l9ZUQi/p5VAkmOS4OmBasPhL6E2AqSoMbfduARhusNDDhw6ZrucjSr6dpc7tnhIPmsRUU5X/x2hKZuPaknbKn88RQfBS7fE4Q5+ZDtRtSsiecg/Hq/dQ5NuZq91inZqo5g6SKxDrwYJT70oynLq03XqDhGKchiOETAeoJ8ZuHuofEDCmizD i7HTAjeZ TbCp2XXpPjxHpnQ9BHGeDuvhml9EkXG8BxPW6EZc7Ju+lVcCsIyjbnaCDFKBbyUqZ6VHxNLFgd+Mk86CFNgNrKa6I3Tuvk5sTwEUb9Q9CtR5VfNfZu5bkpIJcYJS6jv2X1dKqNpQTS/6Kob5xwQaQkTrj4zx1exzKeAWb8q83Y7hnBvSqinVAqfI9Z+qVEVI9yJLYGQb++i35bGWewgHCujK1Oxf7YVDojzL6sPRq7ysCs/c9URSFJvT2UMYu4JkNDf/ToMY/tfTkalMd/3gaItJJcYHQipbGo8en8FcmFsNlC5twY6tHTj8sgiA7QwJTmAUt5MArdImEnntYjUWyu7/6pSi6yaFbB8MJ71nmxmQ0qLN+jHE5M/IpEg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > > + > > +/** > > + * luo_files_startup - Validates the LUO file-descriptors FDT node at startup. > > + * @fdt: Pointer to the LUO FDT blob passed from the previous kernel. > > + * > > + * This __init function checks the existence and validity of the > > + * '/file-descriptors' node in the FDT. This node is considered mandatory. It > > Why is it mandatory? Can't a user just preserve some subsystems, and no > FDs? Yes, that is legal, in that case this node is going to be empty. > > > + * calls panic() if the node is missing, inaccessible, or invalid (e.g., missing > > + * compatible, wrong compatible string), indicating a critical configuration > > + * error for LUO. > > + */ > > +void __init luo_files_startup(void *fdt) > > +{ > > + int ret, node_offset; > > + > > + node_offset = fdt_subnode_offset(fdt, 0, LUO_FILES_NODE_NAME); > > + if (node_offset < 0) > > + panic("Failed to find /file-descriptors node\n"); > > + > > + ret = fdt_node_check_compatible(fdt, node_offset, > > + LUO_FILES_COMPATIBLE); > > + if (ret) { > > + panic("FDT '%s' is incompatible with '%s' [%d]\n", > > + LUO_FILES_NODE_NAME, LUO_FILES_COMPATIBLE, ret); > > + } > > + luo_fdt_in = fdt; > > +} > > + > > +static void luo_files_recreate_luo_files_xa_in(void) > > +{ > > + int parent_node_offset, file_node_offset; > > + const char *node_name, *fdt_compat_str; > > + struct liveupdate_filesystem *fs; > > + struct luo_file *luo_file; > > + const void *data_ptr; > > + int ret = 0; > > + > > + if (luo_files_xa_in_recreated || !luo_fdt_in) > > + return; > > + > > + /* Take write in order to gurantee that we re-create list once */ > > Typo: s/gurantee/guarantee Done, thanks. > > > + down_write(&luo_filesystems_list_rwsem); > > + if (luo_files_xa_in_recreated) > > + goto exit_unlock; > > + > > + parent_node_offset = fdt_subnode_offset(luo_fdt_in, 0, > > + LUO_FILES_NODE_NAME); > > + > > + fdt_for_each_subnode(file_node_offset, luo_fdt_in, parent_node_offset) { > > + bool handler_found = false; > > + u64 token; > > + > > + node_name = fdt_get_name(luo_fdt_in, file_node_offset, NULL); > > + if (!node_name) { > > + panic("Skipping FDT subnode at offset %d: Cannot get name\n", > > + file_node_offset); > > Should failure to parse a specific FD really be a panic? Wouldn't it be > better to continue and let userspace decide if it can live with the FD > missing? This is not safe, the memory might be DMA or owned by a sensetive process, and if we proceed liveupdate reboot without properly handling memory, we can get corruptions, and memory leaks. Therefore, during liveupdate boot if there are exceptions, we should panic. > > + } > > + > > + ret = kstrtou64(node_name, 0, &token); > > + if (ret < 0) { > > + panic("Skipping FDT node '%s': Failed to parse token\n", > > + node_name); > > + } > > + > > + fdt_compat_str = fdt_getprop(luo_fdt_in, file_node_offset, > > + "compatible", NULL); > > + if (!fdt_compat_str) { > > + panic("Skipping FDT node '%s': Missing 'compatible' property\n", > > + node_name); > > + } > > + > > + data_ptr = fdt_getprop(luo_fdt_in, file_node_offset, "data", > > + NULL); > > + if (!data_ptr) { > > + panic("Can't recover property 'data' for FDT node '%s'\n", > > + node_name); > > + } > > + > > + list_for_each_entry(fs, &luo_filesystems_list, list) { > > + if (!strcmp(fs->compatible, fdt_compat_str)) { > > + handler_found = true; > > + break; > > + } > > + } > > + > > + if (!handler_found) { > > + panic("Skipping FDT node '%s': No registered handler for compatible '%s'\n", > > + node_name, fdt_compat_str); > > Thinking out loud here: this means that by the time of first retrieval, > all file systems must be registered. Since this is called from > luo_do_files_finish_calls() or luo_retrieve_file(), it will come from > userspace, so all built in modules would be initialized by then. But > some loadable module might not be. I don't see much of a use case for > loadable modules to participate in LUO, so I don't think it should be a > problem. Yes, in practice I am against supporting liveupdate for loadable modules for FDs and devices; however, if userspace decides to use them, they have to be very careful in terms when data is retrieved, and when they are loaded. > > + } > > + > > + luo_file = kmalloc(sizeof(*luo_file), > > + GFP_KERNEL | __GFP_NOFAIL); > > + luo_file->fs = fs; > > + luo_file->file = NULL; > > + memcpy(&luo_file->private_data, data_ptr, sizeof(u64)); > > Why not make sure data_ptr is exactly sizeof(u64) when we parse it, and > then simply do luo_file->private_data = (u64)*data_ptr ? Because FDT alignment is 4 bytes, we can't simply assign it. > Because if the previous kernel wrote more than a u64 in data, then > something is broken and we should catch that error anyway. > > > + luo_file->reclaimed = false; > > + mutex_init(&luo_file->mutex); > > + luo_file->state = LIVEUPDATE_STATE_UPDATED; > > + ret = xa_err(xa_store(&luo_files_xa_in, token, luo_file, > > + GFP_KERNEL | __GFP_NOFAIL)); > > Should you also check if something is already at token's slot, in case > previous kernel generated wrong tokens or FDT is broken? Good idea, added. > > > + if (ret < 0) { > > + panic("Failed to store luo_file for token %llu in XArray: %d\n", > > + token, ret); > > + } > > + } > > + luo_files_xa_in_recreated = true; > > + > > +exit_unlock: > > + up_write(&luo_filesystems_list_rwsem); > > +} > > + > [...] > > diff --git a/include/linux/liveupdate.h b/include/linux/liveupdate.h > > index 7a130680b5f2..7afe0aac5ce4 100644 > > --- a/include/linux/liveupdate.h > > +++ b/include/linux/liveupdate.h > > @@ -86,6 +86,55 @@ enum liveupdate_state { > > LIVEUPDATE_STATE_UPDATED = 3, > > }; > > > > +/* Forward declaration needed if definition isn't included */ > > +struct file; > > + > > +/** > > + * struct liveupdate_filesystem - Represents a handler for a live-updatable > > + * filesystem/file type. > > + * @prepare: Optional. Saves state for a specific file instance (@file, > > + * @arg) before update, potentially returning value via @data. > > + * Returns 0 on success, negative errno on failure. > > + * @freeze: Optional. Performs final actions just before kernel > > + * transition, potentially reading/updating the handle via > > + * @data. > > + * Returns 0 on success, negative errno on failure. > > + * @cancel: Optional. Cleans up state/resources if update is aborted > > + * after prepare/freeze succeeded, using the @data handle (by > > + * value) from the successful prepare. Returns void. > > + * @finish: Optional. Performs final cleanup in the new kernel using the > > + * preserved @data handle (by value). Returns void. > > + * @retrieve: Retrieve the preserved file. Must be called before finish. > > + * @can_preserve: callback to determine if @file with associated context (@arg) > > + * can be preserved by this handler. > > + * Return bool (true if preservable, false otherwise). > > + * @compatible: The compatibility string (e.g., "memfd-v1", "vfiofd-v1") > > + * that uniquely identifies the filesystem or file type this > > + * handler supports. This is matched against the compatible > > + * string associated with individual &struct liveupdate_file > > + * instances. > > + * @arg: An opaque pointer to implementation-specific context data > > + * associated with this filesystem handler registration. > > + * @list: used for linking this handler instance into a global list of > > + * registered filesystem handlers. > > + * > > + * Modules that want to support live update for specific file types should > > + * register an instance of this structure. LUO uses this registration to > > + * determine if a given file can be preserved and to find the appropriate > > + * operations to manage its state across the update. > > + */ > > +struct liveupdate_filesystem { > > + int (*prepare)(struct file *file, void *arg, u64 *data); > > + int (*freeze)(struct file *file, void *arg, u64 *data); > > + void (*cancel)(struct file *file, void *arg, u64 data); > > + void (*finish)(struct file *file, void *arg, u64 data, bool reclaimed); > > + int (*retrieve)(void *arg, u64 data, struct file **file); > > + bool (*can_preserve)(struct file *file, void *arg); > > + const char *compatible; > > + void *arg; > > What is the use for this arg? I would expect one file type/system to > register one set of handlers. So they can keep their arg in a global in > their code. I don't see why a per-filesystem arg is needed. I think, arg is useful in case we support a subsystem is registered multiple times with some differences: i.e. based on mount point, or file types handling. Let's keep it for now, but if needed, we can remove that in future revisions. > What I do think is needed is a per-file arg. Each callback gets 'data', > which is the serialized data, but there is no place to store runtime > state, like some flags or serialization metadata. Sure, you could make > place for it somewhere in the inode, but I think it would be a lot > cleaner to be able to store it in struct luo_file. > > So perhaps rename private_data in struct luo_file to say > serialized_data, and have a field called "private" that filesystems can > use for their runtime state? I am not against this, but let's make this change when it is actually needed by a registered filesystem. Thanks, Pasha