From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 240D4C36002 for ; Tue, 25 Mar 2025 00:21:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 79288280002; Mon, 24 Mar 2025 20:21:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6F473280001; Mon, 24 Mar 2025 20:21:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5491C280002; Mon, 24 Mar 2025 20:21:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 32BF7280001 for ; Mon, 24 Mar 2025 20:21:51 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 72746140C85 for ; Tue, 25 Mar 2025 00:21:51 +0000 (UTC) X-FDA: 83258170422.14.120F7DE Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) by imf01.hostedemail.com (Postfix) with ESMTP id 97AE34000D for ; Tue, 25 Mar 2025 00:21:49 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=orKNE4Eo; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf01.hostedemail.com: domain of 3HPfhZwoKCNc5A3G9RN3GE9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--changyuanl.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=3HPfhZwoKCNc5A3G9RN3GE9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--changyuanl.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742862109; a=rsa-sha256; cv=none; b=Nbo/5NajIp+n6M5cyivbnZAZGmhbiQqRem0QLCdE3VE3Xh+mysZgpxsDdfB7AjNDwiEFY1 ZDqflqhLgUEUdK1U4/DG30dnHVu3fsFEceNc7uHswoxjHVosNXtXbr9QvPy1iL9tDlVSDn T4bZxb6oWHTFzl68OL9LJtkXwqDnFM4= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=orKNE4Eo; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf01.hostedemail.com: domain of 3HPfhZwoKCNc5A3G9RN3GE9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--changyuanl.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=3HPfhZwoKCNc5A3G9RN3GE9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--changyuanl.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742862109; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Kb8y09wjWZKZXgN+OK6vm5oWH5+mueKid+6zGQwN3TE=; b=DfeuQzroG65tAJPM1XzYLfDY4P/r36gp1a3UAr9slByRTfp3mRkA916muxAx/YFB9qamN4 9M9c5NTrdfRlBIPNcACaor37lwcWlrVk5ezHuHjQVuCzcT7cDA+0GtFxtRr4CiziNYP/M/ y2AHQRDcOfH57m2JcdwDS2lGNWTR6Kk= Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2233b764fc8so81585485ad.3 for ; Mon, 24 Mar 2025 17:21:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1742862108; x=1743466908; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Kb8y09wjWZKZXgN+OK6vm5oWH5+mueKid+6zGQwN3TE=; b=orKNE4EoSrJxuPj7cbh3pK9HE997OHOdpY8kGaJPuJ5/6oCL3hmt1SM2e2qlaOZZjy Hosz4zl3fcCLNJdNI3jMnwhoJMBHy/UNc+c+FCmZ63t0RcHbx6k2ML7pPQO9gkFmhZHS MbTAQHazAKMqCdBfFxlw0qSrX+jmC7dWapuGmT6ys/wK1MHJlrS+rjQ+Ohiwrkiyb/iS cq8p0X4MCOytuJZxDpryGYM68gl7gAZmtWk3JRaH7Q3OCzNat3Jv8tQtDbrrShuqzYoI sbLcTKhHGIZftLpzEfIKUR7PqfTz5CeGNOilq0s/w2BCG2THRUJ9SUmivGFyDLyT7je9 IvKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742862108; x=1743466908; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Kb8y09wjWZKZXgN+OK6vm5oWH5+mueKid+6zGQwN3TE=; b=Vw41iTHxCisyo3kqXGt7jX7LeNeeHC6z6JvrUGoJZscsUyyQVYNR+6q2u+hMyuTvyH z/4yBP3EOYIVFh0MmWi1CN7mFgCmk2nH+zSpN7eJRzAyq6vNQzi0RvuSIm1EOONk1uoD F8NY1KwdMC166QV/H9nY+YlTYTmixjwquCxai6WItG5cHBgKZusTXOqSBGOrR6sSvU/1 pRepguW9B/GSPm9f3li2JivBrI9mdGMOHbMTb4XeRoM7XGz6hA3ePZQ+ic2FFmyAS10x uik52712pqtRgTwXf0lecRsvhI2frLPJRjPgV0ugS9YmgWV15XHzW3QcAywS+khjrd80 XFBQ== X-Forwarded-Encrypted: i=1; AJvYcCWqZta5cUcX5Ec0EoNdxbr4C1abHT6wdavjhIFzpg4zCESX1HlR3DEYtlEpYiBQH3iILh6i8MEOAQ==@kvack.org X-Gm-Message-State: AOJu0YzgjjyG/uxULSz+PGlQ4OUH31LQcBi6xZLPC47nmpJ3Eyo4mapL UthqcN48fadMNb2d91qOyYIYwFjSkT4bMW3Wclf7Kbu5GKGR3Z1nIXFVF6H6E/taG+BlGfpVW/D +5vdwMQ6osmax+LveLQ== X-Google-Smtp-Source: AGHT+IHG0g+Hm1IqtMzqXUj5Qn0QldzYjiLvO8Nyoi1zqCX0pxtjOavFlFPqDc4bNSQjw3dO9geHCTStZpDWLXuA X-Received: from pfbgi10.prod.google.com ([2002:a05:6a00:63ca:b0:736:59f0:d272]) (user=changyuanl job=prod-delivery.src-stubby-dispatcher) by 2002:aa7:88c7:0:b0:736:5725:59b9 with SMTP id d2e1a72fcca58-7390593d43fmr22362509b3a.2.1742862108166; Mon, 24 Mar 2025 17:21:48 -0700 (PDT) Date: Mon, 24 Mar 2025 17:21:45 -0700 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.395.g12beb8f557-goog Message-ID: <20250325002145.982402-1-changyuanl@google.com> Subject: Re: [PATCH v5 07/16] kexec: add Kexec HandOver (KHO) generation helpers From: Changyuan Lyu To: jgg@nvidia.com Cc: akpm@linux-foundation.org, anthony.yznaga@oracle.com, arnd@arndb.de, ashish.kalra@amd.com, benh@kernel.crashing.org, bp@alien8.de, catalin.marinas@arm.com, changyuanl@google.com, corbet@lwn.net, dave.hansen@linux.intel.com, devicetree@vger.kernel.org, dwmw2@infradead.org, ebiederm@xmission.com, graf@amazon.com, hpa@zytor.com, jgowans@amazon.com, kexec@lists.infradead.org, krzk@kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, luto@kernel.org, mark.rutland@arm.com, mingo@redhat.com, pasha.tatashin@soleen.com, pbonzini@redhat.com, peterz@infradead.org, ptyadav@amazon.de, robh+dt@kernel.org, robh@kernel.org, rostedt@goodmis.org, rppt@kernel.org, saravanak@google.com, skinsburskii@linux.microsoft.com, tglx@linutronix.de, thomas.lendacky@amd.com, will@kernel.org, x86@kernel.org Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 97AE34000D X-Stat-Signature: sqgtbb7yyx4z9yz93kdz5xm1ex6bkek3 X-HE-Tag: 1742862109-201103 X-HE-Meta: U2FsdGVkX19HLO3sik92Ngfl5wgrufmql6sLs0B1uTLL5uqQ9YFgegfSg76m7ooSRKXoTF6t/xtX5OICXfmDdTheN2XQB5bnRosBx6SRBZprCWV91FVEHA1HGlg5ITj2wItTZhojscQcT/z5iPxwIXMaA50k1kwQg6gFoCJ8BP44sp4LaBOhd26f4HnKX7aRVs2mij7AWAXfRHIZHN9IXmYvph1dDJVFI38aOlwzRV+P4GCNBtlPuhsDZ6MOz12gvpoFQG7YZEI8odpFYzWp5+AZerCwzuX2OSTixJfREQCr3465wXrfydt+t2cQ8jVMSg6SbhWJAFGo2ZUgIFSeu3JWNYdoXhNaFCDTlHqfnpBZcJR0EHo/KW/nFssxtQd4F2nJrtc22tRsFNZfMyZCV46kOHoQcKYqbgcE37QZarZ+H3gXN6mdnKjqfXUTC8MRSnYA/Ociamgg8py0TwlQcTD3o/xbyvMyOlJSFDn/Y84FRdzS4qdS/uSpzR3zQL3O3zaKMJb4g7XaOhKSQkcOhK/HgxRFtCr8BTitYivUdCEuUWUO2RlAWgRc8WTEhHhdKQ95Q1ZyYTSjov66x8ObE8c0viTsAwa4F+wkvTHohcsogqHZkTtnVsObrGRJy3GSpjiE7h7T9OBdGkwgXVf4VedPYmZre6B6BCQEeSv/8GRde+U/H2MfCBPBpnh3ZnM/6suq/fbuN87hs77J06iBVZo2/RAFD3NWkOhsszmvpt1QSUcG3irbT0mrhcYBpIzBwBmPOeN+l7T4U79Z0WRBVheQdD1hDvlHF+Xf38MWljxLYbmd8XICYNcF2riU5ioireu6dPrR+WsBxD+Kb7H/W/NNNYNGr+auU9I9KS6lFsE8p8C+7+A2dhmq4hYRwftCxXRotlLS5ASocya9G6huvWbLEgCw4912H5rqaD+JMcYQmqCqSmyAKOiaFeOgvKnL/KhzAfmhq/VOm76qzov F42vpsTb Yf3jEmfJux277ksfjxFtvl9oO+w7uvfL3/fj429VWgXRPeIoP98RcKZXnfZHbSFGkxpuDBtT45vAmWVRZPvi8UvedWAE3JVlH4BGvkmodmnYE6bFxgitO8UsyzwMI/UQyv2hS53cCQTBRgK4uoBnyUQUSElpZ2EegeJkABUMVcAP6iA8SvSWwOSBVcXx6cKy57CnMyL/1gKjN3yQ/Of6ztvWSyRX9j21sJiZKjUQ46J/OFKRpd5COtWed/WFInq9awI2P6F1LCZUkvlkV2kZaRp2mY2yRf6pkx01Gx1bdR3GFGN2WbzOYgqLH6rN5t5MCGc130/g4PnMfspmD2oVQmlgri2k5L0vyAUuAt3Lxv4JQx1tzOum3Rm1ZBmWTKkmEjW8CmIyRIsXv87OVOPzjpT1SyeDupGZ1VaoDjfVEL+h0qjEBK6ZGEq6ZNCYbftAofqEVIaC+ObGuXDxIXiGU0vlRKIj2XJrCK6cWuNEV4+SVe5DSKKDMju/hgRLLjIFBSkHc7GDM8zKG7c69s+0A5bTfcEeLiY6402rTEO6sLbF0e2w5CMOZEuXfPFLjcpFkKTbCB5L2RNqbisufq2SJyxwNvHOYXqZiHBrzU3E4XHVP71v2OMT632lgo0RqDMfAdXg8EM5JPqy+gLQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.002011, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Jason, On Mon, Mar 24, 2025 at 13:28:53 -0300, Jason Gunthorpe wrote: > [...] > > > I feel like this patch is premature, it should come later in the > > > project along with a stronger justification for this approach. > > > > > > IHMO keep things simple for this series, just the very basics. > > > > The main purpose of using hashtables is to enable KHO users to save > > data to KHO at any time, not just at the time of activate/finalize KHO > > through sysfs/debugfs. For example, FDBox can save the data into KHO > > tree once a new fd is saved to KHO. Also, using hashtables allows KHO > > users to add data to KHO concurrently, while with notifiers, KHO users' > > callbacks are executed serially. > > This is why I like the recursive FDT scheme. Each serialization > operation can open its own FDT write to it and the close it > sequenatially within its operation without any worries about > concurrency. > > The top level just aggregates the FDT blobs (which are in preserved > memory) > > To me all this complexity here with the hash table and the copying > makes no sense compared to that. It is all around slower. > > > Regarding the suggestion of recursive FDT, I feel like it is already > > doable with this patchset, or even with Mike's V4 patch. > > Of course it is doable, here we are really talk about what is the > right, recommended way to use this system. recurisive FDT is a better > methodology than hash tables > > > just allocates a buffer, serialize all its states to the buffer using > > libfdt (or even using other binary formats), save the address of the > > buffer to KHO's tree, and finally register the buffer's underlying > > pages/folios with kho_preserve_folio(). > > Yes, exactly! I think this is how we should operate this system as a > paradig, not a giant FDT, hash table and so on... > > [...] > > To completely remove fdt_max, I am considering the idea in [1]. At the > > time of kexec_file_load(), we pass the address of an anchor page to > > the new kernel, and the anchor page will later be fulfilled with the > > physical addresses of the pages containing the FDT blob. Multiple > > anchor pages can be linked together. The FDT blob pages can be physically > > noncontiguous. > > Yes, this is basically what I suggested too. I think this is much > prefered and doesn't require the wakky uapi. > > Except I suggested you just really need a single u64 to point to a > preserved page holding the top level FDT. > > With recursive FDT I think we can say that no FDT fragement should > exceed PAGE_SIZE, and things become much simpler, IMHO. Thanks for the suggestions! I am a little bit concerned about assuming every FDT fragment is smaller than PAGE_SIZE. In case a child FDT is larger than PAGE_SIZE, I would like to turn the single u64 in the parent FDT into a u64 list to record all the underlying pages of the child FDT. To be concrete and make sure I understand your suggestions correctly, I drafted the following design, Suppose we have 2 KHO users, memblock and gpu@0x2000000000, the KHO FDT (top level FDT) would look like the following, /dts-v1/; / { compatible = "kho-v1"; memblock { kho,recursive-fdt = <0x00 0x40001000>; }; gpu@0x100000000 { kho,recursive-fdt = <0x00 0x40002000>; }; }; kho,recursive-fdt in "memblock" points to a page containing another FDT, / { compatible = "memblock-v1"; n1 { compatible = "reserve-mem-v1"; size = <0x04 0x00>; start = <0xc06b 0x4000000>; }; n2 { compatible = "reserve-mem-v1"; size = <0x04 0x00>; start = <0xc067 0x4000000>; }; }; Similarly, "kho,recursive-fdt" in "gpu@0x2000000000" points to a page containing another FDT, / { compatible = "gpu-v1" key1 = "v1"; key2 = "v2"; node1 { kho,recursive-fdt = <0x00 0x40003000 0x00 0x40005000>; } node2 { key3 = "v3"; key4 = "v4"; } } and kho,recursive-fdt in "node1" contains 2 non-contagious pages backing the following large FDT fragment, / { compatible = "gpu-subnode1-v1"; key5 = "v5"; key6 = "v6"; key7 = "v7"; key8 = "v8"; ... // many many keys and small values } In this way we assume that most FDT fragment is smaller than 1 page so "kho,recursive-fdt" is usually just 1 u64, but we can also handle larger fragments if that really happens. I also allow KHO users to add sub nodes in-place, instead of forcing to create a new FDT fragment for every sub node, if the KHO user is confident that those subnodes are small enough to fit in the parent node's page. In this way we do not need to waste a full page for a small sub node. An example is the "memblock" node above. Finally, the KHO top level FDT may also be larger than 1 page, this can be handled using the anchor-page method discussed in the previous mails. What do you think? Best, Changyuan