From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45B3CE743D0 for ; Thu, 28 Sep 2023 23:56:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 94E7C8D0072; Thu, 28 Sep 2023 19:56:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8FE7E8D0002; Thu, 28 Sep 2023 19:56:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7C5BA8D0072; Thu, 28 Sep 2023 19:56:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6C3318D0002 for ; Thu, 28 Sep 2023 19:56:49 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 31724140F27 for ; Thu, 28 Sep 2023 23:56:49 +0000 (UTC) X-FDA: 81287668938.09.BB14CC5 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf05.hostedemail.com (Postfix) with ESMTP id 67792100009 for ; Thu, 28 Sep 2023 23:56:47 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=gMNZnEdq; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf05.hostedemail.com: domain of bhe@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bhe@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695945407; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GmH6ulJd+9yyiGz4CzogSJiZ2VjE+uu1xK5/kan0/PQ=; b=mIJpugJjuCJzxJDNlc/gM387ePpXnwlAMAxoAMDfhIaEVD++6Sr7qwFp+3+XGoS+fiC7mw fwrjPgh7GBnw/NBV336gpbP6dQqYl5r8Nb3WIrt3ditVDsgaKMe2aJr10wLhi7pGhBnMEy Hzvvd6eKYBx/le0DQ8pUkaxf+Ube0do= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=gMNZnEdq; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf05.hostedemail.com: domain of bhe@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bhe@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695945407; a=rsa-sha256; cv=none; b=yBEXsVS6mEM+SpI41CkqwAAnlbetps+6YPwENehWwsh4rkoC4M758vqJjHF9cuGXjDUs/Q JvgQe70AY3qX+1Nw5H2OxIm6GWZ+iXL/X2QXPpHt/8vmcsME55XKwWNND8Tlv2h1UWdgKN 0vmxxCQvYNOD/+90/PGwfNiPi6lErdk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1695945406; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=GmH6ulJd+9yyiGz4CzogSJiZ2VjE+uu1xK5/kan0/PQ=; b=gMNZnEdqBCpz92+QkZfJj4rPT0zC8T6w7v8qOwiwPylXaf3lh8qPItsWMqeoIHqGvSqW2k xguxH9IVAmGWW9q51AFvB+ZY5KUcGUA+0N4kgctNhGMXli/ajWRTODxcYdFu9qPN79HYsE W0qE8yO3jLdc9nCt41aRVngFTLyd03k= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-440--9d5HYC4NbCzZbJc3VfQuQ-1; Thu, 28 Sep 2023 19:56:42 -0400 X-MC-Unique: -9d5HYC4NbCzZbJc3VfQuQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E9C693811F22; Thu, 28 Sep 2023 23:56:40 +0000 (UTC) Received: from localhost (unknown [10.72.112.5]) by smtp.corp.redhat.com (Postfix) with ESMTPS id DC1A440C6E76; Thu, 28 Sep 2023 23:56:39 +0000 (UTC) Date: Fri, 29 Sep 2023 07:56:37 +0800 From: Baoquan He To: Stanislav Kinsburskii Cc: Dave Hansen , tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, ebiederm@xmission.com, akpm@linux-foundation.org, stanislav.kinsburskii@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, kexec@lists.infradead.org, linux-mm@kvack.org, kys@microsoft.com, jgowans@amazon.com, wei.liu@kernel.org, arnd@arndb.de, gregkh@linuxfoundation.org, graf@amazon.de, pbonzini@redhat.com, "Shutemov, Kirill" Subject: Re: [RFC PATCH v2 0/7] Introduce persistent memory pool Message-ID: References: <20230927161319.GA19976@skinsburskii.> <20230927232548.GA20221@skinsburskii.> <20230928000230.GA20259@skinsburskii.> <760bbb08-83b4-7bb1-822f-2ceba26278a6@intel.com> <20230928003831.GA20366@skinsburskii.> <340596c9-d55d-5f8a-fa27-d95b0e10b20a@intel.com> <64208.123092816192300612@us-mta-483.us.mimecast.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <64208.123092816192300612@us-mta-483.us.mimecast.lan> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 67792100009 X-Stat-Signature: n14orhwmew3hs1bjecsghj1obyi7zm68 X-HE-Tag: 1695945407-481857 X-HE-Meta: U2FsdGVkX1/+TaIsNINCCwLRBrUmtg2GD7fKmGcGyJ8JSXdXbaMtInX/6HQkGwtzWRWW47AJNpNPhVvYqOHDMQ3Z283yyb1OVeenvLZR0x0V1E+ZowHNsPCpKMiNkH29aA0l/s53mHBEA+9zVCNTKqPdsK8m+5VYub2DpwYQUHdjBd3wn05KlPnsNTsZmHuz8Cfmpg/TzPh7MMFPlO/0ujkZZvKi0j/vsHnK9htt9oLK2CRXY14hzV9sKE0zMFg4ffOwTXuCvKISdrpwODFsU9p0mM8ClOmtAJ7YJSGiCiXOJl0eabe43NK5QWqj+ru1EagOtH16hQHXUQohtdC1qO6GKgfQ4dKBkNbMItvau+X9XQFWq3oFvb3/26ujYtZHpOfGI7eFII6ugI1P5Aq0EtC0/jyuujYzMKUeLhGQrHqtZWLdIBuua185yg7gedwoBtrFMBpk2oyX6963sqBqpzOQxqov8k2ZU9HWrx/LcI91aLytTbUL0NQ7mJsLnIVrphYy2/vD31O7xlkow0te7r0po/oIkLh8LRmzvWoXQIhNuVquUjycGbXUj/E02yXIpOBdYJO4S/QumgeXOVLUO5SQqUhkiXNWdq106LazO0ki6QYyOXvPL0+20saHlBBHisFfuKRm8SeqrBA6IFi/aOuGpkyEW0SlyQgpC1szMjkztOf40Emhi8yRQPncVwa/CFjpxZ+qHcMht45prq1j2Jri8ZpL8Rtdj3g191TWtiY9Bp2Qk9WKu0xkLz5YiQocEIhEDwhTdsVlV1pl/UkJXfXVpUZK1wOiGVy3ZinQOTOUVGtPmR/2hfyIHyw824Cqi7WgCqDR0GZYmsnPw7gIyoVxlt4Gzv2/dhRltYzN+A4RFr2ZpiyVDzMdVFD1ik551/PXHlgv21B4V7elDCuFJ5wg2NTX30S87pxccwHG455wcd8aKW7dftkIZCt6aJ00S40bDBSCd5eholFdkAf tjdiBrBb Ual6GWJVw+vVuFVDj9yxOGK8Kq9PWJUWfH6pUgD0PjR2C4zdAfl4MF0bCvfNMwKZpB5dgxOvWLz1h0lLEdZxZcX+RTa+Rd+rEMV3f1B900CToXUeh7Mpaqe8o6q5QdW2icu4HkesMU0XWmLYujlgKHlUR4XeY+1VrHnxLkZmfz+K2Dy+uMn9o5UaJcDuGhWPiwfpRoSBX/Qs17CT4wwfFnziB2L9u+aNIZNHsmiXX5kwRdnyNCiGgBEeCNWBRXwKcNzTpShZfV8SZVe6yvv9W20lmZg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 09/27/23 at 07:46pm, Stanislav Kinsburskii wrote: > On Thu, Sep 28, 2023 at 12:16:31PM -0700, Dave Hansen wrote: > > On 9/27/23 17:38, Stanislav Kinsburskii wrote: > > > On Thu, Sep 28, 2023 at 11:00:12AM -0700, Dave Hansen wrote: > > >> On 9/27/23 17:02, Stanislav Kinsburskii wrote: > > >>> On Thu, Sep 28, 2023 at 10:29:32AM -0700, Dave Hansen wrote: > > >> ... > > >>> Well, not exactly. That's something I'd like to have indeed, but from my > > >>> POV this goal is out of scope of discussion at the moment. > > >>> Let me try to express it the same way you did above: > > >>> > > >>> 1. Boot some kernel > > >>> 2. Grow the deposited memory a bunch > > >>> 5. Kexec > > >>> 4. Kernel panic due to GPF upon accessing the memory deposited to > > >>> hypervisor. > > >> > > >> I basically consider this a bug in the first kernel. It *can't* kexec > > >> when it's left RAM in shambles. It doesn't know what features the new > > >> kernel has and whether this is even safe. > > >> > > > > > > Could you elaborate more on why this is a bug in the first kernel? > > > Say, kernel memory can be allocated in big physically consequitive > > > chunks by the first kernel for depositing. The information about these > > > chunks is then passed the the second kernel via FDT or even command > > > line, so the seconds kernel can reserve this region during booting. > > > What's wrong with this approach? > > > > How do you know the second kernel can parse the FDT entry or the > > command-line you pass to it? > > > > >> Can the new kernel even read the new device tree data? > > > > > > I'm not sure I understand the question, to be honest. > > > Why can't it? This series contains code parts for both first and seconds > > > kernels. > > > > How do you know the second kernel isn't the version *before* this series > > gets merged? > > > > The answer to both questions above is the following: the feature is deployed > fleed-wide first, and enabled only upon the next deployment. > It worth mentioning, that fleet-wide deployments usually don't need to support > updates to a version older that the previous one. > Also, since kexec is initialited by user space, it always can be > enlightened about kernel capabilities and simply don't kexec to an > incompatible kernel version. > One more bit to mention, that it real life this problme exists only > during initial transition, as once the upgrade to a kernel with a > feature has happened, there won't be a revert to a versoin without it. > > > ... > > >> I still think the only way this will possibly work when kexec'ing both > > >> old and new kernels is to do it with the memory maps that *all* kernels > > >> can read. > > > > > > Could you elaborate more on this? > > > The avaiable memory map actually stays the same for both kernels. The > > > difference here can be in a different list of memory regions to reserve, > > > when the first kernel allocated and deposited another chunk, and thus > > > the second kernel needs to reserve this memory as a new region upon > > > booting. > > > > Please take a step back from your implementation for a moment. There > > are two basic design points that need to be considered. > > > > First, *must* "System RAM" (according to the memory map) be persisted > > across kexec? If no, then there's no problem to solve and we can stop > > this thread. If yes, then some mechanism must be used to tell the new > > kernel that the "System RAM" in the memory map is not normal RAM. > > > > Second, *if* we agree that some data must communicate across kexec, then > > what mechanism should be used? You're arguing for a new mechanism that > > only new kernels can use. I'm arguing that you should likely reuse an > > existing mechanism (probably the UEFI/e820 maps) so that *ALL* kernels > > can consume the information, old and new. > > > > I'd answer yes, "System MAP" must be persisted across kexec. > Could you elaborate on why there should be a mechanism to tell the > kernel anything special about the existent "System map" in this context? > Say, one can reserve a CMA region (or a crash kernel region, etc), store > there some data, and then pass it across kexec. Reserved CMA region will > still be a part of the "System MAP", won't it? Well, I haven't gone through all the discusison thread and clearly got your intention and motivation. But here I have to say there's misunderstanding. At least I am astonished when I heard the above description. Who said a CMA region or a crahs kernel region need be passed across kexec. Think kexec as a bootloader, in essence it's no different than any other bootloader. When it jumps to 2nd kernel, the whole system will be booted up and reconstructed on the system resources. All the difference kexec has is it won't go through firmware to do those detecting/testing/init. If the intentionn is to preserve any state or region in 1st kernel, you absolutely got it wrong. This is not the first time people want to put burden on kexec because of a specifica scenario, and this is not the 2nd time, and not 3rd time in the recent 2 years. But I would say please think about what is kexec reboot, what we expect it to do, whether the problem be fixed in its own side.