From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B05FCE7AFC for ; Thu, 28 Sep 2023 18:00:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 17FC38D00CD; Thu, 28 Sep 2023 14:00:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 130038D0053; Thu, 28 Sep 2023 14:00:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F12038D00CD; Thu, 28 Sep 2023 14:00:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id DDD7E8D0053 for ; Thu, 28 Sep 2023 14:00:17 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id AEC531A021A for ; Thu, 28 Sep 2023 18:00:17 +0000 (UTC) X-FDA: 81286770474.29.243CEA4 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.43]) by imf04.hostedemail.com (Postfix) with ESMTP id 0B10D40032 for ; Thu, 28 Sep 2023 18:00:14 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=I48Og4Mk; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf04.hostedemail.com: domain of dave.hansen@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=dave.hansen@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695924015; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=q37dYC0opXLdgExInyvPD5Paum8zgufwweJdqHBeaNA=; b=RS3NNm16jnxrfjoqr8UrUQQHydNR9kde0CNQZJbzJ1UGhyv1f4++kQSkzhkLfK3c7vi2Pq p6LlN4zzyVyFPA23db3BSG/YZUoNol4z85qL22rjeTDzLo9NcpSSeAFCCARh3V+Z3svHK6 J5YavVrI0cHFHHPU00Ibex2sNUrFgrg= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=I48Og4Mk; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf04.hostedemail.com: domain of dave.hansen@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=dave.hansen@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695924015; a=rsa-sha256; cv=none; b=QrrDLjs1GEQ1R78iONkGAv4aVfrjEwJB7SqNuDAYr7rR8nCDGAamxEOlHXFQZK4LpkA2rj cnnzZMk79b5M2A51kXhWp9WCMWoINlqeYRhAuyZJXTqNafuDKkrlntAkNYazeNdDy37+6V /nGzalCZa4o9ToxC88QbtOSZxH26658= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695924015; x=1727460015; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=k08d4cJZrEP2o/SLqjoaj5SIzc6BygHuqPno8v6qPGw=; b=I48Og4Mk0Qq7PTxqkxfVgCIjA1jjkFdxDgHCpAKZQcN9TtKqorGIwOG4 Eq/KhE718MBuzbIjz7MbWNTOmTa7vcEm0aJ+MbUwwiJda3PDlxwfD9pWB V58sw5GD+7RcpTEO+kWN6eXOLDsBeUG2rDTCRR3H0Hh9+Ut/89cdW9os9 W3LH2R4cdrNJ3JjXM9s4r49As5UXJNEkba641dWZaQV3zGzlbdniUoIXO Bl9i8UkMLB80T4sDmms7mMcUp8g9EWUIMwNC4KWuVV2v8WZgofEBHUIyx jWJ6J7meLFwW/Rd5Iw5PNQfaRfMArc9upVVnBTvrmNHARJ0o4pU6mkLAf Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10847"; a="468414160" X-IronPort-AV: E=Sophos;i="6.03,184,1694761200"; d="scan'208";a="468414160" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Sep 2023 11:00:13 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10847"; a="699374241" X-IronPort-AV: E=Sophos;i="6.03,184,1694761200"; d="scan'208";a="699374241" Received: from jveerasa-mobl.amr.corp.intel.com (HELO [10.255.231.134]) ([10.255.231.134]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Sep 2023 11:00:12 -0700 Message-ID: <760bbb08-83b4-7bb1-822f-2ceba26278a6@intel.com> Date: Thu, 28 Sep 2023 11:00:12 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [RFC PATCH v2 0/7] Introduce persistent memory pool Content-Language: en-US To: Stanislav Kinsburskii Cc: Baoquan He , tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, ebiederm@xmission.com, akpm@linux-foundation.org, stanislav.kinsburskii@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, kexec@lists.infradead.org, linux-mm@kvack.org, kys@microsoft.com, jgowans@amazon.com, wei.liu@kernel.org, arnd@arndb.de, gregkh@linuxfoundation.org, graf@amazon.de, pbonzini@redhat.com, "Shutemov, Kirill" References: <01828.123092517290700465@us-mta-156.us.mimecast.lan> <20230927161319.GA19976@skinsburskii.> <20230927232548.GA20221@skinsburskii.> <20230928000230.GA20259@skinsburskii.> From: Dave Hansen In-Reply-To: <20230928000230.GA20259@skinsburskii.> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 0B10D40032 X-Stat-Signature: 6esbhqbm1zb7dedbwfsz457881ssero8 X-HE-Tag: 1695924014-133067 X-HE-Meta: U2FsdGVkX18uP7xx2MS5Xxqv/s6S44Myp/9h22UVAE1iE+fWxRo13EP9o5EfSjoGcBceHuKQR2ETZoxTJ5VxpFyAPtbDcYdEXyU19aylIdx+SzujnStQvwWSbb4o/veSi+k+k6iZbu70yIft2y6B0tJu2YZrB6bwPCElktz7vRluN0YWo5ixWCcf6xRWJzcAOHDp/xlcnkQFmIiUlKGdW6Y7iHP7duPLpCwFHPEWdTc8gw402objx+LuYHV8g8RcqIBeo52Tvrls4Ub2fBEbF/WfAOqRrpji+NZPxN/388Y5Ldp4+uXh+myluZQ1L9DFLpx4ds8HDyVXd6xt1tei90SN5obnSk7D24/Oall0DRyNA3oLbf39t8CpQlsiYAKo8vapUsYhAnSLZwleBfP7hXahltV1QJvBpAdUiX917BrAPpYewcALm9MLPQa9FBtwH+LYd9p7rQykYj5++yXUVNpS8ji/212Wrf8coE/h5S44hUil0HD2MWF6xfOxbK+5BdjesNh0KRA+qylaWfmKzgN9Ikrcn/ibYHB4zVEmX5JTv51aWqGVeKlc36SUwt50KbJ5gNh7LmGdkYLJLse+jI4Hv2Em+ZYbqIDT4XfabD7oO2nbJ//Ht+zI2+I1iwEsFPxCRZ6zHH8NLPDBmLDAkPXOpY6JLE4j2g/B7mQnOtGOU8vwPKQV5V0Yn8lo1WoWoKvFNvEvKs1TLoHeAYpsvGwKuw+abBABtV3K1V6i1gVeNgyoEzpfPmXoZJisGmBHS/P1yvjMeNNi83uNRwWhIygL+SaaVLoRzuxhw9lmUrFiA20QFBJfVlT1HthvRFlntWvj1hdZAUmDhP+8hxF9GYCdHy/GqdXe0QmuKfmkYRaX5IZX18qUzd35jzkFMXlEKt3pNLLVkNHbVyQlgc7tFYGyIrDJjEGOCFbcfHgzWi08G+o4aR0U9EJo9DjqqZyGo1KAeJWPugrXUnWNFrI 9x8uGjBD pfVARc3rZ5EnZKkCnGz3pTDPsYG0vqlBBZbFC6cGzAkECxadYHowVI8SNBNY5l6+gpu0HYivKhSZOPaQKG6IvLs1uAD9xXKEEmTZMbB/x3I2P9n7S8cu1tZaXa1m25CJHPCjLA3hMDTWQMn/jIHToJbI9JUG/bmTJoQ/PU/ScdTF4Me+zme9gZyPS2xYYl94DiQz++W80ewT5p7aqhujBnDw5Vw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 9/27/23 17:02, Stanislav Kinsburskii wrote: > On Thu, Sep 28, 2023 at 10:29:32AM -0700, Dave Hansen wrote: ... > Well, not exactly. That's something I'd like to have indeed, but from my > POV this goal is out of scope of discussion at the moment. > Let me try to express it the same way you did above: > > 1. Boot some kernel > 2. Grow the deposited memory a bunch > 5. Kexec > 4. Kernel panic due to GPF upon accessing the memory deposited to > hypervisor. I basically consider this a bug in the first kernel. It *can't* kexec when it's left RAM in shambles. It doesn't know what features the new kernel has and whether this is even safe. Can the new kernel even read the new device tree data? >> Can't the deposited memory just be shrunk before kexec? Surely there >> aren't a bunch of pathological things consuming that memory right before >> kexec, which is basically a reboot. > > In general it can. But for this to happen hypervisor needs to release > this memory. And it can release the memory iff the guests are stopped. > And stopping the guests during kexec isn't something we want to have in the > long run. > Also, even if we stop the guests before kexec, we need to restart them > after boot meaning we have to deposit the pages once again. > All this: stopping the guests, withdrawing the pages upon kexec, > allocating after boot and depostiting them again significatnly affect > guests downtime. Ahh, and you're presumably kexec'ing in the first place because you've got a bug in the first kernel and you want a second kernel with fewer bugs. I still think the only way this will possibly work when kexec'ing both old and new kernels is to do it with the memory maps that *all* kernels can read. Can the hypervisor be improved to make this release operation faster?