From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3C50E732FF for ; Thu, 28 Sep 2023 17:30:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4EAD78D00C6; Thu, 28 Sep 2023 13:30:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4750D8D00A1; Thu, 28 Sep 2023 13:30:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2EE528D00C6; Thu, 28 Sep 2023 13:30:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 1A0DB8D00A1 for ; Thu, 28 Sep 2023 13:30:03 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id E698CB3CA5 for ; Thu, 28 Sep 2023 17:30:02 +0000 (UTC) X-FDA: 81286694244.08.B71A2A8 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.145.221.124]) by imf08.hostedemail.com (Postfix) with ESMTP id 94280160015 for ; Thu, 28 Sep 2023 17:30:00 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HA0grIHv; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf08.hostedemail.com: domain of david@redhat.com designates 216.145.221.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695922200; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+4sYcMfxqTnzgFrCEZfVXXlsFEBmmiyRD3TMJswsdYg=; b=5+SuwX0llGucXPMTaGIJnRuHDYgu4tLtJY3p9uxlU2+rmHWl2MApKHt5rNQ+7hmiFzOzU/ hRi7gLK5rxVe+GIkfAa10BTI/9G8JpU+oRXeRsI2mcFZmdvz6XoOix4S7IZgON0o74yuqo PQUFdUhn78gi5sPWLnUzGSjTpAcX5xs= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HA0grIHv; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf08.hostedemail.com: domain of david@redhat.com designates 216.145.221.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695922200; a=rsa-sha256; cv=none; b=nEKqDDdHrTcnPuckOuUJmwoDo0vBjjXxP1Tqryc03OQyXaysiDuLEim4hh1B0ggUnlB8P6 XUBH4Vlp/q4jmTiT3qVQ+XwiqNHWMQKJcm/Ntu0Qn/YB/woTsZvBfjXVeA1PqbhLLbLwD+ FyymZn/bADtZICLc5Qth6sGqZcRbPJg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1695922199; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+4sYcMfxqTnzgFrCEZfVXXlsFEBmmiyRD3TMJswsdYg=; b=HA0grIHvvVxX0L7HR4ygTYQ2ES0OrS3CGLNLZRCGfYL5EsfnHad36K846/RdMpIH/Qug4g /4FiThFABplofffrpFv+PDoAbT4bIjKzaW+EqDEvvdkKbqpIauBap+WeH7fei7ZvSFokCi TzfhHsxES7i4OimKXch9oUC0YPB5HT8= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-2-v-owVdIgN8mqdZ4gRjd7Bw-1; Thu, 28 Sep 2023 13:29:56 -0400 X-MC-Unique: v-owVdIgN8mqdZ4gRjd7Bw-1 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-4053a5c6a59so110910405e9.3 for ; Thu, 28 Sep 2023 10:29:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695922196; x=1696526996; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=+4sYcMfxqTnzgFrCEZfVXXlsFEBmmiyRD3TMJswsdYg=; b=oRQ4RLdGbrg3crMJfDOoRePspSh+jGHnEps2HpiadgsX0AXfpY9RVkmC4hvxX+vBgs hNlfbKRKbDf+CzOW+ZitabrafbHk8Amjsfbbp8S3SoNauAfTqCYfY4DkKUJiD9qXMgf2 RlpwKqJO7BYi+oX0AwhhqSNllmXvPD5VTnsBHY9jaQldbbKY+1iUR2+86xyqiRdIPo6u CqMmR5t4mn5CNj5pYr8Ph2dRiujmX3R0NWa54RDCMHBypJsV43rpNKiYeULDO55kZiMf ng30mJtwztv9kWj/XAGEpeQljc48S6D44Z9nF5zRaAw7K+PZ8tBjrgsKsvd0oH6zodUF CvKQ== X-Gm-Message-State: AOJu0YyvFB1VVYKLXHnjcIysCKJybiExMY6YwB+2FgkcQ4P+8kgP6B3w Lcels2aobpMXyH+VqTZJhaeMHnf+u8SQnjM+YnxbRq2RFqanD4qJUnK6BMtqnSW08G0bG1+I3tg J7gZDy/hgoo4/7mOE3r4= X-Received: by 2002:a5d:58c1:0:b0:31c:3136:60af with SMTP id o1-20020a5d58c1000000b0031c313660afmr1404404wrf.61.1695922195718; Thu, 28 Sep 2023 10:29:55 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEAdjAUBqA8qaoWqAi1QmoFnrMG8cfswCS9Yn7LtSlgQgV/g93InbG4ZlwdY0nlh4hq9SurwA== X-Received: by 2002:a5d:58c1:0:b0:31c:3136:60af with SMTP id o1-20020a5d58c1000000b0031c313660afmr1404390wrf.61.1695922195320; Thu, 28 Sep 2023 10:29:55 -0700 (PDT) Received: from ?IPV6:2003:cb:c718:f00:b37d:4253:cd0d:d213? (p200300cbc7180f00b37d4253cd0dd213.dip0.t-ipconnect.de. [2003:cb:c718:f00:b37d:4253:cd0d:d213]) by smtp.gmail.com with ESMTPSA id b12-20020a5d634c000000b0031ad2f9269dsm19862621wrw.40.2023.09.28.10.29.53 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 28 Sep 2023 10:29:54 -0700 (PDT) Message-ID: <838c24a6-5866-a800-ba50-0311d4a4f1d2@redhat.com> Date: Thu, 28 Sep 2023 19:29:53 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 To: Baoquan He , Stanislav Kinsburskii Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, ebiederm@xmission.com, akpm@linux-foundation.org, stanislav.kinsburskii@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, kexec@lists.infradead.org, linux-mm@kvack.org, kys@microsoft.com, jgowans@amazon.com, wei.liu@kernel.org, arnd@arndb.de, gregkh@linuxfoundation.org, graf@amazon.de, pbonzini@redhat.com References: <01828.123092517290700465@us-mta-156.us.mimecast.lan> <58146.123092712145601339@us-mta-73.us.mimecast.lan> From: David Hildenbrand Organization: Red Hat Subject: Re: [RFC PATCH v2 0/7] Introduce persistent memory pool In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 94280160015 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 7pbk341r3ehwn6o8wzzdy6tug9um33zn X-HE-Tag: 1695922200-913117 X-HE-Meta: U2FsdGVkX19m3XgEK+IMo7WqElyA0nh8v4a7YRIArMcvqZCYGm9/HaKnb/wTFsrmmG8bVrky9WoF162AhdLLtvv1pd1d3/zdwBifhmB345B1RahyAvbqhn1Mc+5l5Tv4lJ0qQsDYepF6XllWz2+BhnmoVj+3jIIfnb+f41HsarFMgPBdmsmsp48hpeG8CV1bTr359K4eOaTE5rZouMptzh6S29KI34w5oXMNk4iTAuheoVexyZtivehrUQ/MwQOixaR6oHZHOv6Q+S2gwEYQiYiHR0G2vPFoeP3J1ovR9Q1+saOimr3cJiNfdKXhhy78TFcTl0Y2W4ZzpJPy1OHUVaI0FzLPGO7y5DWKBh7g8mglbnEUb0dfoEo+vb6yMrZT55L2JVfcWqqHWeBb2+x+0lbxCZ0Xa2sShxYQc28bEgm53Vcnx/7vLn/MsVMYkfouxDz+yFrsGICkVyepPs3W7RfdkJbguynd7jRN0VB2dg8uqUyI/jyqsT1pSKmxd9N1uGYb0gbezyklf/zf2BFGSjIxo3zUTPB59Ugoo0XvjeTOVizAHNaLK6zc8FI3tJvnTuCI7RJCUI09B1N8rzM+XWEnGzeYOmyy0+9lxIzR/TZdeBc0O656sfXPjlX5m2MM4Ow/YKQDvWRe+uehmslZOtQ79RnRxwtlpwVua3YCU5OoaHZoYJ7JZopDa+7VmMe34P6Nsbrbp59pQjqRdtyyQeNI/XK2Vs/dI9uJfA3AJY9tZzQorJXh9ZiTACQdRnHJUGxgnt39P5sC6xJnX/CMdk46yi31qQBlVys+wb5JKcM5Z/I7xMQAxHtWQc8vz8+SrTa9z8+9MuvxbnlLqo7+IpyK5/9R/gjmbR5ilX+8QzZUzKENXaclAbsC+FM2ZqM+zpNHgTzAxSL9pMbc7hnaWZP0cECIQIfrFWiLkQWxNRA/w1837Fz15B1wHqUbPVaXPE9HKJTR3s7XSCfOHWg ZG1IHxqK XIbON0ttm+Da5hnGi6yoPugbnDjoOgrPcBxV61Hj9NlpF/vNkstHe0Lpv5nFIpZpN+0j+bhd1OYtJOjPof2e5h/VDkcOhRFJ8JJUK27J1T0VULHdXpwB6N+CClcSBRysvcPT8CDerAVQ4DwZ41hR9qUdA5nF4qpREVgWDdZY+wuWz1aAR/5J9pkQcSpbqoV8ypJKdW64cahJaijZuYRwZJV76lztibyuWvg9uHSATuK1GOiiBzLxbcdnoAN+xnSRxy22kUa9IMLRl76HkIsFfptwffdeBd0ZYyWwRZM8ZrtxJsY+LMQBut31cy75pNlxdAs0phmClYgX4NYbgVmE3rvbjW3OWF5TWSxw2q3LV5uX+PAUsgR8f0aVcRs0K34NLhbj/EzyE20IO0zbBDmqDm0cjfLkBmhYTlNcPI4N2qVY6O/t4A0cKQav6Ml3Wo/rGGz9XS8dLzx0sIRxbOxLpxIc9UVFu69YVOry+T2QoD7+JFHB9LHWVsYcydvgZY0j4DsAr X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 28.09.23 12:25, Baoquan He wrote: > On 09/27/23 at 09:13am, Stanislav Kinsburskii wrote: >> On Wed, Sep 27, 2023 at 01:44:38PM +0800, Baoquan He wrote: >>> Hi Stanislav, >>> >>> On 09/25/23 at 02:27pm, Stanislav Kinsburskii wrote: >>>> This patch introduces a memory allocator specifically tailored for >>>> persistent memory within the kernel. The allocator maintains >>>> kernel-specific states like DMA passthrough device states, IOMMU state, and >>>> more across kexec. >>> >>> Can you give more details about how this persistent memory pool will be >>> utilized in a actual scenario? I mean, what problem have you met so that >>> you have to introduce persistent memory pool to solve it? >>> >> >> The major reason we have at the moment, is that Linux root partition >> running on top of the Microsoft hypervisor needs to deposit pages to >> hypervisor in runtime, when hypervisor runs out of memory. >> "Depositing" here means, that Linux passes a set of its PFNs to the >> hypervisor via hypercall, and hypervisor then uses these pages for its >> own needs. >> >> Once deposited, these pages can't be accessed by Linux anymore and thus >> must be preserved in "used" state across kexec, as hypervisor state is >> unware of kexec. In the same time, these pages can we withdrawn when >> usused. Thus, an allocator persistent across kexec looks reasonable for >> this particular matter. > > Thanks for these details. > > The deposit and withdraw remind me the Balloon driver, David's virtio-mem, > DLPAR on ppc which can hot increasing or shrinking phisical memory on guest > OS. Can't microsoft hypervisor do the similar thing to reclaim or give > back the memory from or to the 'Linux root partition' running on top of > the hypervisor? virtio-mem was designed with kexec support in mind. You only expose the initial memory to the second kernel, and that memory can never have such holes. That does not apply to memory ballooning implementations, like Hyper-V dynamic memory. In the virtio-mem paper I have the following: "In our experiments, Hyper-V VMs crashed reliably when trying to use kexec under Linux for fast OS reboots with an inflated balloon. Other memory ballooning mechanisms either have to temporarily deflate the whole balloon or al- low access to inflated memory, which is undesired in cloud environments." I remember XEN does something elaborate, whereby they allow access to all inflated memory during reboot, but limit the total number of pages they will hand out. IIRC, you then have to work around things like "Windows initializes all memory with 0s when booting, and cope with that". So there are ways how hypervisors handled that in the past. -- Cheers, David / dhildenb