From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt0-f200.google.com (mail-qt0-f200.google.com [209.85.216.200]) by kanga.kvack.org (Postfix) with ESMTP id B86DB6B0273 for ; Fri, 13 Apr 2018 10:01:45 -0400 (EDT) Received: by mail-qt0-f200.google.com with SMTP id j2so5595658qtl.1 for ; Fri, 13 Apr 2018 07:01:45 -0700 (PDT) Received: from mx1.redhat.com (mx3-rdu2.redhat.com. [66.187.233.73]) by mx.google.com with ESMTPS id k11si7563546qtb.3.2018.04.13.07.01.44 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 13 Apr 2018 07:01:44 -0700 (PDT) Subject: Re: [PATCH RFC 0/8] mm: online/offline 4MB chunks controlled by device driver References: <20180413131632.1413-1-david@redhat.com> <20180413134414.GS17484@dhcp22.suse.cz> From: David Hildenbrand Message-ID: <3545ef32-14db-25ab-bf1a-56044402add3@redhat.com> Date: Fri, 13 Apr 2018 16:01:43 +0200 MIME-Version: 1.0 In-Reply-To: <20180413134414.GS17484@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: linux-mm@kvack.org On 13.04.2018 15:44, Michal Hocko wrote: > [If you choose to not CC the same set of people on all patches - which > is sometimes a legit thing to do - then please cc them to the cover > letter at least.] > > On Fri 13-04-18 15:16:24, David Hildenbrand wrote: >> I am right now working on a paravirtualized memory device ("virtio-mem"). >> These devices control a memory region and the amount of memory available >> via it. Memory will not be indicated via ACPI and friends, the device >> driver is responsible for it. > > How does this compare to other ballooning solutions? And why your driver > cannot simply use the existing sections and maintain subsections on top? > (further down in this mail is a small paragraph about that) All existing balloon implementations work on all memory available in the system. Some of them are able to add memory later on (XEN, Hyper-V), others are not (virtio-balloon). Having this model allows to plug/unplug memory NUMA aware on a fine granularity (e.g. 4MB), while making the implementation in the hypervisor a level of magnitudes easier. We could have multiple paravirtualized memory devices, e.g. one for each NUMA node. E.g. when rebooting we don't have to resize any initial system memory (a820, ACPI ...), but only care about the memory region of this one device. By adding memory by the device driver, we can actually remove the memory blocks again, freeing up the struct pages. Also, I tend to not call the solution a balloon driver, rather "paravirtualized memory". It is something like a balloon, but we are not going to start fragmenting on a page level. There is more to it, but this should cover the basics. "And why your driver cannot simply use the existing sections and maintain subsections on top?" Can you elaborate how that is going to work? What I do as of now, is to remember for each memory block (basically a section because I want to make it as small as possible) which chunks ("subsections") are online/offline. This works just fine. Is this what you are referring to? However when it comes to marking a section finally offline or telling kdump to not touch offline pages, I need the PG_offline. (I had a prototype where I marked the sections manually offline once I knew everything in it was offline, but that looked rather hackish) Thanks for having a look! -- Thanks, David / dhildenb