From: David Hildenbrand <david@redhat.com>
To: Dave Hansen <dave.hansen@linux.intel.com>, linux-mm@kvack.org
Cc: xen-devel@lists.xenproject.org, devel@linuxdriverproject.org,
linux-acpi@vger.kernel.org, linux-sh@vger.kernel.org,
linux-s390@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
linux-kernel@vger.kernel.org, linux-ia64@vger.kernel.org,
"Tony Luck" <tony.luck@intel.com>,
"Fenghua Yu" <fenghua.yu@intel.com>,
"Benjamin Herrenschmidt" <benh@kernel.crashing.org>,
"Paul Mackerras" <paulus@samba.org>,
"Michael Ellerman" <mpe@ellerman.id.au>,
"Martin Schwidefsky" <schwidefsky@de.ibm.com>,
"Heiko Carstens" <heiko.carstens@de.ibm.com>,
"Yoshinori Sato" <ysato@users.sourceforge.jp>,
"Rich Felker" <dalias@libc.org>,
"Andy Lutomirski" <luto@kernel.org>,
"Peter Zijlstra" <peterz@infradead.org>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Ingo Molnar" <mingo@redhat.com>,
"Borislav Petkov" <bp@alien8.de>,
"H. Peter Anvin" <hpa@zytor.com>,
"Rafael J. Wysocki" <rjw@rjwysocki.net>,
"Len Brown" <lenb@kernel.org>,
"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
"K. Y. Srinivasan" <kys@microsoft.com>,
"Haiyang Zhang" <haiyangz@microsoft.com>,
"Stephen Hemminger" <sthemmin@microsoft.com>,
"Boris Ostrovsky" <boris.ostrovsky@oracle.com>,
"Juergen Gross" <jgross@suse.com>,
"Jérôme Glisse" <jglisse@redhat.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Mike Rapoport" <rppt@linux.vnet.ibm.com>,
"Dan Williams" <dan.j.williams@intel.com>,
"Stephen Rothwell" <sfr@canb.auug.org.au>,
"Michal Hocko" <mhocko@suse.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
"Nicholas Piggin" <npiggin@gmail.com>,
"Jonathan Neuschäfer" <j.neuschaefer@gmx.net>,
"Joe Perches" <joe@perches.com>,
"Michael Neuling" <mikey@neuling.org>,
"Mauricio Faria de Oliveira" <mauricfo@linux.vnet.ibm.com>,
"Balbir Singh" <bsingharora@gmail.com>,
"Rashmica Gupta" <rashmica.g@gmail.com>,
"Pavel Tatashin" <pavel.tatashin@microsoft.com>,
"Rob Herring" <robh@kernel.org>,
"Philippe Ombredanne" <pombredanne@nexb.com>,
"Kate Stewart" <kstewart@linuxfoundation.org>,
"mike.travis@hpe.com" <mike.travis@hpe.com>,
"Joonsoo Kim" <iamjoonsoo.kim@lge.com>,
"Oscar Salvador" <osalvador@suse.de>,
"Mathieu Malaterre" <malat@debian.org>
Subject: Re: [PATCH RFC] mm/memory_hotplug: Introduce memory block types
Date: Mon, 1 Oct 2018 11:13:43 +0200 [thread overview]
Message-ID: <147d20c7-2a07-2305-9b44-76fdb735173b@redhat.com> (raw)
In-Reply-To: <5dba97a5-5a18-5df1-5493-99987679cf3a@linux.intel.com>
On 28/09/2018 19:02, Dave Hansen wrote:
> It's really nice if these kinds of things are broken up. First, replace
> the old want_memblock parameter, then add the parameter to the
> __add_page() calls.
Definitely, once we agree that is is not nuts, I will split it up for
the next version :)
>
>> +/*
>> + * NONE: No memory block is to be created (e.g. device memory).
>> + * NORMAL: Memory block that represents normal (boot or hotplugged) memory
>> + * (e.g. ACPI DIMMs) that should be onlined either automatically
>> + * (memhp_auto_online) or manually by user space to select a
>> + * specific zone.
>> + * Applicable to memhp_auto_online.
>> + * STANDBY: Memory block that represents standby memory that should only
>> + * be onlined on demand by user space (e.g. standby memory on
>> + * s390x), but never automatically by the kernel.
>> + * Not applicable to memhp_auto_online.
>> + * PARAVIRT: Memory block that represents memory added by
>> + * paravirtualized mechanisms (e.g. hyper-v, xen) that will
>> + * always automatically get onlined. Memory will be unplugged
>> + * using ballooning, not by relying on the MOVABLE ZONE.
>> + * Not applicable to memhp_auto_online.
>> + */
>> +enum {
>> + MEMORY_BLOCK_NONE,
>> + MEMORY_BLOCK_NORMAL,
>> + MEMORY_BLOCK_STANDBY,
>> + MEMORY_BLOCK_PARAVIRT,
>> +};
>
> This does not seem like the best way to expose these.
>
> STANDBY, for instance, seems to be essentially a replacement for a check
> against running on s390 in userspace to implement a _typical_ s390
> policy. It seems rather weird to try to make the userspace policy
> determination easier by telling userspace about the typical s390 policy
> via the kernel.
Now comes the fun part: I am working on another paravirtualized memory
hotplug way for KVM guests, based on virtio ("virtio-mem").
These devices can potentially be used concurrently with
- s390x standby memory
- DIMMs
How should a policy in user space look like when new memory gets added
- on s390x? Not onlining paravirtualized memory is very wrong.
- on e.g. x86? Onlining memory to the MOVABLE zone is very wrong.
So the type of memory is very important here to have in user space.
Relying on checks like "isS390()", "isKVMGuest()" or "isHyperVGuest()"
to decide whether to online memory and how to online memory is wrong.
Only some specific memory types (which I call "normal") are to be
handled by user space.
For the other ones, we exactly know what to do:
- standby? don't online
- paravirt? always online to normal zone
I will add some more details as reply to Michal.
>
> As for the OOM issues, that sounds like something we need to fix by
> refusing to do (or delaying) hot-add operations once we consume too much
> ZONE_NORMAL from memmap[]s rather than trying to indirectly tell
> userspace to hurry thing along.
That is a moving target and doing that automatically is basically
impossible. You can add a lot of memory to the movable zone and
everything is fine. Suddenly a lot of processes are started - boom.
MOVABLE should only every be used if you expect an unplug. And for
paravirtualized devices, a "typical" unplug does not exist.
>
> So, to my eye, we need:
>
> +enum {
> + MEMORY_BLOCK_NONE,
> + MEMORY_BLOCK_STANDBY, /* the default */
> + MEMORY_BLOCK_AUTO_ONLINE,
> +};
auto-online is strongly misleading, that's why I called it "normal", but
I am open for suggestions. The information about devices handles fully
in the kernel - "paravirt" is key for me.
>
> and we can probably collapse NONE into AUTO_ONLINE because userspace
> ends up doing the same thing for both: nothing.
For external reasons, yes, for internal reasons no (see hmm/device
memory). In user space, we will never end up with MEMORY_BLOCK_NONE,
because there is no memory block.
>
>> struct memory_block {
>> unsigned long start_section_nr;
>> unsigned long end_section_nr;
>> @@ -34,6 +58,7 @@ struct memory_block {
>> int (*phys_callback)(struct memory_block *);
>> struct device dev;
>> int nid; /* NID for this memory block */
>> + int type; /* type of this memory block */
>> };
>
> Shouldn't we just be creating and using an actual named enum type?
>
That makes sense.
Thanks!
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2018-10-01 9:14 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-28 15:03 David Hildenbrand
2018-09-28 17:02 ` Dave Hansen
2018-10-01 9:13 ` David Hildenbrand [this message]
2018-10-01 16:24 ` Dave Hansen
2018-10-04 7:48 ` David Hildenbrand
2018-10-01 8:40 ` Michal Hocko
2018-10-01 9:34 ` David Hildenbrand
2018-10-02 13:47 ` Michal Hocko
2018-10-02 15:25 ` David Hildenbrand
2018-10-03 13:38 ` Vitaly Kuznetsov
2018-10-03 13:44 ` Michal Hocko
2018-10-03 13:52 ` Vitaly Kuznetsov
2018-10-03 14:07 ` Dave Hansen
2018-10-03 14:34 ` Vitaly Kuznetsov
2018-10-03 17:14 ` David Hildenbrand
2018-10-04 6:19 ` Michal Hocko
2018-10-04 8:13 ` David Hildenbrand
2018-10-04 15:28 ` Michal Suchánek
2018-10-04 15:45 ` David Hildenbrand
2018-10-04 17:50 ` Michal Suchánek
2018-10-05 7:37 ` David Hildenbrand
2018-10-03 14:24 ` Michal Hocko
2018-10-03 17:06 ` David Hildenbrand
2018-10-04 8:12 ` David Hildenbrand
2018-10-03 13:54 ` Michal Hocko
2018-10-03 17:00 ` David Hildenbrand
2018-10-04 6:28 ` Michal Hocko
2018-10-04 7:40 ` David Hildenbrand
2018-11-23 11:13 ` David Hildenbrand
2018-11-23 18:06 ` Michal Suchánek
2018-11-26 12:30 ` David Hildenbrand
2018-11-26 13:33 ` David Hildenbrand
2018-11-26 14:20 ` Michal Suchánek
2018-11-26 15:59 ` David Hildenbrand
2018-11-27 16:32 ` Michal Suchánek
2018-11-27 16:47 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=147d20c7-2a07-2305-9b44-76fdb735173b@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=benh@kernel.crashing.org \
--cc=boris.ostrovsky@oracle.com \
--cc=bp@alien8.de \
--cc=bsingharora@gmail.com \
--cc=dalias@libc.org \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=devel@linuxdriverproject.org \
--cc=fenghua.yu@intel.com \
--cc=gregkh@linuxfoundation.org \
--cc=haiyangz@microsoft.com \
--cc=heiko.carstens@de.ibm.com \
--cc=hpa@zytor.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=j.neuschaefer@gmx.net \
--cc=jglisse@redhat.com \
--cc=jgross@suse.com \
--cc=joe@perches.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kstewart@linuxfoundation.org \
--cc=kys@microsoft.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-ia64@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-s390@vger.kernel.org \
--cc=linux-sh@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=luto@kernel.org \
--cc=malat@debian.org \
--cc=mauricfo@linux.vnet.ibm.com \
--cc=mhocko@suse.com \
--cc=mike.travis@hpe.com \
--cc=mikey@neuling.org \
--cc=mingo@redhat.com \
--cc=mpe@ellerman.id.au \
--cc=npiggin@gmail.com \
--cc=osalvador@suse.de \
--cc=paulus@samba.org \
--cc=pavel.tatashin@microsoft.com \
--cc=peterz@infradead.org \
--cc=pombredanne@nexb.com \
--cc=rashmica.g@gmail.com \
--cc=rjw@rjwysocki.net \
--cc=robh@kernel.org \
--cc=rppt@linux.vnet.ibm.com \
--cc=schwidefsky@de.ibm.com \
--cc=sfr@canb.auug.org.au \
--cc=sthemmin@microsoft.com \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=xen-devel@lists.xenproject.org \
--cc=ysato@users.sourceforge.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox