From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f71.google.com (mail-wm0-f71.google.com [74.125.82.71]) by kanga.kvack.org (Postfix) with ESMTP id 8A2B96B0388 for ; Thu, 23 Feb 2017 10:09:26 -0500 (EST) Received: by mail-wm0-f71.google.com with SMTP id x4so932030wme.3 for ; Thu, 23 Feb 2017 07:09:26 -0800 (PST) Received: from mx2.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id a19si6393716wra.291.2017.02.23.07.09.23 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 23 Feb 2017 07:09:23 -0800 (PST) Date: Thu, 23 Feb 2017 16:09:20 +0100 From: Michal Hocko Subject: Re: [RFC PATCH] memory-hotplug: Use dev_online for memhp_auto_offline Message-ID: <20170223150920.GB29056@dhcp22.suse.cz> References: <20170221172234.8047.33382.stgit@ltcalpine2-lp14.aus.stglabs.ibm.com> <878toy1sgd.fsf@vitty.brq.redhat.com> <20170223125643.GA29064@dhcp22.suse.cz> <87bmttyqxf.fsf@vitty.brq.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87bmttyqxf.fsf@vitty.brq.redhat.com> Sender: owner-linux-mm@kvack.org List-ID: To: Vitaly Kuznetsov Cc: Nathan Fontenot , linux-mm@kvack.org, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, mdroth@linux.vnet.ibm.com On Thu 23-02-17 14:31:24, Vitaly Kuznetsov wrote: > Michal Hocko writes: > > > On Wed 22-02-17 10:32:34, Vitaly Kuznetsov wrote: > > [...] > >> > There is a workaround in that a user could online the memory or have > >> > a udev rule to online the memory by using the sysfs interface. The > >> > sysfs interface to online memory goes through device_online() which > >> > should updated the dev->offline flag. I'm not sure that having kernel > >> > memory hotplug rely on userspace actions is the correct way to go. > >> > >> Using udev rule for memory onlining is possible when you disable > >> memhp_auto_online but in some cases it doesn't work well, e.g. when we > >> use memory hotplug to address memory pressure the loop through userspace > >> is really slow and memory consuming, we may hit OOM before we manage to > >> online newly added memory. > > > > How does the in-kernel implementation prevents from that? > > > > Onlining memory on hot-plug is much more reliable, e.g. if we were able > to add it in add_memory_resource() we'll also manage to online it. How does that differ from initiating online from the users? > With > udev rule we may end up adding many blocks and then (as udev is > asynchronous) failing to online any of them. Why would it fail? > In-kernel operation is synchronous. which doesn't mean anything as the context is preemptible AFAICS. > >> In addition to that, systemd/udev folks > >> continuosly refused to add this udev rule to udev calling it stupid as > >> it actually is an unconditional and redundant ping-pong between kernel > >> and udev. > > > > This is a policy and as such it doesn't belong to the kernel. The whole > > auto-enable in the kernel is just plain wrong IMHO and we shouldn't have > > merged it. > > I disagree. > > First of all it's not a policy, it is a default. We have many other > defaults in kernel. When I add a network card or a storage, for example, > I don't need to go anywhere and 'enable' it before I'm able to use > it from userspace. An for memory (and CPUs) we, for some unknown reason > opted for something completely different. If someone is plugging new > memory into a box he probably wants to use it, I don't see much value in > waiting for a special confirmation from him. This was not my decision so I can only guess but to me it makes sense. Both memory and cpus can be physically present and offline which is a perfectly reasonable state. So having a two phase physicall hotadd is just built on top of physical vs. logical distinction. I completely understand that some usecases will really like to online the whole node as soon as it appears present. But an automatic in-kernel implementation has its down sites - e.g. if this operation fails in the middle you will not know about that unless you check all the memblocks in sysfs. This is really a poor interface. > Second, this feature is optional. If you want to keep old behavior just > don't enable it. It just adds unnecessary configuration noise as well > Third, this solves real world issues. With Hyper-V it is very easy to > show udev failing on stress. What is the reason for this failures. Do you have any link handy? > No other solution to the issue was ever suggested. you mean like using ballooning for the memory overcommit like other more reasonable virtualization solutions? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org