Re: [RFC PATCH] memory-hotplug: Use dev_online for memhp_auto_offline

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>,
	linux-mm@kvack.org, mpe@ellerman.id.au,
	linuxppc-dev@lists.ozlabs.org, mdroth@linux.vnet.ibm.com,
	kys@microsoft.com
Subject: Re: [RFC PATCH] memory-hotplug: Use dev_online for memhp_auto_offline
Date: Fri, 24 Feb 2017 15:10:29 +0100	[thread overview]
Message-ID: <87efyny90q.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <20170224133714.GH19161@dhcp22.suse.cz> (Michal Hocko's message of "Fri, 24 Feb 2017 14:37:14 +0100")

Michal Hocko <mhocko@kernel.org> writes:

> On Thu 23-02-17 19:14:27, Vitaly Kuznetsov wrote:
>> Michal Hocko <mhocko@kernel.org> writes:
>> 
>> > On Thu 23-02-17 17:36:38, Vitaly Kuznetsov wrote:
>> >> Michal Hocko <mhocko@kernel.org> writes:
>> > [...]
>> >> > Is a grow from 256M -> 128GB really something that happens in real life?
>> >> > Don't get me wrong but to me this sounds quite exaggerated. Hotmem add
>> >> > which is an operation which has to allocate memory has to scale with the
>> >> > currently available memory IMHO.
>> >> 
>> >> With virtual machines this is very real and not exaggerated at
>> >> all. E.g. Hyper-V host can be tuned to automatically add new memory when
>> >> guest is running out of it. Even 100 blocks can represent an issue.
>> >
>> > Do you have any reference to a bug report. I am really curious because
>> > something really smells wrong and it is not clear that the chosen
>> > solution is really the best one.
>> 
>> Unfortunately I'm not aware of any publicly posted bug reports (CC:
>> K. Y. - he may have a reference) but I think I still remember everything
>> correctly. Not sure how deep you want me to go into details though...
>
> As much as possible to understand what was really going on...
>
>> Virtual guests under stress were getting into OOM easily and the OOM
>> killer was even killing the udev process trying to online the
>> memory.
>
> Do you happen to have any OOM report? I am really surprised that udev
> would be an oom victim because that process is really small. Who is
> consuming all the memory then?

It's been a while since I worked on this and unfortunatelly I don't have
a log. From what I remember, the kernel itself was consuming all memory
so *all* processes were victims.

>
> Have you measured how much memory do we need to allocate to add one
> memblock?

No, it's actually a good idea if we decide to do some sort of pre-allocation.

Just did a quick (and probably dirty) test, increasing guest memory from
4G to 8G (32 x 128mb blocks) require 68Mb of memory, so it's roughly 2Mb
per block. It's really easy to trigger OOM for small guests.

>
>> There was a workaround for the issue added to the hyper-v driver
>> doing memory add:
>> 
>> hv_mem_hot_add(...) {
>> ...
>>  add_memory(....);
>>  wait_for_completion_timeout(..., 5*HZ);
>>  ...
>> }
>
> I can still see 
> 		/*
> 		 * Wait for the memory block to be onlined when memory onlining
> 		 * is done outside of kernel (memhp_auto_online). Since the hot
> 		 * add has succeeded, it is ok to proceed even if the pages in
> 		 * the hot added region have not been "onlined" within the
> 		 * allowed time.
> 		 */
> 		if (dm_device.ha_waiting)
> 			wait_for_completion_timeout(&dm_device.ol_waitevent,
> 						    5*HZ);
>

See 

 dm_device.ha_waiting = !memhp_auto_online;

30 lines above. The workaround is still there for udev case and it is
still equaly bad.

>> the completion was done by observing for the MEM_ONLINE event. This, of
>> course, was slowing things down significantly and waiting for a
>> userspace action in kernel is not a nice thing to have (not speaking
>> about all other memory adding methods which had the same issue). Just
>> removing this wait was leading us to the same OOM as the hypervisor was
>> adding more and more memory and eventually even add_memory() was
>> failing, udev and other processes were killed,...
>
> Yes, I agree that waiting on a user action from the kernel is very far
> from ideal.
>
>> With the feature in place we have new memory available right after we do
>> add_memory(), everything is serialized.
>
> What prevented you from onlining the memory explicitly from
> hv_mem_hot_add path? Why do you need a user visible policy for that at
> all? You could also add a parameter to add_memory that would do the same
> thing. Or am I missing something?

We have different mechanisms for adding memory, I'm aware of at least 3:
ACPI, Xen, Hyper-V. The issue I'm addressing is general enough, I'm
pretty sure I can reproduce the issue on Xen, for example - just boot a
small guest and try adding tons of memory. Why should we have different
defaults for different technologies? 

And, BTW, the link to the previous discussion:
https://groups.google.com/forum/#!msg/linux.kernel/AxvyuQjr4GY/TLC-K0sL_NEJ

-- 
  Vitaly

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2017-02-24 14:10 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-21 17:22 Nathan Fontenot
2017-02-22  9:32 ` Vitaly Kuznetsov
2017-02-23 12:56   ` Michal Hocko
2017-02-23 13:31     ` Vitaly Kuznetsov
2017-02-23 15:09       ` Michal Hocko
2017-02-23 15:49         ` Vitaly Kuznetsov
2017-02-23 16:12           ` Michal Hocko
2017-02-23 16:36             ` Vitaly Kuznetsov
2017-02-23 17:41               ` Michal Hocko
2017-02-23 18:14                 ` Vitaly Kuznetsov
2017-02-24 13:37                   ` Michal Hocko
2017-02-24 14:10                     ` Vitaly Kuznetsov [this message]
2017-02-24 14:41                       ` Michal Hocko
2017-02-24 15:05                         ` Vitaly Kuznetsov
2017-02-24 15:32                           ` Michal Hocko
2017-02-24 16:09                             ` Vitaly Kuznetsov
2017-02-24 16:23                               ` Michal Hocko
2017-02-24 16:40                                 ` Vitaly Kuznetsov
2017-02-24 16:52                                   ` Michal Hocko
2017-02-24 17:06                                     ` Vitaly Kuznetsov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87efyny90q.fsf@vitty.brq.redhat.com \
    --to=vkuznets@redhat.com \
    --cc=kys@microsoft.com \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=mhocko@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=nfont@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox