linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>,
	gregkh@linuxfoundation.org, linux-kernel@vger.kernel.org,
	devel@linuxdriverproject.org, olaf@aepfle.de, apw@canonical.com,
	linux-mm@kvack.org
Subject: Re: [PATCH 2/2] Drivers: hv: balloon: Fix the deadlock issue in the memory hot-add code
Date: Tue, 9 Dec 2014 19:25:50 +0900	[thread overview]
Message-ID: <5486CE2E.4070409@jp.fujitsu.com> (raw)
In-Reply-To: <20141209090843.GA11373@dhcp22.suse.cz>

(2014/12/09 18:08), Michal Hocko wrote:
> On Tue 09-12-14 10:23:51, Yasuaki Ishimatsu wrote:
>> (2014/12/09 0:04), Michal Hocko wrote:
>>> On Fri 05-12-14 16:41:38, K. Y. Srinivasan wrote:
>>>> Andy Whitcroft <apw@canonical.com> initially saw this deadlock. We
>>>> have seen this as well. Here is the original description of the
>>>> problem (and a potential solution) from Andy:
>>>>
>>>> https://lkml.org/lkml/2014/3/14/451
>>>>
>>>> Here is an excerpt from that mail:
>>>>
>>>> "We are seeing machines lockup with what appears to be an ABBA
>>>> deadlock in the memory hotplug system.  These are from the 3.13.6 based Ubuntu kernels.
>>>> The hv_balloon driver is adding memory using add_memory() which takes
>>>> the hotplug lock
>>>
>>> Do you mean mem_hotplug_begin?
>>>
>>
>>>> and then emits a udev event, and then attempts to
>>>> lock the sysfs device.  In response to the udev event udev opens the
>>>> sysfs device and locks it, then attempts to grab the hotplug lock to online the memory.
>>>
>>> Cannot we simply teach online_pages to fail with EBUSY when the memory
>>> hotplug is on the way.  We shouldn't try to online something that is not
>>> initialized yet, no?
>>
>> Yes. Memory online shouldn't try before initializing it. Then memory online
>> should wait for initializing it, not easily fails. Generally, kernel sends
>> memory ONLINE event to userland by kobject_uevent() during initializing memory
>> and udev makes memory online after catching the event. Onlining memory by
>> udev almost run during initializing memory.
>
> I guess this is because the event is sent after a mem section is
> initialized while the overal hotplug operation is still not completed.

right.

>
>> So if memory online easily fails, onlining memory by udev almost
>> fails.
>
> Doesn't udev retry the operation if it gets EBUSY or EAGAIN?

It depend on implementation of udev.rules. So we can retry online/offline
operation in udev.rules.

>
>>> The memory hotplug log is global so we can get
>>> false positives but that should be easier to deal with than exporting
>>> lock_device_hotplug and adding yet another lock dependency.
>>>
>>>> This seems to be inverted nesting in the two cases, leading to the hangs below:
>>>>
>>>> [  240.608612] INFO: task kworker/0:2:861 blocked for more than 120 seconds.
>>>> [  240.608705] INFO: task systemd-udevd:1906 blocked for more than 120 seconds.
>>>>
>>>> I note that the device hotplug locking allows complete retries (via
>>>> ERESTARTSYS) and if we could detect this at the online stage it could
>>>> be used to get us out.
>>>
>>> I am not sure I understand this but it suggests EBUSY above?
>>>
>>>> But before I go down this road I wanted to
>>>> make sure I am reading this right.  Or indeed if the hv_balloon driver
>>>> is just doing this wrong."
>>>>
>>>> This patch is based on the suggestion from
>>>> Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>>>
>>> This changelog doesn't explain us much. And boy this whole thing is so
>>> convoluted. E.g. I have hard time to see why ACPI hotplug is working
>>> correctly. My trail got lost at acpi_memory_device_add level which is
>>> a callback while acpi_device_hotplug is holding lock_device_hotplug but
>>> then again the rest is hidden by callbacks.
>>
>> Commit 0f1cfe9d0d06 (mm/hotplug: remove stop_machine() from try_offline_node()) said:
>>
>>    ---
>>      lock_device_hotplug() serializes hotplug & online/offline operations.  The
>>      lock is held in common sysfs online/offline interfaces and ACPI hotplug
>>      code paths.
>>
>>      And here are the code paths:
>>
>>      - CPU & Mem online/offline via sysfs online
>>          store_online()->lock_device_hotplug()
>>
>>      - Mem online via sysfs state:
>>          store_mem_state()->lock_device_hotplug()
>>
>>      - ACPI CPU & Mem hot-add:
>>          acpi_scan_bus_device_check()->lock_device_hotplug()
>>
>>      - ACPI CPU & Mem hot-delete:
>>          acpi_scan_hot_remove()->lock_device_hotplug()
>>    ---
>>
>> CPU & Memory online/offline/hotplug are serialized by lock_device_hotplug().
>
> OK, this patch aimed at the complete nodes hotplug. I am not familiar
> with the code enough to tell whether this is really needed but it sounds
> like an overkill when we are interested only in the memory hotplug. Why
> would we need stop_machine or anything for memory that is guaranteed to
> be not used at the time of both online and offline.

As Srinivasan mentioned, there is ABBA issue. So to evade it, lock_device_hotplug()
is needed.

> And again, why cannot we simply make the onlining fail or try_lock and
> retry internally if the event consumer cannot cope with errors?

Did you mean the following Srinivasan's first patch looks good to you?
   https://lkml.org/lkml/2014/12/2/662

Thanks,
Yasuaki Ishimatsu

> And even if that is not possible then do not export lock_device_hotplug
> but export a memory hotplug functions which use it properly so that
> other consumers (xen ballon seem to rely on add_memory as well) do not
> need the same change as well.


>
>>> I cannot seem to find any documentation which would explain all the
>>> locking here.
>>
>> Yes. I need the documentation.
>
> Yes please! This code is incredibly hard to follow and deduce all the
> hidden requirements and dependencies is even harder.
>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-12-09 10:28 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-06  0:41 [PATCH 0/2] Drivers: hv: hv_balloon: Fix a deadlock in the hot-add path K. Y. Srinivasan
2014-12-06  0:41 ` [PATCH 1/2] Drivers: base: core: Export functions to lock/unlock device hotplug lock K. Y. Srinivasan
2014-12-06  0:41   ` [PATCH 2/2] Drivers: hv: balloon: Fix the deadlock issue in the memory hot-add code K. Y. Srinivasan
2014-12-08 15:04     ` Michal Hocko
2014-12-09  1:23       ` Yasuaki Ishimatsu
2014-12-09  9:08         ` Michal Hocko
2014-12-09 10:25           ` Yasuaki Ishimatsu [this message]
2014-12-09 10:55             ` Michal Hocko
2014-12-11  0:21               ` KY Srinivasan
2014-12-11 12:58                 ` Michal Hocko
2014-12-14 18:45                   ` KY Srinivasan
2014-12-08  0:07 ` [PATCH 0/2] Drivers: hv: hv_balloon: Fix a deadlock in the hot-add path Yasuaki Ishimatsu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5486CE2E.4070409@jp.fujitsu.com \
    --to=isimatu.yasuaki@jp.fujitsu.com \
    --cc=apw@canonical.com \
    --cc=devel@linuxdriverproject.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=kys@microsoft.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=olaf@aepfle.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox