From: "Rafael J. Wysocki" <rafael@kernel.org>
To: David Hildenbrand <david@redhat.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Linux Memory Management List <linux-mm@kvack.org>,
linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
ACPI Devel Maling List <linux-acpi@vger.kernel.org>,
devel@linuxdriverproject.org, linux-s390@vger.kernel.org,
xen-devel@lists.xenproject.org,
Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>,
Dan Williams <dan.j.williams@intel.com>,
Pavel Tatashin <pasha.tatashin@oracle.com>,
osalvador@suse.de, Vitaly Kuznetsov <vkuznets@redhat.com>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
kys@microsoft.com, haiyangz@microsoft.com,
sthemmin@microsoft.com,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Paul Mackerras <paulus@samba.org>,
"Rafael J. Wysocki" <rjw@rjwysocki.net>,
Len Brown <lenb@kernel.org>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
David Rientjes <rientjes@google.com>
Subject: Re: [PATCH RFC 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock
Date: Fri, 17 Aug 2018 10:20:43 +0200 [thread overview]
Message-ID: <CAJZ5v0jpx_XOmkSkMiSxcECxNHGXPGZHtgqRy_Q7iKnf-C55pg@mail.gmail.com> (raw)
In-Reply-To: <20180817075901.4608-3-david@redhat.com>
On Fri, Aug 17, 2018 at 9:59 AM David Hildenbrand <david@redhat.com> wrote:
>
> There seem to be some problems as result of 30467e0b3be ("mm, hotplug:
> fix concurrent memory hot-add deadlock"), which tried to fix a possible
> lock inversion reported and discussed in [1] due to the two locks
> a) device_lock()
> b) mem_hotplug_lock
>
> While add_memory() first takes b), followed by a) during
> bus_probe_device(), onlining of memory from user space first took b),
> followed by a), exposing a possible deadlock.
>
> In [1], and it was decided to not make use of device_hotplug_lock, but
> rather to enforce a locking order. Looking at 1., this order is not always
> satisfied when calling device_online() - essentially we simply don't take
> one of both locks anymore - and fixing this would require us to
> take the mem_hotplug_lock in core driver code (online_store()), which
> sounds wrong.
>
> The problems I spotted related to this:
>
> 1. Memory block device attributes: While .state first calls
> mem_hotplug_begin() and the calls device_online() - which takes
> device_lock() - .online does no longer call mem_hotplug_begin(), so
> effectively calls online_pages() without mem_hotplug_lock. onlining/
> offlining of pages is no longer serialised across different devices.
>
> 2. device_online() should be called under device_hotplug_lock, however
> onlining memory during add_memory() does not take care of that. (I
> didn't follow how strictly this is needed, but there seems to be a
> reason because it is documented at device_online() and
> device_offline()).
>
> In addition, I think there is also something wrong about the locking in
>
> 3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
> (and device_online()) without locks. This was introduced after
> 30467e0b3be. And skimming over the code, I assume it could need some
> more care in regards to locking.
>
> ACPI code already holds the device_hotplug_lock, and as we are
> effectively hotplugging memory block devices, requiring to hold that
> lock does not sound too wrong, although not chosen in [1], as
> "I don't think resolving a locking dependency is appropriate by
> just serializing them with another lock."
> I think this is the cleanest solution.
>
> Requiring add_memory()/add_memory_resource() to be called under
> device_hotplug_lock fixes 2., taking the mem_hotplug_lock in
> online_pages/offline_pages() fixes 1. and 3.
>
> Fixup all callers of add_memory/add_memory_resource to hold the lock if
> not already done.
>
> So this is essentially a revert of 30467e0b3be, implementation of what
> was suggested in [1] by Vitaly, applied to the current tree.
>
> [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
> 2015-February/065324.html
>
> This patch is partly based on a patch by Vitaly Kuznetsov.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> arch/powerpc/platforms/powernv/memtrace.c | 3 ++
> drivers/acpi/acpi_memhotplug.c | 1 +
> drivers/base/memory.c | 18 +++++-----
> drivers/hv/hv_balloon.c | 4 +++
> drivers/s390/char/sclp_cmd.c | 3 ++
> drivers/xen/balloon.c | 3 ++
> mm/memory_hotplug.c | 42 ++++++++++++++++++-----
> 7 files changed, 55 insertions(+), 19 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
> index 51dc398ae3f7..4c2737a33020 100644
> --- a/arch/powerpc/platforms/powernv/memtrace.c
> +++ b/arch/powerpc/platforms/powernv/memtrace.c
> @@ -206,6 +206,8 @@ static int memtrace_online(void)
> int i, ret = 0;
> struct memtrace_entry *ent;
>
> + /* add_memory() requires device_hotplug_lock */
> + lock_device_hotplug();
> for (i = memtrace_array_nr - 1; i >= 0; i--) {
> ent = &memtrace_array[i];
>
> @@ -244,6 +246,7 @@ static int memtrace_online(void)
> pr_info("Added trace memory back to node %d\n", ent->nid);
> ent->size = ent->start = ent->nid = -1;
> }
> + unlock_device_hotplug();
> if (ret)
> return ret;
>
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index 6b0d3ef7309c..e7a4c7900967 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -228,6 +228,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
> if (node < 0)
> node = memory_add_physaddr_to_nid(info->start_addr);
>
> + /* we already hold the device_hotplug lock at this point */
> result = add_memory(node, info->start_addr, info->length);
>
> /*
A very minor nit here: I would say "device_hotplug_lock is already
held at this point" in the comment (I sort of don't like to say "we"
in code comments as it is not particularly clear what group of people
is represented by that and the lock is actually called
device_hotplug_lock).
Otherwise the approach is fine by me.
BTW, the reason why device_hotplug_lock is acquired by the ACPI memory
hotplug is because it generally needs to be synchronized with respect
CPU hot-remove and similar. I believe that this may be the case in
non-ACPI setups as well.
Thanks,
Rafael
next prev parent reply other threads:[~2018-08-17 8:20 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-17 7:58 [PATCH RFC 0/2] mm: " David Hildenbrand
2018-08-17 7:59 ` [PATCH RFC 1/2] drivers/base: export lock_device_hotplug/unlock_device_hotplug David Hildenbrand
2018-08-17 8:41 ` Greg Kroah-Hartman
2018-08-17 8:56 ` David Hildenbrand
2018-08-17 9:03 ` Rafael J. Wysocki
2018-08-17 9:41 ` David Hildenbrand
2018-08-17 10:06 ` Greg Kroah-Hartman
2018-08-17 11:04 ` David Hildenbrand
2018-08-17 11:28 ` Heiko Carstens
2018-08-17 11:56 ` David Hildenbrand
2018-08-17 17:02 ` Greg Kroah-Hartman
2018-08-17 8:57 ` Rafael J. Wysocki
2018-08-17 7:59 ` [PATCH RFC 2/2] mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock David Hildenbrand
2018-08-17 8:20 ` Rafael J. Wysocki [this message]
2018-08-17 8:27 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJZ5v0jpx_XOmkSkMiSxcECxNHGXPGZHtgqRy_Q7iKnf-C55pg@mail.gmail.com \
--to=rafael@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=benh@kernel.crashing.org \
--cc=dan.j.williams@intel.com \
--cc=david@redhat.com \
--cc=devel@linuxdriverproject.org \
--cc=gregkh@linuxfoundation.org \
--cc=haiyangz@microsoft.com \
--cc=heiko.carstens@de.ibm.com \
--cc=kys@microsoft.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-s390@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mhocko@suse.com \
--cc=osalvador@suse.de \
--cc=pasha.tatashin@oracle.com \
--cc=paulus@samba.org \
--cc=rientjes@google.com \
--cc=rjw@rjwysocki.net \
--cc=schwidefsky@de.ibm.com \
--cc=sthemmin@microsoft.com \
--cc=vbabka@suse.cz \
--cc=vkuznets@redhat.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox