From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01EA1C33CA4 for ; Fri, 10 Jan 2020 16:43:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8537F2072A for ; Fri, 10 Jan 2020 16:43:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=intel-com.20150623.gappssmtp.com header.i=@intel-com.20150623.gappssmtp.com header.b="ZfnlIqum" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8537F2072A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CD56F8E0005; Fri, 10 Jan 2020 11:43:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C85D48E0001; Fri, 10 Jan 2020 11:43:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC2938E0005; Fri, 10 Jan 2020 11:43:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0025.hostedemail.com [216.40.44.25]) by kanga.kvack.org (Postfix) with ESMTP id A7D308E0001 for ; Fri, 10 Jan 2020 11:43:11 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id 6D35C5DC5 for ; Fri, 10 Jan 2020 16:43:11 +0000 (UTC) X-FDA: 76362294582.11.teeth91_55ecc2dd1394e X-HE-Tag: teeth91_55ecc2dd1394e X-Filterd-Recvd-Size: 10520 Received: from mail-oi1-f194.google.com (mail-oi1-f194.google.com [209.85.167.194]) by imf12.hostedemail.com (Postfix) with ESMTP for ; Fri, 10 Jan 2020 16:43:10 +0000 (UTC) Received: by mail-oi1-f194.google.com with SMTP id 18so2365268oin.9 for ; Fri, 10 Jan 2020 08:43:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=pR2iX18tjXNnoBPSNqNK9ys64+Jcpo7fZQg0i/s/+nw=; b=ZfnlIqumLx11xnTlrzHHA/io9dAA9rY70kuYDBhctQ7IN74tUElswBp7CxnhQVSYx6 c7AliLu0EobfDfzf6WWjjjToJ/R+jDi4SkJGbj8HyR0wBR7pnweuEqQcLz7DEr3Jl+9h 3+O7+at2eVX3wuSxLv44jSrpksEKmtyO18ZuvCWs8CeZq2Jq+f/Z06CX1GskGiwhlrLT qRxkFHFl9dvBJ98dRItwbnJ3XL2jDUasJ7Tp38uE7XOptpHR8CMay4k/RnSnvKgf81Wi aQLYeBZh90Gp3X+tCJgVu5qbQgslw4VFrYDfHLzNgseHrm6sLsHCq3oKthnEN1Czpu/0 v7Fg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=pR2iX18tjXNnoBPSNqNK9ys64+Jcpo7fZQg0i/s/+nw=; b=hyOerIYNGWem9oEJwPoOHMgZ8AKXmLbQqzctbjuiZ7hHBJQ4nBB0HJ3Z9kmHSjvpG/ +Wei8FQptmrEcWaVLoBVNX5hYaoG10LDtHiVu7AbYwQiY98RUTnI4ItJE/Xi1aHOpRCF 5YNJZ1AjhMdpOEkVWkunROoCnwf7iggM8k3OYy3C0YTAw3sk7Z77HQaSfSBONfudlsFt g0SRTmkNUzYa2Z1gpbiamszyx3hgX3R10i1WBx9DDIpsG89L41vRS7qk+Jhzogdcs1aP t81L1C+WEP00I20A7bNvrXr75QCpxwnv9ZyL5cpfb15R0F7wcjB3qSxLoIn8yg6O6DUT 9vHQ== X-Gm-Message-State: APjAAAUGH70ohVUVerv/QFrCEz8bj/UMmwzSK7PJyZDWtaUMWKz5mFKo 9rPBKSMn4gYSuVvtFoy1JZH9MeNyKHdkYbm7eMVlWg== X-Google-Smtp-Source: APXvYqxK4i/HNBAk3rtkotV9R/sF7UURwsT3pfV5Edd4gerSvJPZ67eM19soRbqvumd2706szUR3iDfBazdnV/orM4s= X-Received: by 2002:aca:3f54:: with SMTP id m81mr2723806oia.73.1578674589422; Fri, 10 Jan 2020 08:43:09 -0800 (PST) MIME-Version: 1.0 References: <157863061737.2230556.3959730620803366776.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: From: Dan Williams Date: Fri, 10 Jan 2020 08:42:58 -0800 Message-ID: Subject: Re: [PATCH] mm/memory_hotplug: Fix remove_memory() lockdep splat To: David Hildenbrand Cc: Andrew Morton , stable , Vishal Verma , Pavel Tatashin , Michal Hocko , Dave Hansen , Linux MM , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jan 10, 2020 at 1:10 AM David Hildenbrand wrote: > > On 10.01.20 05:30, Dan Williams wrote: > > The daxctl unit test for the dax_kmem driver currently triggers the > > lockdep splat below. It results from the fact that > > remove_memory_block_devices() is invoked under the mem_hotplug_lock() > > causing lockdep entanglements with cpu_hotplug_lock(). > > > > The mem_hotplug_lock() is not needed to synchronize the memory block > > device sysfs interface vs the page online state, that is already handled > > by lock_device_hotplug(). Specifically lock_device_hotplug() > > is sufficient to allow try_remove_memory() to check the offline > > state of the memblocks and be assured that subsequent online attempts > > will be blocked. The device_online() path checks mem->section_count > > before allowing any state manipulations and mem->section_count is > > cleared in remove_memory_block_devices(). > > > > The add_memory() path does create memblock devices under the lock, but > > there is no lockdep report on that path, so it is left alone for now. > > > > This change is only possible thanks to the recent change that refactored > > memory block device removal out of arch_remove_memory() (commit > > 4c4b7f9ba948 mm/memory_hotplug: remove memory block devices before > > arch_remove_memory()). > > > > ====================================================== > > WARNING: possible circular locking dependency detected > > 5.5.0-rc3+ #230 Tainted: G OE > > ------------------------------------------------------ > > lt-daxctl/6459 is trying to acquire lock: > > ffff99c7f0003510 (kn->count#241){++++}, at: kernfs_remove_by_name_ns+0x41/0x80 > > > > but task is already holding lock: > > ffffffffa76a5450 (mem_hotplug_lock.rw_sem){++++}, at: percpu_down_write+0x20/0xe0 > > > > which lock already depends on the new lock. > > > > > > the existing dependency chain (in reverse order) is: > > > > -> #2 (mem_hotplug_lock.rw_sem){++++}: > > __lock_acquire+0x39c/0x790 > > lock_acquire+0xa2/0x1b0 > > get_online_mems+0x3e/0xb0 > > kmem_cache_create_usercopy+0x2e/0x260 > > kmem_cache_create+0x12/0x20 > > ptlock_cache_init+0x20/0x28 > > start_kernel+0x243/0x547 > > secondary_startup_64+0xb6/0xc0 > > > > -> #1 (cpu_hotplug_lock.rw_sem){++++}: > > __lock_acquire+0x39c/0x790 > > lock_acquire+0xa2/0x1b0 > > cpus_read_lock+0x3e/0xb0 > > online_pages+0x37/0x300 > > memory_subsys_online+0x17d/0x1c0 > > device_online+0x60/0x80 > > state_store+0x65/0xd0 > > kernfs_fop_write+0xcf/0x1c0 > > vfs_write+0xdb/0x1d0 > > ksys_write+0x65/0xe0 > > do_syscall_64+0x5c/0xa0 > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > > -> #0 (kn->count#241){++++}: > > check_prev_add+0x98/0xa40 > > validate_chain+0x576/0x860 > > __lock_acquire+0x39c/0x790 > > lock_acquire+0xa2/0x1b0 > > __kernfs_remove+0x25f/0x2e0 > > kernfs_remove_by_name_ns+0x41/0x80 > > remove_files.isra.0+0x30/0x70 > > sysfs_remove_group+0x3d/0x80 > > sysfs_remove_groups+0x29/0x40 > > device_remove_attrs+0x39/0x70 > > device_del+0x16a/0x3f0 > > device_unregister+0x16/0x60 > > remove_memory_block_devices+0x82/0xb0 > > try_remove_memory+0xb5/0x130 > > remove_memory+0x26/0x40 > > dev_dax_kmem_remove+0x44/0x6a [kmem] > > device_release_driver_internal+0xe4/0x1c0 > > unbind_store+0xef/0x120 > > kernfs_fop_write+0xcf/0x1c0 > > vfs_write+0xdb/0x1d0 > > ksys_write+0x65/0xe0 > > do_syscall_64+0x5c/0xa0 > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > > other info that might help us debug this: > > > > Chain exists of: > > kn->count#241 --> cpu_hotplug_lock.rw_sem --> mem_hotplug_lock.rw_sem > > > > Possible unsafe locking scenario: > > > > CPU0 CPU1 > > ---- ---- > > lock(mem_hotplug_lock.rw_sem); > > lock(cpu_hotplug_lock.rw_sem); > > lock(mem_hotplug_lock.rw_sem); > > lock(kn->count#241); > > > > *** DEADLOCK *** > > > > No fixes tag as this seems to have been a long standing issue that > > likely predated the addition of kernfs lockdep annotations. > > > > Cc: > > Cc: Vishal Verma > > Cc: David Hildenbrand > > Cc: Pavel Tatashin > > Cc: Michal Hocko > > Cc: Dave Hansen > > Signed-off-by: Dan Williams > > --- > > mm/memory_hotplug.c | 12 +++++++++--- > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > > index 55ac23ef11c1..a4e7dadded08 100644 > > --- a/mm/memory_hotplug.c > > +++ b/mm/memory_hotplug.c > > @@ -1763,8 +1763,6 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size) > > > > BUG_ON(check_hotplug_memory_range(start, size)); > > > > - mem_hotplug_begin(); > > - > > /* > > * All memory blocks must be offlined before removing memory. Check > > * whether all memory blocks in question are offline and return error > > @@ -1777,9 +1775,17 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size) > > /* remove memmap entry */ > > firmware_map_remove(start, start + size, "System RAM"); > > > > - /* remove memory block devices before removing memory */ > > + /* > > + * Remove memory block devices before removing memory, and do > > + * not hold the mem_hotplug_lock() over kobject removal > > + * operations. lock_device_hotplug() keeps the > > + * check_memblock_offlined_cb result valid until the entire > > + * removal process is complete. > > + */ > > Maybe shorten that to > > /* > * Remove memory block devices before removing memory. Protected > * by the device_hotplug_lock only. > */ Why make someone dig for the reasons this lock is sufficient? > > AFAIK, the device hotplug lock is sufficient here. The memory hotplug > lock / cpu hotplug lock is only needed when calling into arch code > (especially for PPC). We hold both locks when onlining/offlining memory. > > > remove_memory_block_devices(start, size); > > > > + mem_hotplug_begin(); > > + > > arch_remove_memory(nid, start, size, NULL); > > memblock_free(start, size); > > memblock_remove(start, size); > > > > I'd suggest to do the same in the adding part right away (if easily > possible) to make it clearer. Let's let this fix percolate upstream for a bit to make sure there was no protection the mem_hotplug_begin() was inadvertently providing. > I properly documented the semantics of > add_memory_block_devices()/remove_memory_block_devices() already (that > they need the device hotplug lock). I see that, but I prefer lockdep_assert_held() in the code rather than comments. I'll send a patch to fix that up.