From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05F6AC04A68 for ; Wed, 27 Jul 2022 20:30:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 31373940028; Wed, 27 Jul 2022 16:30:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 29B68940012; Wed, 27 Jul 2022 16:30:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 13CEE940028; Wed, 27 Jul 2022 16:30:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 00B7C940012 for ; Wed, 27 Jul 2022 16:30:42 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 8499B80D69 for ; Wed, 27 Jul 2022 20:30:42 +0000 (UTC) X-FDA: 79734023124.19.22A5021 Received: from mail-yw1-f172.google.com (mail-yw1-f172.google.com [209.85.128.172]) by imf07.hostedemail.com (Postfix) with ESMTP id 0F8BC4009C for ; Wed, 27 Jul 2022 20:30:41 +0000 (UTC) Received: by mail-yw1-f172.google.com with SMTP id 00721157ae682-32194238c77so14756767b3.4 for ; Wed, 27 Jul 2022 13:30:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=qC/2GpX8G1YxmI7AUAGx9cdmS+R/MtU1fjbm5R4/Jec=; b=ErpR4J7KGPLRGFTeh7WSrlbaCVKxU6h+F40DUq9lM3MEtkVqAi/6ULVCfZOoCeYHJO vFzug27yaISebUNhmAWpEmoqfFsTDdhHbkgEwEeCSh1SiiVRdnbmZLRL0UO0DW/NyAWV vn6ALx1BuxYECq/isTZMp18Iq/EPVLT7Cca/B8e9ZHJXj/xId1+VifkVZEYY70xbFWfq cgsjZXbsI/YlSt5s97I9Uq3hZW/b8FYFVhiPeo+UuUdOKrl/K6zUCYUzIu3uwwiAlpzB IpNHits3Q79nGlf+4+GfygpjRvfdYqca7/O5oH1O99iW7dhRaSJW3L3CQNisGH8eGY71 +lFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=qC/2GpX8G1YxmI7AUAGx9cdmS+R/MtU1fjbm5R4/Jec=; b=5RK8X/Kf/X3wJfCX9t3hHT4Jwo2LpBp1WPDCwwjclRyWkteeTIUUrocuyx7K7LnM3D rrcnpE/4ibf6DXSF6z0NWB0D12rHq+v34h8RYtgr2sThAiSEzpyATV7TfH6ovMDWnbUf HKUHVIVx49zM7j14fk1YYuZlubHd6zFRKZzr9W8PHHvJZvYQENF3R3sOOB5KfPlasOkM +WC2juXx+P7KvjIFMJNKGg+3YZ+lgngowHuPu6+pMtg2Ly2pMSGYmOH9Ule8p4DQG7Ql e6pRdBV+rBYArXiaBvQ8oufdqxyBfYPMaEOL/84sGbB3jwJ4gNlotl37XxlxWAFY25M8 j5Ig== X-Gm-Message-State: AJIora9PPIp7GctpC9kd6pP8irLqSXHAQkHfiJ/2elr1Sm4UdvKb2wHg 8eQwopTZIIrj+F3AcpUqudrC9c3j7Tgw3xouhyg/Mw== X-Google-Smtp-Source: AGRyM1tF9tqlr/OBQG5UXGRIEwwgef5cEbQqoI+epbDEySB8RGnwa70IMHc3cAhVXNq9/NOFtZFrS4i640PWV9ZQGeI= X-Received: by 2002:a81:a247:0:b0:31d:72da:e931 with SMTP id z7-20020a81a247000000b0031d72dae931mr20987026ywg.469.1658953841024; Wed, 27 Jul 2022 13:30:41 -0700 (PDT) MIME-Version: 1.0 References: <20220128131006.67712-1-michel@lespinasse.org> <20220128131006.67712-24-michel@lespinasse.org> <20220727073420.GA8985@hu-pkondeti-hyd.qualcomm.com> In-Reply-To: <20220727073420.GA8985@hu-pkondeti-hyd.qualcomm.com> From: Suren Baghdasaryan Date: Wed, 27 Jul 2022 13:30:29 -0700 Message-ID: Subject: Re: [PATCH v2 23/35] mm: add mmu_notifier_lock To: Pavan Kondeti Cc: Michel Lespinasse , Linux-MM , LKML , Andrew Morton , kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski Content-Type: text/plain; charset="UTF-8" ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=ErpR4J7K; spf=pass (imf07.hostedemail.com: domain of surenb@google.com designates 209.85.128.172 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1658953842; a=rsa-sha256; cv=none; b=PHHhb8dfAsIro8AKcgX8C5J5NaMXQEcee9DllIplZ3/bmVdEbmqAGjIXzXO9oidW4lkVSq ugyOLgf//W/Ddfr3v1j9wCQLWsWf6p0Kddeknd6WnFqcaqpYSyFC1TtcFkHn8fLwJHn2gg 1vEy9dGOUg9/97XtLal46F2wb9wOQkU= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1658953842; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qC/2GpX8G1YxmI7AUAGx9cdmS+R/MtU1fjbm5R4/Jec=; b=fxM4NqSoiOQpchQFVdnn8evmnv7Xvz7QVpWghrsLcHkyQztwl4WMMwYipepysHNVDvErRh GzLEEJb6nGyThWhJjIYfDHbrAvZVPPi99abBfOKJ1zIzf4woOzyU7JjE2t3e/sjv9AHqoY xoJ6ETY+ujDJYerV+hm5hNMXZyRWInA= Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=ErpR4J7K; spf=pass (imf07.hostedemail.com: domain of surenb@google.com designates 209.85.128.172 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 0F8BC4009C X-Stat-Signature: qjhtds4buw9fkrwh11w18qcrqarrnmmb X-HE-Tag: 1658953841-257754 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jul 27, 2022 at 12:34 AM Pavan Kondeti wrote: > > On Fri, Jan 28, 2022 at 05:09:54AM -0800, Michel Lespinasse wrote: > > Introduce mmu_notifier_lock as a per-mm percpu_rw_semaphore, > > as well as the code to initialize and destroy it together with the mm. > > > > This lock will be used to prevent races between mmu_notifier_register() > > and speculative fault handlers that need to fire MMU notifications > > without holding any of the mmap or rmap locks. > > > > Signed-off-by: Michel Lespinasse > > --- > > include/linux/mm_types.h | 6 +++++- > > include/linux/mmu_notifier.h | 27 +++++++++++++++++++++++++-- > > kernel/fork.c | 3 ++- > > 3 files changed, 32 insertions(+), 4 deletions(-) > > > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > > index 305f05d2a4bc..f77e2dec038d 100644 > > --- a/include/linux/mm_types.h > > +++ b/include/linux/mm_types.h > > @@ -462,6 +462,7 @@ struct vm_area_struct { > > } __randomize_layout; > > > > struct kioctx_table; > > +struct percpu_rw_semaphore; > > struct mm_struct { > > struct { > > struct vm_area_struct *mmap; /* list of VMAs */ > > @@ -608,7 +609,10 @@ struct mm_struct { > > struct file __rcu *exe_file; > > #ifdef CONFIG_MMU_NOTIFIER > > struct mmu_notifier_subscriptions *notifier_subscriptions; > > -#endif > > +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT > > + struct percpu_rw_semaphore *mmu_notifier_lock; > > +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */ > > +#endif /* CONFIG_MMU_NOTIFIER */ > > #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS > > pgtable_t pmd_huge_pte; /* protected by page_table_lock */ > > #endif > > diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h > > index 45fc2c81e370..ace76fe91c0c 100644 > > --- a/include/linux/mmu_notifier.h > > +++ b/include/linux/mmu_notifier.h > > @@ -6,6 +6,8 @@ > > #include > > #include > > #include > > +#include > > +#include > > #include > > #include > > > > @@ -499,15 +501,35 @@ static inline void mmu_notifier_invalidate_range(struct mm_struct *mm, > > __mmu_notifier_invalidate_range(mm, start, end); > > } > > > > -static inline void mmu_notifier_subscriptions_init(struct mm_struct *mm) > > +static inline bool mmu_notifier_subscriptions_init(struct mm_struct *mm) > > { > > +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT > > + mm->mmu_notifier_lock = kzalloc(sizeof(struct percpu_rw_semaphore), GFP_KERNEL); > > + if (!mm->mmu_notifier_lock) > > + return false; > > + if (percpu_init_rwsem(mm->mmu_notifier_lock)) { > > + kfree(mm->mmu_notifier_lock); > > + return false; > > + } > > +#endif > > + > > mm->notifier_subscriptions = NULL; > > + return true; > > } > > > > static inline void mmu_notifier_subscriptions_destroy(struct mm_struct *mm) > > { > > if (mm_has_notifiers(mm)) > > __mmu_notifier_subscriptions_destroy(mm); > > + > > +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT > > + if (!in_atomic()) { > > + percpu_free_rwsem(mm->mmu_notifier_lock); > > + kfree(mm->mmu_notifier_lock); > > + } else { > > + percpu_rwsem_async_destroy(mm->mmu_notifier_lock); > > + } > > +#endif > > } > > > > We have received a bug report from our customer running Android GKI kernel > android-13-5.15 branch where this series is included. As the callstack [1] > indicates, the non-atomic test it self is not sufficient to free the percpu > rwsem. > > The scenario deduced from the callstack: > > - context switch on CPU#0 from 'A' to idle. idle thread took A's mm > > - 'A' later ran on another CPU and exited. A's mm has still reference. > > - Now CPU#0 is being hotplugged out. As part of this, idle thread's > mm is switched (in idle_task_exit()) but its active_mm freeing is > deferred to finish_cpu() which gets called later from the control processor > (the thread which initiated the CPU hotplug). Please see the reasoning > on why mmdrop() is not called in idle_task_exit() at > commit bf2c59fce4074('sched/core: Fix illegal RCU from offline CPUs') > > - Now when finish_cpu() tries call percpu_free_rwsem() directly since we are > not in atomic path but hotplug path where cpus_write_lock() called is causing > the deadlock. > > I am not sure if there is a clean way other than freeing the per-cpu > rwsemaphore asynchronously all the time. Thanks for reporting this issue, Pavan. I think your suggestion of doing unconditional async destruction of mmu_notifier_lock would be fine here. percpu_rwsem_async_destroy has a bit of an overhead to schedule that work but I don't think the exit path is too performance critical to suffer from that. Michel, WDYT? > > [1] > > -001|context_switch(inline) > -001|__schedule() > -002|__preempt_count_sub(inline) > -002|schedule() > -003|_raw_spin_unlock_irq(inline) > -003|spin_unlock_irq(inline) > -003|percpu_rwsem_wait() > -004|__preempt_count_add(inline) > -004|__percpu_down_read() > -005|percpu_down_read(inline) > -005|cpus_read_lock() // trying to get cpu_hotplug_lock again > -006|rcu_barrier() > -007|rcu_sync_dtor() > -008|mmu_notifier_subscriptions_destroy(inline) > -008|__mmdrop() > -009|mmdrop(inline) > -009|finish_cpu() > -010|cpuhp_invoke_callback() > -011|cpuhp_invoke_callback_range(inline) > -011|cpuhp_down_callbacks() > -012|_cpu_down() // acquired cpu_hotplug_lock (write lock) > > Thanks, > Pavan >