From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DDA9AC04A68 for ; Wed, 27 Jul 2022 07:34:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 414686B0075; Wed, 27 Jul 2022 03:34:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3C4076B0078; Wed, 27 Jul 2022 03:34:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 28BE6940007; Wed, 27 Jul 2022 03:34:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 18B716B0075 for ; Wed, 27 Jul 2022 03:34:36 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id D22611A0B59 for ; Wed, 27 Jul 2022 07:34:35 +0000 (UTC) X-FDA: 79732067310.29.C60D767 Received: from alexa-out.qualcomm.com (alexa-out.qualcomm.com [129.46.98.28]) by imf05.hostedemail.com (Postfix) with ESMTP id 368A21000A2 for ; Wed, 27 Jul 2022 07:34:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; i=@quicinc.com; q=dns/txt; s=qcdkim; t=1658907274; x=1690443274; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=zkWtaFjvHO+h70yxBl/vZPnpQu6YsT2P4TacQ7mT4kI=; b=I0fRHrnQMqzWSRQAeDqE5i9n3McDll8lX3X10H7hZAeOJXk07co2GrMC EF8ko33SKfgi69As4gzYY1OzRXkk8oZykDGVWkLLGIVQMS9xRG752kRfj l8XeVZpcGy4sFoVKXyu5fhVjWsdPJDfxYv99w6iJvp8YPItsYIibjKiKy E=; Received: from ironmsg08-lv.qualcomm.com ([10.47.202.152]) by alexa-out.qualcomm.com with ESMTP; 27 Jul 2022 00:34:33 -0700 X-QCInternal: smtphost Received: from nasanex01c.na.qualcomm.com ([10.47.97.222]) by ironmsg08-lv.qualcomm.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jul 2022 00:34:31 -0700 Received: from nalasex01a.na.qualcomm.com (10.47.209.196) by nasanex01c.na.qualcomm.com (10.47.97.222) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Wed, 27 Jul 2022 00:34:31 -0700 Received: from hu-pkondeti-hyd.qualcomm.com (10.80.80.8) by nalasex01a.na.qualcomm.com (10.47.209.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Wed, 27 Jul 2022 00:34:24 -0700 Date: Wed, 27 Jul 2022 13:04:20 +0530 From: Pavan Kondeti To: Michel Lespinasse CC: Linux-MM , , "Andrew Morton" , , Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Matthew Wilcox , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , "Andy Lutomirski" Subject: Re: [PATCH v2 23/35] mm: add mmu_notifier_lock Message-ID: <20220727073420.GA8985@hu-pkondeti-hyd.qualcomm.com> References: <20220128131006.67712-1-michel@lespinasse.org> <20220128131006.67712-24-michel@lespinasse.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20220128131006.67712-24-michel@lespinasse.org> User-Agent: Mutt/1.5.24 (2015-08-30) X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01a.na.qualcomm.com (10.52.223.231) To nalasex01a.na.qualcomm.com (10.47.209.196) ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcdkim header.b=I0fRHrnQ; spf=pass (imf05.hostedemail.com: domain of quic_pkondeti@quicinc.com designates 129.46.98.28 as permitted sender) smtp.mailfrom=quic_pkondeti@quicinc.com; dmarc=pass (policy=none) header.from=quicinc.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1658907274; a=rsa-sha256; cv=none; b=K3v/fgjhzRiicnK5xfsmOQFLFnsnlToBid/2n0E6nsaYvc/WVrM5tBVCEUbt7z5lcAHeMw uo4kIEhnE8KixIFVB60lXId5iITrm/XrZpS6q4J5/1B/PRS0tpBhoI93CLVBYzwJwV3k4j nOeeGDbOF4GpovZNdy1A49za9ycvWz0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1658907274; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zkWtaFjvHO+h70yxBl/vZPnpQu6YsT2P4TacQ7mT4kI=; b=fbYUdPu3Z2wBoAtpc6f3XIdvRuVKu5PkOH9H9RPplPdOs8C6dkgirdpm5ybCwrViJM3i1z ND4WRsMe9nLDhGmPaBNwI+BCa0VTudo89Dv6NAGCXMlrnlFz52QxzfpEoLzopee920mC6R Wb45VgVUiqrF0lCfXVn0UaeRG8JqP2g= X-Rspamd-Server: rspam10 X-Rspam-User: Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcdkim header.b=I0fRHrnQ; spf=pass (imf05.hostedemail.com: domain of quic_pkondeti@quicinc.com designates 129.46.98.28 as permitted sender) smtp.mailfrom=quic_pkondeti@quicinc.com; dmarc=pass (policy=none) header.from=quicinc.com X-Stat-Signature: 7957bwb8zmfh6zje3wimd863e3hbrmhb X-Rspamd-Queue-Id: 368A21000A2 X-HE-Tag: 1658907274-990359 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jan 28, 2022 at 05:09:54AM -0800, Michel Lespinasse wrote: > Introduce mmu_notifier_lock as a per-mm percpu_rw_semaphore, > as well as the code to initialize and destroy it together with the mm. > > This lock will be used to prevent races between mmu_notifier_register() > and speculative fault handlers that need to fire MMU notifications > without holding any of the mmap or rmap locks. > > Signed-off-by: Michel Lespinasse > --- > include/linux/mm_types.h | 6 +++++- > include/linux/mmu_notifier.h | 27 +++++++++++++++++++++++++-- > kernel/fork.c | 3 ++- > 3 files changed, 32 insertions(+), 4 deletions(-) > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 305f05d2a4bc..f77e2dec038d 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -462,6 +462,7 @@ struct vm_area_struct { > } __randomize_layout; > > struct kioctx_table; > +struct percpu_rw_semaphore; > struct mm_struct { > struct { > struct vm_area_struct *mmap; /* list of VMAs */ > @@ -608,7 +609,10 @@ struct mm_struct { > struct file __rcu *exe_file; > #ifdef CONFIG_MMU_NOTIFIER > struct mmu_notifier_subscriptions *notifier_subscriptions; > -#endif > +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT > + struct percpu_rw_semaphore *mmu_notifier_lock; > +#endif /* CONFIG_SPECULATIVE_PAGE_FAULT */ > +#endif /* CONFIG_MMU_NOTIFIER */ > #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS > pgtable_t pmd_huge_pte; /* protected by page_table_lock */ > #endif > diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h > index 45fc2c81e370..ace76fe91c0c 100644 > --- a/include/linux/mmu_notifier.h > +++ b/include/linux/mmu_notifier.h > @@ -6,6 +6,8 @@ > #include > #include > #include > +#include > +#include > #include > #include > > @@ -499,15 +501,35 @@ static inline void mmu_notifier_invalidate_range(struct mm_struct *mm, > __mmu_notifier_invalidate_range(mm, start, end); > } > > -static inline void mmu_notifier_subscriptions_init(struct mm_struct *mm) > +static inline bool mmu_notifier_subscriptions_init(struct mm_struct *mm) > { > +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT > + mm->mmu_notifier_lock = kzalloc(sizeof(struct percpu_rw_semaphore), GFP_KERNEL); > + if (!mm->mmu_notifier_lock) > + return false; > + if (percpu_init_rwsem(mm->mmu_notifier_lock)) { > + kfree(mm->mmu_notifier_lock); > + return false; > + } > +#endif > + > mm->notifier_subscriptions = NULL; > + return true; > } > > static inline void mmu_notifier_subscriptions_destroy(struct mm_struct *mm) > { > if (mm_has_notifiers(mm)) > __mmu_notifier_subscriptions_destroy(mm); > + > +#ifdef CONFIG_SPECULATIVE_PAGE_FAULT > + if (!in_atomic()) { > + percpu_free_rwsem(mm->mmu_notifier_lock); > + kfree(mm->mmu_notifier_lock); > + } else { > + percpu_rwsem_async_destroy(mm->mmu_notifier_lock); > + } > +#endif > } > We have received a bug report from our customer running Android GKI kernel android-13-5.15 branch where this series is included. As the callstack [1] indicates, the non-atomic test it self is not sufficient to free the percpu rwsem. The scenario deduced from the callstack: - context switch on CPU#0 from 'A' to idle. idle thread took A's mm - 'A' later ran on another CPU and exited. A's mm has still reference. - Now CPU#0 is being hotplugged out. As part of this, idle thread's mm is switched (in idle_task_exit()) but its active_mm freeing is deferred to finish_cpu() which gets called later from the control processor (the thread which initiated the CPU hotplug). Please see the reasoning on why mmdrop() is not called in idle_task_exit() at commit bf2c59fce4074('sched/core: Fix illegal RCU from offline CPUs') - Now when finish_cpu() tries call percpu_free_rwsem() directly since we are not in atomic path but hotplug path where cpus_write_lock() called is causing the deadlock. I am not sure if there is a clean way other than freeing the per-cpu rwsemaphore asynchronously all the time. [1] -001|context_switch(inline) -001|__schedule() -002|__preempt_count_sub(inline) -002|schedule() -003|_raw_spin_unlock_irq(inline) -003|spin_unlock_irq(inline) -003|percpu_rwsem_wait() -004|__preempt_count_add(inline) -004|__percpu_down_read() -005|percpu_down_read(inline) -005|cpus_read_lock() // trying to get cpu_hotplug_lock again -006|rcu_barrier() -007|rcu_sync_dtor() -008|mmu_notifier_subscriptions_destroy(inline) -008|__mmdrop() -009|mmdrop(inline) -009|finish_cpu() -010|cpuhp_invoke_callback() -011|cpuhp_invoke_callback_range(inline) -011|cpuhp_down_callbacks() -012|_cpu_down() // acquired cpu_hotplug_lock (write lock) Thanks, Pavan