From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8B9DC3DA4A for ; Thu, 1 Aug 2024 13:44:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 563AF6B009E; Thu, 1 Aug 2024 09:44:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 513496B00C1; Thu, 1 Aug 2024 09:44:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3B3B76B00C3; Thu, 1 Aug 2024 09:44:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 1D4E26B00C1 for ; Thu, 1 Aug 2024 09:44:11 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 90D1E120DFF for ; Thu, 1 Aug 2024 13:44:10 +0000 (UTC) X-FDA: 82403795460.12.FFCF19F Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf24.hostedemail.com (Postfix) with ESMTP id C1E93180002 for ; Thu, 1 Aug 2024 13:44:08 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=LNugnmQ8; spf=pass (imf24.hostedemail.com: domain of will@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=will@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722519843; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XB3sAX0g/JNEP9gnvLDFbK+Z6BAmFff2ldh6uVOSZ1I=; b=O70ujlrf8NN1E5BcG/GHs6BGFF99C93viR5n2dGDQHFWV7gdL6FhHjil/26ToJmgzEEXe3 k4SBn0STM9dzleW3fFqc6Dqin/f3SfubaHfOg+4mztBmrjtcjGjiIbDpkdzrso5BlOTcJl yIs3nrcIv2bCyJN8ttEg+Oe6CDWM1eA= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=LNugnmQ8; spf=pass (imf24.hostedemail.com: domain of will@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=will@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722519843; a=rsa-sha256; cv=none; b=cpA2D5U/Zi0IpEMtxEwnIMorOp8vr2CXVRcKXEOMdaR/APkPVTxrVzW01aYcHXsqm21IN9 NMwFz6Kf2HbgC7F7//s85hFmLSt7jhCC5NkEIiPk660LnVDudY99Jtnk+5bIwomP/wGbOY 8hlHK8icBKxQ4DkhZWQ6RvFS04ADiyU= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id CDAB3628A6; Thu, 1 Aug 2024 13:44:07 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 080FBC4AF0A; Thu, 1 Aug 2024 13:44:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1722519847; bh=RfncJUAECWo6xOr/QiiTb2O6yYVrKFtunr2zn7xLUR4=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=LNugnmQ8GMePBgVnjTn6Tj9c4WZsv3HiYS7SXlFELmmUxo8GHCiePHWXriBIxpsHm PFY+Qoagxeg9A6AJ1oEcZaJMILdrTNuNeOrh2stASuaPz2Dg5LVR7doW7x8o553Q2X AiI06id5IWdhk/PKW8+fkW6MD0KSc/MpawCDVxtuR2H+9mmy8tJSCg8k+ZVlfaLfj0 cxu2ANysbEtQ2r+l1TSwvFqim7cLZQzWbaiEp+32plzi5gKvWYOORhp73d9II9VcQi mKItsep07i1TIZyKWneAWFY1G+VbaBuWCz210B/DlUUdFLU9V7y60I1PHve+81mZSC 2Np6G8Btc7e2g== Date: Thu, 1 Aug 2024 14:43:59 +0100 From: Will Deacon To: David Hildenbrand Cc: Dev Jain , akpm@linux-foundation.org, willy@infradead.org, ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, osalvador@suse.de, baolin.wang@linux.alibaba.com, dave.hansen@linux.intel.com, baohua@kernel.org, ioworker0@gmail.com, gshan@redhat.com, mark.rutland@arm.com, kirill.shutemov@linux.intel.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, broonie@kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: Race condition observed between page migration and page fault handling on arm64 machines Message-ID: <20240801134358.GB4794@willie-the-truck> References: <20240801081657.1386743-1-dev.jain@arm.com> <3b82e195-5871-4880-9ce5-d01bb751f471@redhat.com> <92df0ee1-d3c9-41e2-834c-284127ae2c4c@arm.com> <19902a48-c59b-4e3b-afc5-e792506c2fd6@redhat.com> <6486a2b1-45ef-44b6-bd84-d402fc121373@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6486a2b1-45ef-44b6-bd84-d402fc121373@redhat.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Rspam-User: X-Stat-Signature: xo9six3upnag14o1jdrer7933qzjmn81 X-Rspamd-Queue-Id: C1E93180002 X-Rspamd-Server: rspam11 X-HE-Tag: 1722519848-226556 X-HE-Meta: U2FsdGVkX18vVPjCgz6OXAwxgETBQ+HG3lxnuq3dpd8XqkERumDDrqszwPDjFZMiFS+oyQUFRrbrvVo0RJ3Wzr3I3TA0rQXINWD1Z9dfAdVGpKFvQe8hYsEtY+eFbiDBZ+Oun1kj337HCrnGjbA9gesKdIKbc+579WXxSLzOFpbnLdXHSn8LPeervwtgh2g1XxADN+J2vIOiHumGoc2AiHKWWuOfynH7d9X337zr6YkuiWe2vQpAhhDw6M434butEE2idIYT2e3iJuhJnxl2+5BWbCyAXoBOpWC1K4Urt/jcrdm2Nv5bxyxu6ZcY4l8fqqpJnBkHFa+rjUNiyW+9zBfyof0yO7DT2ERCIq79grLemNiallKdFxwyLB+XbO3OmGMqbt7Sa3nUi5y0z1oOYZNPoV04J4X4nLe8Q7DdRRs0x/NK4j8o7UswpQs7zcDeFdKa23Fjbo3aQDg7dL+djKdi7xeQqRWXXqIz5amRoQFxQG1uMW45jzuuVTw3BjnNtwbmg4yhXakLeUClTSkcEL8Obmlhv+pDytaZmLiMqlfYmkpY4gSvA0UL7RMOwBiF5cY+7pY/KP+3Rab7sMKhkntyTWG8LDpa3r2eQ6cZMBZr3UOfuEZvxJBHdJYTyBOOKkXHX3utgO+h0F0qBH15glSqZiHfBvldTz86l5bfjzMJrmO791tHVMK8SC2UKNjb+HHtKBp3yVcMza9XL0o7tKTNNBOaPxOoaPSIQiYXekJCUV3pZVAihw+8ah9LIBSKwVcBaXI/qfYK+zntDoxfWjTE91axrInCfPWTKnUkqWLm7eQ+EpV1ZVVO/Zn+7aa5biLLFar4wSA1JLxRa4O6FAx44WZd/khujP4DptEZt7LBlejyh/jEyHwueHIxkPJYcTyW3g3NeCHwoV317FQYtQVMOYZW5dEcJZNdgLZnmx4uY5Z3DsxxE11PJtVR3xUThQNRFyU9E44CQFVVdRo wIY2g0Nf X0zQHvYWuzsAKrb97Gawgd+yxP6Rk1TEFizrRCwuxQbcBqbx9cyuoGe4x6lMsBA5b7fIvKVrghLa5MT+8haAf8RLjuf3nJnfoLFd3+piPRUoHrOMZBvG90iiSI2cpuNwKFZLjUm2IW2AKPNWAxTZah981wGdCyLd7HPDDuZQ1LyoIqthoJ3wxK3dffdiAhTWjEEUL X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Aug 01, 2024 at 03:26:57PM +0200, David Hildenbrand wrote: > On 01.08.24 15:13, David Hildenbrand wrote: > > > > > To dampen the tradeoff, we could do this in shmem_fault() instead? But > > > > > then, this would mean that we do this in all > > > > > > > > > > kinds of vma->vm_ops->fault, only when we discover another reference > > > > > count race condition :) Doing this in do_fault() > > > > > > > > > > should solve this once and for all. In fact, do_pte_missing() may call > > > > > do_anonymous_page() or do_fault(), and I just > > > > > > > > > > noticed that the former already checks this using vmf_pte_changed(). > > > > > > > > What I am still missing is why this is (a) arm64 only; and (b) if this > > > > is something we should really worry about. There are other reasons > > > > (e.g., speculative references) why migration could temporarily fail, > > > > does it happen that often that it is really something we have to worry > > > > about? > > > > > > > > > (a) See discussion at [1]; I guess it passes on x86, which is quite > > > strange since the race is clearly arch-independent. > > > > Yes, I think this is what we have to understand. Is the race simply less > > likely to trigger on x86? > > > > I would assume that it would trigger on any arch. > > > > I just ran it on a x86 VM with 2 NUMA nodes and it also seems to work here. > > > > Is this maybe related to deferred flushing? Such that the other CPU will > > by accident just observe the !pte_none a little less likely? > > > > But arm64 also usually defers flushes, right? At least unless > > ARM64_WORKAROUND_REPEAT_TLBI is around. With that we never do deferred > > flushes. > > Bingo! > > diff --git a/mm/rmap.c b/mm/rmap.c > index e51ed44f8b53..ce94b810586b 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -718,10 +718,7 @@ static void set_tlb_ubc_flush_pending(struct mm_struct > *mm, pte_t pteval, > */ > static bool should_defer_flush(struct mm_struct *mm, enum ttu_flags flags) > { > - if (!(flags & TTU_BATCH_FLUSH)) > - return false; > - > - return arch_tlbbatch_should_defer(mm); > + return false; > } > > > On x86: > > # ./migration > TAP version 13 > 1..1 > # Starting 1 tests from 1 test cases. > # RUN migration.shared_anon ... > Didn't migrate 1 pages > # migration.c:170:shared_anon:Expected migrate(ptr, self->n1, self->n2) (-2) > == 0 (0) > # shared_anon: Test terminated by assertion > # FAIL migration.shared_anon > not ok 1 migration.shared_anon > > > It fails all of the time! Nice work! I suppose that makes sense as, with the eager TLB invalidation, the window between the other CPU faulting and the migration entry being written is fairly wide. Not sure about a fix though :/ It feels a bit overkill to add a new invalid pte encoding just for this. Will