From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BEF9FC54E4A for ; Thu, 7 Mar 2024 20:42:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4E8786B02AB; Thu, 7 Mar 2024 15:42:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 497FF6B02AC; Thu, 7 Mar 2024 15:42:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 35F896B02AD; Thu, 7 Mar 2024 15:42:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 2799B6B02AB for ; Thu, 7 Mar 2024 15:42:23 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id EC9D41A0721 for ; Thu, 7 Mar 2024 20:42:22 +0000 (UTC) X-FDA: 81871415724.01.BBAF29D Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) by imf04.hostedemail.com (Postfix) with ESMTP id 3830740009 for ; Thu, 7 Mar 2024 20:42:20 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Fb5QokxR; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of 3rCbqZQoKCNcRHLKR3AF769HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--yosryahmed.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=3rCbqZQoKCNcRHLKR3AF769HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--yosryahmed.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709844141; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RQdNtN78xJnCksQQbRN+GLBXoPr5M5Y4bjQ8v3llxL0=; b=bVtJYKeJyywKXjTULLpO8xMdrK2Qc6fkdN4QowtnJRSHB+/o1gYN4kHoSk0RWUAKEEfBb5 qdCwYKrdWWL5jG2lCgUn2zAq0Uq/uYFGyHoi5X3sUhQkcQ0L9IpoU7DDNrnt+VjWYLg1NV 0eLBARBa5nmjICNn4NfZMp0f0ArEJtc= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Fb5QokxR; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of 3rCbqZQoKCNcRHLKR3AF769HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--yosryahmed.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=3rCbqZQoKCNcRHLKR3AF769HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--yosryahmed.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709844141; a=rsa-sha256; cv=none; b=t+hIS3bZo7vCf/zFauNCpLZJ+ba0HFwcxUcoZV3b96jvhJrr3zVNnUgHMgUN7y+eZWP/BN IA8gQRJnQW5TAFvQjKKXOajRYdmqz4VCILOoEXNE05pLVDmmFNVD6Ty4hufabpuZ+YGOGJ wXj1eFqGKYpjggzEDmwDoP/NJ86a/u4= Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-1dc8e1a6011so11348315ad.2 for ; Thu, 07 Mar 2024 12:42:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1709844140; x=1710448940; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=RQdNtN78xJnCksQQbRN+GLBXoPr5M5Y4bjQ8v3llxL0=; b=Fb5QokxRcwQloCER7i8niboDvnByTasP33a07bSjAY/LBkdJBzej6IP4iFT6h2cbF4 2nAcDwSans+gSNf3GRhgSA+qx+R7207o/hAmuYT9sCyD3NApDNOAXkHeDC2T3xoC/rDm D3uaqlWQT0MUTXV/LHD9UwEpXgpkGJX7mZo6JCOvaV6i+IR4BSaOt5LSPMm39f4amZmH mL08Sf0UYyaIrL5uCkledQMXAdmm0zwNnALUo2i/QUcl6N2BsVxjPA9lY8aWAllZTmqu yw/bZSmftdOAvss9sUIUncekgCs+JFywHlW9KMC6VojeNEwbeX7ol+Iqm6lYAZLJqfGQ 2p3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709844140; x=1710448940; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=RQdNtN78xJnCksQQbRN+GLBXoPr5M5Y4bjQ8v3llxL0=; b=sqM92Wgewtp8pYFSjselaQ9MM0QBIn549FCzPuyIoeh4Eds9NuedwQ/JyvMduJPxex Tcr6Dd9OrUQGvVNExTJeS/6oK3Com94mB+DZ0fS7j2mOOXBuhO+YcLSYNqabZrQbKk0q CnfJJehibT/EP5WyhFJ3gTKkN21QLDiKanIETQwWQcTNSIP51nlcScl7Yp8Xkpc2kaM/ ePh/SUGmdmL+1jfWTCY8MdvGaylGQ+O/RT86RIriTMallcu8En3jbAHl0OT/y/0TES7i JQOHXDWwOfkllKCDdZz3urstYchTKFuxhiuCbj3aXD/Fr2zyOrqp1DdqbU36MCscwSR2 lwnA== X-Forwarded-Encrypted: i=1; AJvYcCV9fs0sp7IxOZd3hunO7ZYPtcmc8+sKDIYMra+7Z9RdLKb++Yj5hMOqZhq4ujvUWztzUKaKK0K7ZYWqzODEAASv0oI= X-Gm-Message-State: AOJu0YyHaJx8EyNrsbeOEcGy4GqJn2+2Rc01a0g0yNbNWuFiaG7OD2zG K/+rGYX7F+MZWoPubrEbHQRIYfobTQr5DVCCwErcORUZNrjmSLVKYB5uJjL8wlFmiJM7JLuVkyt JaERlnJSjYV9oK7xFxQ== X-Google-Smtp-Source: AGHT+IHE8t4oN+uf/OQDciimdaaIli2dbKCyNod+F1ML8w/S26BvIeQRNR8rcjQpok9SEW4/QlY+jBrUrmk9QumG X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:20:ed76:c0a8:29b4]) (user=yosryahmed job=sendgmr) by 2002:a17:902:cec8:b0:1dc:df25:76d4 with SMTP id d8-20020a170902cec800b001dcdf2576d4mr493481plg.11.1709844140048; Thu, 07 Mar 2024 12:42:20 -0800 (PST) Date: Thu, 7 Mar 2024 20:42:18 +0000 In-Reply-To: Mime-Version: 1.0 References: <20240307133916.3782068-1-yosryahmed@google.com> <20240307133916.3782068-2-yosryahmed@google.com> Message-ID: Subject: Re: [RFC PATCH 1/3] x86/mm: fix LAM cr3 mask inconsistency during context switch From: Yosry Ahmed To: Dave Hansen Cc: Andrew Morton , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Peter Zijlstra , Andy Lutomirski , "Kirill A. Shutemov" , x86@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="us-ascii" X-Rspam-User: X-Stat-Signature: xdq4ojd39e8byd7sbq7xxt8dghkpasnk X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 3830740009 X-HE-Tag: 1709844140-980346 X-HE-Meta: U2FsdGVkX18Z+KGQ14gYdwZ3K2pB+CoFS5DJb+B2Qy/a20zxZfrIgEHitIgQfCPX6QlpZ9eqfJBgKs7430E2UMuThn6u1S6tSMjH+Jl98HrQPSkctlcHZTc6qD5q3WP/9gXgjgslDeSGjJ+jbGpvlJvGBhRIYd3soPvgmEQOXj/zq15PKk4lY/afOWV6XlNlh4SxDTpFT1AzVutVwt1XY4bdqBMNg/kHMgv3jwXd4UfqiVlpfZFWlkVPEsT3wpLmLHwQyUI3XhR4RrQ9b5L8u/PYG7CrufurLby3GwazMF0HWKzy2x7r6PA7Pv3qZkeQ1Ry5BrLCpB2ijmY9SyqEkBbllagV5w4SaEdT/N7wZcxP6xKt6T4G1FkCt+nU9/8CDGCRnNOPkiLeqxOat+aWwnKSKFbNnvbKboR3u+8L6m1J+C5KaH31i72ToBEwuZV8UItqc6V3HLy4fG/0RWqRaWFzaeuu8oVeAEpHDU8TE2gnWKiJOn83Wg4H5JVYAVdyAUsdHWM2qR5TQF5jao6pUjNL83EgoYhLLkecwjrvZeX0W+Xb8LiXLxSGdun0EVIqNn1aHhRgfrKAa8iNgYdHPtyO/aSFNk8VBZBk+i3fdA37R2oZR8gj8nS8bh/eVaSB3P3CF+F6qjGxtzX9XMvAORWFrIudDGdhbM4mu44+LcUsiwxGRIpkkfiR4JLDYDMZBi4eRrBVenLL0wQmhe9MpifWmfwDIautN3WTLNxKbFMChY5cr8I3sD/+xQmajPDIHZx9A6y8KRmvmbSEka3FMpqEPeQuuNuVcdv1MQbYenRazN4nTsdjyREfXlZ/ZRxbFF8qwk9KN7YkpjlzNdMsH9A3WPnR4kxkGL0ixv0freVWGU8G8ndlb2z+ysJBUPw2g7okInb9avtoMwRE/mP4w3gceGy2Xu+Ps0C87nemquQcStkN4mjPHzY2ZzBG6z7h3Uhw8UWaemy1JKKlo5W HTllQtI7 DZgAncnHLwQfHTmTi+uqMm+f86E88h5G6y2NeHtkCHj+2DUxQj+bgcqLjJhZHBxrWiJfdCZOZ8DPLcsb0zPFkrBPmYL7JfTZKpFSXAVDeEw2n81iTjtJLslT5bMkDkIXi8faqy6tagyNmB9rdLxTliL2QdB3sMif0CdbRIltKI3d9P4X/as+qJi2FMSUhuL9ue3QSq1u/h57kUEVrHp0pNT7SccY71fjQqpK3gJTOHtoM74PK2hL3PVL++jy19KZqhdKU41oqtjknLQ3pHEjdiNK61WAJPSbL04LMlZ/v/HpcrM5tnWz316NTnjSXMWN0WdbnXzIUjXoQWcBJpEkEq6syDgZdcuDADkOHoBi98IgW5ZN4rb8DxZiokujyRXNA3GKiOkW38+cXAP7bDFc9nd6ycaYvWG7+44vTA/igNX2LR+6eVJSrBDvu96j5FJkb1SvvxNuxjajvZqHc39vAX8dfusgqgrQAfdTp95B1ccMs3Dl7N6dC2phdLign/lzVx3gTx2BQvg/VSIE5G+9HXkSyXeiJ6UoRqnoPdXkCAWT7Npcim4H1jZfUXRnkWpHra7uZGXx8MYdxsRZCHvEeFmedQ/luvXCLp7d6KlKT0Fu8ZkpZBqPDwehVR8OIoPiWVVXuoy3ZsQsuD6zfKeSbJ7eNWWWBVHJGIH/mykGNRnJQXVuRPZHTpykvsE0gKREX4kfPjrWGbHM5sahvYa9oPRyBw9PfZXAQV08r X-Bogosity: Ham, tests=bogofilter, spamicity=0.000116, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Mar 07, 2024 at 09:36:58AM -0800, Dave Hansen wrote: > I know we all have different rules, but any time you could spend absorbing: > > https://www.kernel.org/doc/html/next/process/maintainer-tip.html Thanks for the quick review and tips. I didn't know this existed, I will take a look before respinning. > > would be appreciated, especially: > > > The condensed patch description in the subject line should start with > > a uppercase letter and should be written in imperative tone. > > > On 3/7/24 05:39, Yosry Ahmed wrote: > > In switch_mm_irqs_off(), we read the 'mm->context.lam_cr3_mask' into > > 'new_lam', which is later passed to load_new_mm_cr3(). However, there is > > a call to set_tlbstate_lam_mode() in between which will read > > 'mm->context.lam_cr3_mask' again and set 'cpu_tlbstate.lam' accordingly. > > If we race with another thread updating 'mm->context.lam_cr3_mask', the > > value in 'cpu_tlbstate.lam' could end up being different from CR3. > > Your description is fine (modulo the we's). But I slightly reworded it > to make it more plainly readable: > > LAM can only be enabled when a process is single-threaded. But _kernel_ > threads can temporarily use a single-threaded process's mm. That means > that a context-switching kernel thread can race and observe the mm's LAM > metadata (mm->context.lam_cr3_mask) change. > > The context switch code does two logical things with that metadata: > populate CR3 and populate 'cpu_tlbstate.lam'. If it hits this race, > 'cpu_tlbstate.lam' and CR3 can end up out of sync. > > This de-synchronization is currently harmless. But it is confusing and > might lead to warnings or real bugs. Thanks a lot! I will adopt your version moving forward :) > > -- > > > Fix the problem by updating set_tlbstate_lam_mode() to return the LAM > > mask that was set to 'cpu_tlbstate.lam', and use that mask in > > switch_mm_irqs_off() when writing CR3. Use READ_ONCE to make sure we > > read the mask once and use it consistenly. > > Spell checking is also appreciated. I did run checkpatch. Did it miss something? > > ... > > -static inline void set_tlbstate_lam_mode(struct mm_struct *mm) > > +static inline unsigned long set_tlbstate_lam_mode(struct mm_struct *mm) > > { > > - this_cpu_write(cpu_tlbstate.lam, > > - mm->context.lam_cr3_mask >> X86_CR3_LAM_U57_BIT); > > + unsigned long lam = READ_ONCE(mm->context.lam_cr3_mask); > > + > > + this_cpu_write(cpu_tlbstate.lam, lam >> X86_CR3_LAM_U57_BIT); > > this_cpu_write(tlbstate_untag_mask, mm->context.untag_mask); > > + return lam; > > } > > The comments about races need to be _here_ so that the purpose of the > READ_ONCE() is clear. > > It would also be nice to call out the rule that this can only > meaningfully be called once per context switch. I wanted the comments in switch_mm_irqs_off() where the races actually matter, but I guess I can make the comment more generic and specify that the return value is used to write CR3 so we READ_ONCE keeps CR3 and tlbstate.lam consistent. > > > @@ -633,7 +628,12 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next, > > barrier(); > > } > > > > - set_tlbstate_lam_mode(next); > > + /* > > + * Even if we are not actually switching mm's, another thread could have > > + * updated mm->context.lam_cr3_mask. Make sure tlbstate_lam_cr3_mask() > > + * and the loaded CR3 use the up-to-date mask. > > + */ > > I kinda dislike how the comment talks about the details of what > set_tlbstate_lam_mode() does. It would be much better to put the meat > of this comment at the set_tlbstate_lam_mode() definition. Agreed. I will move most comments to set_tlbstate_lam_mode(). > > > + new_lam = set_tlbstate_lam_mode(next); > > if (need_flush) { > > this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id); > > this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen); > > This is less a complaint about your change and more of the existing > code, but I wish it was more obvious that set_tlbstate_lam_mode() is > logically shuffling data (once) from 'next' into the tlbstate. > > The naming makes it sound like it is modifying the tlbstate of 'next'. We can update the function name to make it more verbose, maybe something like update_cpu_tlbstate_lam()? We can also try to put "return" somewhere in the name to imply that it returns the LAM mask it sets, but I can't make that look pretty. > > But I don't have any particularly brilliant ideas to fix it either. > Maybe just: > > /* new_lam is effectively cpu_tlbstate.lam */ > > > @@ -705,7 +705,6 @@ void initialize_tlbstate_and_flush(void) > > > > /* LAM expected to be disabled */ > > WARN_ON(cr3 & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57)); > > - WARN_ON(mm_lam_cr3_mask(mm)); > > > > /* > > * Assert that CR4.PCIDE is set if needed. (CR4.PCIDE initialization > > @@ -724,7 +723,7 @@ void initialize_tlbstate_and_flush(void) > > this_cpu_write(cpu_tlbstate.next_asid, 1); > > this_cpu_write(cpu_tlbstate.ctxs[0].ctx_id, mm->context.ctx_id); > > this_cpu_write(cpu_tlbstate.ctxs[0].tlb_gen, tlb_gen); > > - set_tlbstate_lam_mode(mm); > > + WARN_ON(set_tlbstate_lam_mode(mm)); > > The "set_" naming bugs me in both of the sites that get modified here. > I'd be with a new name that fits better, if we can think of one. Is it because it's not clear we are updating cpu_tlbstate (in which case I think update_cpu_tlbstate_lam() is an improvement), or is it because the function returns a value now? If the latter we can put "return" in the name somewhere, or keep the function void and pass in an output parameter.