From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD933C5475B for ; Fri, 8 Mar 2024 08:09:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 36A1D6B0346; Fri, 8 Mar 2024 03:09:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2CB996B0347; Fri, 8 Mar 2024 03:09:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 146F56B0348; Fri, 8 Mar 2024 03:09:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id F2A5C6B0346 for ; Fri, 8 Mar 2024 03:09:25 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 9E897A1433 for ; Fri, 8 Mar 2024 08:09:25 +0000 (UTC) X-FDA: 81873147090.30.3492711 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf15.hostedemail.com (Postfix) with ESMTP id 11D59A0011 for ; Fri, 8 Mar 2024 08:09:23 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=xbqVutWw; spf=pass (imf15.hostedemail.com: domain of 3s8fqZQoKCCQYOSRYAHMEDGOOGLE.COMLINUX-MMKVACK.ORG@flex--yosryahmed.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3s8fqZQoKCCQYOSRYAHMEDGOOGLE.COMLINUX-MMKVACK.ORG@flex--yosryahmed.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709885364; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jVM5khotddcw4+zdLDFsWg/Hyre6BquckIFzuEdqmQo=; b=fQm0cDVllhCi1PcUei6VPY4yftewkDcdY7WwI+IsEfMMdDAAIPawXB2FHcaeu2BPekYsg3 w78PI2L1JhsG6hvT4ID8203+HYXnLLvJTfGrVFz8geiLAz342Z0XPZhWqiWO8yykPcYurO xGIZniOecbJzE6PSbP1tevF4MktFgUA= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=xbqVutWw; spf=pass (imf15.hostedemail.com: domain of 3s8fqZQoKCCQYOSRYAHMEDGOOGLE.COMLINUX-MMKVACK.ORG@flex--yosryahmed.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3s8fqZQoKCCQYOSRYAHMEDGOOGLE.COMLINUX-MMKVACK.ORG@flex--yosryahmed.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709885364; a=rsa-sha256; cv=none; b=BbeRdUnDFTkdxPXlbDt+y9fYzx1Nsqhr7fh60TjEOlBs4WlSvnN4mtZUrJFCHqZPS9P9lw qCxpKmKETLxszeFS6Rd0tupqTfRuqPaas5k9IAnQ2qUU2tYCXY+JN6SGEe1ojnV/qigc/2 /vsPJoqrlVTOqENzV/kLboqqbUdrFBg= Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-dd0ae66422fso3583318276.0 for ; Fri, 08 Mar 2024 00:09:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1709885363; x=1710490163; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=jVM5khotddcw4+zdLDFsWg/Hyre6BquckIFzuEdqmQo=; b=xbqVutWwTjkaMYOa4a7xr0/TK5n2SFTekgnMerWxIWwOoBdm50/h69D1zzmlcEW95d kq5RcRzMJQ49+bAlDncUZ8YesgFqIC4dCnn59RKpgqpKNAtBJ5XcDhFuJs8tdAPnl3ds Fz05+fPrHIU471CtcJ1fYGoDPfH2EmNnmSA8sMlHxYcAH9PLNqPhMll1rbymEMiwUcXs bmwZJvH0zV8kSIcYPyN6egUVJ2JGZm87CKeg2bDyTEb/6D+B/saomKg58SydghMA6GeP Zojc0FNrnZ4ZXU/qzSlPrmu1Xd8k+HGHN5nvZYubOnsAjfx/0B/mRXEJAi/jHqbBAbUP fL/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709885363; x=1710490163; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jVM5khotddcw4+zdLDFsWg/Hyre6BquckIFzuEdqmQo=; b=B6fA98ESI02f0J/8st0MCuB8sxBgZCIeHx9qoif+waJQNshRxKWRLebxl7PKo2Xcet VLO+NC64mALM8HkUQ3zuG4L4n9yQ6PdN+9Y93B60atdQAGJHn9ZjUq9oYPFQ9EluxMJt 90zXMKLppe8PvIoa6iwgLmzB9kYCsDYH+TXLu6PrsDXmWjT6upYO7iViWnunw8dqhxQ8 vg7UzW5UlwJI7WQDIJapggLLyu4xVEyPVfWpWX64chuU6G+orIVg7y2w2YuSb+58bYU9 c4pJ4bsy2ymgqaAuyagdazFtVploRIh1Axybwbxr6JsFzzF9Rx+Ey8wdxXVU5pr5DjHm 1WYA== X-Forwarded-Encrypted: i=1; AJvYcCV8NYLI7vbaGTMinqdlajLpmEWYMf2y4J+WYyAHmJ498QGZVzLekLbgSb8Kh+JvhobfnN2RVkudvvYxrWWsrawhKvk= X-Gm-Message-State: AOJu0Yzr55wWL0e+u1vzH6r9t2TCe44iorlwV7Hnt+sASN56k+K/33BI 5h3V33WMnayRUNDhTvV+JtxsH8Y+EB+gIvwmZF6EWHl5H6K/ShDJe9Y6Ob7mbNJix/+y196tCcy v6BDAZTu2Kw7Y3oAqIg== X-Google-Smtp-Source: AGHT+IE4e/40kxNJIZ25T26PPyJwuyDGGNrgX2SRxW+xX6qQdtulmHhP+q/CKGklzz9g9RTcpU7g3WnpDj3f5I0S X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:20:ed76:c0a8:29b4]) (user=yosryahmed job=sendgmr) by 2002:a05:6902:120a:b0:dc7:7655:46ce with SMTP id s10-20020a056902120a00b00dc7765546cemr1062959ybu.2.1709885363070; Fri, 08 Mar 2024 00:09:23 -0800 (PST) Date: Fri, 8 Mar 2024 08:09:21 +0000 In-Reply-To: Mime-Version: 1.0 References: <20240307133916.3782068-1-yosryahmed@google.com> <20240307133916.3782068-3-yosryahmed@google.com> <83b019e2-1b84-491a-b0b9-beb02e45d80c@intel.com> Message-ID: Subject: Re: [RFC PATCH 2/3] x86/mm: make sure LAM is up-to-date during context switching From: Yosry Ahmed To: Dave Hansen Cc: Andrew Morton , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Peter Zijlstra , Andy Lutomirski , "Kirill A. Shutemov" , x86@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="us-ascii" X-Rspamd-Queue-Id: 11D59A0011 X-Rspam-User: X-Stat-Signature: h37orjyodm1i8tb8oidmdc53wpyoqmko X-Rspamd-Server: rspam01 X-HE-Tag: 1709885363-909967 X-HE-Meta: U2FsdGVkX19KuwC+o+QKf4Nid3hbLRs9K4WotIlVyUb3k7VFS41WTT1WHMDgtyuxNcoLFIvtG8jtCvyMmdld7dekDfiHE9mlq3geWIJDls2cwrW06e12k8spt2UWki/APZRW4BNCY9hFbQzAO/s7c4Z+BPXjx5BB4IVNr4SNkhK05LF3phA1jOA2t+pa8DrhU6SBl/5Bxholc5pabQmss5IuY6NnogBiewF1nGkYZpVcD4mvidZvqWJWNiW5MIvWZvbIf40AugtbFNIq6ES6S/U0bE0dyfAmqjT4wPgbCtQdEbKW8dtAUpJvoJwiQw4NxUifjCxhhk+FXqjDSCF65ih2S5Pk4ik3YSBl1VNksC53FdN28QjLe3VUiTzFUaB1iIrlSaK0nn22PoJBYfKXqP816EK+J0tiYx8tw/V4woKwVfw/CUZQFvty8woBk0NNPweqP2/G+wgggj0ZTt47+drMI2LqJojzJjGfFSGQah7b4+QIv3R5jBJd4MLdFh756cJcGs1JSUFxmDXg/FNRlcoc3IMLKlbUXyw2WYYg8I1Uei4Dm99kaeMG0WnnULNxwBYr35pq9FGitnL41NvyJsKJxAZ0U/Dx4IOIpegxPy7LLdl/hz5aSSh80iBXeWewDMemaGrOrDX/7VhVTQh00x/VUgZZOELplBY7Jin3TQvr6OFFs3IRS+dSM5WkrmlUFAhtMEpsaqE72Z2e6NiChdUahLXGlo8ZOWYHvc3CRiRsGac1T7ulRxZevgSS80PfiqlhSwK2L20la32FrDOhNucobZ6gTZzmJB1QtpimHB8W8jJRsBPTS2VW1XJ8bdJ53oJ/7TfEn3ttYdp6AHefedWVvRbuXSxenAAMJge0JlHH+7bbh9jJJT9sEUY9q456pTx9GTOUVbBgfIi8x0QiT/BR7GUmeeP351BETyDBQZJJ1po9PX6Fo+8n7ei5EZcoOF6lH3LhrIrIu7T2x4m uyApnsXx 8PIWm9fDHMaRrRnQMYXozLCi2brN7A1BUnb9R0a++ra06B6G2F7pb5REGpECvKprrRPvEiaItvhJVvtcHIWtHwdqXp/AqhN8r88RSzbllVM27W55A7+0uxAnh0OR/d0oz0DErN/F0B2B9THvGKb5g28Vmmq8+b3q3Z4IOHCSfRA4SIC3mRLReqS1+exWWSz282PAYm+l+gBPijBxEu8eD/CuMl5gjIelYOoAcRSn9BWE+9yGXrddsWMpFYLDzMLu4a6cPu9aBt1NobNU7zh3VIhEcGkkPgnVeZ/axkL70PNrwP094eJ9clPXaz+1l5SOew9MiIPP0xbZLAJpjj278T93AIKh5vTe2IeeI0H2b47AX10PBUwWjtSPM5Yg7B7bmOP4W X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > I came up with a kernel patch that I *think* may reproduce the problem > with enough iterations. Userspace only needs to enable LAM, so I think > the selftest can be enough to trigger it. > > However, there is no hardware with LAM at my disposal, and IIUC I cannot > use QEMU without KVM to run a kernel with LAM. I was planning to do more > testing before sending a non-RFC version, but apparently I cannot do > any testing beyond building at this point (including reproducing) :/ > > Let me know how you want to proceed. I can send a non-RFC v1 based on > the feedback I got on the RFC, but it will only be build tested. > > For the record, here is the diff that I *think* may reproduce the bug: Okay, I was actually able to run _some_ testing with the diff below on _a kernel_, and I hit the BUG_ON pretty quickly. If I did things correctly, this BUG_ON means that even though we have an outdated LAM in our CR3, we will not update CR3 because the TLB is up-to-date. I can work on a v1 now with the IPI approach that Andy suggested. A small kink is that we may still hit the BUG_ON with that fix, but in that case it should be fine to not write CR3 because once we re-enable interrupts we will receive the IPI and fix it. IOW, the diff below will still BUG with the proposed fix, but it should be okay. One thing I am not clear about with the IPI approach, if we use mm_cpumask() to limit the IPI scope, we need to make sure that we read mm_lam_cr3_mask() *after* we update the cpumask in switch_mm_irqs_off(), which makes me think we'll need a barrier (and Andy said we want to avoid those in this path). But looking at the code I see: /* * Start remote flushes and then read tlb_gen. */ if (next != &init_mm) cpumask_set_cpu(cpu, mm_cpumask(next)); next_tlb_gen = atomic64_read(&next->context.tlb_gen); This code doesn't have a barrier. How do we make sure the read actually happens after the write? If no barrier is needed there, then I think we can similarly just read the LAM mask after cpumask_set_cpu(). > > diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c > index 33b268747bb7b..c37a8c26a3c21 100644 > --- a/arch/x86/kernel/process_64.c > +++ b/arch/x86/kernel/process_64.c > @@ -750,8 +750,25 @@ static long prctl_map_vdso(const struct vdso_image *image, unsigned long addr) > > #define LAM_U57_BITS 6 > > +static int kthread_fn(void *_mm) > +{ > + struct mm_struct *mm = _mm; > + > + /* > + * Wait for LAM to be enabled then schedule. Hopefully we will context > + * switch directly into the task that enabled LAM due to CPU pinning. > + */ > + kthread_use_mm(mm); > + while (!test_bit(MM_CONTEXT_LOCK_LAM, &mm->context.flags)); > + schedule(); > + return 0; > +} > + > static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits) > { > + struct task_struct *kthread_task; > + int kthread_cpu; > + > if (!cpu_feature_enabled(X86_FEATURE_LAM)) > return -ENODEV; > > @@ -782,10 +799,22 @@ static int prctl_enable_tagged_addr(struct mm_struct *mm, unsigned long nr_bits) > return -EINVAL; > } > > + /* Pin the task to the current CPU */ > + set_cpus_allowed_ptr(current, cpumask_of(smp_processor_id())); > + > + /* Run a kthread on another CPU and wait for it to start */ > + kthread_cpu = cpumask_next_wrap(smp_processor_id(), cpu_online_mask, 0, false), > + kthread_task = kthread_run_on_cpu(kthread_fn, mm, kthread_cpu, "lam_repro_kthread"); > + while (!task_is_running(kthread_task)); > + > write_cr3(__read_cr3() | mm->context.lam_cr3_mask); > set_tlbstate_lam_mode(mm); > set_bit(MM_CONTEXT_LOCK_LAM, &mm->context.flags); > > + /* Move the task to the kthread CPU */ > + set_cpus_allowed_ptr(current, cpumask_of(kthread_cpu)); > + > mmap_write_unlock(mm); > > return 0; > diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c > index 51f9f56941058..3afb53f1a1901 100644 > --- a/arch/x86/mm/tlb.c > +++ b/arch/x86/mm/tlb.c > @@ -593,7 +593,7 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next, > next_tlb_gen = atomic64_read(&next->context.tlb_gen); > if (this_cpu_read(cpu_tlbstate.ctxs[prev_asid].tlb_gen) == > next_tlb_gen) > - return; > + BUG_ON(new_lam != tlbstate_lam_cr3_mask()); > > /* > * TLB contents went out of date while we were in lazy >