From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 704C4C433FE for ; Sat, 5 Dec 2020 08:00:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D7FE222D6F for ; Sat, 5 Dec 2020 08:00:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D7FE222D6F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A6C246B0036; Sat, 5 Dec 2020 03:00:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A1C9A6B005D; Sat, 5 Dec 2020 03:00:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8E2796B0068; Sat, 5 Dec 2020 03:00:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0187.hostedemail.com [216.40.44.187]) by kanga.kvack.org (Postfix) with ESMTP id 7487C6B0036 for ; Sat, 5 Dec 2020 03:00:10 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 31BCE363D for ; Sat, 5 Dec 2020 08:00:10 +0000 (UTC) X-FDA: 77558480580.20.camp62_5311407273cb Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin20.hostedemail.com (Postfix) with ESMTP id 137AE180C07AB for ; Sat, 5 Dec 2020 08:00:10 +0000 (UTC) X-HE-Tag: camp62_5311407273cb X-Filterd-Recvd-Size: 10145 Received: from mail-pg1-f196.google.com (mail-pg1-f196.google.com [209.85.215.196]) by imf47.hostedemail.com (Postfix) with ESMTP for ; Sat, 5 Dec 2020 08:00:09 +0000 (UTC) Received: by mail-pg1-f196.google.com with SMTP id o4so5023539pgj.0 for ; Sat, 05 Dec 2020 00:00:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:subject:to:cc:references:in-reply-to:mime-version :message-id:content-transfer-encoding; bh=0masjBUaEdT8yhWPTiGZKXmGUQ0WDl0QMJjmOmFRbgI=; b=Gc/gJZMQJsaxJezPjoIJ+l5Klj3CzVb5UfjGmnUDQ+DMJR1ckzh7xASLtrBAkakChE Grl/mbVH7ZhodnKKNM2V9SyaZmqJG37yL3RpwHLG4q20CIoG0WuD8qDvAO8cK0/5pOI0 XLsxT4WCK17cUlrgT8NKnQ5XHp4ZttYAJyAOErwofFJmxZrdS7k60N/YEMuAPFjj724x 6Y3lOmK1Fa/4/jUmD8oygPoU6iU8yiVqkQIaS7YpdjpykZeVq+kx+BCFkvagYRVtu/LW nNkXd6MTx3rnKjDArqwSiQDF0G/uOxtru2icj5A/knbFbSw4HW25IWRF8hP5S0xki7Ot QuLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:subject:to:cc:references:in-reply-to :mime-version:message-id:content-transfer-encoding; bh=0masjBUaEdT8yhWPTiGZKXmGUQ0WDl0QMJjmOmFRbgI=; b=HbzLtxAlG9vgN8AeKjzmBVzPfmBNvY//aS/QgZoIhGb1N+NU8vLOAhV8UmRrr9y3zP JJx6+TvNfzAaIk/1xpuNRiDa7nmPRw6k6Kz4FmK3tA0S+eaDEPLWV38kFcrp48Ymc0JH NCYSn3cSxUoxoUPZibcL7ElmsltS10U9UBjcUYDAjGBVqos5P5qFknbBx/iovmP2iEKO ufgmgDfAMy/aaYoPn41Nf7IhuVvaxzN+XJLpYaBrGSJxIJ37pHyuFBqEc9HI8GBniB9Y P83HQozq4BB/2n6vk3c/+0X8FIv2R31zB+9b8qwdddVIgYUymQSp5PH5Sv4RGzepsmID tXmQ== X-Gm-Message-State: AOAM533wtpcDKSPlAAJVdaL21zK/F+0pa8bliJuaan3YDZ3rydUaNMF0 RCuZmTf8LS8s4TnfefL8gQ0= X-Google-Smtp-Source: ABdhPJxxq5hhiaL6DPOoOhrdsC1HhF8QISir8hTk4ZBkYS67jaUKjbr7sa21l559bUoRv8iVAoVSKg== X-Received: by 2002:a63:a55d:: with SMTP id r29mr10713433pgu.289.1607155208529; Sat, 05 Dec 2020 00:00:08 -0800 (PST) Received: from localhost ([1.129.168.124]) by smtp.gmail.com with ESMTPSA id n5sm6278072pgm.29.2020.12.05.00.00.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 05 Dec 2020 00:00:07 -0800 (PST) Date: Sat, 05 Dec 2020 18:00:00 +1000 From: Nicholas Piggin Subject: Re: [PATCH 2/8] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode To: Andy Lutomirski Cc: Anton Blanchard , Arnd Bergmann , linux-arch , LKML , Linux-MM , linuxppc-dev , Mathieu Desnoyers , Peter Zijlstra , X86 ML References: <20201128160141.1003903-1-npiggin@gmail.com> <20201128160141.1003903-3-npiggin@gmail.com> <1606876327.dyrhkih2kh.astroid@bobo.none> In-Reply-To: MIME-Version: 1.0 Message-Id: <1607152918.fkgmomgfw9.astroid@bobo.none> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Excerpts from Andy Lutomirski's message of December 3, 2020 3:09 pm: > On Tue, Dec 1, 2020 at 6:50 PM Nicholas Piggin wrote: >> >> Excerpts from Andy Lutomirski's message of November 29, 2020 3:55 am: >> > On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin wr= ote: >> >> >> >> And get rid of the generic sync_core_before_usermode facility. This i= s >> >> functionally a no-op in the core scheduler code, but it also catches >> >> >> >> This helper is the wrong way around I think. The idea that membarrier >> >> state requires a core sync before returning to user is the easy one >> >> that does not need hiding behind membarrier calls. The gap in core >> >> synchronization due to x86's sysret/sysexit and lazy tlb mode, is the >> >> tricky detail that is better put in x86 lazy tlb code. >> >> >> >> Consider if an arch did not synchronize core in switch_mm either, the= n >> >> membarrier_mm_sync_core_before_usermode would be in the wrong place >> >> but arch specific mmu context functions would still be the right plac= e. >> >> There is also a exit_lazy_tlb case that is not covered by this call, = which >> >> could be a bugs (kthread use mm the membarrier process's mm then cont= ext >> >> switch back to the process without switching mm or lazy mm switch). >> >> >> >> This makes lazy tlb code a bit more modular. >> > >> > I have a couple of membarrier fixes that I want to send out today or >> > tomorrow, and they might eliminate the need for this patch. Let me >> > think about this a little bit. I'll cc you. The existing code is way >> > to subtle and the comments are far too confusing for me to be quickly >> > confident about any of my conclusions :) >> > >> >> Thanks for the head's up. I'll have to have a better look through them >> but I don't know that it eliminates the need for this entirely although >> it might close some gaps and make this not a bug fix. The problem here >> is x86 code wanted something to be called when a lazy mm is unlazied, >> but it missed some spots and also the core scheduler doesn't need to >> know about those x86 details if it has this generic call that annotates >> the lazy handling better. >=20 > I'll send v3 tomorrow. They add more sync_core_before_usermode() callers= . >=20 > Having looked at your patches a bunch and the membarrier code a bunch, > I don't think I like the approach of pushing this logic out into new > core functions called by arch code. Right now, even with my > membarrier patches applied, understanding how (for example) the x86 > switch_mm_irqs_off() plus the scheduler code provides the barriers > that membarrier needs is quite complicated, and it's not clear to me > that the code is even correct. At the very least I'm pretty sure that > the x86 comments are misleading. > > With your patches, someone trying to > audit the code would have to follow core code calling into arch code > and back out into core code to figure out what's going on. I think > the result is worse. Sorry I missed this and rather than reply to the later version you have a bigger comment here. I disagree. Until now nobody following it noticed that the mm gets un-lazied in other cases, because that was not too clear from the code (only indirectly using non-standard terminology in the arch support document). In other words, membarrier needs a special sync to deal with the case=20 when a kthread takes the mm. exit_lazy_tlb gives membarrier code that=20 exact hook that it wants from the core scheduler code. >=20 > I wrote this incomplete patch which takes the opposite approach (sorry > for whitespace damage): That said, if you want to move the code entirely in the x86 arch from exit_lazy_tlb to switch_mm_irqs_off, it's trivial and touches no core code after my series :) and I would have no problem with doing that. I suspect it might actually be more readable to go the other way and pull most of the real_prev =3D=3D next membarrier code into exit_lazy_tlb instead, but I could be wrong I don't know exactly how the x86 lazy state correlates with core lazy tlb state. Thanks, Nick >=20 > commit 928b5c60e93f475934892d6e0b357ebf0a2bf87d > Author: Andy Lutomirski > Date: Wed Dec 2 17:24:02 2020 -0800 >=20 > [WIP] x86/mm: Handle unlazying membarrier core sync in the arch code >=20 > The core scheduler isn't a great place for > membarrier_mm_sync_core_before_usermode() -- the core scheduler > doesn't actually know whether we are lazy. With the old code, if a > CPU is running a membarrier-registered task, goes idle, gets unlazied > via a TLB shootdown IPI, and switches back to the > membarrier-registered task, it will do an unnecessary core sync. >=20 > Conveniently, x86 is the only architecture that does anything in this > hook, so we can just move the code. >=20 > XXX: actually delete the old code. >=20 > Signed-off-by: Andy Lutomirski >=20 > diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c > index 3338a1feccf9..e27300fc865b 100644 > --- a/arch/x86/mm/tlb.c > +++ b/arch/x86/mm/tlb.c > @@ -496,6 +496,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, > struct mm_struct *next, > * from one thread in a process to another thread in the same > * process. No TLB flush required. > */ > + > + // XXX: why is this okay wrt membarrier? > if (!was_lazy) > return; >=20 > @@ -508,12 +510,24 @@ void switch_mm_irqs_off(struct mm_struct *prev, > struct mm_struct *next, > smp_mb(); > next_tlb_gen =3D atomic64_read(&next->context.tlb_gen); > if (this_cpu_read(cpu_tlbstate.ctxs[prev_asid].tlb_gen) =3D=3D > - next_tlb_gen) > + next_tlb_gen) { > + /* > + * We're reactivating an mm, and membarrier might > + * need to serialize. Tell membarrier. > + */ > + > + // XXX: I can't understand the logic in > + // membarrier_mm_sync_core_before_usermode(). What's > + // the mm check for? > + membarrier_mm_sync_core_before_usermode(next); > return; > + } >=20 > /* > * TLB contents went out of date while we were in lazy > * mode. Fall through to the TLB switching code below. > + * No need for an explicit membarrier invocation -- the CR3 > + * write will serialize. > */ > new_asid =3D prev_asid; > need_flush =3D true; >=20