From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0DF02C433E3 for ; Fri, 10 Jul 2020 17:05:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C33CE207D0 for ; Fri, 10 Jul 2020 17:05:10 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="1tRe4LZ6" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C33CE207D0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 440E78D0005; Fri, 10 Jul 2020 13:05:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3F3A78D0001; Fri, 10 Jul 2020 13:05:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 32ED08D0005; Fri, 10 Jul 2020 13:05:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0212.hostedemail.com [216.40.44.212]) by kanga.kvack.org (Postfix) with ESMTP id 1DF5F8D0001 for ; Fri, 10 Jul 2020 13:05:10 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id CD4B8180AD83E for ; Fri, 10 Jul 2020 17:05:09 +0000 (UTC) X-FDA: 77022791538.02.whip05_590d9a526ed0 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin02.hostedemail.com (Postfix) with ESMTP id ADA0120003327DB1 for ; Fri, 10 Jul 2020 17:05:09 +0000 (UTC) X-HE-Tag: whip05_590d9a526ed0 X-Filterd-Recvd-Size: 6410 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Fri, 10 Jul 2020 17:05:09 +0000 (UTC) Received: from mail-wr1-f52.google.com (mail-wr1-f52.google.com [209.85.221.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 05BA320857 for ; Fri, 10 Jul 2020 17:05:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1594400708; bh=3S7IewpNxJBD3L+ZfZ/E6af35AY9UGg9jqOelSrpKo4=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=1tRe4LZ68tVZGxEGQ7XeZ+yUcJKPuc4ZhIcHwk2QL6cG3SdjNkD2PO5p5HsegEww3 gmZfDPpe4jdgPlAuSo+u1XmlaVBSKvuVGiOOZM6SQIII2HSnN7c89x9nK4D4K4r+6q zcxAm6d3uRrL58fmcsOUp32tlZPRex6+m3Qe/Vmk= Received: by mail-wr1-f52.google.com with SMTP id f18so6726536wrs.0 for ; Fri, 10 Jul 2020 10:05:07 -0700 (PDT) X-Gm-Message-State: AOAM530WAV69u6hraDS3YjoQYNd6hmYp1UVLlMiR29WxlDG59BYq8rUJ XH4zXlhCyBQy5pHlOp5KS37eLUGqynwm9YqRz68ANw== X-Google-Smtp-Source: ABdhPJwltug8Dryi9wS37jyJz6qJjriDvb96hMI47ek0pD++Bm/QsUvqdOhO95P79buU9BFsgr/1LX6a5Jbh2/sEF7w= X-Received: by 2002:adf:e482:: with SMTP id i2mr67761053wrm.75.1594400706520; Fri, 10 Jul 2020 10:05:06 -0700 (PDT) MIME-Version: 1.0 References: <20200710015646.2020871-1-npiggin@gmail.com> <20200710015646.2020871-5-npiggin@gmail.com> In-Reply-To: <20200710015646.2020871-5-npiggin@gmail.com> From: Andy Lutomirski Date: Fri, 10 Jul 2020 10:04:54 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode To: Nicholas Piggin Cc: linux-arch , X86 ML , Mathieu Desnoyers , Arnd Bergmann , Peter Zijlstra , LKML , linuxppc-dev , Linux-MM , Anton Blanchard Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: ADA0120003327DB1 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jul 9, 2020 at 6:57 PM Nicholas Piggin wrote: > > And get rid of the generic sync_core_before_usermode facility. > > This helper is the wrong way around I think. The idea that membarrier > state requires a core sync before returning to user is the easy one > that does not need hiding behind membarrier calls. The gap in core > synchronization due to x86's sysret/sysexit and lazy tlb mode, is the > tricky detail that is better put in x86 lazy tlb code. > > Consider if an arch did not synchronize core in switch_mm either, then > membarrier_mm_sync_core_before_usermode would be in the wrong place > but arch specific mmu context functions would still be the right place. > There is also a exit_lazy_tlb case that is not covered by this call, which > could be a bugs (kthread use mm the membarrier process's mm then context > switch back to the process without switching mm or lazy mm switch). > > This makes lazy tlb code a bit more modular. The mm-switching and TLB-management has often had the regrettable property that it gets wired up in a way that seems to work at the time but doesn't have clear semantics, and I'm a bit concerned that this patch is in that category. If I'm understanding right, you're trying to enforce the property that exiting lazy TLB mode will promise to sync the core eventually. But this has all kinds of odd properties: - Why is exit_lazy_tlb() getting called at all in the relevant cases? When is it permissible to call it? I look at your new code and see: > +/* > + * Ensure that a core serializing instruction is issued before returning > + * to user-mode, if a SYNC_CORE was requested. x86 implements return to > + * user-space through sysexit, sysrel, and sysretq, which are not core > + * serializing. > + * > + * See the membarrier comment in finish_task_switch as to why this is done > + * in exit_lazy_tlb. > + */ > +#define exit_lazy_tlb exit_lazy_tlb > +static inline void exit_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk) > +{ > + /* Switching mm is serializing with write_cr3 */ > + if (tsk->mm != mm) > + return; And my brain says WTF? Surely you meant something like if (WARN_ON_ONCE(tsk->mm != mm)) { /* egads, what even happened? how do we try to recover well enough to get a crashed logged at least? */ } So this needs actual documentation, preferably in comments near the function, of what the preconditions are and what this mm parameter is. Once that's done, then we could consider whether it's appropriate to have this function promise to sync the core under some conditions. - This whole structure seems to rely on the idea that switching mm syncs something. I periodically ask chip vendor for non-serializing mm switches. Specifically, in my dream world, we have totally separate user and kernel page tables. Changing out the user tables doesn't serialize or even create a fence. Instead it creates the minimum required pipeline hazard such that user memory access and switches to usermode will make sure they access memory through the correct page tables. I haven't convinced a chip vendor yet, but there are quite a few hundreds of cycles to be saved here. With this in mind, I see the fencing aspects of the TLB handling code as somewhat of an accident. I'm fine with documenting them and using them to optimize other paths, but I think it should be explicit. For example: /* Also does a full barrier? (Or a sync_core()-style barrier.) However, if you rely on this, you must document it in a comment where you call this function. *? void switch_mm_irqs_off() { } This is kind of like how we strongly encourage anyone using smp_?mb() to document what they are fencing against. Also, as it stands, I can easily see in_irq() ceasing to promise to serialize. There are older kernels for which it does not promise to serialize. And I have plans to make it stop serializing in the nearish future. --Andy