From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3102CC4361B for ; Sat, 5 Dec 2020 23:15:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 89DA723122 for ; Sat, 5 Dec 2020 23:15:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 89DA723122 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 908076B005C; Sat, 5 Dec 2020 18:15:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 88F396B005D; Sat, 5 Dec 2020 18:15:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 72F7C6B0068; Sat, 5 Dec 2020 18:15:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0059.hostedemail.com [216.40.44.59]) by kanga.kvack.org (Postfix) with ESMTP id 5667E6B005C for ; Sat, 5 Dec 2020 18:15:07 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 124F0362C for ; Sat, 5 Dec 2020 23:15:07 +0000 (UTC) X-FDA: 77560786254.17.bell53_610c49c273d1 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin17.hostedemail.com (Postfix) with ESMTP id EE6F9180D0180 for ; Sat, 5 Dec 2020 23:15:06 +0000 (UTC) X-HE-Tag: bell53_610c49c273d1 X-Filterd-Recvd-Size: 7184 Received: from mail-pf1-f195.google.com (mail-pf1-f195.google.com [209.85.210.195]) by imf02.hostedemail.com (Postfix) with ESMTP for ; Sat, 5 Dec 2020 23:15:06 +0000 (UTC) Received: by mail-pf1-f195.google.com with SMTP id 11so193822pfu.4 for ; Sat, 05 Dec 2020 15:15:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:subject:to:cc:references:in-reply-to:mime-version :message-id:content-transfer-encoding; bh=L+vg4JbP0FJBCP1pF1I695Sm385f278Rs5jRmS9rNXg=; b=QqSC0/j6vjRBOiUDUATi/LPsIVAKUy7bK3rktA0IiR7ML8OM2DpeGXtMsUyK3nMwvk pSIpV88iDIfxTLuHlmK8QbBnKk9K3t3fr+y6EQaJIyRri4NV/kXIDMmBsMZJZSctTZtK HhdYeIA6jeB/vHZdcEGMnXvdBMLb3PuBdsn0rJvHnz/Lly9SVkMu+NWh4/mOtxfPk22B G98NTD3lNPX4TNB+6ZX95CA788LnFcXhbCyOA600VVkyE4El+oW55ey3b5hgAYD+ZvMV XEhp7uyplSDklj4P9m6cf1EiAoFeRjIeP8qF9dgrcpjqlXWreqnm2tGOqgBF6WlFcWiJ 4FLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:subject:to:cc:references:in-reply-to :mime-version:message-id:content-transfer-encoding; bh=L+vg4JbP0FJBCP1pF1I695Sm385f278Rs5jRmS9rNXg=; b=bgXbuKQbqJ94clZUqdKMBmKrchCqmqPoJHCtPk4yevGj4RwP2ZmVP5l8XbT2nodpV8 fZDkt+pScWFP4FKgEeEFinGTiT607THzT94KWqZihri1KnV2da4sy1q1YZ3X9TuGMG+I NqNx5gxXa6tyl2Y9Iye9kXLwDBYfVORMpFliCDpG1OIsxjowtR1ZrffH39B9Ez01dDEH uPg3gCdWrl2bq9GL0fdbkba8stEkkjAOpYZEeyE4VYvhKKLnY8YHpQLYT6n9VdukaaCo djj7ErqgsnBIv56FKpkOMdW4AKJwRnXJP6bWkPOBHy+VNBV9O1nix110nFIMuo6RgVn4 TURA== X-Gm-Message-State: AOAM530nODgKYt/gJ/nGfPpdRS+DSnC2K3lTSkVOHTX75nztcaKJ9zuu OUPAVerqQBmZ3U0R2pc2Wk0= X-Google-Smtp-Source: ABdhPJzfxzuJbfqX3AxcZgkq+sDJtUgImoz4dGsHTfVrStKXtO6JvVyi6Isgck4AhVl4gjBk5OYlzg== X-Received: by 2002:a63:1107:: with SMTP id g7mr12703758pgl.432.1607210105367; Sat, 05 Dec 2020 15:15:05 -0800 (PST) Received: from localhost ([1.129.241.238]) by smtp.gmail.com with ESMTPSA id k26sm9514644pfg.8.2020.12.05.15.15.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 05 Dec 2020 15:15:04 -0800 (PST) Date: Sun, 06 Dec 2020 09:14:57 +1000 From: Nicholas Piggin Subject: Re: [PATCH 2/8] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode To: Andy Lutomirski Cc: Anton Blanchard , Arnd Bergmann , linux-arch , LKML , Linux-MM , linuxppc-dev , Andy Lutomirski , Mathieu Desnoyers , Peter Zijlstra , X86 ML References: <1607152918.fkgmomgfw9.astroid@bobo.none> <116A6B40-C77B-4B6A-897B-18342CD62CEC@amacapital.net> In-Reply-To: <116A6B40-C77B-4B6A-897B-18342CD62CEC@amacapital.net> MIME-Version: 1.0 Message-Id: <1607209402.fogfsh8ov4.astroid@bobo.none> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Excerpts from Andy Lutomirski's message of December 6, 2020 2:11 am: >=20 >> On Dec 5, 2020, at 12:00 AM, Nicholas Piggin wrote: >>=20 >>=20 >> I disagree. Until now nobody following it noticed that the mm gets >> un-lazied in other cases, because that was not too clear from the >> code (only indirectly using non-standard terminology in the arch >> support document). >=20 >> In other words, membarrier needs a special sync to deal with the case=20 >> when a kthread takes the mm. >=20 > I don=E2=80=99t think this is actually true. Somehow the x86 oddities abo= ut=20 > CR3 writes leaked too much into the membarrier core code and comments.=20 > (I doubt this is x86 specific. The actual x86 specific part seems to=20 > be that we can return to user mode without syncing the instruction=20 > stream.) >=20 > As far as I can tell, membarrier doesn=E2=80=99t care at all about lazine= ss.=20 > Membarrier cares about rq->curr->mm. The fact that a cpu can switch=20 > its actual loaded mm without scheduling at all (on x86 at least) is=20 > entirely beside the point except insofar as it has an effect on=20 > whether a subsequent switch_mm() call serializes. Core membarrier itself doesn't care about laziness, which is why the membarrier flush should go in exit_lazy_tlb() or other x86 specific code (at least until more architectures did the same thing and we moved it into generic code). I just meant this non-serialising return as=20 documented in the membarrier arch enablement doc specifies the lazy tlb requirement. If an mm was lazy tlb for a kernel thread and then it becomes unlazy, and if switch_mm is serialising but return to user is not, then you need a serialising instruction somewhere before return to user. unlazy is the logical place to add that, because the lazy tlb mm (i.e.,=20 switching to a kernel thread and back without switching mm) is what=20 opens the hole. > If we notify=20 > membarrier about x86=E2=80=99s asynchronous CR3 writes, then membarrier n= eeds=20 > to understand what to do with them, which results in an unmaintainable=20 > mess in membarrier *and* in the x86 code. How do you mean? exit_lazy_tlb is the opposite, core scheduler notifying arch code about when an mm becomes not-lazy, and nothing to do with membarrier at all even. It's a convenient hook to do your un-lazying. I guess you can do it also checking things in switch_mm and keeping state in arch code, I don't think that's necessarily the best place to put it. So membarrier code is unchanged (it cares that the serialise is done at un-lazy time), core code is simpler (no knowledge of this membarrier=20 quirk and it already knows about lazy-tlb so the calls actually improve=20 the documentation), and x86 code I would argue becomes nicer (or no real difference at worst) because you can move some exit lazy tlb handling to that specific call rather than decipher it from switch_mm. >=20 > I=E2=80=99m currently trying to document how membarrier actually works, a= nd=20 > hopefully this will result in untangling membarrier from mmdrop() and=20 > such. That would be nice. >=20 > A silly part of this is that x86 already has a high quality=20 > implementation of most of membarrier(): flush_tlb_mm(). If you flush=20 > an mm=E2=80=99s TLB, we carefully propagate the flush to all threads, wit= h=20 > attention to memory ordering. We can=E2=80=99t use this directly as an=20 > arch-specific implementation of membarrier because it has the annoying=20 > side affect of flushing the TLB and because upcoming hardware might be=20 > able to flush without guaranteeing a core sync. (Upcoming means Zen=20 > 3, but the Zen 3 implementation is sadly not usable by Linux.) >=20 A hardware broadcast TLB flush, you mean? What makes it unusable by=20 Linux out of curiosity?