From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D884EC433DF for ; Thu, 16 Jul 2020 04:15:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 739EA20775 for ; Thu, 16 Jul 2020 04:15:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="jQAGzKs4" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 739EA20775 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C696F6B0003; Thu, 16 Jul 2020 00:15:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BCC236B0005; Thu, 16 Jul 2020 00:15:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A6CDE6B0008; Thu, 16 Jul 2020 00:15:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0178.hostedemail.com [216.40.44.178]) by kanga.kvack.org (Postfix) with ESMTP id 89FC86B0003 for ; Thu, 16 Jul 2020 00:15:10 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 18F4C180AD804 for ; Thu, 16 Jul 2020 04:15:10 +0000 (UTC) X-FDA: 77042623980.02.coast04_040b09d26eff Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin02.hostedemail.com (Postfix) with ESMTP id 00BA21000AF38C79 for ; Thu, 16 Jul 2020 04:15:09 +0000 (UTC) X-HE-Tag: coast04_040b09d26eff X-Filterd-Recvd-Size: 8110 Received: from mail-wr1-f65.google.com (mail-wr1-f65.google.com [209.85.221.65]) by imf45.hostedemail.com (Postfix) with ESMTP for ; Thu, 16 Jul 2020 04:15:09 +0000 (UTC) Received: by mail-wr1-f65.google.com with SMTP id z15so5464385wrl.8 for ; Wed, 15 Jul 2020 21:15:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:subject:to:cc:references:in-reply-to:mime-version :message-id:content-transfer-encoding; bh=/3B9A674EQfjQa8s5j2tFu+PNv0/P6AJozav9RBzKBs=; b=jQAGzKs40+u5PfuX4oC97UuJZrMs4G6Aq6m+2YC+TGqpgpTzShrfzi12bv1RXsMI7J 1Om7DcMpi2okDxe33FH2+wQgNuwie8n39E9qq+kR7bvyZg3ku9eFeBBrPCLSgBL2ZVLA JoMWOpT9lZ5O/zAQu/FRvhcP28sSLrGaUv9ajEX1ZH+NakX7on24mq99R+WaMQ3ChQ4G l8y7DIs8EdMUX7lh8khpaiy6TgzpvK0LWbpeOLeBWwVtCjAdo4iuRQiLIop6owvFwcI4 b0mLYecXgTQBcstsrz/QN80nX3y54h9coJr39Dg0XtiHOThblyJZnohqYClM4HVwjntl +YQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:subject:to:cc:references:in-reply-to :mime-version:message-id:content-transfer-encoding; bh=/3B9A674EQfjQa8s5j2tFu+PNv0/P6AJozav9RBzKBs=; b=pFd2zF7rBKxPGJwqCl9kYE5q9IXSDvdRNlBN8vh6ejkRvRu2U3ulgFYNc+nwQhbqnj NGBhSwC9oPLkpkTxF4wdkwj8e5GzdnKUGksahzyqFm1jsIKNkbf54tBD23pBDzU3zssw WzqFZw1e1paMuOdWUekOkJWONmIBfP/7ijNdEzHVukxXGmSxZ0z7hf7f6oFNIQ4zHMHt G/9u4pE8+7D5AlHwDJnviKe3V0ND4yYDRR+MLWOggOQW9GArXrK81tAG2Mg/iEjRv/cK Oqdx9BVcKbhRx+q2Sdsx6XyXo6YVt4i99sQF2nrrnpFmm16uGf0T4oe7xVczvsLZPKts rUxA== X-Gm-Message-State: AOAM530p8+uNJo3E6gnhSlzFsnZhf2ehXFwCxC2QPlrZrpOLF9xjSAIp WVmCjeUkVnaVerKPhqRcoMs= X-Google-Smtp-Source: ABdhPJxZLZohu0lb79Fzv4Qv7qvyRt8Xt9bGz8aNabW4ubonLDft59n3+6lq6IRt8CxL9c1y7gJAdw== X-Received: by 2002:adf:c142:: with SMTP id w2mr2802839wre.337.1594872908107; Wed, 15 Jul 2020 21:15:08 -0700 (PDT) Received: from localhost (110-174-173-27.tpgi.com.au. [110.174.173.27]) by smtp.gmail.com with ESMTPSA id x18sm6694844wrq.13.2020.07.15.21.15.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 Jul 2020 21:15:07 -0700 (PDT) Date: Thu, 16 Jul 2020 14:15:00 +1000 From: Nicholas Piggin Subject: Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode To: Mathieu Desnoyers Cc: Anton Blanchard , Arnd Bergmann , linux-arch , linux-kernel , linux-mm , linuxppc-dev , Andy Lutomirski , Peter Zijlstra , x86 References: <20200710015646.2020871-1-npiggin@gmail.com> <20200710015646.2020871-5-npiggin@gmail.com> <1594613902.1wzayj0p15.astroid@bobo.none> <1594647408.wmrazhwjzb.astroid@bobo.none> <284592761.9860.1594649601492.JavaMail.zimbra@efficios.com> In-Reply-To: <284592761.9860.1594649601492.JavaMail.zimbra@efficios.com> MIME-Version: 1.0 Message-Id: <1594868476.6k5kvx8684.astroid@bobo.none> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 00BA21000AF38C79 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Excerpts from Mathieu Desnoyers's message of July 14, 2020 12:13 am: > ----- On Jul 13, 2020, at 9:47 AM, Nicholas Piggin npiggin@gmail.com wrot= e: >=20 >> Excerpts from Nicholas Piggin's message of July 13, 2020 2:45 pm: >>> Excerpts from Andy Lutomirski's message of July 11, 2020 3:04 am: >>>> Also, as it stands, I can easily see in_irq() ceasing to promise to >>>> serialize. There are older kernels for which it does not promise to >>>> serialize. And I have plans to make it stop serializing in the >>>> nearish future. >>>=20 >>> You mean x86's return from interrupt? Sounds fun... you'll konw where t= o >>> update the membarrier sync code, at least :) >>=20 >> Oh, I should actually say Mathieu recently clarified a return from >> interrupt doesn't fundamentally need to serialize in order to support >> membarrier sync core. >=20 > Clarification to your statement: >=20 > Return from interrupt to kernel code does not need to be context serializ= ing > as long as kernel serializes before returning to user-space. >=20 > However, return from interrupt to user-space needs to be context serializ= ing. Hmm, I'm not sure it's enough even with the sync in the exit_lazy_tlb in the right places. A kernel thread does a use_mm, then it blocks and the user process with the same mm runs on that CPU, and then it calls into the kernel, blocks, the kernel thread runs again, another CPU issues a membarrier which does not IPI this one because it's running a kthread, and then the kthread switches back to the user process (still without having unused the mm), and then the user process returns from syscall without having done a=20 core synchronising instruction. The cause of the problem is you want to avoid IPI'ing kthreads. Why? I'm guessing it really only matters as an optimisation in case of idle threads. Idle thread is easy (well, easier) because it won't use_mm, so=20 you could check for rq->curr =3D=3D rq->idle in your loop (in a suitable=20 sched accessor function). But... I'm not really liking this subtlety in the scheduler for all this=20 (the scheduler still needs the barriers when switching out of idle). Can it be improved somehow? Let me forget x86 core sync problem for now (that _may_ be a bit harder), and step back and look at what we're doing. The memory barrier case would actually suffer from the same problem as core sync, because in the same situation it has no implicit mmdrop in the scheduler switch code either. So what are we doing with membarrier? We want any activity caused by the=20 set of CPUs/threads specified that can be observed by this thread before=20 calling membarrier is appropriately fenced from activity that can be=20 observed to happen after the call returns. CPU0 CPU1 1. user stuff a. membarrier() 2. enter kernel b. read rq->curr 3. rq->curr switched to kthread c. is kthread, skip IPI 4. switch_to kthread d. return to user 5. rq->curr switched to user thread 6. switch_to user thread 7. exit kernel 8. more user stuff As far as I can see, the problem is CPU1 might reorder step 5 and step 8, so you have mmdrop of lazy mm be a mb after step 6. But why? The membarrier call only cares that there is a full barrier between 1 and 8, right? Which it will get from the previous context switch to the kthread. I must say the memory barrier comments in membarrier could be improved a bit (unless I'm missing where the main comment is). It's fine to know what barriers pair with one another, but we need to know which exact memory accesses it is ordering /* * Matches memory barriers around rq->curr modification in * scheduler. */ Sure, but it doesn't say what else is being ordered. I think it's just the user memory accesses, but would be nice to make that a bit more explicit. If we had such comments then we might know this case is safe. I think the funny powerpc barrier is a similar case of this. If we ever see remote_rq->curr->flags & PF_KTHREAD, then we _know_ that CPU has or will have issued a memory barrier between running user code. So AFAIKS all this membarrier stuff in kernel/sched/core.c could just go away. Except x86 because thread switch doesn't imply core sync, so CPU1 between 1 and 8 may never issue a core sync instruction the same way a context switch must be a full mb. Before getting to x86 -- Am I right, or way off track here?=20 Thanks, Nick