linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>, Linux-MM <linux-mm@kvack.org>
Cc: Nicholas Piggin <npiggin@gmail.com>,
	Anton Blanchard <anton@ozlabs.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@ozlabs.org>,
	Randy Dunlap <rdunlap@infradead.org>,
	linux-arch <linux-arch@vger.kernel.org>,
	x86@kernel.org, Rik van Riel <riel@surriel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Nadav Amit <nadav.amit@gmail.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Andy Lutomirski <luto@kernel.org>
Subject: [PATCH 01/23] membarrier: Document why membarrier() works
Date: Sat,  8 Jan 2022 08:43:46 -0800	[thread overview]
Message-ID: <d64b6651fe8799481c6204e43b17f81010018345.1641659630.git.luto@kernel.org> (raw)
In-Reply-To: <cover.1641659630.git.luto@kernel.org>

We had a nice comment at the top of membarrier.c explaining why membarrier
worked in a handful of scenarios, but that consisted more of a list of
things not to forget than an actual description of the algorithm and why it
should be expected to work.

Add a comment explaining my understanding of the algorithm.  This exposes a
couple of implementation issues that I will hopefully fix up in subsequent
patches.

Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 kernel/sched/membarrier.c | 60 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 58 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index b5add64d9698..30e964b9689d 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -7,8 +7,64 @@
 #include "sched.h"
 
 /*
- * For documentation purposes, here are some membarrier ordering
- * scenarios to keep in mind:
+ * The basic principle behind the regular memory barrier mode of
+ * membarrier() is as follows.  membarrier() is called in one thread.  Tt
+ * iterates over all CPUs, and, for each CPU, it either sends an IPI to
+ * that CPU or it does not. If it sends an IPI, then we have the
+ * following sequence of events:
+ *
+ * 1. membarrier() does smp_mb().
+ * 2. membarrier() does a store (the IPI request payload) that is observed by
+ *    the target CPU.
+ * 3. The target CPU does smp_mb().
+ * 4. The target CPU does a store (the completion indication) that is observed
+ *    by membarrier()'s wait-for-IPIs-to-finish request.
+ * 5. membarrier() does smp_mb().
+ *
+ * So all pre-membarrier() local accesses are visible after the IPI on the
+ * target CPU and all pre-IPI remote accesses are visible after
+ * membarrier(). IOW membarrier() has synchronized both ways with the target
+ * CPU.
+ *
+ * (This has the caveat that membarrier() does not interrupt the CPU that it's
+ * running on at the time it sends the IPIs. However, if that is the CPU on
+ * which membarrier() starts and/or finishes, membarrier() does smp_mb() and,
+ * if not, then the scheduler's migration of membarrier() is a full barrier.)
+ *
+ * membarrier() skips sending an IPI only if membarrier() sees
+ * cpu_rq(cpu)->curr->mm != target mm.  The sequence of events is:
+ *
+ *           membarrier()            |          target CPU
+ * ---------------------------------------------------------------------
+ *                                   | 1. smp_mb()
+ *                                   | 2. set rq->curr->mm = other_mm
+ *                                   |    (by writing to ->curr or to ->mm)
+ * 3. smp_mb()                       |
+ * 4. read rq->curr->mm == other_mm  |
+ * 5. smp_mb()                       |
+ *                                   | 6. rq->curr->mm = target_mm
+ *                                   |    (by writing to ->curr or to ->mm)
+ *                                   | 7. smp_mb()
+ *                                   |
+ *
+ * All memory accesses on the target CPU prior to scheduling are visible
+ * to membarrier()'s caller after membarrier() returns due to steps 1, 2, 4
+ * and 5.
+ *
+ * All memory accesses by membarrier()'s caller prior to membarrier() are
+ * visible to the target CPU after scheduling due to steps 3, 4, 6, and 7.
+ *
+ * Note that, tasks can change their ->mm, e.g. via kthread_use_mm().  So
+ * tasks that switch their ->mm must follow the same rules as the scheduler
+ * changing rq->curr, and the membarrier() code needs to do both dereferences
+ * carefully.
+ *
+ * GLOBAL_EXPEDITED support works the same way except that all references
+ * to rq->curr->mm are replaced with references to rq->membarrier_state.
+ *
+ *
+ * Specific examples of how this produces the documented properties of
+ * membarrier():
  *
  * A) Userspace thread execution after IPI vs membarrier's memory
  *    barrier before sending the IPI
-- 
2.33.1



  reply	other threads:[~2022-01-08 16:44 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-08 16:43 [PATCH 00/23] mm, sched: Rework lazy mm handling Andy Lutomirski
2022-01-08 16:43 ` Andy Lutomirski [this message]
2022-01-12 15:30   ` [PATCH 01/23] membarrier: Document why membarrier() works Mathieu Desnoyers
2022-01-08 16:43 ` [PATCH 02/23] x86/mm: Handle unlazying membarrier core sync in the arch code Andy Lutomirski
2022-01-12 15:40   ` Mathieu Desnoyers
2022-01-08 16:43 ` [PATCH 03/23] membarrier: Remove membarrier_arch_switch_mm() prototype in core code Andy Lutomirski
2022-01-08 16:43 ` [PATCH 04/23] membarrier: Make the post-switch-mm barrier explicit Andy Lutomirski
2022-01-12 15:52   ` Mathieu Desnoyers
2022-01-08 16:43 ` [PATCH 06/23] powerpc/membarrier: Remove special barrier on mm switch Andy Lutomirski
2022-01-10  8:42   ` Christophe Leroy
2022-01-12 15:57   ` Mathieu Desnoyers
2022-01-08 16:43 ` [PATCH 07/23] membarrier: Rewrite sync_core_before_usermode() and improve documentation Andy Lutomirski
2022-01-12 16:11   ` Mathieu Desnoyers
2022-01-08 16:43 ` [PATCH 08/23] membarrier: Remove redundant clear of mm->membarrier_state in exec_mmap() Andy Lutomirski
2022-01-12 16:13   ` Mathieu Desnoyers
2022-01-08 16:43 ` [PATCH 09/23] membarrier: Fix incorrect barrier positions during exec and kthread_use_mm() Andy Lutomirski
2022-01-12 16:30   ` Mathieu Desnoyers
2022-01-12 17:08     ` Mathieu Desnoyers
2022-01-08 16:43 ` [PATCH 10/23] x86/events, x86/insn-eval: Remove incorrect active_mm references Andy Lutomirski
2022-01-08 16:43 ` [PATCH 11/23] sched/scs: Initialize shadow stack on idle thread bringup, not shutdown Andy Lutomirski
2022-01-10 22:06   ` Sami Tolvanen
2022-01-08 16:43 ` [PATCH 12/23] Rework "sched/core: Fix illegal RCU from offline CPUs" Andy Lutomirski
2022-01-08 16:43 ` [PATCH 13/23] exec: Remove unnecessary vmacache_seqnum clear in exec_mmap() Andy Lutomirski
2022-01-08 16:43 ` [PATCH 14/23] sched, exec: Factor current mm changes out from exec Andy Lutomirski
2022-01-08 16:44 ` [PATCH 15/23] kthread: Switch to __change_current_mm() Andy Lutomirski
2022-01-08 16:44 ` [PATCH 16/23] sched: Use lightweight hazard pointers to grab lazy mms Andy Lutomirski
2022-01-08 19:22   ` Linus Torvalds
2022-01-08 22:04     ` Andy Lutomirski
2022-01-09  0:27       ` Linus Torvalds
2022-01-09  0:53       ` Linus Torvalds
2022-01-09  3:58         ` Andy Lutomirski
2022-01-09  4:38           ` Linus Torvalds
2022-01-09 20:19             ` Andy Lutomirski
2022-01-09 20:48               ` Linus Torvalds
2022-01-09 21:51                 ` Linus Torvalds
2022-01-10  0:52                   ` Andy Lutomirski
2022-01-10  2:36                     ` Rik van Riel
2022-01-10  3:51                       ` Linus Torvalds
     [not found]                   ` <1641790309.2vqc26hwm3.astroid@bobo.none>
     [not found]                     ` <1641791321.kvkq0n8kbq.astroid@bobo.none>
2022-01-10 17:19                       ` Linus Torvalds
2022-01-10 20:52                     ` Andy Lutomirski
2022-01-11  3:10                       ` Nicholas Piggin
2022-01-11 15:39                         ` Andy Lutomirski
2022-01-11 22:48                           ` Nicholas Piggin
2022-01-11 10:39                 ` Will Deacon
2022-01-11 15:22                   ` Andy Lutomirski
2022-01-09  5:56   ` Nadav Amit
2022-01-09  6:48     ` Linus Torvalds
2022-01-09  8:49       ` Nadav Amit
2022-01-09 19:10         ` Linus Torvalds
2022-01-09 19:52           ` Andy Lutomirski
2022-01-09 20:00             ` Linus Torvalds
2022-01-09 20:34             ` Nadav Amit
2022-01-09 20:48               ` Andy Lutomirski
2022-01-09 19:22         ` Rik van Riel
2022-01-09 19:34           ` Nadav Amit
2022-01-09 19:37             ` Rik van Riel
2022-01-09 19:51               ` Nadav Amit
2022-01-09 19:54                 ` Linus Torvalds
2022-01-08 16:44 ` [PATCH 17/23] x86/mm: Make use/unuse_temporary_mm() non-static Andy Lutomirski
2022-01-08 16:44 ` [PATCH 18/23] x86/mm: Allow temporary mms when IRQs are on Andy Lutomirski
2022-01-08 16:44 ` [PATCH 19/23] x86/efi: Make efi_enter/leave_mm use the temporary_mm machinery Andy Lutomirski
2022-01-10 13:13   ` Ard Biesheuvel
2022-01-08 16:44 ` [PATCH 20/23] x86/mm: Remove leave_mm() in favor of unlazy_mm_irqs_off() Andy Lutomirski
2022-01-08 16:44 ` [PATCH 21/23] x86/mm: Use unlazy_mm_irqs_off() in TLB flush IPIs Andy Lutomirski
2022-01-08 16:44 ` [PATCH 22/23] x86/mm: Optimize for_each_possible_lazymm_cpu() Andy Lutomirski
2022-01-08 16:44 ` [PATCH 23/23] x86/mm: Opt in to IRQs-off activate_mm() Andy Lutomirski
     [not found] ` <e6e7c11c38a3880e56fb7dfff4fa67090d831a3b.1641659630.git.luto@kernel.org>
2022-01-12 15:55   ` [PATCH 05/23] membarrier, kthread: Use _ONCE accessors for task->mm Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d64b6651fe8799481c6204e43b17f81010018345.1641659630.git.luto@kernel.org \
    --to=luto@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=anton@ozlabs.org \
    --cc=benh@kernel.crashing.org \
    --cc=dave.hansen@intel.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=nadav.amit@gmail.com \
    --cc=npiggin@gmail.com \
    --cc=paulus@ozlabs.org \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=riel@surriel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox