From: Andrew Morton <akpm@linux-foundation.org>
To: 1vier1@web.de, akpm@linux-foundation.org, dbueso@suse.de,
linux-mm@kvack.org, longman@redhat.com, manfred@colorfullife.com,
mm-commits@vger.kernel.org, peterz@infradead.org,
torvalds@linux-foundation.org, will.deacon@arm.com
Subject: [patch 16/67] ipc/mqueue.c: update/document memory barriers
Date: Mon, 03 Feb 2020 17:34:36 -0800 [thread overview]
Message-ID: <20200204013436.V8KUTPHdg%akpm@linux-foundation.org> (raw)
In-Reply-To: <20200203173311.6269a8be06a05e5a4aa08a93@linux-foundation.org>
From: Manfred Spraul <manfred@colorfullife.com>
Subject: ipc/mqueue.c: update/document memory barriers
Update and document memory barriers for mqueue.c:
- ewp->state is read without any locks, thus READ_ONCE is required.
- add smp_aquire__after_ctrl_dep() after the READ_ONCE, we need
acquire semantics if the value is STATE_READY.
- use wake_q_add_safe()
- document why __set_current_state() may be used:
Reading task->state cannot happen before the wake_q_add() call,
which happens while holding info->lock. Thus the spin_unlock()
is the RELEASE, and the spin_lock() is the ACQUIRE.
For completeness: there is also a 3 CPU scenario, if the to be woken
up task is already on another wake_q.
Then:
- CPU1: spin_unlock() of the task that goes to sleep is the RELEASE
- CPU2: the spin_lock() of the waker is the ACQUIRE
- CPU2: smp_mb__before_atomic inside wake_q_add() is the RELEASE
- CPU3: smp_mb__after_spinlock() inside try_to_wake_up() is the ACQUIRE
Link: http://lkml.kernel.org/r/20191020123305.14715-4-manfred@colorfullife.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Waiman Long <longman@redhat.com>
Cc: <1vier1@web.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
ipc/mqueue.c | 92 +++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 78 insertions(+), 14 deletions(-)
--- a/ipc/mqueue.c~ipc-mqueuec-update-document-memory-barriers
+++ a/ipc/mqueue.c
@@ -63,6 +63,66 @@ struct posix_msg_tree_node {
int priority;
};
+/*
+ * Locking:
+ *
+ * Accesses to a message queue are synchronized by acquiring info->lock.
+ *
+ * There are two notable exceptions:
+ * - The actual wakeup of a sleeping task is performed using the wake_q
+ * framework. info->lock is already released when wake_up_q is called.
+ * - The exit codepaths after sleeping check ext_wait_queue->state without
+ * any locks. If it is STATE_READY, then the syscall is completed without
+ * acquiring info->lock.
+ *
+ * MQ_BARRIER:
+ * To achieve proper release/acquire memory barrier pairing, the state is set to
+ * STATE_READY with smp_store_release(), and it is read with READ_ONCE followed
+ * by smp_acquire__after_ctrl_dep(). In addition, wake_q_add_safe() is used.
+ *
+ * This prevents the following races:
+ *
+ * 1) With the simple wake_q_add(), the task could be gone already before
+ * the increase of the reference happens
+ * Thread A
+ * Thread B
+ * WRITE_ONCE(wait.state, STATE_NONE);
+ * schedule_hrtimeout()
+ * wake_q_add(A)
+ * if (cmpxchg()) // success
+ * ->state = STATE_READY (reordered)
+ * <timeout returns>
+ * if (wait.state == STATE_READY) return;
+ * sysret to user space
+ * sys_exit()
+ * get_task_struct() // UaF
+ *
+ * Solution: Use wake_q_add_safe() and perform the get_task_struct() before
+ * the smp_store_release() that does ->state = STATE_READY.
+ *
+ * 2) Without proper _release/_acquire barriers, the woken up task
+ * could read stale data
+ *
+ * Thread A
+ * Thread B
+ * do_mq_timedreceive
+ * WRITE_ONCE(wait.state, STATE_NONE);
+ * schedule_hrtimeout()
+ * state = STATE_READY;
+ * <timeout returns>
+ * if (wait.state == STATE_READY) return;
+ * msg_ptr = wait.msg; // Access to stale data!
+ * receiver->msg = message; (reordered)
+ *
+ * Solution: use _release and _acquire barriers.
+ *
+ * 3) There is intentionally no barrier when setting current->state
+ * to TASK_INTERRUPTIBLE: spin_unlock(&info->lock) provides the
+ * release memory barrier, and the wakeup is triggered when holding
+ * info->lock, i.e. spin_lock(&info->lock) provided a pairing
+ * acquire memory barrier.
+ */
+
struct ext_wait_queue { /* queue of sleeping tasks */
struct task_struct *task;
struct list_head list;
@@ -646,18 +706,23 @@ static int wq_sleep(struct mqueue_inode_
wq_add(info, sr, ewp);
for (;;) {
+ /* memory barrier not required, we hold info->lock */
__set_current_state(TASK_INTERRUPTIBLE);
spin_unlock(&info->lock);
time = schedule_hrtimeout_range_clock(timeout, 0,
HRTIMER_MODE_ABS, CLOCK_REALTIME);
- if (ewp->state == STATE_READY) {
+ if (READ_ONCE(ewp->state) == STATE_READY) {
+ /* see MQ_BARRIER for purpose/pairing */
+ smp_acquire__after_ctrl_dep();
retval = 0;
goto out;
}
spin_lock(&info->lock);
- if (ewp->state == STATE_READY) {
+
+ /* we hold info->lock, so no memory barrier required */
+ if (READ_ONCE(ewp->state) == STATE_READY) {
retval = 0;
goto out_unlock;
}
@@ -923,16 +988,11 @@ static inline void __pipelined_op(struct
struct ext_wait_queue *this)
{
list_del(&this->list);
- wake_q_add(wake_q, this->task);
- /*
- * Rely on the implicit cmpxchg barrier from wake_q_add such
- * that we can ensure that updating receiver->state is the last
- * write operation: As once set, the receiver can continue,
- * and if we don't have the reference count from the wake_q,
- * yet, at that point we can later have a use-after-free
- * condition and bogus wakeup.
- */
- this->state = STATE_READY;
+ get_task_struct(this->task);
+
+ /* see MQ_BARRIER for purpose/pairing */
+ smp_store_release(&this->state, STATE_READY);
+ wake_q_add_safe(wake_q, this->task);
}
/* pipelined_send() - send a message directly to the task waiting in
@@ -1049,7 +1109,9 @@ static int do_mq_timedsend(mqd_t mqdes,
} else {
wait.task = current;
wait.msg = (void *) msg_ptr;
- wait.state = STATE_NONE;
+
+ /* memory barrier not required, we hold info->lock */
+ WRITE_ONCE(wait.state, STATE_NONE);
ret = wq_sleep(info, SEND, timeout, &wait);
/*
* wq_sleep must be called with info->lock held, and
@@ -1152,7 +1214,9 @@ static int do_mq_timedreceive(mqd_t mqde
ret = -EAGAIN;
} else {
wait.task = current;
- wait.state = STATE_NONE;
+
+ /* memory barrier not required, we hold info->lock */
+ WRITE_ONCE(wait.state, STATE_NONE);
ret = wq_sleep(info, RECV, timeout, &wait);
msg_ptr = wait.msg;
}
_
next prev parent reply other threads:[~2020-02-04 1:34 UTC|newest]
Thread overview: 95+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-04 1:33 incoming Andrew Morton
2020-02-04 1:33 ` [patch 01/67] ocfs2: fix oops when writing cloned file Andrew Morton
2020-02-04 1:33 ` [patch 02/67] mm/page_alloc.c: fix uninitialized memmaps on a partially populated last section Andrew Morton
2020-02-04 1:33 ` [patch 03/67] fs/proc/page.c: allow inspection of last section and fix end detection Andrew Morton
2020-02-04 1:33 ` [patch 04/67] mm/page_alloc.c: initialize memmap of unavailable memory directly Andrew Morton
2020-02-04 1:33 ` [patch 05/67] mm/page_alloc: fix and rework pfn handling in memmap_init_zone() Andrew Morton
2020-02-04 2:45 ` Linus Torvalds
2020-02-04 1:34 ` [patch 06/67] mm: factor out next_present_section_nr() Andrew Morton
2020-02-04 3:04 ` Linus Torvalds
2020-02-04 4:29 ` Andrew Morton
2020-02-04 8:22 ` David Hildenbrand
2020-02-04 1:34 ` [patch 07/67] mm/memmap_init: update variable name in memmap_init_zone Andrew Morton
2020-02-04 1:34 ` [patch 08/67] mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone() Andrew Morton
2020-02-04 1:34 ` [patch 09/67] mm/memory_hotplug: we always have a zone in find_(smallest|biggest)_section_pfn Andrew Morton
2020-02-04 1:34 ` [patch 10/67] mm/memory_hotplug: don't check for "all holes" in shrink_zone_span() Andrew Morton
2020-02-04 1:34 ` [patch 11/67] mm/memory_hotplug: drop local variables " Andrew Morton
2020-02-04 1:34 ` [patch 12/67] mm/memory_hotplug: cleanup __remove_pages() Andrew Morton
2020-02-04 1:34 ` [patch 13/67] mm/memory_hotplug: drop valid_start/valid_end from test_pages_in_a_zone() Andrew Morton
2020-02-04 1:34 ` [patch 14/67] smp_mb__{before,after}_atomic(): update Documentation Andrew Morton
2020-02-04 1:34 ` [patch 15/67] ipc/mqueue.c: remove duplicated code Andrew Morton
2020-02-04 1:34 ` Andrew Morton [this message]
2020-02-04 1:34 ` [patch 17/67] ipc/msg.c: update and document memory barriers Andrew Morton
2020-02-04 1:34 ` [patch 18/67] ipc/sem.c: document and update " Andrew Morton
2020-02-04 1:34 ` [patch 19/67] ipc/msg.c: consolidate all xxxctl_down() functions Andrew Morton
2020-02-04 1:34 ` [patch 20/67] drivers/block/null_blk_main.c: fix layout Andrew Morton
2020-02-04 1:34 ` [patch 21/67] drivers/block/null_blk_main.c: fix uninitialized var warnings Andrew Morton
2020-02-04 1:34 ` [patch 22/67] pinctrl: fix pxa2xx.c build warnings Andrew Morton
2020-02-04 1:34 ` [patch 23/67] mm: remove __krealloc Andrew Morton
2020-02-04 1:35 ` [patch 24/67] mm: add generic p?d_leaf() macros Andrew Morton
2020-02-04 1:35 ` [patch 25/67] arc: mm: add p?d_leaf() definitions Andrew Morton
2020-02-04 1:35 ` [patch 26/67] arm: " Andrew Morton
2020-02-04 1:35 ` [patch 27/67] arm64: " Andrew Morton
2020-02-04 1:35 ` [patch 28/67] mips: " Andrew Morton
2020-02-04 1:35 ` [patch 29/67] powerpc: " Andrew Morton
2020-02-04 1:35 ` [patch 30/67] riscv: " Andrew Morton
2020-02-04 1:35 ` [patch 31/67] s390: " Andrew Morton
2020-02-04 1:35 ` [patch 32/67] sparc: " Andrew Morton
2020-02-04 1:35 ` [patch 33/67] x86: " Andrew Morton
2020-02-04 1:35 ` [patch 34/67] mm: pagewalk: add p4d_entry() and pgd_entry() Andrew Morton
2020-02-04 1:35 ` [patch 35/67] mm: pagewalk: allow walking without vma Andrew Morton
2020-02-04 1:35 ` [patch 36/67] mm: pagewalk: don't lock PTEs for walk_page_range_novma() Andrew Morton
2020-02-04 1:35 ` [patch 37/67] mm: pagewalk: fix termination condition in walk_pte_range() Andrew Morton
2020-02-04 1:36 ` [patch 38/67] mm: pagewalk: add 'depth' parameter to pte_hole Andrew Morton
2020-02-04 1:36 ` [patch 39/67] x86: mm: point to struct seq_file from struct pg_state Andrew Morton
2020-02-04 1:36 ` [patch 40/67] x86: mm+efi: convert ptdump_walk_pgd_level() to take a mm_struct Andrew Morton
2020-02-04 1:36 ` [patch 41/67] x86: mm: convert ptdump_walk_pgd_level_debugfs() to take an mm_struct Andrew Morton
2020-02-04 1:36 ` [patch 42/67] mm: add generic ptdump Andrew Morton
2020-02-04 1:36 ` [patch 43/67] x86: mm: convert dump_pagetables to use walk_page_range Andrew Morton
2020-02-04 1:36 ` [patch 44/67] arm64: mm: convert mm/dump.c to use walk_page_range() Andrew Morton
2020-02-04 1:36 ` [patch 45/67] arm64: mm: display non-present entries in ptdump Andrew Morton
2020-02-04 1:36 ` [patch 46/67] mm: ptdump: reduce level numbers by 1 in note_page() Andrew Morton
2020-02-04 1:36 ` [patch 47/67] x86: mm: avoid allocating struct mm_struct on the stack Andrew Morton
2020-02-04 1:36 ` [patch 48/67] powerpc/mmu_gather: enable RCU_TABLE_FREE even for !SMP case Andrew Morton
2020-02-04 1:36 ` [patch 49/67] mm/mmu_gather: invalidate TLB correctly on batch allocation failure and flush Andrew Morton
2020-02-04 1:36 ` [patch 50/67] asm-generic/tlb: avoid potential double flush Andrew Morton
2020-02-04 1:36 ` [patch 51/67] asm-gemeric/tlb: remove stray function declarations Andrew Morton
2020-02-04 1:36 ` [patch 52/67] asm-generic/tlb: add missing CONFIG symbol Andrew Morton
2020-02-04 1:37 ` [patch 53/67] asm-generic/tlb: rename HAVE_RCU_TABLE_FREE Andrew Morton
2020-02-04 1:37 ` [patch 54/67] asm-generic/tlb: rename HAVE_MMU_GATHER_PAGE_SIZE Andrew Morton
2020-02-04 1:37 ` [patch 55/67] asm-generic/tlb: rename HAVE_MMU_GATHER_NO_GATHER Andrew Morton
2020-02-04 1:37 ` [patch 56/67] asm-generic/tlb: provide MMU_GATHER_TABLE_FREE Andrew Morton
2020-02-04 1:37 ` [patch 57/67] proc: decouple proc from VFS with "struct proc_ops" Andrew Morton
2020-02-04 1:37 ` [patch 58/67] proc: convert everything to " Andrew Morton
2020-02-04 1:37 ` [patch 59/67] lib/string: add strnchrnul() Andrew Morton
2020-02-04 1:37 ` [patch 60/67] bitops: more BITS_TO_* macros Andrew Morton
2020-02-04 1:37 ` [patch 61/67] lib: add test for bitmap_parse() Andrew Morton
2020-02-04 1:37 ` [patch 62/67] lib: make bitmap_parse_user a wrapper on bitmap_parse Andrew Morton
2020-02-04 1:37 ` [patch 63/67] lib: rework bitmap_parse() Andrew Morton
2020-02-04 1:37 ` [patch 64/67] lib: new testcases for bitmap_parse{_user} Andrew Morton
2020-02-04 1:37 ` [patch 65/67] include/linux/cpumask.h: don't calculate length of the input string Andrew Morton
2020-02-04 1:37 ` [patch 66/67] treewide: remove redundant IS_ERR() before error code check Andrew Morton
2020-02-04 1:37 ` [patch 67/67] ARM: dma-api: fix max_pfn off-by-one error in __dma_supported() Andrew Morton
2020-02-04 2:27 ` incoming Linus Torvalds
2020-02-04 2:46 ` incoming Andrew Morton
2020-02-04 3:11 ` incoming Linus Torvalds
2020-02-14 6:26 ` mmotm 2020-02-13-22-26 uploaded Andrew Morton
2020-02-14 16:29 ` mmotm 2020-02-13-22-26 uploaded (mm/hugetlb.c) Randy Dunlap
2020-02-14 17:18 ` Mike Kravetz
2020-02-14 20:51 ` Mina Almasry
[not found] ` <20200214204544.231482-1-almasrymina@google.com>
2020-02-14 21:00 ` [PATCH] hugetlb: fix CONFIG_CGROUP_HUGETLB ifdefs Mina Almasry
2020-02-15 1:17 ` Randy Dunlap
2020-02-15 1:56 ` Randy Dunlap
2020-02-16 20:40 ` Mina Almasry
2020-02-16 21:03 ` Mina Almasry
2020-02-17 2:48 ` Randy Dunlap
2020-02-17 2:57 ` [PATCH] hugetlb: fix <linux/hugetlb_cgroup.h> structs Randy Dunlap
2020-02-17 3:53 ` [PATCH] hugetlb: fix CONFIG_CGROUP_HUGETLB ifdefs Stephen Rothwell
2020-02-14 16:49 ` mmotm 2020-02-13-22-26 uploaded (mm/migrate.c, hugetlb_cgroup.h) Randy Dunlap
2020-02-25 3:53 ` mmotm 2020-02-24-19-53 uploaded Andrew Morton
2020-02-25 6:16 ` mmotm 2020-02-24-19-53 uploaded (init/main.c: initrd*) Randy Dunlap
2020-02-25 6:18 ` Randy Dunlap
2020-02-25 6:21 ` Randy Dunlap
2020-02-25 16:41 ` mmotm 2020-02-24-19-53 uploaded (drivers/platform/x86/intel_pmc_core.c) Randy Dunlap
2020-02-25 17:01 ` mmotm 2020-02-24-19-53 uploaded (objtool warning) Randy Dunlap
2020-02-27 21:52 ` Josh Poimboeuf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200204013436.V8KUTPHdg%akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=1vier1@web.de \
--cc=dbueso@suse.de \
--cc=linux-mm@kvack.org \
--cc=longman@redhat.com \
--cc=manfred@colorfullife.com \
--cc=mm-commits@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=torvalds@linux-foundation.org \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox