From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ABB15C001DC for ; Thu, 27 Jul 2023 14:40:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 055336B0071; Thu, 27 Jul 2023 10:40:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 006636B0074; Thu, 27 Jul 2023 10:40:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E0EFD6B0075; Thu, 27 Jul 2023 10:40:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id CE46F6B0071 for ; Thu, 27 Jul 2023 10:40:14 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 85DBE40562 for ; Thu, 27 Jul 2023 14:40:14 +0000 (UTC) X-FDA: 81057651948.19.750072F Received: from mail-wm1-f45.google.com (mail-wm1-f45.google.com [209.85.128.45]) by imf26.hostedemail.com (Postfix) with ESMTP id 843EA140008 for ; Thu, 27 Jul 2023 14:40:12 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=W8gG6bj4; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf26.hostedemail.com: domain of jannh@google.com designates 209.85.128.45 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690468812; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wy3ot5QcIIA3cfo4DH/Mcif4JvZoxXny17t2aC1892Y=; b=l1ejzoLwUYFTUU9ZKPuICTTfIkcT4fEPqwSMkzh3vepvV6rokJhnRX8RG5yYLNDwz2tqpI ayhJFicW6sZuYPOYZC3tvJ0s/rMLZyo1pZ+huhzO/KkaSHtvQ0jQXxCqBszLOPemnIzWJs GB7NPS319frDkE7BZmHMBaoVogRRuKI= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=W8gG6bj4; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf26.hostedemail.com: domain of jannh@google.com designates 209.85.128.45 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690468812; a=rsa-sha256; cv=none; b=5+UK3iVs1DQmq7/rIeKpHoOgsRb+mrwChFTWXwxWkwulthMg9A0c7tHvqNhxmll4jCNtiF RUWaIYr8sxNZyrK12x8UgReDRDBgZWcPWgVGf169+EeOnROgiRk67uPiBSw6DggfTH1chS ZLBqH+bydnngqWILemEQzmJQv0AMHQY= Received: by mail-wm1-f45.google.com with SMTP id 5b1f17b1804b1-3fd28ae8b90so73665e9.1 for ; Thu, 27 Jul 2023 07:40:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1690468811; x=1691073611; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=wy3ot5QcIIA3cfo4DH/Mcif4JvZoxXny17t2aC1892Y=; b=W8gG6bj40pYSiJWmg0l6d6qbgvSnGWdsy5vFfsA4gtUo9YqAVT4cnZwCMH5WxY5JFp +fZGxgdjUhm/1U3bA71MnBvXQGrFd3QIrDGBMJoaqL0220+sBaM1SaWhOzIYnGOVkUEN pIh3YWjA06CY5cf+dqHXDeRZ5HkTHTR9wPEGkDltvJPzChaj9a1p0evvggs7ZRr3flkP K0NLQY5RY+nLw3NPKw5oTAeG1q6DxQJo3D9Zo16Gh6P47smMIFa4vr0y/hin2vvpG9fA tbSqMjOvQ5hlN9V1kc0mT6yuLwCxyvW6ZrVWh5BYyt1BiLGRgnKuQIVhatupgjI+JEaP 3qlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690468811; x=1691073611; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wy3ot5QcIIA3cfo4DH/Mcif4JvZoxXny17t2aC1892Y=; b=HyoqPRIUHEeW7hDe0nOPnS6WNesflQLoM56W8V3uyqg+XiVaRHQIb4tH2l16GoMAE5 da8mkOhR9PaQPURC/ny+i565TwDwegg9T7jQpRDS/Z4Lxf9zVm8ulZyjh/UW5FlMUVBx knz8M91X7AgKNVJrtfmbxRezsvCvZLqyDPUbX+yPN+PEG7Yp8vvh26VpVa2h0Zh11HnF oHwr74C5gOH3sBhX52SWcPU4ta71mCh0wAgo18pqGkN4vEY3x7nzP9mhp45h3wgRmunk b+ATCWpDMDjS1FMq1R/rZaoRgFl/1Mrhf1kGhyG/IxT5HqfIKsG8TxYv6FhgnvYMVRta B8yw== X-Gm-Message-State: ABy/qLZqF099xe2I8UHvA4DDctD44pTFrk2kI3tKAQ/e0vO7CHVeMAKQ mROIXzfiBtOKyA4qiRQCpuJcwTsLaLoNKOSFVRGYHA== X-Google-Smtp-Source: APBJJlGoKupT7bYrhLOrGrQZmXu42rkmk1BYWAGKbC0h2ROAaqiUpU+j7JdfoA6S2Xtip8uNKxV9v7h9ABXhiphZ4S4= X-Received: by 2002:a05:600c:880c:b0:3fc:75d:8f85 with SMTP id gy12-20020a05600c880c00b003fc075d8f85mr101578wmb.6.1690468810800; Thu, 27 Jul 2023 07:40:10 -0700 (PDT) MIME-Version: 1.0 References: <20230726214103.3261108-1-jannh@google.com> <31df93bd-4862-432c-8135-5595ffd2bd43@paulmck-laptop> In-Reply-To: <31df93bd-4862-432c-8135-5595ffd2bd43@paulmck-laptop> From: Jann Horn Date: Thu, 27 Jul 2023 16:39:34 +0200 Message-ID: Subject: Re: [PATCH 0/2] fix vma->anon_vma check for per-VMA locking; fix anon_vma memory ordering To: paulmck@kernel.org Cc: Andrew Morton , Linus Torvalds , Peter Zijlstra , Suren Baghdasaryan , Matthew Wilcox , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Alan Stern , Andrea Parri , Will Deacon , Boqun Feng , Nicholas Piggin , David Howells , Jade Alglave , Luc Maranget , Akira Yokosawa , Daniel Lustig , Joel Fernandes Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 843EA140008 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: agghzmcjgrqt893s8hqdpqwcszkfbwqd X-HE-Tag: 1690468812-310392 X-HE-Meta: U2FsdGVkX18cgxs6oCKO2cLOuioOnhenzX+H3qhN0fbUZSBQWeGSDKJY+Mm+kgRXA2Rgt0pr2IgkpJJlN8Z1TFCHvF6JDX7rX+WXWUK1P0KpAiIG+AXsS5oJQx30y8pOQIGniYzRZWLA8ISJBJvW4/CwcR1xmQ3ZGew97kILwV2es36Z5FRjKp1xyeVCx8TawSIKP+zySxKga/0n7ioIOadSSaBALY860O0N/XJ+nd7jPq3XYitttapblyEwTv9+oI0RxP4YjyFPwhoG7LjBvV17sjeO6+H+cfphWF99jevDohCpfIpuCNMtbpvdEd7zA5MR0V0Z9G4NZkCYSsKWWCgUJqNEM+lr1M0MZ46jeh7JhbDj95wE2lH2+DqaMABoB7kwnc7uOZfKW3cAxDEZ2IKwg9LKpMgBZ/qlk+Idt43hroBrnDcTkXUW5fKN/ZIpsSDZFre8aH53qE1/AHWjxEo7+McY5AR2ybL+3m9W6bTpDQB8cLQ0Z3kMru0eYjELWkzGN7rLNoIgLc0OUY+ubMHpolhb+q2kCKyVdt5z98oUB2XWQI8lhmNPyRblZmiAVIrYYVipToz3Q25yvZ/0DdgK8N+0Epvn6aEMmorZlEdN7LIQQm+O+94wM6YmBwgUGQmmvC3OKocXee3OxGyD0WPjMe5oAoJaI4zhUEYkjhlHMRa+rfpDgHWiAzoHw/E1VsZj/edgXOS0O7KgqeNvZFIz07V3jYX4alPf65mIzb2pi8g5BZtHx8BF9gH+eofnXygdo31exq40SSncK/zzTEmbvJ/AYLaAwCxCvRVZcre5qTfPrIas8hYWg+1TaWq/NxIkr+HU9iGmH2LgLOcA/6NJ5t+hNbp4cQHoEf0QbdZiJlXTN02WTLfGah34wOytpRwxrXePZa589nM7xVB9jYI/PvSPcx7Bc3RPGDklF2AzMve1ks2ytrMts2YqYSS06xFK49qL1o5BlBfHjsm 3O/tVp9i 1nuwGfGLIeCxnaFvmjWgTPpHWTK+hW99je+Qz/SudOMVHl8YpUu8rSSEZIhB8qBOfa446rzJM9FaMOz0+UMXPlHatWgIvbhJuVXXySHSuxU07hr/oKUVOsrIYW2D62oTjl5ukPZwrbOGEsFX62+62/g1FVgU9axd/yae9Vrj7uXQ6+ZQjn3bGvDAZhiSJUiZOi/F5utrWWTKEZnQ1QbehI5WHtmPGVGP15mKq02suIr4CQy+0QKMOoxzwqftiFTR5HsJQlsfbAP9yZtAgUMixGCursPJaf23Fkek1IEQadTJOA2H/I36KxpBK3WqgCSu8smTDkdUCXQliqHzPdDsZ5CxT0yDUePHWPv+s+A3p8r68C5HFVYee0YpUS0ijeQdfc6oLe/6MluTZl4wfH/i8U4tccA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jul 27, 2023 at 1:19=E2=80=AFAM Paul E. McKenney wrote: > > On Wed, Jul 26, 2023 at 11:41:01PM +0200, Jann Horn wrote: > > Hi! > > > > Patch 1 here is a straightforward fix for a race in per-VMA locking cod= e > > that can lead to use-after-free; I hope we can get this one into > > mainline and stable quickly. > > > > Patch 2 is a fix for what I believe is a longstanding memory ordering > > issue in how vma->anon_vma is used across the MM subsystem; I expect > > that this one will have to go through a few iterations of review and > > potentially rewrites, because memory ordering is tricky. > > (If someone else wants to take over patch 2, I would be very happy.) > > > > These patches don't really belong together all that much, I'm just > > sending them as a series because they'd otherwise conflict. > > > > I am CCing: > > > > - Suren because patch 1 touches his code > > - Matthew Wilcox because he is also currently working on per-VMA > > locking stuff > > - all the maintainers/reviewers for the Kernel Memory Consistency Mode= l > > so they can help figure out the READ_ONCE() vs smp_load_acquire() > > thing > > READ_ONCE() has weaker ordering properties than smp_load_acquire(). > > For example, given a pointer gp: > > p =3D whichever(gp); > a =3D 1; > r1 =3D p->b; > if ((uintptr_t)p & 0x1) > WRITE_ONCE(b, 1); > WRITE_ONCE(c, 1); > > Leaving aside the "&" needed by smp_load_acquire(), if "whichever" is > "READ_ONCE", then the load from p->b and the WRITE_ONCE() to "b" are > ordered after the load from gp (the former due to an address dependency > and the latter due to a (fragile) control dependency). The compiler > is within its rights to reorder the store to "a" to precede the load > from gp. The compiler is forbidden from reordering the store to "c" > wtih the load from gp (because both are volatile accesses), but the CPU > is completely within its rights to do this reordering. > > But if "whichever" is "smp_load_acquire()", all four of the subsequent > memory accesses are ordered after the load from gp. > > Similarly, for WRITE_ONCE() and smp_store_release(): > > p =3D READ_ONCE(gp); > r1 =3D READ_ONCE(gi); > r2 =3D READ_ONCE(gj); > a =3D 1; > WRITE_ONCE(b, 1); > if (r1 & 0x1) > whichever(p->q, r2); > > Again leaving aside the "&" needed by smp_store_release(), if "whichever" > is WRITE_ONCE(), then the load from gp, the load from gi, and the load > from gj are all ordered before the store to p->q (by address dependency, > control dependency, and data dependency, respectively). The store to "a" > can be reordered with the store to p->q by the compiler. The store to > "b" cannot be reordered with the store to p->q by the compiler (again, > both are volatile), but the CPU is free to reorder them, especially when > whichever() is implemented as a conditional store. > > But if "whichever" is "smp_store_release()", all five of the earlier > memory accesses are ordered before the store to p->q. > > Does that help, or am I missing the point of your question? My main question is how permissible/ugly you think the following use of READ_ONCE() would be, and whether you think it ought to be an smp_load_acquire() instead. Assume that we are holding some kind of lock that ensures that the only possible concurrent update to "vma->anon_vma" is that it changes from a NULL pointer to a non-NULL pointer (using smp_store_release()). if (READ_ONCE(vma->anon_vma) !=3D NULL) { // we now know that vma->anon_vma cannot change anymore // access the same memory location again with a plain load struct anon_vma *a =3D vma->anon_vma; // this needs to be address-dependency-ordered against one of // the loads from vma->anon_vma struct anon_vma *root =3D a->root; } Is this fine? If it is not fine just because the compiler might reorder the plain load of vma->anon_vma before the READ_ONCE() load, would it be fine after adding a barrier() directly after the READ_ONCE()? I initially suggested using READ_ONCE() for this, and then Linus and me tried to reason it out and Linus suggested (if I understood him correctly) that you could make the ugly argument that this works because loads from the same location will not be reordered by the hardware. So on anything other than alpha, we'd still have the required address-dependency ordering because that happens for all loads, even plain loads, while on alpha, the READ_ONCE() includes a memory barrier. But that argument is weirdly reliant on architecture-specific implementation details. The other option is to replace the READ_ONCE() with a smp_load_acquire(), at which point it becomes a lot simpler to show that the code is correct. > Thanx, Paul > > > - people involved in the previous discussion on the security list > > > > > > Jann Horn (2): > > mm: lock_vma_under_rcu() must check vma->anon_vma under vma lock > > mm: Fix anon_vma memory ordering > > > > include/linux/rmap.h | 15 ++++++++++++++- > > mm/huge_memory.c | 4 +++- > > mm/khugepaged.c | 2 +- > > mm/ksm.c | 16 +++++++++++----- > > mm/memory.c | 32 ++++++++++++++++++++------------ > > mm/mmap.c | 13 ++++++++++--- > > mm/rmap.c | 6 ++++-- > > mm/swapfile.c | 3 ++- > > 8 files changed, 65 insertions(+), 26 deletions(-) > > > > > > base-commit: 20ea1e7d13c1b544fe67c4a8dc3943bb1ab33e6f > > -- > > 2.41.0.487.g6d72f3e995-goog > >