From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EEDC5C0015E for ; Thu, 27 Jul 2023 19:10:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5FF9B6B0072; Thu, 27 Jul 2023 15:10:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 588016B0074; Thu, 27 Jul 2023 15:10:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 401356B0075; Thu, 27 Jul 2023 15:10:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 2B0FF6B0072 for ; Thu, 27 Jul 2023 15:10:55 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id E9B924052A for ; Thu, 27 Jul 2023 19:10:54 +0000 (UTC) X-FDA: 81058334028.23.F09AD49 Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) by imf12.hostedemail.com (Postfix) with ESMTP id 04DE34001B for ; Thu, 27 Jul 2023 19:10:52 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=Ul9bfIiB; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690485053; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iPWrH1ydAmFa/cYT+qF6NSiBlttgrD65D/tbBsSlPUQ=; b=0NmOAvQMXvuACofRcqhUP4KDiSwWHh6Y0qYDc3VlSfyuMdGRL/NNqfxo23HF0/EHtoaASx cZt/GdBiqc4bJMlVijhuvUZ3cIWhVk/+Mo2gYjWniv4PmNfiswyKkgPiXXkWVPO7BgSGr0 6uM4UY9AWMHPdQjcz4OCO2liNZuuUHs= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=Ul9bfIiB; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690485053; a=rsa-sha256; cv=none; b=bHzc2QgyYBsgblYT2JMbdP2x4Etb2ayxTTTUNLGQeHkkSQvjFeJx/UGom0Bu7xKKNebGct mjAJPPs/8RdrhpSmjznFbckFMGbzHGmg19F1GeWI4Q3XoS2+8D00I22c/CC1wTJhASLANr PSiOPcz2oiXXZIB8VTuokEJ+AdSCBEc= Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-1b8c81e36c0so8580615ad.0 for ; Thu, 27 Jul 2023 12:10:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1690485052; x=1691089852; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=iPWrH1ydAmFa/cYT+qF6NSiBlttgrD65D/tbBsSlPUQ=; b=Ul9bfIiBBtRBPSqMLgkkF+ENKfAwQfkHi8zZwlgddXtD563eoPmNQ/cW8RLR6mJQl9 9bSoErLatgAn6/YzR6X9mQVsKP3DH72WIR7bF5Adr2NLiWU42yneyDLWDVNewWz2UpFq HbNfsih9nNi1q2T84KAKW7AEIU5jMdXdf8cVvrzOtrmLPPU4aENawy9qwKaFdA1S/LPH huZg5TaLv1qH2MQY8i9JBxvDRqObw/teGMB7Vts3H8++P4wOa9NnIj/ic3poYD/TPQJ2 6ueCmJCGv7fk9r6I+guc2d95DLlO7MsKh/Z70t8u1ZpyEjvVt3UcqD6CJQ1knnYaetbY bdPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690485052; x=1691089852; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iPWrH1ydAmFa/cYT+qF6NSiBlttgrD65D/tbBsSlPUQ=; b=JGjkdS0yDYZOwnkV9CnkNwioxHsZXtDRd1/DiGmaHeOcxrvNyCkcRMenz9BXo1dUHn 68d9cTR8AXiTE3OqeUjPdWj1a0nHjNiFMYNXCxzYewufGu1TJAWso7JRA85PUHIi8UvD UqwDCeet3703yfcIdTi2yKQwLFpwcBNYhilxuMMiRJjyFXs2JC770shMXS+OFXMEhQl0 u6UbK8rVD2TTfOm5JM7hY1xtepy8k6kEqSf7QZHDtR1x8DYBJBzsaqCXPB1yk15Y2fEg Do2lXyy50HV2sPEEFt/qGuxll9AROirHDM53X07fNk7ULDa3pDgLt/YJ/yrZWwud34fJ ro4w== X-Gm-Message-State: ABy/qLZOEhwF4ynmuxLgU9rbEjI2y8M+/tAIbrSsW/QaOdBsI4w8JYpY rLBAee7xMDTm+CYWHLavlxk= X-Google-Smtp-Source: APBJJlHeTj4dwz1v/1Fh12dHHHXtuZL7PWv2+WkrGjNjiOIc4aBFIQWRAAouUQRmWmeN0w7twdEJ3w== X-Received: by 2002:a17:902:e5c3:b0:1b8:a843:815 with SMTP id u3-20020a170902e5c300b001b8a8430815mr220988plf.62.1690485051623; Thu, 27 Jul 2023 12:10:51 -0700 (PDT) Received: from smtpclient.apple (c-73-162-233-46.hsd1.ca.comcast.net. [73.162.233.46]) by smtp.gmail.com with ESMTPSA id ji20-20020a170903325400b001ac591b0500sm1988902plb.134.2023.07.27.12.10.49 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Jul 2023 12:10:51 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.600.7\)) Subject: Re: [PATCH 0/2] fix vma->anon_vma check for per-VMA locking; fix anon_vma memory ordering From: Nadav Amit In-Reply-To: <20230727145747.GB19940@willie-the-truck> Date: Thu, 27 Jul 2023 12:05:53 -0700 Cc: Jann Horn , "Paul E. McKenney" , Andrew Morton , Linus Torvalds , Peter Zijlstra , Suren Baghdasaryan , Matthew Wilcox , Linux Kernel Mailing List , linux-mm , Alan Stern , Andrea Parri , Boqun Feng , Nicholas Piggin , David Howells , Jade Alglave , Luc Maranget , Akira Yokosawa , Daniel Lustig , Joel Fernandes Content-Transfer-Encoding: quoted-printable Message-Id: <8EA729DD-F1CE-4C6F-A074-147A6A1BBCE0@gmail.com> References: <20230726214103.3261108-1-jannh@google.com> <31df93bd-4862-432c-8135-5595ffd2bd43@paulmck-laptop> <20230727145747.GB19940@willie-the-truck> To: Will Deacon X-Mailer: Apple Mail (2.3731.600.7) X-Rspamd-Queue-Id: 04DE34001B X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 78hkmu6yfraqxborur8jn6say3wobxqs X-HE-Tag: 1690485052-597369 X-HE-Meta: U2FsdGVkX18EpaHTrnyeS6MRZJXyRF503A6FYwsMICzLY7MAdo+XBfmRwXQfhdO4Lx4DNAqdGfOu+3Bo73TRo52dDJXnQQweV7tX3/1gYl9J08Q4KQ7NhpVeXSbLdzKVIWBAfYXP5kXXRFjUEkBb6xsVjLqF9GDD1FdTQK9Yq+XCSnXJEoIHolYN7vP3CB7IXS5ILaqK0+1HLyQ+uc9MeRPFBWWSxyzsRbnmqUmHKQk6YPrPr8i7neUig3qJHXmCtZ7HFSyVtJSF/E3zQDvMF2eyozqw9pm/fYAkzomYCd03IpgJLeHDiV7qUzKom/hndhuCce1y92a3mV4poVYkdI++ZbvMMgmHDetRpp3dDdajH1XLPPlpSbi/ws0dWojt8o2tYIx+mKsRkDuUOVe4aVVyxaD/0/rINobdN5aldoXdFs/eeSFQ0o3E2B4JSy+ncNSo3hkdjp5IQKHXcPufFOJf6LG6kFo6FiFA5laiMiEs0gJVtIrkFyVnBRXcfPlviGvLUfbOjNhEfNc3xWB11HFmXpG98Z7rczLhL4ZBAV1DEKNaBBv22nHBZ7LN+SScWemabqroJHVtLpXqsxN1mploT/rw3BFj7JFLJdo6JhHZVuqTM05d6MAcp9axsNfKag2BpiV45ggRXFg0UdK2ukLFKFyqRGZBT9hdktsO0J3HcN0ALWdoAQ+XUZ3z2FG6fV+dys2bIIX78EsEeLyWJ+56jU45p543CXDK3uITC5hLK15t5HCt6c8KJoCl3XqhQdOnggTjBC90PbCyGjaUZ6lbWppBXUwc/ksf/vzEqfvrei49XVtVl5UDUuRN9LaqtSRdsMgU0MFWZ0sIrs1cqYH+ISn89vmS8TaVx8fw30+EeqIhuLBCSZWnrGDKaCyNzz3sDrpHS9a4iQwuv2YVZhJZYzeB1nOvSrzVKhs3dXS8Uu6m1tb/8CSfzHsaazHNkxwXQOkoygbK7bKF3vw FSGmhrfr yyWEeDBmIaDKtk1x5155Zo2Hk8K9ZZKF48ZWfI08CFus7eLAqTln/p0J3Og2fuA9NCXoAV10BokanHBMKAXYs5nI5QD4V/9hw8Uq9AnZBsfHel7Px0XHT0e/EtLEzqkkEjBN+ip+hqW4iENBNLv8NPic7KXlhFCQJEEpjFXYZdfI/69i73DE8WJXnvn1QMgTudLfntvQ+vqRUUERD/dQkoCK/aDFQk8yLs/joxtlPPgpMQwtVe0zENAxuMi/LZ8AcqO8NZjH3Hd24ln8Sceyyn9comS5cx6pJ2JZqRVzOxOF62gtc2Y0XpT4sus9VaPJT0RBIuI2KyixTSg5XHz+AqDwLYBFIF/EDqgyQ4Rgq3wh9h2Lo7oJsQ7gGvba4JrqzD+ZRyIlvqXEYyvYjXJdd+4fpmX5nr+c2hKZPyGiPuBPcbWslv+/NZBOjjG5aB0XUYbdjzFmh+w08TNEfeNqPYbpx0XxcdN2Aor/LE/83M97OQ1+CstimSx7xh/7k7rUiGgIMlZ78y7aWAX8yZvV670gjgZeSvxsyRiO7Dvw3gSWWw6JPzketdvRsZLAws0WN47E+PBY3bpA3lruqHUZri6/QVfM34EnqPL2tvpx18arh3FmsASVaUYh9W63fFP+vkadb29QicIeLNhDLI1NkoaA/xC6HKc2CjcQf X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Jul 27, 2023, at 7:57 AM, Will Deacon wrote: >=20 > On Thu, Jul 27, 2023 at 04:39:34PM +0200, Jann Horn wrote: >> On Thu, Jul 27, 2023 at 1:19=E2=80=AFAM Paul E. McKenney = wrote: >>>=20 >>> On Wed, Jul 26, 2023 at 11:41:01PM +0200, Jann Horn wrote: >>>> Hi! >>>>=20 >>>> Patch 1 here is a straightforward fix for a race in per-VMA locking = code >>>> that can lead to use-after-free; I hope we can get this one into >>>> mainline and stable quickly. >>>>=20 >>>> Patch 2 is a fix for what I believe is a longstanding memory = ordering >>>> issue in how vma->anon_vma is used across the MM subsystem; I = expect >>>> that this one will have to go through a few iterations of review = and >>>> potentially rewrites, because memory ordering is tricky. >>>> (If someone else wants to take over patch 2, I would be very = happy.) >>>>=20 >>>> These patches don't really belong together all that much, I'm just >>>> sending them as a series because they'd otherwise conflict. >>>>=20 >>>> I am CCing: >>>>=20 >>>> - Suren because patch 1 touches his code >>>> - Matthew Wilcox because he is also currently working on per-VMA >>>> locking stuff >>>> - all the maintainers/reviewers for the Kernel Memory Consistency = Model >>>> so they can help figure out the READ_ONCE() vs smp_load_acquire() >>>> thing >>>=20 >>> READ_ONCE() has weaker ordering properties than smp_load_acquire(). >>>=20 >>> For example, given a pointer gp: >>>=20 >>> p =3D whichever(gp); >>> a =3D 1; >>> r1 =3D p->b; >>> if ((uintptr_t)p & 0x1) >>> WRITE_ONCE(b, 1); >>> WRITE_ONCE(c, 1); >>>=20 >>> Leaving aside the "&" needed by smp_load_acquire(), if "whichever" = is >>> "READ_ONCE", then the load from p->b and the WRITE_ONCE() to "b" are >>> ordered after the load from gp (the former due to an address = dependency >>> and the latter due to a (fragile) control dependency). The compiler >>> is within its rights to reorder the store to "a" to precede the load >>> from gp. The compiler is forbidden from reordering the store to "c" >>> wtih the load from gp (because both are volatile accesses), but the = CPU >>> is completely within its rights to do this reordering. >>>=20 >>> But if "whichever" is "smp_load_acquire()", all four of the = subsequent >>> memory accesses are ordered after the load from gp. >>>=20 >>> Similarly, for WRITE_ONCE() and smp_store_release(): >>>=20 >>> p =3D READ_ONCE(gp); >>> r1 =3D READ_ONCE(gi); >>> r2 =3D READ_ONCE(gj); >>> a =3D 1; >>> WRITE_ONCE(b, 1); >>> if (r1 & 0x1) >>> whichever(p->q, r2); >>>=20 >>> Again leaving aside the "&" needed by smp_store_release(), if = "whichever" >>> is WRITE_ONCE(), then the load from gp, the load from gi, and the = load >>> from gj are all ordered before the store to p->q (by address = dependency, >>> control dependency, and data dependency, respectively). The store = to "a" >>> can be reordered with the store to p->q by the compiler. The store = to >>> "b" cannot be reordered with the store to p->q by the compiler = (again, >>> both are volatile), but the CPU is free to reorder them, especially = when >>> whichever() is implemented as a conditional store. >>>=20 >>> But if "whichever" is "smp_store_release()", all five of the earlier >>> memory accesses are ordered before the store to p->q. >>>=20 >>> Does that help, or am I missing the point of your question? >>=20 >> My main question is how permissible/ugly you think the following use >> of READ_ONCE() would be, and whether you think it ought to be an >> smp_load_acquire() instead. >>=20 >> Assume that we are holding some kind of lock that ensures that the >> only possible concurrent update to "vma->anon_vma" is that it changes >> from a NULL pointer to a non-NULL pointer (using = smp_store_release()). >>=20 >>=20 >> if (READ_ONCE(vma->anon_vma) !=3D NULL) { >> // we now know that vma->anon_vma cannot change anymore >>=20 >> // access the same memory location again with a plain load >> struct anon_vma *a =3D vma->anon_vma; >>=20 >> // this needs to be address-dependency-ordered against one of >> // the loads from vma->anon_vma >> struct anon_vma *root =3D a->root; >> } >>=20 >>=20 >> Is this fine? If it is not fine just because the compiler might >> reorder the plain load of vma->anon_vma before the READ_ONCE() load, >> would it be fine after adding a barrier() directly after the >> READ_ONCE()? >=20 > I'm _very_ wary of mixing READ_ONCE() and plain loads to the same = variable, > as I've run into cases where you have sequences such as: >=20 > // Assume *ptr is initially 0 and somebody else writes it to 1 > // concurrently >=20 > foo =3D *ptr; > bar =3D READ_ONCE(*ptr); > baz =3D *ptr; >=20 > and you can get foo =3D=3D baz =3D=3D 0 but bar =3D=3D 1 because the = compiler only > ends up reading from memory twice. >=20 > That was the root cause behind f069faba6887 ("arm64: mm: Use READ_ONCE > when dereferencing pointer to pte table"), which was very unpleasant = to > debug. Interesting. I wonder if you considered adding to READ_ONCE() something like: asm volatile("" : "+g" (x) ); So later loads (such as baz =3D *ptr) would reload the updated value.