From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4822AC52D73 for ; Thu, 8 Aug 2024 22:16:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CCD7A6B0082; Thu, 8 Aug 2024 18:16:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C7D016B0088; Thu, 8 Aug 2024 18:16:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B1DF46B0095; Thu, 8 Aug 2024 18:16:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 91EC36B0082 for ; Thu, 8 Aug 2024 18:16:52 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 56A211202A0 for ; Thu, 8 Aug 2024 22:16:52 +0000 (UTC) X-FDA: 82430489064.10.B686E69 Received: from mail-ed1-f45.google.com (mail-ed1-f45.google.com [209.85.208.45]) by imf19.hostedemail.com (Postfix) with ESMTP id 818201A001D for ; Thu, 8 Aug 2024 22:16:50 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=t+nWYwiz; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf19.hostedemail.com: domain of jannh@google.com designates 209.85.208.45 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723155401; a=rsa-sha256; cv=none; b=0PjrIbRi9nYvu3LAOWLfYKPCT3+AYQBRLreY8SvUEOaCzifNTg0bDbfchAxgPcZX/SJYAc ZEqTbp98Da9sICqNTkrs2RgZyMAFiY+Ggjfp08ovYS1NyeKvKq2oaCBLb6xDumUI/+U+iB OokXNhj89HGEUL291HGhZi/0vhB/pps= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=t+nWYwiz; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf19.hostedemail.com: domain of jannh@google.com designates 209.85.208.45 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723155401; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vD633yduOKBgasPPct2yhHXxR5UbK3+yFP/Pe6LEU00=; b=Pwf9dnNjaFzQx0qY3pKONWFQ/KKo0k4Hf1oGCKlQ3KOBFvf7W2b4o47G5PFG1aYq0vzMOC 3aWutFAFgqWGWAt6qItmV2FSCa7BEyjK1S++19Y58lFbQeZvCD6KK+KFLKZTq35p9HJGIO ZxSvXpxV+PxPsiNHQdZ8blN4YaE97ac= Received: by mail-ed1-f45.google.com with SMTP id 4fb4d7f45d1cf-5a1b073d7cdso4288a12.0 for ; Thu, 08 Aug 2024 15:16:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1723155409; x=1723760209; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=vD633yduOKBgasPPct2yhHXxR5UbK3+yFP/Pe6LEU00=; b=t+nWYwizxKYMe2uFKE/vNEMtv9vmy3Ni5GC+zNE6Q4czgyA0fHGrpH/8vBsBbm1+yc 8zIhFQ8wx9IG3F9KUOlJfrQtljgLHe5o+SicrPtZLaNTwvw2drPEoy/lUtabLgPRddEM OD523dUPICA8jXhueyKYt+O1ZumFj8sN94+FPwiSD5BTOZTQ7tn32W8AcKpVRKX8YeE6 HTPs2/oxieIGkRdP/BcU5KTAoZfK+EgORfKgBSxLwOuUYsxW0fISPKJn+qv5NmrNqf2W STDuI/8ucJN5yjS8rza7M5P/ww8ufzlQRG7ERq3PIEb+wkQPbS+XLp/0fIl+6LQJ1/3P U73w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723155409; x=1723760209; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vD633yduOKBgasPPct2yhHXxR5UbK3+yFP/Pe6LEU00=; b=dySgfQPJPP9mYs3YAa8hCIxgKmJkzCDsXLCJOrBkrFlr7EBFJIuTafdtwzsBj9O10Q mHAFCpj6Ykqqytv2Ko2YlyVsitUw6XSn7NutFCtt0DVbgW/LwQDefZkctf53IO/bYUHq yRBFKw39j/8QO3xQ8NbZI06QGetGTVvolkQyRN/PCMPlGZcKGLFPDbUKCutltmmpS2xi mVi8qIYzMhfOZ1dbZJDkBc4fbd0w8UDWYRSaqd8lCyfzgVSfAFm2RuxXhqpETgDVSbJK vlGwhI06s7tZjqIgE8HbHNb4f52PfqT6Z6ANr2Oh54UKgFz2gJTC7mlJ8rRT/I12iUB+ cQKQ== X-Forwarded-Encrypted: i=1; AJvYcCUsaBvfTlADOjTBJ5svMBPrDjLAbr43/liXGKF9qYmT3gPQu9TxvjB3alqpPPdmWpmHid0wdQFz1EJRNJwXgVH8fjI= X-Gm-Message-State: AOJu0Yws2vvgI7w6RR6wnwHV+BTdJs81QBWkeG2ebf966F8DARqCijPs Z+g7HK9xlORzKQF5Zak8JC4Rh3bPvLwC0mEn4dcEubXtvDD5Ozl4LYeWIOpO6e1wHRHWCnacu1T Qc/+j7qvi8ywwLnwrbyXibEbsC9G0NUKXCt6O X-Google-Smtp-Source: AGHT+IEOZWtXx34JWNvL+d6ELvLP8XfqXiM1qC8rozN7hYCkV3RfD9vAKGeZ/H9Uld93ZDVZfoRLupxOvgTXLK7kB8s= X-Received: by 2002:a05:6402:35cc:b0:57c:c5e2:2c37 with SMTP id 4fb4d7f45d1cf-5bc4b3fc8d7mr29569a12.3.1723155408169; Thu, 08 Aug 2024 15:16:48 -0700 (PDT) MIME-Version: 1.0 References: <20240807182325.2585582-1-surenb@google.com> In-Reply-To: From: Jann Horn Date: Fri, 9 Aug 2024 00:16:09 +0200 Message-ID: Subject: Re: [RFC 1/1] mm: introduce mmap_lock_speculation_{start|end} To: Andrii Nakryiko Cc: Suren Baghdasaryan , akpm@linux-foundation.org, peterz@infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox , Vlastimil Babka , Michal Hocko Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: 818201A001D X-Rspamd-Server: rspam01 X-Stat-Signature: stib7jhybmyo5r3dxftnxm5dswt1g7ze X-HE-Tag: 1723155410-236441 X-HE-Meta: U2FsdGVkX1+oUIyDIgfxcK2f5GX6nVkTw4OJ0KBMTBVwfSvjwtDUljFQDKjiNbwdg3i7OaTN1CBzHbKCfQy/59JYxTocl0nx/Ei1zlmoTWizoCSLawMXh8pvDNakACffyBWqThHUI7/UnPmUEwE9a6O/gWSVoO+fujQYBZHPjpxp1TyUMEXMyqJx66HUUYSwj3MVYrXac4VDCu7A+IOEhlGoCdjhZrKrhf0G1IvMfNSejyqfa59i0AmeLPt8RT/gFQBTiEi4BfqfcV9j2A2Q+h/gGWarLOK+IFU5T5zpsj01X333mqzsG99lLqfefg1jbXAs0iDSY+aSZ7Lq+/biC0dFp6yZYLScZQ6b3TNujM9USsHv4ofdgVIbTdQaQ9nJ8errrQe+sOp0es5eK6LP5ATu0gOpy5xV6U7giqebyLA/K/aQnZUtkdhjnnqXVkT7YIWOEKwyfx6X0Je8cgvI9wwkPb7NTyRHy4jujtj2qPVTvryOnC6s7Zv9PE4i7+s/rr9rE+OZLCXV30GWE4kNKQbl63dINdgF9fAOkcu2AdmgoJbO3uYJow9lMLeoXcBRjYV9C44MY6QX+C6+nnC/Mfu9P9Bvp4pKu52IanvBS8CCctPplAK6594xEhKV4j72HWNrmnhDCHe52CcKoTGo57RVTJ5fWHIgIEFkseOyB+c4/bW8sw9K4BO7mCU6g+yFkfnuwrZkBBu1sJh5Ys+tjUCpWBR4UCRz0nPKZL4p9paM+7MU96ZfgwkIPHtmnSKjYia0dDfBSQLOc91+8Z6FC0gJv0gz4CQZf+yVIoD/iN4ErzjaYOhmvakVcAW7+c4vjdxD448im4wy2toAlFP0ixjz55K5q14IAK61SWvoBTTTI6v1Op1MFkHQ5Bc8GQdVLL+QKlfWUJEzBt148MJpgy0DA0RCLmBjkwODHR1jbXS1WtAEJq6WoVL6hfXlfcWTCPE2MDN9iTCSNvV2tQO 6czESbmo H/JrL0aFaEDElHv30xXaoRGs4ZZnQvXYPH6ywi3xEv35IVCuEWzIQoBA8/7aZQegnDQmn7cGlT9luqHv0xm0HwxerYY/34U3SiGBAskjR3V5l6fOZUjeyLVEN0wd2r4LaMynpc+eMZJTcH7dOxycDyJON0Sb6CnC+TnLQgYed1WklVhNplTJMgnEGWv0KYX6JAU3+WZbG1BfPjvuaDyiDxi87kzCiGXJYqNl2+ZF0sKd+Eacw4/ctPeXjj0T/CvMXExcYZP+GErOiq1DrfWzz6qzvRzLNniYB0/2guwUrtDKALEugH4Xg5ce93nsORSBHGS7Lu6yRkBzQ7NegDcVCnl7eVrhV4h+xHbjxjicL2MmQuuqv3QcLoFvb+a0hHuq4ELkTgi+Lice6GJ/O/qBvoxdwtA91TctGWkMg2QC6Lj83S3UEX3uikflldXojrD6lkRdHydaxZOobDOl3Rp04ZJdIzQ/pywGNjW2OguZnJJ/eMkYs6tjgXnOoAE8KTdoJU8ytzNin1vCJVlYsQcRAldL0MdA4WsQb3QmTD6mmLIlGjmavyp8ytB6T5HsZeDu6pr9B X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Aug 9, 2024 at 12:05=E2=80=AFAM Andrii Nakryiko wrote: > On Thu, Aug 8, 2024 at 2:43=E2=80=AFPM Jann Horn wrote= : > > > > On Thu, Aug 8, 2024 at 11:11=E2=80=AFPM Andrii Nakryiko > > wrote: > > > On Thu, Aug 8, 2024 at 2:02=E2=80=AFPM Suren Baghdasaryan wrote: > > > > > > > > On Thu, Aug 8, 2024 at 8:19=E2=80=AFPM Andrii Nakryiko > > > > wrote: > > > > > > > > > > On Wed, Aug 7, 2024 at 11:23=E2=80=AFAM Suren Baghdasaryan wrote: > > > > > > > > > > > > Add helper functions to speculatively perform operations withou= t > > > > > > read-locking mmap_lock, expecting that mmap_lock will not be > > > > > > write-locked and mm is not modified from under us. > > > > > > > > > > > > Signed-off-by: Suren Baghdasaryan > > > > > > Suggested-by: Peter Zijlstra > > > > > > Cc: Andrii Nakryiko > > > > > > --- > > > > > > > > > > This change makes sense and makes mm's seq a bit more useful and > > > > > meaningful. I've also tested it locally with uprobe stress-test, = and > > > > > it seems to work great, I haven't run into any problems with a > > > > > multi-hour stress test run so far. Thanks! > > > > > > > > Thanks for testing and feel free to include this patch into your se= t. > > > > > > Will do! > > > > > > > > > > > I've been thinking about this some more and there is a very unlikel= y > > > > corner case if between mmap_lock_speculation_start() and > > > > mmap_lock_speculation_end() mmap_lock is write-locked/unlocked so m= any > > > > times that mm->mm_lock_seq (int) overflows and just happen to reach > > > > the same value as we recorded in mmap_lock_speculation_start(). Thi= s > > > > would generate a false positive, which would show up as if the > > > > mmap_lock was never touched. Such overflows are possible for vm_loc= k > > > > as well (see: https://elixir.bootlin.com/linux/v6.10.3/source/inclu= de/linux/mm_types.h#L688) > > > > but they are not critical because a false result would simply lead = to > > > > a retry under mmap_lock. However for your case this would be a > > > > critical issue. This is an extremely low probability scenario but > > > > should we still try to handle it? > > > > > > > > > > No, I think it's fine. > > > > Modern computers don't take *that* long to count to 2^32, even when > > every step involves one or more syscalls. I've seen bugs where, for > > example, a 32-bit refcount is not decremented where it should, making > > it possible to overflow the refcount with 2^32 operations of some > > kind, and those have taken something like 3 hours to trigger in one > > case (https://bugs.chromium.org/p/project-zero/issues/detail?id=3D2478)= , > > 14 hours in another case. Or even cases where, if you have enough RAM, > > you can create 2^32 legitimate references to an object and overflow a > > refcount that way > > (https://bugs.chromium.org/p/project-zero/issues/detail?id=3D809 if you > > had more than 32 GiB of RAM, taking only 25 minutes to overflow the > > 32-bit counter - and that is with every step allocating memory). > > So I'd expect 2^32 simple operations that take the mmap lock for > > writing to be faster than 25 minutes on a modern desktop machine. > > > > So for a reader of some kinda 32-bit sequence count, if it is > > conceivably possible for the reader to take at least maybe a couple > > minutes or so between the sequence count reads (also counting time > > during which the reader is preempted or something like that), there > > could be a problem. At that point in the analysis, if you wanted to > > know whether it's actually exploitable, I guess you'd have to look at > > what kinda context you're running in, and what kinda events can > > interrupt/preempt you (like whether someone can send a sufficiently > > dense flood of IPIs to completely prevent you making forward progress, > > like in https://www.vusec.net/projects/ghostrace/), and for how long > > those things can delay you (maybe including what the pessimal > > scheduler behavior looks like if you're in preemptible context, or how > > long clock interrupts can take to execute when processing a giant pile > > of epoll watches), and so on... > > > > And here we are talking about *lockless* *speculative* VMA usage that > will last what, at most on the order of a few microseconds? Are you talking about time spent in task context, or time spent while the task is on the CPU (including time in interrupt context), or about wall clock time? https://www.vusec.net/projects/ghostrace/ is pretty amazing - when you look at the paper https://download.vusec.net/papers/ghostrace_sec24.pdf you can see in Figure 4 how they managed to turn a race window that's 8 instructions wide into a window they can stretch "indefinitely", and they didn't even have to reschedule to pull it off. If I understand correctly, they stretched the race window to something like 35 seconds and could have stretched it even wider if they had wanted to? (And yes, Linux fixed the specific trick they used for doing that, but it still shows that this kinda thing is possible in principle.)