From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B37C3C3DA4A for ; Thu, 8 Aug 2024 21:43:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 49D7F6B0092; Thu, 8 Aug 2024 17:43:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 426406B0095; Thu, 8 Aug 2024 17:43:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2A0666B009C; Thu, 8 Aug 2024 17:43:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0560D6B0092 for ; Thu, 8 Aug 2024 17:43:15 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id C45B21401D3 for ; Thu, 8 Aug 2024 21:43:15 +0000 (UTC) X-FDA: 82430404350.16.229959C Received: from mail-ed1-f50.google.com (mail-ed1-f50.google.com [209.85.208.50]) by imf07.hostedemail.com (Postfix) with ESMTP id F2F0A40002 for ; Thu, 8 Aug 2024 21:43:12 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=M761+rU6; spf=pass (imf07.hostedemail.com: domain of jannh@google.com designates 209.85.208.50 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723153360; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=F3HAG9cERt0HZrldBR6qvUxJ0ROrnb/caFVAXJMZG9w=; b=pMVXBb1U1PsDm+Kiq6mOl0Qna+chKKNgGc/Sn8OgUNGfAV5PRQ0Wvr3KlAocZJX+yS7n6a BQpsre6ulVe4Ob94JkyXRHh5R/eDoYXI8PIy/zWJ0Y8KI418R32JlZ1LufiLj/P4oVX/9z viZJR0HAWliWytIVxtWwWyH8yGm1hss= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=M761+rU6; spf=pass (imf07.hostedemail.com: domain of jannh@google.com designates 209.85.208.50 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723153360; a=rsa-sha256; cv=none; b=zxw9xbvDms8MLK5vMjfdXe34Gm4owXUHRyQeKFIRejxAStC0jscAA5wcf58M93aXnirsPX XERPQayOUCGQ5YZo4M00UXB0lYaZu2x3fpYdJFKC31L5XO6nUh8Ex/ZOtbUKdGtd5Ajvcl 70Rj//IikTOx3zUbV3DO2U1aCjkzqmQ= Received: by mail-ed1-f50.google.com with SMTP id 4fb4d7f45d1cf-5a28b61b880so623a12.1 for ; Thu, 08 Aug 2024 14:43:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1723153391; x=1723758191; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=F3HAG9cERt0HZrldBR6qvUxJ0ROrnb/caFVAXJMZG9w=; b=M761+rU6gPilpluCaG5PssM7oqlJnMHMBtiuBqC3WHp99BKI07XRf92R+8w/aec/f5 YLMR5aN+rvBHBJrcbFxunsBYVNAlWAea1AI1saki05J0BLEd/2L1/izJuMy5U081EX5E dyEzjyYSMByDRE+jpo6lKw+n7vuNI1My5F9RVz+5pOZfWE+OUy8D9chPK1tOFtdNg6Ie u+5nUmiXvK2mlNDMD4dcL4toPsqH4hf9C7R39ZELWBMw6FavZip3bcf5Ht8tisR+utfs iicGHsThXkJr7iCpBCYI0MH8OwlNt1CCVs6aGdtJJHea+ZnMpW+1wbOGZYcQajcKfovM DgYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723153391; x=1723758191; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=F3HAG9cERt0HZrldBR6qvUxJ0ROrnb/caFVAXJMZG9w=; b=e2gQBNCtP06mzZVFgf/ug2I8AVtt6YwPUvmRA2HMm/i4Vb2mGmfsqTro5+q8Sz/LgQ uCx7nptvfsxrUZgAD+KYdxze8/lI4nVeGsG9syFmhgn3Vt0TcaWMHOjkuGoN6wf0m3UK P56HjL4oqM/LmdOaqcZsaes/+6kRvMIz3wyAyUMtmk+pwTiGBu7RrNjqNUHlly9WcTw/ tetElsu01InK0OU5UZHXpsQgcdZVVtJFWsxIblVWmsWqqAGydOSubaoyTNXOEFiOD7qo 7DVJ3UOIqTwo3X8DVyW4yUZpF/j+8MAS7rZU1WdTdMh5He2yTUfdTyExKGHF6HnQO8zZ 3XDg== X-Forwarded-Encrypted: i=1; AJvYcCX2hMcFfoQqXjjX/eNi8yJJf81ILjVixHpmsVrWXKyJshSUB5Caod4kpIlQ+fDHSNFFnx8wo+C49NcbX1tX0nDVAW8= X-Gm-Message-State: AOJu0YyKaZK7YFCynOyFqzLwy7x75aPm6BpZAtfwhZx13TpvGqoO300L kOQdPia12RGVW6vOnWHdvJRNNFTN9Enn59/o0MirXjHIUSJGMr1n7iVR6lTwKjbJXgWmXUG8XF+ zcfaRKktllgBO3xgi8Eypb6140V2n/g5NMbGk X-Google-Smtp-Source: AGHT+IEH1V3AUKZFLyjOwF0Qksh4Jg69gvv/7gEV5z4JKgxe7+5RAl2VUU75yiE7fK1wQur24yXm5KzLy8M6xQfmK9Y= X-Received: by 2002:a05:6402:350b:b0:58b:90c6:c59e with SMTP id 4fb4d7f45d1cf-5bc4b4363f2mr17670a12.7.1723153390883; Thu, 08 Aug 2024 14:43:10 -0700 (PDT) MIME-Version: 1.0 References: <20240807182325.2585582-1-surenb@google.com> In-Reply-To: From: Jann Horn Date: Thu, 8 Aug 2024 23:42:33 +0200 Message-ID: Subject: Re: [RFC 1/1] mm: introduce mmap_lock_speculation_{start|end} To: Andrii Nakryiko Cc: Suren Baghdasaryan , akpm@linux-foundation.org, peterz@infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox , Vlastimil Babka , Michal Hocko Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: F2F0A40002 X-Stat-Signature: n51xrpsmy3wxpyzpabxjqu5xn6393kja X-HE-Tag: 1723153392-356099 X-HE-Meta: U2FsdGVkX19juf5PsQphJd+cYUeUmXy+E17ii6ZMCliFwdeaQMdPd2QXX+eR0+Q7iXlDt2fneNnsyO7TBxGnDxUiYCEnzPNSAZH3L3NUejZgZBKO5rZMJ89QYevPq8/9NUatoJ50EKk/uLfPcx+1dvlMlY0zBrxA8FVA3kW3OgPOOwRLBfoCiEs7miuBRBHSTxkrbPR5YoDrwuRXpyYaFw7dyMM2oMBfpI2FWTWn87/oDA0zNFLijitBGqAadgoZZGcpaDGS3OhT8z9h7UlHtJs/x+LIgnvm1ucStgmjK5H4DLqkpJ2zcPRQLH0bf+btbjX14ErE6Jat7Mrf8jM1s5AADD8PPFSbzxbErDHBmx4pUh0akPOVPpFfSPCmevmq1rZxSJEDSL0Xr07MT4KL4lpUXK15W1+BjBoxtFx69gdbD1uQ7n9c7O2vG1PurrDmfl47k5T8Ea5eTqqmZMOdGZ0RM9/jU0yIbaNQFDQAJoOPoaWOg+tqt++3D/ZvjrwQlja5zJCT5N4ms68Ox8ZKi93cRZ6DtNrhUfw06KAWjp2XxXAxfUFO3Esf5tpsgbgws/gvF/zBTEhejATVzzzJLTwNT6HJQ2vO+BsDG0q7KPt9bl7ET7CFihsdzFQwohYH1+XUA9nzkQknCmB9Vc+BpeD8FPHH26aGmFRMxtD4BrXLFICQS8v7PdqvRcRDvbcIXh1f2s5UlWUVg9LGhsnA0mHG3w/fgh/VFjtwlN4STHD72CNi44ANk2Hw3NUVXn12xDf3rBDujmMDvb2VNfySX8QstFoOotz3zCLmVWfa6tpKC+rln3QGmNAxGvmXEWJo/mpWGmAKecHkno6xohZzK91+ts5+2krZ6u594KRPW3cknwAVN2CQ6W78H+ItZnITyyJdyUOTWU0YgxwSmCIwkmN1se206sCyfSYxrYRuo4y7UJD4ElUxYS9f446UyfITmLp9UDaVTd2YURtwSgu GBUZmuUP 5KxenJDUgP+RXqlxcSPNQMEGfDR6B9wUb4GS9KcurchSYPgdlUnN32k5Z6BylLYwErLygGQdxHqEo7oivOrbEa07t5xVqX9J+Gz2IXHW6VyIoKaMmr7np+NJhSNVOHuojIhBgLsleOWh8SQuyRjrNZNbN5FSf+5wmFeRZjzTGNi3+uTeYLyuiT/l5G0BioZg8CqG8jfRs0t6EQ6/KuJzm28F8I9jcmSAGIwFhBr0FwseUFrMRlgYul2sKexLTop+4Xnqd+FQGUyUqNTcEYAJ04HJVsAvcvmOBP88P+i3nbK8fEWRJT/7hKnpsC2aRoJLy7PpH5p7AGvSX4xEU15l8MXykibByA2AO2Y6Xx1y76z/D2zvptyDHjo50KaUicTU556XrV2kAXRdaCsIF3zSXUg9pVTOoIuZuMGPGglFd+gB98Ye1iKr2dMsaZwJ/a3CelFhx4fUCLBfPpYH06PBqPWF1EpvWenv8CG49Z8BRT4g5ufYlGCXGk61DrGHq4IJKD30ymcrEhbluhu1LBxSkZLhdG0XizjSTptGx5Ux3r7+nkJumioJVeuwf7aYwUiWZvxc0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Aug 8, 2024 at 11:11=E2=80=AFPM Andrii Nakryiko wrote: > On Thu, Aug 8, 2024 at 2:02=E2=80=AFPM Suren Baghdasaryan wrote: > > > > On Thu, Aug 8, 2024 at 8:19=E2=80=AFPM Andrii Nakryiko > > wrote: > > > > > > On Wed, Aug 7, 2024 at 11:23=E2=80=AFAM Suren Baghdasaryan wrote: > > > > > > > > Add helper functions to speculatively perform operations without > > > > read-locking mmap_lock, expecting that mmap_lock will not be > > > > write-locked and mm is not modified from under us. > > > > > > > > Signed-off-by: Suren Baghdasaryan > > > > Suggested-by: Peter Zijlstra > > > > Cc: Andrii Nakryiko > > > > --- > > > > > > This change makes sense and makes mm's seq a bit more useful and > > > meaningful. I've also tested it locally with uprobe stress-test, and > > > it seems to work great, I haven't run into any problems with a > > > multi-hour stress test run so far. Thanks! > > > > Thanks for testing and feel free to include this patch into your set. > > Will do! > > > > > I've been thinking about this some more and there is a very unlikely > > corner case if between mmap_lock_speculation_start() and > > mmap_lock_speculation_end() mmap_lock is write-locked/unlocked so many > > times that mm->mm_lock_seq (int) overflows and just happen to reach > > the same value as we recorded in mmap_lock_speculation_start(). This > > would generate a false positive, which would show up as if the > > mmap_lock was never touched. Such overflows are possible for vm_lock > > as well (see: https://elixir.bootlin.com/linux/v6.10.3/source/include/l= inux/mm_types.h#L688) > > but they are not critical because a false result would simply lead to > > a retry under mmap_lock. However for your case this would be a > > critical issue. This is an extremely low probability scenario but > > should we still try to handle it? > > > > No, I think it's fine. Modern computers don't take *that* long to count to 2^32, even when every step involves one or more syscalls. I've seen bugs where, for example, a 32-bit refcount is not decremented where it should, making it possible to overflow the refcount with 2^32 operations of some kind, and those have taken something like 3 hours to trigger in one case (https://bugs.chromium.org/p/project-zero/issues/detail?id=3D2478), 14 hours in another case. Or even cases where, if you have enough RAM, you can create 2^32 legitimate references to an object and overflow a refcount that way (https://bugs.chromium.org/p/project-zero/issues/detail?id=3D809 if you had more than 32 GiB of RAM, taking only 25 minutes to overflow the 32-bit counter - and that is with every step allocating memory). So I'd expect 2^32 simple operations that take the mmap lock for writing to be faster than 25 minutes on a modern desktop machine. So for a reader of some kinda 32-bit sequence count, if it is conceivably possible for the reader to take at least maybe a couple minutes or so between the sequence count reads (also counting time during which the reader is preempted or something like that), there could be a problem. At that point in the analysis, if you wanted to know whether it's actually exploitable, I guess you'd have to look at what kinda context you're running in, and what kinda events can interrupt/preempt you (like whether someone can send a sufficiently dense flood of IPIs to completely prevent you making forward progress, like in https://www.vusec.net/projects/ghostrace/), and for how long those things can delay you (maybe including what the pessimal scheduler behavior looks like if you're in preemptible context, or how long clock interrupts can take to execute when processing a giant pile of epoll watches), and so on... > Similar problems could happen with refcount_t, > for instance (it has a logic to have a sticky "has overflown" state, > which I believe relies on the fact that we'll never be able to > increment refcount 2bln+ times in between some resetting logic). > Anyways, I think it's utterly unrealistic and should be considered > impossible. IIRC refcount_t protects against this even in theoretical, fairly pessimal scenarios, because the maximum number of tasks you can have on Linux is smaller than the number of refcount decrements you'd have to do in parallel to bring a pinned refcount back down to 0. I know this is a weakness of seqcount_t (though last time I checked I couldn't find any examples where it seemed like you could actually abuse this). But if you want a counter, and something bad would happen if the counter wraps, and you don't have a really strong guarantee that the counter won't wrap, I think it's more robust to make it 64-bit. (Or an unsigned long and hope there aren't too many people who still run 32-bit kernels on anything important... though that's not very pretty.)