From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE2E4C3DA4A for ; Thu, 8 Aug 2024 22:05:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 263E16B0089; Thu, 8 Aug 2024 18:05:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 213F76B008A; Thu, 8 Aug 2024 18:05:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0DBC26B008C; Thu, 8 Aug 2024 18:05:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E39B16B0089 for ; Thu, 8 Aug 2024 18:05:22 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 79F1B14021D for ; Thu, 8 Aug 2024 22:05:22 +0000 (UTC) X-FDA: 82430460084.11.7B9EF90 Received: from mail-pg1-f173.google.com (mail-pg1-f173.google.com [209.85.215.173]) by imf04.hostedemail.com (Postfix) with ESMTP id 9DAEF40013 for ; Thu, 8 Aug 2024 22:05:19 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=i4F2yW1k; spf=pass (imf04.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.215.173 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723154655; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VS9RhnQcAx8bVfljIwN5nss8vnHxFq29bU8o2dOxP0Y=; b=OvFX7nD+45PQ1kZOtpEUv3oIZ6nk02B8802V0sNvoCY3o9xRwUwL/iD899u8+BVdiSewGJ 7wE+aqUElFBcpXGg5Lm6ddF/iV2AuX24Vv+rkPfU/yYPvTOUM8HYHcS54RO+ThjJlXSR4g flr9lp2s9g9iSxsr42CB7/Lk10AHWUE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723154655; a=rsa-sha256; cv=none; b=c5T7GL5foGzsz0pTrDOJ/0hZLN7qj0ZriDderOnPRzuUjHp2fTMACTPrZeJVpC1d6/FGix 8z9uYAnIio71RG7Jo6RBqjTJ89udbatRS6w4U9rHgAoBziieYv79WSuXcxnNw98QfHy9hp ltGmecNwpvwkNwCoGdhkTBMKFjt+CWA= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=i4F2yW1k; spf=pass (imf04.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.215.173 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pg1-f173.google.com with SMTP id 41be03b00d2f7-7b80afeb099so1029139a12.2 for ; Thu, 08 Aug 2024 15:05:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723154718; x=1723759518; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=VS9RhnQcAx8bVfljIwN5nss8vnHxFq29bU8o2dOxP0Y=; b=i4F2yW1kkvJsFM6ZjsAje/j/mwgeQUcrZhTbeUapD3mYg0dKXE/awy4oT8vvdA1J+5 De+kgflAEesiuQEnMST4HhLOn3v9E2SKbksNfXHuKGSN3N+24ct/MK0yF4g362+ZhNob NLqTmLZr7XGC4GnrbcEIv3BhiZbgdh+8dl4hC5N4Yo/Hod4kKlM4iA9ZlVvikxx7/9bN O8r9d1DKOIQbpoiB0sXcaTw1MFbmpao7Ky/P/QCJaaGhpXrmpSOjZFkLW1XHSgJZHO7A 19EpP4cbIC8p9eTNQgeFrGXu2iU4dAkiisC4DgPV31eHqdIal75EbiM0cBBefGoJYf+G Abpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723154718; x=1723759518; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VS9RhnQcAx8bVfljIwN5nss8vnHxFq29bU8o2dOxP0Y=; b=ThEhpoXHN+D/pOUIYIkRmpNvF3xEFxEoxznp4seg1z6do+em0YEacpOsqHUkaESUbJ 0EW8lCJ+G9WcI6312M+CXcQmLaCaLLPUdUYAS4+FmZ2iBbBZOgDyt3A24myuCSYlveq/ Gxvv4T/DDIzGvyVZXtk7sgNio1OrUUkxAekMfcQkypnTKblIA5B2UhHvcnyJZARtowEq a5SVA1xhtKU7TRo8bDg8Nhv0Ldb3Lb3WKyF3LMSkrGHaDaQSJtL1vFP2VRel79/l2ix+ BEwfcX7ipd66eEtde5Gzi7QFfRD470EJMpJGjFPy26PlwBox+Ygpi+NOzlcQ3+fTU8CR f6Hg== X-Forwarded-Encrypted: i=1; AJvYcCUoTNe6tDTL95B7aa+wfrc8d8Lti6lAj8cZYN+wxtnXJVFnO7KMUDXBYYYTKa0C/Hqt23DUmyv0uUl6qyiWy4oWORU= X-Gm-Message-State: AOJu0Yyp6+AU4ECp28mABfT7GG9zRVGT16xUyB73HAFBvrXfd6Hnw0KC 62CjgemVW1i/cAuyq963k1V8jUGEv4WqxtlEQB1pEggdE8SdZqBh8tpua7A+rmduNYIdBYG8iYP EHlDyjGbSuvXkMOr5JB+P+EVCH5s= X-Google-Smtp-Source: AGHT+IFHmLN4iW78Tt/ZhNvVTM2l59XEo9720FW+/GAAJwnuB/X4Qj5hzBa1kvCGjqN61sU+d81WsX1p9oHtMyEhvWE= X-Received: by 2002:a17:90a:8a10:b0:2c9:7803:1cf6 with SMTP id 98e67ed59e1d1-2d1c33fe286mr3697436a91.20.1723154717943; Thu, 08 Aug 2024 15:05:17 -0700 (PDT) MIME-Version: 1.0 References: <20240807182325.2585582-1-surenb@google.com> In-Reply-To: From: Andrii Nakryiko Date: Thu, 8 Aug 2024 15:05:05 -0700 Message-ID: Subject: Re: [RFC 1/1] mm: introduce mmap_lock_speculation_{start|end} To: Jann Horn Cc: Suren Baghdasaryan , akpm@linux-foundation.org, peterz@infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox , Vlastimil Babka , Michal Hocko Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 9DAEF40013 X-Stat-Signature: d5hnudb7yoxpa9a531g6bnocbke8dwzd X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1723154719-567109 X-HE-Meta: U2FsdGVkX18to9ies8JLy2QXhEbmPq09+YSXxsT5y2T5d4BqPbEp/G1NibJU80SLMk7sign+BWTdXUp7q/0vcfwa4LITlzpj5k9uqqzI0XyIWguOqdXsTiNh5pjZ2SzPme6yi1hHav3zokP3lZLsNmRmJEpdKcaUSI5x0wJNBZO0UeJUUFge9hXjsYPRFR9IEeED5UQx8W6rg74M5CMttb+JCYTFO8cyJYx/+Y2ggoWo+9Q948XuxnY0FgTBoH5WO4Mmzv9ur/i0oOftO961K8S49+I51QvU8nzejeRxQ/Gn+VlNlInDHXiC5L+Q9wo6r67BebSKU5i1Hk14x623cNZlZie5kkzA04/y7I782xaOtil3hG5+5NAGr8/wL/IijvUaOhSpmV9ikX219AFsgfZqn9ZXaMmyHyOZuuzA6aZ7fRSpqu0Ow6JG3Xnj1KowLTwa+zLyDA/kidHx6wx8NeglZNd6xBOYyCH7u7HqlhRpfQfiReRK37sOJa//5W9o10PfslaFTqV4Cspgyn34ndq9fNTKAoNpftSD1ruaILBOWyVJclWzLtSfXk5H/LLngpsz1rZsUiG7hShe0S7fyrOkZQLMecAlvHCU3lyTvkj5cgM+hiFU/Ouy+dt3CJYVruEis3nOyIAFNgzEOHpwI5p8Fj5/Ds84QrWn4U4xAANnrHZxUI5O1AT+9J99VPYfKSGPO+zZ300I9K+Cv6+qSFaWt6Jk+EC5iYwSyewW0OeZuYVA4YZ1bBUj7qoIDrZfQGwrLrUW6MnKJpY9kyaAovrbBuFpgWoYPyu7CnCWPysKFxCky2wuzd8A0UolfBtzoiSsZ41/6WVyNNvmUzQMWbx8c8+cPvAGYJtQFpnus8ML4cW+KAcR75G0bY+Ij7sxRKAqBUU+YH53/ixg5HhzBI8Xa6CzdlY13puRoo3qoL/mOmiTTaELLl63XpUVadOI/TrIHo81AYzqRRzeZWN rchexS/r CvQBxyzBLHCA8fFiCRM9d9mxzxqnbYFH+gB5TRJSlpc17sz/zmsNq3tL6/O1NTiCHhuayOV+f48RxH1TTo/Vr7q07m51VYn4PEobHRZcizuOg2YOrh6ToA4rO67D5o7Z6BFfBlG387pJ8zEHFIZp1Tfa18MVhMSdk3sb+07E1QsodSbWY8tJAoq2cnG4SFLsQVw9px6HAvG+1IKATOE9HBtBD8qOgXD1rzr6Ydr3VdBzoNn3ubsbPRpq31KouU0CAyEEN47FM+3j9LpCrk2gZstulvYwZgrqzeawbTe5U2M5hcw5Sk/6hOcALPU0nO/zHpGapR99RtYT1dhmacGW+L0Aiqz4hJwuSPw8eXEV40HsSu/PF/WrjC9FBcNYxdZDF7F2TUMVPepg9oAnTB7PT5MvKtQGXvpizjcIkGpFq1+/df7Ibyia1Fkq8bpNihw8Px0hqnXk+7QMrd8NdgtQDvwoPLaNkfEeI2rIk1ekJ9jgn7Fq/ILIMbkSjst14ggj3LvSR20AAXfMCVQxjo2dP1uJ1/aOVuwGlmdFWycXyYwXjCO6UNQkOHmTzwPu0dOKmPKo2gmSeC8IysHJCzAE98oCTT2XpSCicPSOC4GhNzzu4grSig1rWBRUP7oR0Zerx0wDWIsz2g5mFKwyl6yxyNUNwOg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Aug 8, 2024 at 2:43=E2=80=AFPM Jann Horn wrote: > > On Thu, Aug 8, 2024 at 11:11=E2=80=AFPM Andrii Nakryiko > wrote: > > On Thu, Aug 8, 2024 at 2:02=E2=80=AFPM Suren Baghdasaryan wrote: > > > > > > On Thu, Aug 8, 2024 at 8:19=E2=80=AFPM Andrii Nakryiko > > > wrote: > > > > > > > > On Wed, Aug 7, 2024 at 11:23=E2=80=AFAM Suren Baghdasaryan wrote: > > > > > > > > > > Add helper functions to speculatively perform operations without > > > > > read-locking mmap_lock, expecting that mmap_lock will not be > > > > > write-locked and mm is not modified from under us. > > > > > > > > > > Signed-off-by: Suren Baghdasaryan > > > > > Suggested-by: Peter Zijlstra > > > > > Cc: Andrii Nakryiko > > > > > --- > > > > > > > > This change makes sense and makes mm's seq a bit more useful and > > > > meaningful. I've also tested it locally with uprobe stress-test, an= d > > > > it seems to work great, I haven't run into any problems with a > > > > multi-hour stress test run so far. Thanks! > > > > > > Thanks for testing and feel free to include this patch into your set. > > > > Will do! > > > > > > > > I've been thinking about this some more and there is a very unlikely > > > corner case if between mmap_lock_speculation_start() and > > > mmap_lock_speculation_end() mmap_lock is write-locked/unlocked so man= y > > > times that mm->mm_lock_seq (int) overflows and just happen to reach > > > the same value as we recorded in mmap_lock_speculation_start(). This > > > would generate a false positive, which would show up as if the > > > mmap_lock was never touched. Such overflows are possible for vm_lock > > > as well (see: https://elixir.bootlin.com/linux/v6.10.3/source/include= /linux/mm_types.h#L688) > > > but they are not critical because a false result would simply lead to > > > a retry under mmap_lock. However for your case this would be a > > > critical issue. This is an extremely low probability scenario but > > > should we still try to handle it? > > > > > > > No, I think it's fine. > > Modern computers don't take *that* long to count to 2^32, even when > every step involves one or more syscalls. I've seen bugs where, for > example, a 32-bit refcount is not decremented where it should, making > it possible to overflow the refcount with 2^32 operations of some > kind, and those have taken something like 3 hours to trigger in one > case (https://bugs.chromium.org/p/project-zero/issues/detail?id=3D2478), > 14 hours in another case. Or even cases where, if you have enough RAM, > you can create 2^32 legitimate references to an object and overflow a > refcount that way > (https://bugs.chromium.org/p/project-zero/issues/detail?id=3D809 if you > had more than 32 GiB of RAM, taking only 25 minutes to overflow the > 32-bit counter - and that is with every step allocating memory). > So I'd expect 2^32 simple operations that take the mmap lock for > writing to be faster than 25 minutes on a modern desktop machine. > > So for a reader of some kinda 32-bit sequence count, if it is > conceivably possible for the reader to take at least maybe a couple > minutes or so between the sequence count reads (also counting time > during which the reader is preempted or something like that), there > could be a problem. At that point in the analysis, if you wanted to > know whether it's actually exploitable, I guess you'd have to look at > what kinda context you're running in, and what kinda events can > interrupt/preempt you (like whether someone can send a sufficiently > dense flood of IPIs to completely prevent you making forward progress, > like in https://www.vusec.net/projects/ghostrace/), and for how long > those things can delay you (maybe including what the pessimal > scheduler behavior looks like if you're in preemptible context, or how > long clock interrupts can take to execute when processing a giant pile > of epoll watches), and so on... > And here we are talking about *lockless* *speculative* VMA usage that will last what, at most on the order of a few microseconds? So I stand by "can never happen", because if it does, your system is so overloaded that something like this uprobe issue is your least concern. > > Similar problems could happen with refcount_t, > > for instance (it has a logic to have a sticky "has overflown" state, > > which I believe relies on the fact that we'll never be able to > > increment refcount 2bln+ times in between some resetting logic). > > Anyways, I think it's utterly unrealistic and should be considered > > impossible. > > IIRC refcount_t protects against this even in theoretical, fairly > pessimal scenarios, because the maximum number of tasks you can have > on Linux is smaller than the number of refcount decrements you'd have > to do in parallel to bring a pinned refcount back down to 0. > > I know this is a weakness of seqcount_t (though last time I checked I > couldn't find any examples where it seemed like you could actually > abuse this). > > But if you want a counter, and something bad would happen if the > counter wraps, and you don't have a really strong guarantee that the > counter won't wrap, I think it's more robust to make it 64-bit. (Or an > unsigned long and hope there aren't too many people who still run > 32-bit kernels on anything important... though that's not very > pretty.)