From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53469C3DA7F for ; Thu, 15 Aug 2024 20:17:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CD2716B0204; Thu, 15 Aug 2024 16:17:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C82EA6B0205; Thu, 15 Aug 2024 16:17:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B4ADA6B0207; Thu, 15 Aug 2024 16:17:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 9436C6B0204 for ; Thu, 15 Aug 2024 16:17:21 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 23906140F32 for ; Thu, 15 Aug 2024 20:17:21 +0000 (UTC) X-FDA: 82455589482.14.084288D Received: from mail-pg1-f173.google.com (mail-pg1-f173.google.com [209.85.215.173]) by imf28.hostedemail.com (Postfix) with ESMTP id 507FBC0014 for ; Thu, 15 Aug 2024 20:17:19 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=UDhNixCm; spf=pass (imf28.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.215.173 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723752965; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0cdTKPtoN4CzKbsNc3MnpYWSX4bpB5XCY/L4ADgd1ls=; b=a6WaxaJQ/vpQ4vG5XwcrKSLgyFXkZPHzj0HAmTvAI7dPtYqdoGLfXaqwxH72ZhH/KqaKCN pLgtTsOu7Mq13H7FgbzPliYAMy4fBCFF4/ZESu2PEzAGTxG2leOMe4MaLxvvAa9RKZnw97 7QzkrnAo/g1aVp/Jd0i9sKqolQ+3eXE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723752965; a=rsa-sha256; cv=none; b=VQKkeqLABq6jDKBY+ivBQ9GOiamtV9unMk4zvHFDlaN2iga3pem+NCHcehldcegW0X7f12 TFONS1MF2JIBjkK1vXoXfPKBN/VSYXPw3tjEsh7o0STngz+XurfZ8m6Q4Prx1lJSiGb7W0 CoMwqpIFhtOHh7qpdS5N9RqIilwP5Bg= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=UDhNixCm; spf=pass (imf28.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.215.173 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pg1-f173.google.com with SMTP id 41be03b00d2f7-7bcf8077742so1052226a12.0 for ; Thu, 15 Aug 2024 13:17:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723753038; x=1724357838; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=0cdTKPtoN4CzKbsNc3MnpYWSX4bpB5XCY/L4ADgd1ls=; b=UDhNixCmbiUhbk9+11BLtHV6Pr19HHqSF0pjCO+TLgi3794fK2RwLswk43yD+dmtrA q9klEWQgRN/yDLdelX60WvyBPLGxz3eXMh3Icl8bjY8po3TZSwVC3HBAwbaQwZHr6AAg CIsJZ/TKP7OX+5NJVstNDW4NRz0tBvJX3XGieU4WzJVB4q/V2t5SveuOKvYo+KcDP8IU 3Ek9/5n1mbcSNihOgH+Fa/pNFEs8l21FGKTziMUGnKJG+uvAiwoXDeLHsMZ0U4kTzpJd RU9P3yYeOUQ3BuTuc38BAvxpaBOPY4ohhbiuD06wpgYLOQbgAkbwvqrwuzUYDSv7yfJx pxwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723753038; x=1724357838; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0cdTKPtoN4CzKbsNc3MnpYWSX4bpB5XCY/L4ADgd1ls=; b=PrMTETak/ySAxLWoAwq5z7CvTP7J0hlGsdalfRg9aE5Uor/QtFwxgaysdVbbW39dpU ocf4kjQfMsBGS+xSTQYMaSWgDZmZv1OEp5396aj/9yxTMbKZWpb7NBPRjJm/V/aIhuAJ YXiQFZ/0CA1Ij5KChUPMZNcZAmM1cgif0s4lG/bFacwlMYTbIzB7Jk5eGbhTXR80cxj3 LUwNy+Kcqfrxy/Q2rmLQUXsx19K4ix5s7QaAh62gpdvJYod+BGITYnlGhRe/Jhpi4gRK Y/vjBNNanCXOPI7/4WW0ph+7vi9r/6eaYrCjPPE6iG/wqAFPN1KdHYuRj6gr9JMAXkOH EOgw== X-Forwarded-Encrypted: i=1; AJvYcCWuZXNY1LrgRLtjCTPUJfvJ60y/hZWGN8xj50lp+8HBQ/8+vgRjJlkoA3yxqK1buC159AiDi0MOaT2PH9laz14i3V4= X-Gm-Message-State: AOJu0Yy6b7TaExuZtJ9oYEzy3IjAEzGOSDBS/nTHfMSuXZQYg6vhD0mG fyzrSs4iJeFXYLuYFv1SoQZKW2GJwa90rdqLqbP/9ZGlpaHuUFjkh37bMOnsx8X8cP3B6yIW6K3 rDQDuc+jvta7bnqkVahQFsPNLYE0= X-Google-Smtp-Source: AGHT+IG/3Oe6f9E+qWnqbNQJ7WtZ8Qle8wXpUdO9SOJikgKPAM96ccSiRAJmaUC3x7vChs7vGIIh2bwQVsa4vRL1h70= X-Received: by 2002:a17:90b:1247:b0:2cd:5d13:40ba with SMTP id 98e67ed59e1d1-2d3dffc8f3bmr928532a91.14.1723753037724; Thu, 15 Aug 2024 13:17:17 -0700 (PDT) MIME-Version: 1.0 References: <20240813042917.506057-1-andrii@kernel.org> <20240813042917.506057-14-andrii@kernel.org> <7byqni7pmnufzjj73eqee2hvpk47tzgwot32gez3lb2u5lucs2@5m7dvjrvtmv2> In-Reply-To: From: Andrii Nakryiko Date: Thu, 15 Aug 2024 13:17:05 -0700 Message-ID: Subject: Re: [PATCH RFC v3 13/13] uprobes: add speculative lockless VMA to inode resolution To: Jann Horn Cc: Suren Baghdasaryan , Christian Brauner , Mateusz Guzik , Andrii Nakryiko , linux-trace-kernel@vger.kernel.org, peterz@infradead.org, oleg@redhat.com, rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, jolsa@kernel.org, paulmck@kernel.org, willy@infradead.org, akpm@linux-foundation.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 35qyomjqrq89k4rs3trh4dt31yy73izt X-Rspamd-Queue-Id: 507FBC0014 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1723753039-538350 X-HE-Meta: U2FsdGVkX1986psYh1IpZLOPkDwrBh5SAojwzN8J4GCUgTzz5l5kzp5nunWAK0G8P3REXfBbVs6trXSJexK721iQhu2H5i71UJpHZkC5Np9Df6EnUqI0g0i0QNakIJkgX1wAVVxiUSjVEKRiXVS76dcCq70bFstXj5UlnX2OoHBUrumaogLopRfNvIgMkbODjblL+zU0H/wqlk/vSgOn4AQIMm9qNDZSFdeJ8tRRyoE2YEDg9DnUePdbjb0O0+/O97ShqW+BO4DIlO1yIuLz4NfvqkpxsKI3buDCzl4sUKgJ2GPYGjPzVKaSp2kOz1w8FaLQbaLSFjMkMCDp4MkPD40orGo/bWi0NuGmIt01l/Ww0Pxvv1eoXoMKnk3XdU+0I3C1cujAtoN5ZL9Uk+EELnU/YirqRGSOq7TYQ0O/wp0eHGPtTTCCKL91MKRFn98oSHKRuwy3iIV6RH7msXCfB+HqnntiIbB7im8B0xiClp/Eq1i4hP49H9NjNlrEUXSbgj1z5uxrLt70oD8cmvgGc27cJ7jrwMCfBzuvv3UcIqFGkAHOrWAGQGcYZ7Tx24aO0dX5q7dnLLhA8eLzjglIYodo0KB5RPpdo6ya8Jx3oTqp8bJ0/YvJnrLmzoAasD2c9FzwCUAGfhBNODjhVrFGLFnTrF59mJX/pYuwZmXRM2uL7d+jqiFRJh5yoRx+Qvs0ye1SZ4ULcz5p8mcWQgQFHw0uM2HTu6zsTl9xvYbLeyzsgZqwIz3P7/AyIrCnZifQdk5Btu9IjoNtXquHMzEcejfzFviiC7UB7LjW7N+hDWa1LcGUi7FBxgYn/hTS7V11a2hPJR/kR/0Bwn2n0OivKk6lLiJcVb5njiUKuqNu2tQ9+hkGVTaf/ChyhLaeklnkdvMsZZ3w7AyXfgDAt81g3G1/ZGIOM3aFfB92j3tuyCixMmirINS1h02yVZACBk5NTeKzYaOGhDr4IBvPlOD 0Qq3zRqz clEL0zTxElOlWaSnLb7MnDeHuCt6qii16FfWhFOmWNF29PKhOO9KUTionKLrgd8bmk8yfbCT2zqEsEDDMblTogArPvL+i/tlDQHaSncDpLxB09uJVcoM7bJHECAy+gMs6f2H03hBhZX5cSgnFuP0waMDnpGiRYZ1hwO8hecr/TNewQLjEp9GRzVQIru4s7jJuLC5aTCaB+dlH1tq5gNGBXlt9qvVWiso/EUL4bMUAiyUsKy5NgEKjvsHrLkJLb9WvPDqPOOhZfvxT4TQ5MR4/7FtNg5eHwIYgAPtjjV7FpdhzJMkjTq2iOud4pw8jbil81pbhj5AKPZjbXSCn44KP434SLAKkiAlACvmpYD4Fz/tkie9bbTg37mhvXdlbId1MXzwDJIQ+nHpM7Ah3DFBlLZRfxYXBF23e4XEFXQaGo9clKuvfOvTgg9XU/g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000679, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Aug 15, 2024 at 11:58=E2=80=AFAM Jann Horn wrote= : > > +brauner for "struct file" lifetime > > On Thu, Aug 15, 2024 at 7:45=E2=80=AFPM Suren Baghdasaryan wrote: > > On Thu, Aug 15, 2024 at 9:47=E2=80=AFAM Andrii Nakryiko > > wrote: > > > > > > On Thu, Aug 15, 2024 at 6:44=E2=80=AFAM Mateusz Guzik wrote: > > > > > > > > On Tue, Aug 13, 2024 at 08:36:03AM -0700, Suren Baghdasaryan wrote: > > > > > On Mon, Aug 12, 2024 at 11:18=E2=80=AFPM Mateusz Guzik wrote: > > > > > > > > > > > > On Mon, Aug 12, 2024 at 09:29:17PM -0700, Andrii Nakryiko wrote= : > > > > > > > Now that files_cachep is SLAB_TYPESAFE_BY_RCU, we can safely = access > > > > > > > vma->vm_file->f_inode lockless only under rcu_read_lock() pro= tection, > > > > > > > attempting uprobe look up speculatively. > > Stupid question: Is this uprobe stuff actually such a hot codepath > that it makes sense to optimize it to be faster than the page fault > path? Not a stupid question, but yes, generally speaking uprobe performance is critical for a bunch of tracing use cases. And having independent threads implicitly contending with each other just because of uprobe's internal implementation detail (while conceptually there should be no dependencies for triggering uprobe from multiple parallel threads) is a big surprise to users and affects production use cases beyond just uprobe-handling BPF logic overhead ("useful overhead") they assume. > > (Sidenote: I find it kinda interesting that this is sort of going back > in the direction of the old Speculative Page Faults design.) > > > > > > > > We rely on newly added mmap_lock_speculation_{start,end}() he= lpers to > > > > > > > validate that mm_struct stays intact for entire duration of t= his > > > > > > > speculation. If not, we fall back to mmap_lock-protected look= up. > > > > > > > > > > > > > > This allows to avoid contention on mmap_lock in absolutely ma= jority of > > > > > > > cases, nicely improving uprobe/uretprobe scalability. > > > > > > > > > > > > > [...] > > Note: up_write(&vma->vm_lock->lock) in the vma_start_write() is not > > enough because it's one-way permeable (it's a "RELEASE operation") and > > later vma->vm_file store (or any other VMA modification) can move > > before our vma->vm_lock_seq store. > > > > This makes vma_start_write() heavier but again, it's write-locking, so > > should not be considered a fast path. > > With this change we can use the code suggested by Andrii in > > https://lore.kernel.org/all/CAEf4BzZeLg0WsYw2M7KFy0+APrPaPVBY7FbawB9vjc= A2+6k69Q@mail.gmail.com/ > > with an additional smp_rmb(): > > > > rcu_read_lock() > > vma =3D find_vma(...) > > if (!vma) /* bail */ > > And maybe add some comments like: > > /* > * Load the current VMA lock sequence - we will detect if anyone concurre= ntly > * locks the VMA after this point. > * Pairs with smp_wmb() in vma_start_write(). > */ > > vm_lock_seq =3D smp_load_acquire(&vma->vm_lock_seq); > /* > * Now we just have to detect if the VMA is already locked with its curre= nt > * sequence count. > * > * The following load is ordered against the vm_lock_seq load above (usin= g > * smp_load_acquire() for the load above), and pairs with implicit memory > * ordering between the mm_lock_seq write in mmap_write_unlock() and the > * vm_lock_seq write in the next vma_start_write() after that (which can = only > * occur after an mmap_write_lock()). > */ > > mm_lock_seq =3D smp_load_acquire(&vma->mm->mm_lock_seq); > > /* I think vm_lock has to be acquired first to avoid the race */ > > if (mm_lock_seq =3D=3D vm_lock_seq) > > /* bail, vma is write-locked */ > > ... perform uprobe lookup logic based on vma->vm_file->f_inode ... > /* > * Order the speculative accesses above against the following vm_lock_seq > * recheck. > */ > > smp_rmb(); > > if (vma->vm_lock_seq !=3D vm_lock_seq) > thanks, will incorporate these comments into the next revision > (As I said on the other thread: Since this now relies on > vma->vm_lock_seq not wrapping back to the same value for correctness, > I'd like to see vma->vm_lock_seq being at least an "unsigned long", or > even better, an atomic64_t... though I realize we don't currently do > that for seqlocks either.) > > > /* bail, VMA might have changed */ > > > > The smp_rmb() is needed so that vma->vm_lock_seq load does not get > > reordered and moved up before speculation. > > > > I'm CC'ing Jann since he understands memory barriers way better than > > me and will keep me honest.