From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48B50C369DC for ; Thu, 1 May 2025 22:10:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EA8AD6B0088; Thu, 1 May 2025 18:10:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E30916B0089; Thu, 1 May 2025 18:10:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CD1926B008A; Thu, 1 May 2025 18:10:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A9F1E6B0088 for ; Thu, 1 May 2025 18:10:29 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 9962216188A for ; Thu, 1 May 2025 22:10:29 +0000 (UTC) X-FDA: 83395733778.21.991E79D Received: from mail-pj1-f52.google.com (mail-pj1-f52.google.com [209.85.216.52]) by imf26.hostedemail.com (Postfix) with ESMTP id AF84C140003 for ; Thu, 1 May 2025 22:10:27 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=gzuYX+x3; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf26.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.216.52 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746137427; a=rsa-sha256; cv=none; b=HZ/w9reLN8Wng3Rdjq4fgUZ9BoOZlWpbC5pEzU5uH0+dniRn4iYCcG/MwQmznM06ybNSK8 dHGpDYQ+pgKnF9GoDhulhI3LVQzfWDgb7mNU29iEgR7o7ErgjJHH+fVeQSBKPeKzWkuhFF bPgKFZAXo5ypZGIDmh5yD2zN8iO6OMo= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=gzuYX+x3; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf26.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.216.52 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746137427; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BWPyE4Z62SctOURTQUDtHt2c4dEB0jFXGMMvw9+2hRI=; b=Ubz3Be17nFFvmSnTF+MlMgp0jD0Nx4NAg9D12g6Oi50jfqAzT6J0N0oTCgHDFM63iYpyuk Rma+UGi5UbEVgHkwD1llf7G8hR6q+603e7iH8IUcVFs11XdlXzhPS07iGwCMrQqAW7aXb6 CrddinALbkQFLwhx9x2mRN1vqUGpOoA= Received: by mail-pj1-f52.google.com with SMTP id 98e67ed59e1d1-301918a4e1bso1236724a91.1 for ; Thu, 01 May 2025 15:10:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1746137426; x=1746742226; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=BWPyE4Z62SctOURTQUDtHt2c4dEB0jFXGMMvw9+2hRI=; b=gzuYX+x3YG1kp2egRC0JYKJODYG9zLCmn0THYDxOXCjAKqJnJzlQmkDRnZLcBvBCjn epGH8C7XNxJBkUnaaNPhn8GAhI0YIqtshawqpvgwQmxJ2EkaVdyXQ2q8zDFNwsahDBS6 AbOKacQjgKuKRFAIHoJGqp35DlgJ0r1dQdjPn1cqY5sqzxCHBMp7ev3s/RSDGQUx4GKE YKJWy+TfzPKJQWUQENSw2gNdSxRzhqZPVhB+8b4pstVLQ+UHCZjdCXQDYCQKcVSA1h8N nBBfda/BdjMLF1R59AF4GlNNK29dR5BrWqT7ueKg3KcdHeIgvh+pODDgWa2Ox4/3+r79 aKaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746137426; x=1746742226; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BWPyE4Z62SctOURTQUDtHt2c4dEB0jFXGMMvw9+2hRI=; b=V+Rkumgkb02dLefhHtj9fvffYb4UhjgIqxVjNdKGoH+EMo+e9Q8ie92Tcxm1FrwmkF Pu4dj2V0mDVYd+RVYN/6jjob5g2Dzw75exB5hXkcoaJmslhSah1OtiNQYi8k2hjfEc20 3ffPsfk0maPd9eC+MZrW+jcayswu7tMfNkVmylAtiB+srx5Lld9c3t1X8yxKO4saeVod 74ZfsU0Mt2/NOOU3Hi889HLxP+p9Z8r1nYwzmdx09mJ1wfoMikCztiL0E7wQ8VNu8egA GTouH1TaMwIRUIM4Ak0he1fbmvOgLHZc2OZ5UxZPar/UXE0U+o3IhypaBYaiBHlfbxLD dZ8w== X-Forwarded-Encrypted: i=1; AJvYcCVqv7KYh2Mc4RspZgMuIt5TxMEZCPCbMuAoCcTWkzau/+mcUiU/t+43qYuhfuFMZ1zSNXn/PqDdag==@kvack.org X-Gm-Message-State: AOJu0YxHza958D+lqhGVTYwzy+8bDsh+elNj7VO0GJZ3eNK/mmgAoaOF S2r2zPbdwmOxHVs4FKPLLIXwAGsSFCizWt7urTRQBxAvkOt7/wfDsQ0hSU29Vtv/bHkHcbJg9hZ ncqu+G0ymTY9G6hXi6xSQyYag68I= X-Gm-Gg: ASbGncvqL4Rj8+dRc+v0gRF9IGYvrVkKBI6p+shsOBBHe4Qyj0x9VEjtaVa4gKUYwrd HgY3svg8PMzLtT0o9gpE+HP1IADT4DS6BmjypOGOSRghV2VEAnbtru39Od0Gk6QGy3ehRCPv8rN YFoxdxvpJ72jvC33BjkkN3+/bi0OaZ2gmNlgLkWw== X-Google-Smtp-Source: AGHT+IEukHCiv0n/JvQlnVisQ9NcpDWiULJyzG9ulukDSWHzwhVlDHhrNnmb5DBXCR9VMAyD/hB4fECHCyyHgmDkii8= X-Received: by 2002:a17:90b:548d:b0:30a:4c29:4c9c with SMTP id 98e67ed59e1d1-30a4e55e265mr1047516a91.6.1746137426343; Thu, 01 May 2025 15:10:26 -0700 (PDT) MIME-Version: 1.0 References: <20250418174959.1431962-1-surenb@google.com> <20250418174959.1431962-9-surenb@google.com> In-Reply-To: From: Andrii Nakryiko Date: Thu, 1 May 2025 15:10:13 -0700 X-Gm-Features: ATxdqUExe7F77b22Qk2zjYQs6MWIUlgh62k8s1E89d2IVsrOwf_AG7qP5_ERJcs Message-ID: Subject: Re: [PATCH v3 8/8] mm/maps: execute PROCMAP_QUERY ioctl under RCU To: Jann Horn Cc: Suren Baghdasaryan , akpm@linux-foundation.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, david@redhat.com, vbabka@suse.cz, peterx@redhat.com, hannes@cmpxchg.org, mhocko@kernel.org, paulmck@kernel.org, shuah@kernel.org, adobriyan@gmail.com, brauner@kernel.org, josef@toxicpanda.com, yebin10@huawei.com, linux@weissschuh.net, willy@infradead.org, osalvador@suse.de, andrii@kernel.org, ryan.roberts@arm.com, christophe.leroy@csgroup.eu, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: AF84C140003 X-Stat-Signature: fjbq85shco4kiyunpsa5rp3ycah3emm3 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1746137427-232175 X-HE-Meta: U2FsdGVkX19myGS+5KgQrXXc/LYVk0qcpGW/M4vgpSvgv10fn41Cny6zTfHcDg9nIytMEElc2VvsGNWNYqpWqlIEsOTvBCtkgRgVITciAbQm297ESbJe2mEhrbXTMKwQx7hmiWS7AKHxqqEIjcJWxxhbfKc5pSktKBykY5yy6FE2PH1sisLJsHbv6OR1pwV7cApmafhJaa5MZ4uaj34NcbbieMuryCNLubCQlj3zeHd9z03fSS3pXjbs7F0JUCzcP9MPXXVXx1ztnWY/3A8STktCt55dGLVncjUZjDaBsq4x+7tdNjVoEMIrHaPXJOjPnlzBS011lEWQwrXVuEKOBSwnbvSVhpa1kxrkKx/ZamO1h24BrQ0aKwlVXvKWx1p8o7KOn2eU0xmypRhgkGbrHm5KQeT8sCHVM9gRynPrAVMGogWmjDjJGsELzauJBjNo6H97sVPVI6ZXSYcH8TXHoGbdzXo1GCTM4uEdLmiue3DLdEiC04A3gYtkT0dLQ3jwK1FsFtB+k7U06OewTwnXKQtNJrG0wMbV3FdLLXNJLGwJZFMo5d4YDUwfTmpuGDaqu8kcZNSkySucPPRros3bNGyP3/YRtSUynjgeQylE7BfS/O3Hw8tY7u371W3LPpqHvAIVpsxq+6Gg6ivq0zD33O0kgKMj56+sRXPbAl3+ZDJgH42SQK9DQeJtPV3kvt+FFOsfJ4oYRVtY/kEgtTfSKNvGFk2DG4i+GZdCEMEm2fToJdy7q8WQctC0AyOBV/6CPgjQ+ryL9F33Rn/R4M3RDHNlks9F98uQYwTi7v6DbQxR2QpKo22Jli3fe7VOS60I6Bawk4jIdwZfC6UAM/bU1CCsn+Kc6y91m71oMGNUPaB5sGlzvVZYFuVCb3YToNirdC+zKTHYlLcjS2EDUIAYPCcUy9Lat+sBjqXp/Tk7lYj4h9OGor8hUQBKSy1Vr5h6/V9oZdCHv30jGOdl4I2 er3sG5uB injAYSZhb3iDNM95Dh3BX+oidenY7HDyBRs+h/tNM9vncdOX188MFg5BNchtXpKcb35sJWY5bssQPZH2Oh9iRGcJDJiFexjjmZ2ghZ/SURChNMiCkaQj/zrBFXi/kEOMGT1+o5Bs+8CX/segZFoGVy5cKrkZoxUp9ZK5TeimN953osTe5vu5e86NcwCQ4NSniSL2LO8vXslLPjSNerb1toO0XHI3FqzMJ6VS9VuAQOeEq3VxiXSUMxnXeQXd5iDG03+SHVWKwT06q2V2EPN7qEW583JgXw6xaMgUvRWPV3kM2BUt4E23vvY8ZCFwed+nXWzrx4i6baU3sJcHRF7ae6tpQSicYN5ovh06lU94RZzERmncmiisei06FEIVfMnnw69Eg1cq0la0+JTNBgrZ0+YIMVgy6Ge6TEy2MnpyGsVKH+7TX8zlv09f9PZpLm57Bk9Wf X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Apr 29, 2025 at 10:25=E2=80=AFAM Jann Horn wrote= : > > On Tue, Apr 29, 2025 at 7:15=E2=80=AFPM Suren Baghdasaryan wrote: > > On Tue, Apr 29, 2025 at 8:56=E2=80=AFAM Jann Horn wr= ote: > > > On Wed, Apr 23, 2025 at 12:54=E2=80=AFAM Andrii Nakryiko > > > wrote: > > > > On Fri, Apr 18, 2025 at 10:50=E2=80=AFAM Suren Baghdasaryan wrote: > > > > > Utilize speculative vma lookup to find and snapshot a vma without > > > > > taking mmap_lock during PROCMAP_QUERY ioctl execution. Concurrent > > > > > address space modifications are detected and the lookup is retrie= d. > > > > > While we take the mmap_lock for reading during such contention, w= e > > > > > do that momentarily only to record new mm_wr_seq counter. > > > > > > > > PROCMAP_QUERY is an even more obvious candidate for fully lockless > > > > speculation, IMO (because it's more obvious that vma's use is > > > > localized to do_procmap_query(), instead of being spread across > > > > m_start/m_next and m_show as with seq_file approach). We do > > > > rcu_read_lock(), mmap_lock_speculate_try_begin(), query for VMA (no > > > > mmap_read_lock), use that VMA to produce (speculative) output, and > > > > then validate that VMA or mm_struct didn't change with > > > > mmap_lock_speculate_retry(). If it did - retry, if not - we are don= e. > > > > No need for vma_copy and any gets/puts, no? > > > > > > I really strongly dislike this "fully lockless" approach because it > > > means we get data races all over the place, and it gets hard to reaso= n > > > about what happens especially if we do anything other than reading > > > plain data from the VMA. When reading the implementation of > > > do_procmap_query(), at basically every memory read you'd have to thin= k > > > twice as hard to figure out which fields can be concurrently updated > > > elsewhere and whether the subsequent sequence count recheck can > > > recover from the resulting badness. > > > > > > Just as one example, I think get_vma_name() could (depending on > > > compiler optimizations) crash with a NULL deref if the VMA's ->vm_ops > > > pointer is concurrently changed to &vma_dummy_vm_ops by vma_close() > > > between "if (vma->vm_ops && vma->vm_ops->name)" and > > > "vma->vm_ops->name(vma)". And I think this illustrates how the "fully > > > lockless" approach creates more implicit assumptions about the > > > behavior of core MM code, which could be broken by future changes to > > > MM code. > > > > Yeah, I'll need to re-evaluate such an approach after your review. I > > like having get_stable_vma() to obtain a completely stable version of > > the vma in a localized place and then stop worrying about possible > > races. If implemented correctly, would that be enough to address your > > concern, Jann? > > Yes, I think a stable local snapshot of the VMA (where tricky data > races are limited to the VMA snapshotting code) is a good tradeoff. I'm not sure I agree with VMA snapshot being better either, tbh. It is error-prone to have a byte-by-byte local copy of VMA (which isn't really that VMA anymore), and passing it into ops callbacks (which expect "real" VMA)... Who guarantees that this won't backfire, depending on vm_ops implementations? And constantly copying 176+ bytes just to access a few fields out of it is a bit unfortunate... Also taking mmap_read_lock() sort of defeats the point of "RCU-only access". It's still locking/unlocking and bouncing cache lines between writer and reader frequently. How slow is per-VMA formatting? If we take mmap_read_lock, format VMA information into a buffer under this lock, and drop the mmap_read_lock, would it really be that much slower compared to what Suren is doing in this patch set? And if no, that would be so much simpler compared to this semi-locked/semi-RCU way that is added in this patch set, no? But I do agree that vma->vm_ops->name access is hard to do in a completely lockless way reliably. But also how frequently VMAs have custom names/anon_vma_name? What if we detect that VMA has some "fancy" functionality (like this custom name thing), and just fallback to mmap_read_lock-protected logic, which needs to be supported as a fallback even for lockless approach? This way we can process most (typical) VMAs completely locklessly, while not adding any extra assumptions for all the potentially complicated data pieces. WDYT?