From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 221BFC369C2 for ; Tue, 22 Apr 2025 22:49:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E3F406B0022; Tue, 22 Apr 2025 18:49:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DEE386B0023; Tue, 22 Apr 2025 18:49:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CDC926B0024; Tue, 22 Apr 2025 18:49:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id B15AC6B0022 for ; Tue, 22 Apr 2025 18:49:07 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id EE9451621A5 for ; Tue, 22 Apr 2025 22:49:07 +0000 (UTC) X-FDA: 83363171934.27.0829D6A Received: from mail-pg1-f179.google.com (mail-pg1-f179.google.com [209.85.215.179]) by imf20.hostedemail.com (Postfix) with ESMTP id 16B8B1C0007 for ; Tue, 22 Apr 2025 22:49:05 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="XMoY/ksF"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.215.179 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745362146; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=b87Z6hdoHcNORRcYRXz71HmcfEiLW2krrwaT/icFFxI=; b=cyk3sxURZGSnfP0s6UKg7E8f/hmSVatWWt5IbVewcWNCcUwHhcobVhnSnmjkKL6kYTOYsE Y0pLucY5qYogzV1pfRn7ekAZudTUI3KoCKy13CHbrrCubBWAYP9ZHuds8O+Z3bZhbbHtYe aTgBQySncIBYLyHQUl/+7RTLBcHQJi4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745362146; a=rsa-sha256; cv=none; b=KM8R5+10jMrDFDpKh7xK0yb4Dr/LCG2dwLYIwdVz4m128kwxXrcoM+tWIpEUUJbi6cOiAR 52wIDmt0uZmpoXM65KeE3Hc+bK6q2hTua6mNk6vjh6fn06E5s+BRqcm3qPK1o2YWELZ0dJ P/52F0ues28ZTiqRWh7T90x7jytDbtU= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="XMoY/ksF"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.215.179 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com Received: by mail-pg1-f179.google.com with SMTP id 41be03b00d2f7-af9925bbeb7so3767117a12.3 for ; Tue, 22 Apr 2025 15:49:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1745362145; x=1745966945; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=b87Z6hdoHcNORRcYRXz71HmcfEiLW2krrwaT/icFFxI=; b=XMoY/ksFtbjFGCXsGo7NyefsB6MqZWa6UvCsEL/+86bO5cWRTdAaeG0noE7jTEtWO7 3IHe5mda+XW3Z18rOcPyHPhRJwcG5OnbkufjmL9F5oKkKGWo3k04ElgA4fIeZKdXgiwZ 7l4znNbp81w1dXumtHrGX9LN4u32q5DtEbp+qCV/5XkMKOfFh4G3SDHI2ahtCDu58ksG rmJMeA15TjJTUbSfsT159bXC4WX2tMQEJjghQID4IuGugozz+BBS2oTPpIyRYv1SZ26o FtZGIRiSbzf0bgX+nBOjD+usYRUm4rDN8rraU/3pEC5EkXaV1e4mVkF3iVwai4WZOOLB NzIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745362145; x=1745966945; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=b87Z6hdoHcNORRcYRXz71HmcfEiLW2krrwaT/icFFxI=; b=DfoPH42nytuwlsU5zNoMayfDcBeikvT+3+jIOwWacHkP/FlVTLKT/p/sJzAsKxtASx MRN3dse+eksjAGHYuhlM+vYhVSCp+18B4ddxDSM8odFvPpays59ymQShvRWGYHy8N4o0 lC9YOWayK+RmF2gHsJZlOQoOb8A2TfsAn9l1Yr1V8iu31b0MO7xPx7iLrQU94GOu6+74 nl1xUuu1vE96EBkEUprhT/fKEqSq+GZ6NMOmsybPoso+oP7JloM6x0VBSwKZVG5Lb87G Wwrgj6It3Id6jsaO1pHJSL8Kw30bZSLVaQYn6JWazWk0NQ/l6eyTSr5n/5rPhrwphB31 ZqYg== X-Forwarded-Encrypted: i=1; AJvYcCVFUbfRn1HcZNDymfaWkHUxvXb7HAQ4XA2xnI00kWa3r24xuAVtq7NRq9VD15mGtwziy78h9os/wA==@kvack.org X-Gm-Message-State: AOJu0YxlunFDhvBn8IfeNahO0IW/v8zNBDqLciCvLmtGwEQyvzdO2m/i g3WMn3LydljkdJg9GEFMZc6+fFxYmuYQJKOmD+1KUiYOnR+e3ik7dIRtlixjqzSfpa0ldluZ8ep UcmH6SFtCET1eWy2jhY+Fv0ldzfA= X-Gm-Gg: ASbGncv8h5mKRJFsANOb5Sz/7qsm8R16K2QtFXK9RTHHEzre/MQk6OYJGbzLoC8SnAd IEpIx5WOqdN3ABtoxeicELmUvoFtcAZ1Jku2kOJMOMYVEbwV6bIsznbF5WC0W+/Inm2C4aOM5iy GSVP2ZHn6290BizjzKNEHjWlKXHu1yi1hsiqWokg== X-Google-Smtp-Source: AGHT+IEaXVqyjeo1USfmvtnc/v5tnImw/+t4sFUD76JPWeBRBPX+uZBJiLyql0KdJsA4I9vGgTzuEP00QGXPqvZHShw= X-Received: by 2002:a17:90b:4ecf:b0:2ee:693e:ed7a with SMTP id 98e67ed59e1d1-3087bccc3a5mr24129038a91.35.1745362144838; Tue, 22 Apr 2025 15:49:04 -0700 (PDT) MIME-Version: 1.0 References: <20250418174959.1431962-1-surenb@google.com> <20250418174959.1431962-8-surenb@google.com> In-Reply-To: <20250418174959.1431962-8-surenb@google.com> From: Andrii Nakryiko Date: Tue, 22 Apr 2025 15:48:52 -0700 X-Gm-Features: ATxdqUG4dz3OGN13sHC4CHKb-O7jfQGss6KMOy-5UBi2V9AZVLsJCVhdcja1Qzc Message-ID: Subject: Re: [PATCH v3 7/8] mm/maps: read proc/pid/maps under RCU To: Suren Baghdasaryan Cc: akpm@linux-foundation.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, david@redhat.com, vbabka@suse.cz, peterx@redhat.com, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, paulmck@kernel.org, shuah@kernel.org, adobriyan@gmail.com, brauner@kernel.org, josef@toxicpanda.com, yebin10@huawei.com, linux@weissschuh.net, willy@infradead.org, osalvador@suse.de, andrii@kernel.org, ryan.roberts@arm.com, christophe.leroy@csgroup.eu, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 16B8B1C0007 X-Rspam-User: X-Stat-Signature: o1nwrwkghbuxqs1593g7n8ax6iyd3915 X-HE-Tag: 1745362145-930041 X-HE-Meta: U2FsdGVkX1/EZPUNqFccp2fHz0fcZkp9aYWgH9CxL/iTK6CsK0up11HZN20A366P90gTskHRj5Pu9nd9DY/hVHMHEAQE8ZG4NtHLXGZgEhkgaRau9Xx0Ms2L94Dr88++WZ64tOMPBcVac8h1IPMdapPiwYWTT/8zARCMFaVMtAEDIPtbNAGdoE4b11lGcHFgAjUiyJF0YNsG5xoZqMtQNaDdRfN36XSE6CgLz62Pdkmj24G5MTviGH/1m2mDYVGnUbzHdrez6gB9ujzTlH+MTe3RB6/p9m2JPvRGaTZ7hR6b4ClWfjWU9Fg78YvA0F5VhdjkJfa9I7Opyk9ChpviSiqvRg2fXFPVB95au/klmnla6EyuXcaR4GFxmUNl7cqbWoQHr5XzmMP/AkoisjA+8KGLvUSpUUqGBFl31E9zGrQMQ4unEAOOKpAav2zR7wZq5tQ5urfC56eO6VHGDRqpEOqwk4Nb1uW1zWP0Zkgl6FAnsc0DBRl6oSKlVlxCMzVYF3kgBVCZZXAXXCrn9bls6D3zq5UJmYFkK+IDSIFwUigCnbp9j3l87qeBSeNSar74V162V/PnsvME8NnMI3Vn/S90YwAoQ8nrQbszEYZsHk4SE3Bk7dYWHARJgvOnpM5C0XlGlCkzgGbqFWUFK/5xBYHfCOYCbkrCHhm1nnBd6m2rtiXl2ZFfaS2x+BcScgJMtVDkn+qiY4XXpij/m5AD2yRStI5VuEueqPSo1QDH9zbXBO4YtOR9h2tdRBFc5tHYFZUtyhDNQO4L05n/Banthjc+afR4qt7K32kAuaqFw46vd2UqB/V7B/EFw6TdpgPHYwgdAC+uVEijR4hjfKw3HH5fI313R17sOkboR9F3j1uOoGTxA9znxxCZjh5Swj4X/EQXk75BSBibLt0teUjplSxXjTeBjfxJ+xGDACW5lUUw1Bxm4+HgFpWTPKqMFCR/dfA9KH8hglpPX466xr2 NX4bqiaA FiwyPLgCQGjpo9mJtvaA+bcNs+w/+qDe+ipeJV60+me3IlST/UfHFQTv13rcCc0O4g+9FZnioYQ3G2skduerMElop6pkw3rpzl90fZUE81QzCSPAEJLyzNCMYcSmnFM18ZdpfH5twtIQLRAS/BtNj4Cjg6tmsa0wOImqiQrQI8FJ+kaXWgmhtKqswDpxJf4ShqZcmpPOekGhPW3tFvsTrg/+L4XCihyOpPd0Qd2lcpPFKZiAod+kUo3ltCvj5EeOFiSEQOKpRn+24JgxgHWTe/E9lbz+VafI2+5uWBIO/CDcovCcc0uOUOgiJfkkYsLDUgZf4NlHXsiVIKG3o7izpst8v47TXrj5ekqAKouVoC9lS7KgxRJ9N8/leag== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Apr 18, 2025 at 10:50=E2=80=AFAM Suren Baghdasaryan wrote: > > With maple_tree supporting vma tree traversal under RCU and vma and > its important members being RCU-safe, /proc/pid/maps can be read under > RCU and without the need to read-lock mmap_lock. However vma content > can change from under us, therefore we make a copy of the vma and we > pin pointer fields used when generating the output (currently only > vm_file and anon_name). Afterwards we check for concurrent address > space modifications, wait for them to end and retry. While we take > the mmap_lock for reading during such contention, we do that momentarily > only to record new mm_wr_seq counter. This change is designed to reduce This is probably a stupid question, but why do we need to take a lock just to record this counter? uprobes get away without taking mmap_lock even for reads, and still record this seq counter. And then detect whether there were any modifications in between. Why does this change need more heavy-weight mmap_read_lock to do speculative reads? > mmap_lock contention and prevent a process reading /proc/pid/maps files > (often a low priority task, such as monitoring/data collection services) > from blocking address space updates. > Note that this change has a userspace visible disadvantage: it allows > for sub-page data tearing as opposed to the previous mechanism where > data tearing could happen only between pages of generated output data. > Since current userspace considers data tearing between pages to be > acceptable, we assume is will be able to handle sub-page data tearing > as well. > > Signed-off-by: Suren Baghdasaryan > --- > fs/proc/internal.h | 6 ++ > fs/proc/task_mmu.c | 170 ++++++++++++++++++++++++++++++++++---- > include/linux/mm_inline.h | 18 ++++ > 3 files changed, 177 insertions(+), 17 deletions(-) > > diff --git a/fs/proc/internal.h b/fs/proc/internal.h > index 96122e91c645..6e1169c1f4df 100644 > --- a/fs/proc/internal.h > +++ b/fs/proc/internal.h > @@ -379,6 +379,12 @@ struct proc_maps_private { > struct task_struct *task; > struct mm_struct *mm; > struct vma_iterator iter; > + bool mmap_locked; > + loff_t last_pos; > +#ifdef CONFIG_PER_VMA_LOCK > + unsigned int mm_wr_seq; > + struct vm_area_struct vma_copy; > +#endif > #ifdef CONFIG_NUMA > struct mempolicy *task_mempolicy; > #endif > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index b9e4fbbdf6e6..f9d50a61167c 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -127,13 +127,130 @@ static void release_task_mempolicy(struct proc_map= s_private *priv) > } > #endif > > -static struct vm_area_struct *proc_get_vma(struct proc_maps_private *pri= v, > - loff_t *ppos) > +#ifdef CONFIG_PER_VMA_LOCK > + > +static const struct seq_operations proc_pid_maps_op; > + > +/* > + * Take VMA snapshot and pin vm_file and anon_name as they are used by > + * show_map_vma. > + */ > +static int get_vma_snapshot(struct proc_maps_private *priv, struct vm_ar= ea_struct *vma) > +{ > + struct vm_area_struct *copy =3D &priv->vma_copy; > + int ret =3D -EAGAIN; > + > + memcpy(copy, vma, sizeof(*vma)); > + if (copy->vm_file && !get_file_rcu(©->vm_file)) > + goto out; > + > + if (!anon_vma_name_get_if_valid(copy)) > + goto put_file; Given vm_file and anon_vma_name are both RCU-protected, if we take rcu_read_lock() at m_start/m_next before fetching VMA, we shouldn't even need getting/putting them, no? I feel like I'm missing some important limitations, but I don't think they are spelled out explicitly... > + > + if (!mmap_lock_speculate_retry(priv->mm, priv->mm_wr_seq)) > + return 0; > + > + /* Address space got modified, vma might be stale. Re-lock and re= try. */ > + rcu_read_unlock(); Another question I have is whether we really need to copy vma into priv->vma_copy to have a stable snapshot? Can't we just speculate like with uprobes under assumption that data doesn't change. And once we are done producing text output, confirm that speculation was successful, and if not - retry? We'd need an API for seq_file to rollback to previous good known location for that, but that seems straightforward enough to do, no? Just remember buffer position before speculation, write data, check for no mm modifications, and if something changed, rollback seq file to last position. > + ret =3D mmap_read_lock_killable(priv->mm); > + if (!ret) { > + /* mmap_lock_speculate_try_begin() succeeds when holding = mmap_read_lock */ > + mmap_lock_speculate_try_begin(priv->mm, &priv->mm_wr_seq)= ; > + mmap_read_unlock(priv->mm); > + ret =3D -EAGAIN; > + } > + > + rcu_read_lock(); > + > + anon_vma_name_put_if_valid(copy); > +put_file: > + if (copy->vm_file) > + fput(copy->vm_file); > +out: > + return ret; > +} > + > +static void put_vma_snapshot(struct proc_maps_private *priv) > +{ > + struct vm_area_struct *vma =3D &priv->vma_copy; > + > + anon_vma_name_put_if_valid(vma); > + if (vma->vm_file) > + fput(vma->vm_file); > +} > + [...]