From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31BCBC3ABA5 for ; Tue, 29 Apr 2025 17:21:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9B5996B0006; Tue, 29 Apr 2025 13:21:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 966316B0007; Tue, 29 Apr 2025 13:21:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 82E026B000A; Tue, 29 Apr 2025 13:21:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 654DF6B0006 for ; Tue, 29 Apr 2025 13:21:05 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id EF45D120C4E for ; Tue, 29 Apr 2025 17:21:06 +0000 (UTC) X-FDA: 83387746932.18.85C4488 Received: from mail-ed1-f51.google.com (mail-ed1-f51.google.com [209.85.208.51]) by imf12.hostedemail.com (Postfix) with ESMTP id 044E240002 for ; Tue, 29 Apr 2025 17:21:04 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=XALEcbgT; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of jannh@google.com designates 209.85.208.51 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745947265; a=rsa-sha256; cv=none; b=DhIUe1JxECIGkWOd6xA4T6VB+5QalAcUoPZuZ8otk39CYBAwdZ6X18jPhIl2HqE8OrJ0p3 se9QAMf2nbfVk+mNtbgz6KPi75OMn6EgVN5xD9diaBUM7liwHeQTqoqmV4n0oqPeHObFFR v2RD+UxTMTxuvFPMLta49wpo9Ho60PQ= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=XALEcbgT; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of jannh@google.com designates 209.85.208.51 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745947265; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lZzRObcdHr7PQq7bfSd5UhNrj/W+U/lLynlTJ+ckjFI=; b=UmEABqEri3uP66Je/sLSYfjxblCDFzkGW0cySRCfiGFVx3C6sfZ0/jHTF0txrkksLH3g1c UBpw8O7n8PFTl6QnQpvo81aXVK2Lf80cJD8zevw2b0+SLnOLS1g+i8ty4m3gbUK26fKOx2 jnyscUrQL1hA78EbOi3FR6al1QzPnwA= Received: by mail-ed1-f51.google.com with SMTP id 4fb4d7f45d1cf-5f438523d6fso3431a12.1 for ; Tue, 29 Apr 2025 10:21:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1745947263; x=1746552063; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=lZzRObcdHr7PQq7bfSd5UhNrj/W+U/lLynlTJ+ckjFI=; b=XALEcbgT3lxxPKz7IPg7tvs657Apjesl7mKaRJ8EoZIY3Osqkf23JtPU6cJ7Dvrkdq oh/NGHLgFqrO3mOmjYmHM0PyuxsuW0QmzVAroEvfk/mUgHtLc2nnrIumT98Kbos2jpSz HTd8pFoSlf8i+s7TLlpQ/VtcDz7JCiPCQhLrtN+Igkc5HYN/mAZYyOAJs5UyFbKh2/xy L0+HRDOx6OJlwYwJb0GgIkx61GNIiJiQP9Py99ejpbqMbUL8zaG8gFP7/0dSPix//Cfc TY9q6uC81lKAiujVvwofCmLdgAALPwi1tXwNPzrnvb9wNyDUb/q2lNaeMujPTfqlHEhv Vnhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745947263; x=1746552063; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lZzRObcdHr7PQq7bfSd5UhNrj/W+U/lLynlTJ+ckjFI=; b=DtRDiuixv+ixpSkikZyJOqOTKw/mR+B+BPELc6EVFHgHPsLS8nBbw5WIUwS1s4uT/C 1Lif5Q4oGvWhx8cpkL+sqvI0+fkbbHUVabMkFfjoZ06CL5lGcKUD8Fsr05Cv8dyn+bYx Kqtbxek/1eUIUzLGFokymXjHSYBnYNe8JuiZS5KkdGeFPgWwilRw2Akv2DXgjETDryFM 4bBnJlv0sUsq+ib56LdxnnPcRVGdhl9U5RIFYHH7kjBp4zxQuZ5K2wvCdTs75COGXUMd o6Xb2YRWPb+UTmJ40mmvdwK4SinRspjRU2nLFiCi9NtWLQxkGV3+YAnlEori6oUlEcrb UIww== X-Forwarded-Encrypted: i=1; AJvYcCUmYI/AJcqfqvdS5UC4+0dn4atRNU4i0KCIjSp/R5iS2iLNhF4qtGiWXhWn3umBP5JfP5gsiuaP2Q==@kvack.org X-Gm-Message-State: AOJu0Yw5qAivpaZE9m0mdXvEOh/LELbWOnzBV8bBygtlJ++e5QpYT7RW J3QZi6KEWa8x+csLWxeOErcU8dyiSjO1V9jmooTtd+thJtDQf7hjffYoSHC+8AxByO2Ro6hLYzx cj1hXIx2PC84mANn7/Q6ha6omRWkIctEhp6Gh X-Gm-Gg: ASbGncvvmuvjsdWc7dzxIxE4EZGl8s2c456vbpXVW91DgUWlnH2aRUFUVc6hiQXJG8f /cswVqmaIkTQtl8IQSH5r5U4AaIZkHQUaL5AQDyhCJn9pPjfCAaJZ/vXLQCB1BIQYEDokQ1Ui2B Aje/J1h89mV0U51iTAwZRoxp5V+nyVjJVDpefucf3/u8+D9VSbeQ== X-Google-Smtp-Source: AGHT+IHXiqqyy83qaPZfB+md16M9rNTjn8jbT8RqafBiRfGVmWl6YPMy5nKNP8fe8EziQdI0GNykmhELD881HGMp61Y= X-Received: by 2002:a05:6402:4008:b0:5e4:afad:9a83 with SMTP id 4fb4d7f45d1cf-5f839346e96mr181777a12.2.1745947263207; Tue, 29 Apr 2025 10:21:03 -0700 (PDT) MIME-Version: 1.0 References: <20250418174959.1431962-1-surenb@google.com> <20250418174959.1431962-8-surenb@google.com> In-Reply-To: From: Jann Horn Date: Tue, 29 Apr 2025 19:20:26 +0200 X-Gm-Features: ATxdqUFzen8Bz6q3XA569qY44VOL0_qRNo7JRo7XEkTEhlP9Kc_3Dw1bzLR5XTc Message-ID: Subject: Re: [PATCH v3 7/8] mm/maps: read proc/pid/maps under RCU To: Suren Baghdasaryan Cc: Al Viro , brauner@kernel.org, linux-fsdevel@vger.kernel.org, akpm@linux-foundation.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, david@redhat.com, vbabka@suse.cz, peterx@redhat.com, hannes@cmpxchg.org, mhocko@kernel.org, paulmck@kernel.org, shuah@kernel.org, adobriyan@gmail.com, josef@toxicpanda.com, yebin10@huawei.com, linux@weissschuh.net, willy@infradead.org, osalvador@suse.de, andrii@kernel.org, ryan.roberts@arm.com, christophe.leroy@csgroup.eu, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: 044E240002 X-Rspamd-Server: rspam04 X-Stat-Signature: crx4q1dj8tiwbi8e4y4pdeu3wdst9zom X-HE-Tag: 1745947264-219579 X-HE-Meta: U2FsdGVkX1+v/WTWRhghPREORrzlm0BA/RjfEQr4pRBIbzPhnBF7HjtG+CUr2E1SZCKJmgYZx53/wUhjOhyaMT5z/av1cl0VFGSrI/FjH6v7+HjfdNskZ0uJP4+fU9RbfVRRSrLHmW5rdXLAD6V7Ykpn8I0IHxk20uzlDpH0Xc7iwXZYZqRn+/mrND7dQ/pkdS4rjfbcY9MEXdRodk+RRWXvwtb+97+uc0b4s0nNSX7FAv7VNdMwB7y8ZBew1igNdT33JhVIj3FFQOlrg9t1s+R/JDLN0pguZMd/mihFk9NRhw9BnTsLqqRvFnmDrOseVl8LDWgm0u+jXPOZFbsmZjXUsmBSL0EnNOPTLNAhIiiuqmiFTbECwbHY3YIQDACvoQY6dV66CqGXd/29ckaKlCyP2ih17n+ZkXHFHtYkk2BCrovvsbpKy3wEAUTaQYOfVXIwzcnbF95tHTCEPVpO6beyOKHFaqKKQZQaSyWulJnj55zrTKCPKtHRa4JNPQYLpqD45TVGZnXv8xZejxrRkVk8VGswe+tX3iagJCoAYH+JpRUTMmiPQsihPyC2+5s1Pfsa5NHpEMCHnHsYLeGFHnkPO7rIohAZeLlaezx45oSYiRqTQQBt8YHuRHc5XfoTa8R33VmcgULT7U4WHagFPSUY697ePLSzxu6l/chFo5eTlUdGOheOeg7dVhNKkPFrFWqBimFM5+D1KqT58bNfIkPzhqdkhkAQWVJj1h/U5Yty6/jXCWEqRbIwE4oQLKuhQKkWDbRLGg/hopukfcOF4NGrPBXNtRfh+YbuYVyCDI6I8OTLoKYsly0rQFiYqnHmpkRdsD0JEA1uBJEQFdJRbUY7qwvCY67b4StQDPT8qaTX7aA8LTwsjJfJJGum9P0bETPHHX4PdjlLomRJuUKfJW2lcjsmQgehM6xuYe9PUeoP48sVTrwgEzktVUIheQU6AmDWdOy2K1HOoj7gICE Um6H7bKm vc0aRoZvbX9WyiiruI+Lu/+BSYjxQ0yKpvUAsyW3Q6wxNJAXOQ+08EzQbSrKu6LvoCZiWtvR0R466OK40/0NmPZG//x1SY7uSjLcTAuDt3PJgmG1TyFluZCBWN1b2ScvHu3hNdrr7lwagklDFhjdU6DDwAQNvpbccaBpobgrrSVZ0VEUH2B02M3LCZlrOlQHPbbXtwPCbopLhpWepSsrV+WlsQvt9fXzRUokz X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi! (I just noticed that I incorrectly assumed that VMAs use kfree_rcu (not SLAB_TYPESAFE_BY_RCU) when I wrote my review of this, somehow I forgot all about that...) On Tue, Apr 29, 2025 at 7:09=E2=80=AFPM Suren Baghdasaryan wrote: > On Tue, Apr 29, 2025 at 8:40=E2=80=AFAM Jann Horn wrot= e: > > On Fri, Apr 18, 2025 at 7:50=E2=80=AFPM Suren Baghdasaryan wrote: > > > With maple_tree supporting vma tree traversal under RCU and vma and > > > its important members being RCU-safe, /proc/pid/maps can be read unde= r > > > RCU and without the need to read-lock mmap_lock. However vma content > > > can change from under us, therefore we make a copy of the vma and we > > > pin pointer fields used when generating the output (currently only > > > vm_file and anon_name). Afterwards we check for concurrent address > > > space modifications, wait for them to end and retry. While we take > > > the mmap_lock for reading during such contention, we do that momentar= ily > > > only to record new mm_wr_seq counter. This change is designed to redu= ce > > > mmap_lock contention and prevent a process reading /proc/pid/maps fil= es > > > (often a low priority task, such as monitoring/data collection servic= es) > > > from blocking address space updates. > > [...] > > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > > > index b9e4fbbdf6e6..f9d50a61167c 100644 > > > --- a/fs/proc/task_mmu.c > > > +++ b/fs/proc/task_mmu.c > > [...] > > > +/* > > > + * Take VMA snapshot and pin vm_file and anon_name as they are used = by > > > + * show_map_vma. > > > + */ > > > +static int get_vma_snapshot(struct proc_maps_private *priv, struct v= m_area_struct *vma) > > > +{ > > > + struct vm_area_struct *copy =3D &priv->vma_copy; > > > + int ret =3D -EAGAIN; > > > + > > > + memcpy(copy, vma, sizeof(*vma)); > > > + if (copy->vm_file && !get_file_rcu(©->vm_file)) > > > + goto out; > > > > I think this uses get_file_rcu() in a different way than intended. > > > > As I understand it, get_file_rcu() is supposed to be called on a > > pointer which always points to a file with a non-zero refcount (except > > when it is NULL). That's why it takes a file** instead of a file* - if > > it observes a zero refcount, it assumes that the pointer must have > > been updated in the meantime, and retries. Calling get_file_rcu() on a > > pointer that points to a file with zero refcount, which I think can > > happen with this patch, will cause an endless loop. > > (Just as background: For other usecases, get_file_rcu() is supposed to > > still behave nicely and not spuriously return NULL when the file* is > > concurrently updated to point to another file*; that's what that loop > > is for.) > > Ah, I see. I wasn't aware of this subtlety. I think this is fixable by > checking the return value of get_file_rcu() and retrying speculation > if it changed. I think you could probably still end up looping endlessly in get_file_rcu()= . > > (If my understanding is correct, maybe we should document that more > > explicitly...) > > Good point. I'll add comments for get_file_rcu() as a separate patch. > > > > > Also, I think you are introducing an implicit assumption that > > remove_vma() does not NULL out the ->vm_file pointer (because that > > could cause tearing and could theoretically lead to a torn pointer > > being accessed here). > > > > One alternative might be to change the paths that drop references to > > vma->vm_file (search for vma_close to find them) such that they first > > NULL out ->vm_file with a WRITE_ONCE() and do the fput() after that, > > maybe with a new helper like this: > > > > static void vma_fput(struct vm_area_struct *vma) > > { > > struct file *file =3D vma->vm_file; > > > > if (file) { > > WRITE_ONCE(vma->vm_file, NULL); > > fput(file); > > } > > } > > > > Then on the lockless lookup path you could use get_file_rcu() on the > > ->vm_file pointer _of the original VMA_, and store the returned file* > > into copy->vm_file. > > Ack. Except for storing the return value of get_file_rcu(). I think > once we detect that get_file_rcu() returns a different file we should > bail out and retry. The change in file is an indication that the vma > got changed from under us, so whatever we have is stale. What does "different file" mean here - what file* would you compare the returned one against?