From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA48EC369DC for ; Tue, 29 Apr 2025 18:04:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3D1336B000C; Tue, 29 Apr 2025 14:04:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3594E6B000D; Tue, 29 Apr 2025 14:04:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1D5886B000E; Tue, 29 Apr 2025 14:04:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id EF1636B000C for ; Tue, 29 Apr 2025 14:04:30 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id B075A1A0906 for ; Tue, 29 Apr 2025 18:04:31 +0000 (UTC) X-FDA: 83387856342.03.16112AA Received: from mail-qt1-f182.google.com (mail-qt1-f182.google.com [209.85.160.182]) by imf02.hostedemail.com (Postfix) with ESMTP id ABECC80014 for ; Tue, 29 Apr 2025 18:04:29 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=KIRFiGrB; spf=pass (imf02.hostedemail.com: domain of surenb@google.com designates 209.85.160.182 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745949869; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bGFBOIYiVFMA2gosF4fAzdDOnh60Gs1cEA/UxI1BGcI=; b=j77n227SpU4q2AdjMa9D/Xr+SawSDERQTZ1WkFKnz55gl0wgyVYY1er+uZQ6rF9pNUm2Y/ RJPKowbWGu3VyfYlxyr1ZQXut5nqzMrIs25AE0SZrKtwTKFCZ64bSKY8vieFKwvGgoVS6R 3XtYsxQ/CGXssNB1fKonFdJr/wZ+DVA= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=KIRFiGrB; spf=pass (imf02.hostedemail.com: domain of surenb@google.com designates 209.85.160.182 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745949869; a=rsa-sha256; cv=none; b=t6OkGU8ZFtWOo4IrKOo+C9l2nweL4BRFJfBHGeGoEo6S/SsuRnVdwE0EqLlndhn+4Yp6Mm 9y/2QYPR+pBzu4vArKGAOstmZm7ghS9/kRU4PxUJuBkuhAQkARddfM+DS3PCMjR+Bl6qU9 kuq6Rga2/1dzYFqXp/1Cku7WvTHMxDo= Received: by mail-qt1-f182.google.com with SMTP id d75a77b69052e-47666573242so381301cf.0 for ; Tue, 29 Apr 2025 11:04:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1745949869; x=1746554669; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=bGFBOIYiVFMA2gosF4fAzdDOnh60Gs1cEA/UxI1BGcI=; b=KIRFiGrBqbWJo14Emnp07V2w+LloXr0b3cHAA6/w9VjVNx4HFC5M5KYrCnw30JQcHu v6zRP886gbMjws0DU5nj7VxzQ20Pxyg8TpneCz2P+XKkFS8NpgteuFhrKGMlcpaGyK58 7SkA4aJCY6L6pJA1bVBVfEjWtCvqupfaasjrr55/zjsnxzXpx0tIidUg+ifKKXVjnGBc PKlEcNStfzUoP5u+bTpNSFuSoNLWz8V557pXd3pAudwSoPI+VUxEQOBI/RAxBaB3iQ1s fDUKr5+6OU1IP2cChCttvi277eH932F2qWHiRZ0/OhnM5VBurfqKjgfdQa3lLqHrON1N Uz0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745949869; x=1746554669; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bGFBOIYiVFMA2gosF4fAzdDOnh60Gs1cEA/UxI1BGcI=; b=piIjiYLXhWVwri+N3GEsUfS4VDg+fVhMC3Rie3R7Vk/iyhwqJ4ysQIyQDU0Lo9AGvY Ey2KqeXkEri33ygzCFhu9y6mrERpXS9FVAqS82uYlPSj4256cJiV91DdqwVWxBloxDaq 4kyd2/4d/2E8MPLZ+sE5d0nqicIWoqvNeBSsClz9sSw9U64hvA4VWVTOsefR8Qb1lOvU eZdGHGzKG9nl4hgsB0b3ytyMe2yp9rpaJincJFMOVTx3T6tMMJFXCfssrHVk75B4EWAi l2WTlfpF9F8TuWn1totxFs3RcQL3mIAt3oF0XZI0kgyIm8EPWsjvAAvVYUUVmR7+HMFa 7oqA== X-Forwarded-Encrypted: i=1; AJvYcCXb903Q32L/KUDYvMzj8HfVKydeyg1rdWab05Lu2qD33RTqTVXCqpwhiMLEZOzAveOhVTZUjXvDbA==@kvack.org X-Gm-Message-State: AOJu0YxJVh0lidlAy6FNz0wbKwwmkRsSP6LouCgB/trhcPkol7rtPiui mi0X8hVZZFESXL/XVzpAAuuCqlrWQ5m+dkgrXxTX6h7OuzeK9asO4+rhmQ5diwnbEezw68elxMI 9tJL7POEimp8ttWPxKTW0sLrZu3JmkzZ8EU0+ X-Gm-Gg: ASbGncu62rNgqfh911q4o814FfHCOh4CsT11EwScHrnLrZgFOTc2tuutt7IyqdLp7Z3 6NI8n1bS724opZ3xIkO1OkPzJujbe1bSyMNLXEHzKrV3r94lfMwaeR93uEb2wAlj6bY+dDPqJeV 2eBUmCD8YqhC9RPfZpIbHXmV/QqHoIqnoaJ09xgCy9JbLe7RWYP+PE X-Google-Smtp-Source: AGHT+IGTHdof2vv8tE6oOakqNkRD6qkaYwcU2GMDuR1dry0vZWexciL9KgeFFNdap4sTR6sGZPImbJQwIIPq3LvxJ3k= X-Received: by 2002:a05:622a:8e:b0:472:538:b795 with SMTP id d75a77b69052e-489bb19045fmr245541cf.22.1745949868234; Tue, 29 Apr 2025 11:04:28 -0700 (PDT) MIME-Version: 1.0 References: <20250418174959.1431962-1-surenb@google.com> <20250418174959.1431962-8-surenb@google.com> In-Reply-To: From: Suren Baghdasaryan Date: Tue, 29 Apr 2025 11:04:15 -0700 X-Gm-Features: ATxdqUEFK_c8BfZNtNDlAtU3sfwm14-hJ5pe2ZNDycq9_J7ILXRelmx3Box4ufQ Message-ID: Subject: Re: [PATCH v3 7/8] mm/maps: read proc/pid/maps under RCU To: Jann Horn Cc: Al Viro , brauner@kernel.org, linux-fsdevel@vger.kernel.org, akpm@linux-foundation.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, david@redhat.com, vbabka@suse.cz, peterx@redhat.com, hannes@cmpxchg.org, mhocko@kernel.org, paulmck@kernel.org, shuah@kernel.org, adobriyan@gmail.com, josef@toxicpanda.com, yebin10@huawei.com, linux@weissschuh.net, willy@infradead.org, osalvador@suse.de, andrii@kernel.org, ryan.roberts@arm.com, christophe.leroy@csgroup.eu, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: sqqdcoqx9mqb34en4j6jihrhbfdkkcjw X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: ABECC80014 X-Rspam-User: X-HE-Tag: 1745949869-255105 X-HE-Meta: U2FsdGVkX18hooJ82UASewi/U6xioDDWYpsVHB2t8oUQ0bINgLwBxpIZbMDadQw60zySLRGvkCeqWy5bPI+BMZzMup36ZFklad2pAH3WUn97p1Sj/VPgB8cQdg2/M7l7xqelQaiif1uwDW/pWuW4MHOTTbhw26TIkDm+ui3NVXVXsRbKkARMYptDEjJJgWWkC/nwwWmsfsjxDfZ+RAFnkv6NLQ4uGuc5QLNcgDXOu3hG3q/dOWAwPqJz1xgdw+Usr+XPVr31taMOXcUd8RsXdSobga7822y8pYYrafOrx6Sz4MERIZHZXOAlzfUl+g7A71/jMZUTEyQJEtg8dTEiJs9vPlPtkhwZz4mkB8ByxqnQeeqyk+6X91EFECJWMWo11Oi3RwhlsOCLPC+FKxlNI6eMYEkdmTwtKGGQR6jFJyhuggcj0PO5RLBI+WTzvk/6QjrOwrc1WSdq/4T4PHCjmnkC3lUVTvKbc7Gi5ilFyXt1LUwp7rOCU6mINQ+d2p/ZoFleqZsIsvWI9m4oryyrvWuBz5PSFEZIef2iDVowCBBqZxDExQF+q3a6jlhW6lnBwk6+49up8pW4H+2BiQGo7M+zRnLY1H5PXhZ0cR0GVkRIg+XdZEvTyTaui48N6H/S4slpEKelZ1ApKRianQ/8zEpZVymVGlXIykIJvOKqvUrgHQ0ActBZucbUCg7eWqd97VbEadwuAJGkmhz/UtcfQyLhacTEmOOfmW8jMOnO9GMGAwkElA54kUbI8eV8fcYNBIQALViVaLaNqofZkAXqHX6hC/WdSiIvhl8QiMf3U6vydmleOtEQCcXB6CAfJz6ALQLca+Jw++g2YwGXGyoNSKMbaoKqZzhjsKM8/upUbndUjlnQTpp9CQbuGg3fQ0oQoYTDAV2rGUlC6IlKN1M7wNNF1ZILyAtmH0FzHLSusrizSyHihuHLColR5mxn8ixjMAhVCGF6Yh1rtsm5s5P YnZ0zWjO gnGjUp5t0j1vTFVL6mD6VqWto3fGRWDpH3NIWcOvd/mk/t4fwfXF9ci4XFCRxohEBwXrqyWXuUmL/p759cjHTKuDVECXGsHaDOs4iF8uzwGSCN87OS6OvavovITdwdGFSDZ8DNFbgI6ec+Zzw2f2m9xeSS67yhtsIJP7vvafb8Cxy+CubZ729O5TS5ZrlsvoZE94cuzDgwB+vhWVssrz375+vklUwyg/VYE8n X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Apr 29, 2025 at 10:21=E2=80=AFAM Jann Horn wrote= : > > Hi! > > (I just noticed that I incorrectly assumed that VMAs use kfree_rcu > (not SLAB_TYPESAFE_BY_RCU) when I wrote my review of this, somehow I > forgot all about that...) Does this fact affect your previous comments? Just want to make sure I'm not missing something... > > On Tue, Apr 29, 2025 at 7:09=E2=80=AFPM Suren Baghdasaryan wrote: > > On Tue, Apr 29, 2025 at 8:40=E2=80=AFAM Jann Horn wr= ote: > > > On Fri, Apr 18, 2025 at 7:50=E2=80=AFPM Suren Baghdasaryan wrote: > > > > With maple_tree supporting vma tree traversal under RCU and vma and > > > > its important members being RCU-safe, /proc/pid/maps can be read un= der > > > > RCU and without the need to read-lock mmap_lock. However vma conten= t > > > > can change from under us, therefore we make a copy of the vma and w= e > > > > pin pointer fields used when generating the output (currently only > > > > vm_file and anon_name). Afterwards we check for concurrent address > > > > space modifications, wait for them to end and retry. While we take > > > > the mmap_lock for reading during such contention, we do that moment= arily > > > > only to record new mm_wr_seq counter. This change is designed to re= duce > > > > mmap_lock contention and prevent a process reading /proc/pid/maps f= iles > > > > (often a low priority task, such as monitoring/data collection serv= ices) > > > > from blocking address space updates. > > > [...] > > > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > > > > index b9e4fbbdf6e6..f9d50a61167c 100644 > > > > --- a/fs/proc/task_mmu.c > > > > +++ b/fs/proc/task_mmu.c > > > [...] > > > > +/* > > > > + * Take VMA snapshot and pin vm_file and anon_name as they are use= d by > > > > + * show_map_vma. > > > > + */ > > > > +static int get_vma_snapshot(struct proc_maps_private *priv, struct= vm_area_struct *vma) > > > > +{ > > > > + struct vm_area_struct *copy =3D &priv->vma_copy; > > > > + int ret =3D -EAGAIN; > > > > + > > > > + memcpy(copy, vma, sizeof(*vma)); > > > > + if (copy->vm_file && !get_file_rcu(©->vm_file)) > > > > + goto out; > > > > > > I think this uses get_file_rcu() in a different way than intended. > > > > > > As I understand it, get_file_rcu() is supposed to be called on a > > > pointer which always points to a file with a non-zero refcount (excep= t > > > when it is NULL). That's why it takes a file** instead of a file* - i= f > > > it observes a zero refcount, it assumes that the pointer must have > > > been updated in the meantime, and retries. Calling get_file_rcu() on = a > > > pointer that points to a file with zero refcount, which I think can > > > happen with this patch, will cause an endless loop. > > > (Just as background: For other usecases, get_file_rcu() is supposed t= o > > > still behave nicely and not spuriously return NULL when the file* is > > > concurrently updated to point to another file*; that's what that loop > > > is for.) > > > > Ah, I see. I wasn't aware of this subtlety. I think this is fixable by > > checking the return value of get_file_rcu() and retrying speculation > > if it changed. > > I think you could probably still end up looping endlessly in get_file_rcu= (). By "retrying speculation" I meant it in the sense of get_vma_snapshot() retry when it takes the mmap_read_lock and then does mmap_lock_speculate_try_begin to restart speculation. I'm also thinking about Liam's concern of guaranteeing forward progress for the reader. Thinking maybe I should not drop mmap_read_lock immediately on contention but generate some output (one vma or one page worth of vmas) before dropping mmap_read_lock and proceeding with speculation. > > > > (If my understanding is correct, maybe we should document that more > > > explicitly...) > > > > Good point. I'll add comments for get_file_rcu() as a separate patch. > > > > > > > > Also, I think you are introducing an implicit assumption that > > > remove_vma() does not NULL out the ->vm_file pointer (because that > > > could cause tearing and could theoretically lead to a torn pointer > > > being accessed here). > > > > > > One alternative might be to change the paths that drop references to > > > vma->vm_file (search for vma_close to find them) such that they first > > > NULL out ->vm_file with a WRITE_ONCE() and do the fput() after that, > > > maybe with a new helper like this: > > > > > > static void vma_fput(struct vm_area_struct *vma) > > > { > > > struct file *file =3D vma->vm_file; > > > > > > if (file) { > > > WRITE_ONCE(vma->vm_file, NULL); > > > fput(file); > > > } > > > } > > > > > > Then on the lockless lookup path you could use get_file_rcu() on the > > > ->vm_file pointer _of the original VMA_, and store the returned file* > > > into copy->vm_file. > > > > Ack. Except for storing the return value of get_file_rcu(). I think > > once we detect that get_file_rcu() returns a different file we should > > bail out and retry. The change in file is an indication that the vma > > got changed from under us, so whatever we have is stale. > > What does "different file" mean here - what file* would you compare > the returned one against? Inside get_vma_snapshot() I would pass the original vma->vm_file to get_file_rcu() and check if it returns the same value. If the value got changed we jump to /* Address space got modified, vma might be stale. Re-lock and retry. */ section. That should work, right?