From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC4F1C369DC for ; Tue, 29 Apr 2025 15:40:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EBB636B0007; Tue, 29 Apr 2025 11:40:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E69C16B000A; Tue, 29 Apr 2025 11:40:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D0C446B000C; Tue, 29 Apr 2025 11:40:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id B2A1D6B0007 for ; Tue, 29 Apr 2025 11:40:04 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id ACB57B8A6E for ; Tue, 29 Apr 2025 15:40:05 +0000 (UTC) X-FDA: 83387492370.13.440DA1B Received: from mail-ed1-f45.google.com (mail-ed1-f45.google.com [209.85.208.45]) by imf17.hostedemail.com (Postfix) with ESMTP id A398E40009 for ; Tue, 29 Apr 2025 15:40:03 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=qafrKw9N; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of jannh@google.com designates 209.85.208.45 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745941203; a=rsa-sha256; cv=none; b=X/H/rRrFRyugi0SWN3hCn8G2PfhnzhIxDOMO2Qq5YkeHojs1k95JJQo8Y7Qjy535fTKagj Pxqs/ADjclDrDdtvVMsdg6Zt2q+9twNbSujGxJH40msx9oy1OymgMm4hXyckvoZjubUzs0 3N3NUrSV0O5A1tjd82CtBnSOqW4XTR8= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=qafrKw9N; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of jannh@google.com designates 209.85.208.45 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745941203; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nrLjz0SO4NLZaGlZgmJ5ANLBwSUMQ+H6anEYSpKzKcw=; b=CfQvHY6efbG5mJTS0TnNV0bONMKv8+7rMnDtCOLNp0dbg88eIyiSHwqdoWMJsYCqiSbI2w etTxvnSARYU1D7qmD8OQq2OgLnmeOPZ3dlClurJgmkSKQCfT0AaEhpkz5tZw3TMIP1OP3w B52hOMfujdEKrNy66wBM7RqqnZKbcUQ= Received: by mail-ed1-f45.google.com with SMTP id 4fb4d7f45d1cf-5dbfc122b82so9152a12.0 for ; Tue, 29 Apr 2025 08:40:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1745941202; x=1746546002; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=nrLjz0SO4NLZaGlZgmJ5ANLBwSUMQ+H6anEYSpKzKcw=; b=qafrKw9N27WsdR46t+3PlcxdRt+OYHaci9ndc+3RtK83nusNDpmIpx/dpRijBoGyn2 VjZg3jr1nLjbuj9ymK1alC5O3fodDgLiOuWEX+u1amk553YFVXAAYNCTcaDex1WgPvjy FGhhCkTdkAAiZEv7DmBn+FP3xlSiPT+3pZ7TQt93sSQQszQwRCznGmGvXMQttZo3gCNP 6/q8QpyeQXysMdiUVo+G1P25MOOrflsHiQSLGN5ec9GohGjY2HWBUa+oRlSE20AAtMdi 0Aji2Pgo6NZV2+LorcmTNtccMsvsRLRxW787X59UdETHzsxWgw8SafO2mOMsStSZ8QHj fGyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745941202; x=1746546002; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nrLjz0SO4NLZaGlZgmJ5ANLBwSUMQ+H6anEYSpKzKcw=; b=ZT+IFomG5XOo3gIk/9um2wTnWGoEwR8SkdrncXeVw3nkgLvNDUI9OgBbR8vdTLb8e5 MH8BvH+6QAlZEz7X63RPgFRhvDQP+HqcQjnJNi/yYwtL8j/vbUPdsIytA1/F3QlNyRy9 aTOcX+jLrsIAlM/pJzpAYI1YRgg2qlcsqu7UTJD7qT/gYPAuahQWivLcgA0tGtmePqEl Bxi3YQfKsQ7gJMZNgbubFF0P0RInjX37GuLzRka+jpJ3fyRw7lkCTqD4A3qaL0OVPlOW +Mbqf2XWi1OoSZQ40jNsctgUm7r+8RLheQFrlCnr1RutRNl5usV8kVL3uuuzAfTGQDna y51A== X-Forwarded-Encrypted: i=1; AJvYcCW7eaA7GaPmMntgnS1KyWWAzoCx0QNik1SwmuTqQQemVNsPSiqv4PVstuz3kncgNmy1gNUnR/OFpw==@kvack.org X-Gm-Message-State: AOJu0YyZuYQcKplCOIGxOEN07t7Z++Ky98IXOaelsFKdsaE5zzlToi2/ 0am+rQWrYKHGBIQ8ai1MoM0u1MmQMcZhGlq3q5wNbR7WM3F6sAPJDH7HsTVRobN8FaMlq4PGqlD cqEDG/lsLoCNIGbcGaeuqek7FiyMmtsSJ/sti X-Gm-Gg: ASbGncuZC+1Gqj0gG3/OhuxIoRoWxFGy/sNd44GqIJmEPMI7bgJps33FZKoXxeuK19b WQM3067ZLX3e7dYvKEUdGWIj8E+58iUOBKEwF+u5FD/+FX7vobUUvac4IrOQ0HnJx2M6+eKH3x5 yf61NephxrbFwNeimnrekSRcY3s599QDLD2rUikolnNKZM5PVeGg== X-Google-Smtp-Source: AGHT+IGTmRrDj58386rjJZziDq63lizkFMzj1lu/PM65c5dMEelMGMumc3qFR5YFmPkjXr20u694i69kdm2X64XfN50= X-Received: by 2002:a50:8713:0:b0:5de:bcd9:4aa with SMTP id 4fb4d7f45d1cf-5f83c1b5a2bmr96410a12.3.1745941201725; Tue, 29 Apr 2025 08:40:01 -0700 (PDT) MIME-Version: 1.0 References: <20250418174959.1431962-1-surenb@google.com> <20250418174959.1431962-8-surenb@google.com> In-Reply-To: <20250418174959.1431962-8-surenb@google.com> From: Jann Horn Date: Tue, 29 Apr 2025 17:39:25 +0200 X-Gm-Features: ATxdqUE5eaCYmwGd2V3bpxMHfmJQqwTFsFKw7F4dJMhdfNXiMK8VuoXSkdfNDiE Message-ID: Subject: Re: [PATCH v3 7/8] mm/maps: read proc/pid/maps under RCU To: Suren Baghdasaryan , Al Viro , brauner@kernel.org, linux-fsdevel@vger.kernel.org Cc: akpm@linux-foundation.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, david@redhat.com, vbabka@suse.cz, peterx@redhat.com, hannes@cmpxchg.org, mhocko@kernel.org, paulmck@kernel.org, shuah@kernel.org, adobriyan@gmail.com, josef@toxicpanda.com, yebin10@huawei.com, linux@weissschuh.net, willy@infradead.org, osalvador@suse.de, andrii@kernel.org, ryan.roberts@arm.com, christophe.leroy@csgroup.eu, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: A398E40009 X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: m7i86ofbrkb4manmijj3rq7ca77e5az5 X-HE-Tag: 1745941203-504245 X-HE-Meta: U2FsdGVkX1+mkme45/gTZ4cWhDZRBEplExn/6OTrwgntXHvPuI+V3hywVwFrlERWnIRpXKfJifdBVEn9Wwdb8Yz8pSss3DqgxnkyYgeOWK6IWkHmeitIJS2ILqs7F+rXL56jpeplPjHlpUPJyQxKJdgb7adgXOPy4OHFopzvNa61izqfdYU3caL5nhhagkOuFJ6IL81w4QgarpiMI9XXgmvUWmiZ9hjDtyLIyTIclBYuNmf7zDsIguCYeBxnOYianzFYGYI6XoRAsVPNcQWIx2bsxnAhcXMyK+lGZJ+5F17URmDe40Q74JaqRPDZ47vVYcQP5cW4lKw9KCDC0L/0Q0T71bxdevRW+eYmNskieQH1WZ9JQbHrAHIeJC0Glgvs4qVCXn+STmxOFz5xHkFGjnQzadOM5pVuf5/X548DqcDNddojEhHwvAwYgLC4VyuNSxXo0yGdPAgqRcudQ5FpFICdX8vW4Yn3LouVsvCj2s6jJmSJgo1VA83lQ9WU8DiFuardU7/H/rVeMJY1hKwgM0ZzYSZF/gKd/L19LCpZ+akjIk46xIWyhWQbzREOTzjTkQxGBqX1k6z3VDN7BKKRKs/ZlrCzFbApRJ8zSAKiMSKKhIAArZ5VTqbbY+H4UnQcJR69ml8vBx2fKm+XbKNgc9RAvLa6/wnw5USwAB6omU68p6LFDXiaqOxDzLjwqjMN11PLi6VMu2mT9i9CzwqpVfB7O/xt5+AgNcX5tPQJshHAD/u2Xh1H3XRBAzSGCt6uL3gmiu/F0o8oDULRrOfXRRL+0B8jtsT6x3TRv6g66nj/2RbYDFcVGagEZ3Jku6WwN5tRy8nXfP3d0IirvaneFG+ut1gAyDJ/o/0wz/5+0T/a5PSMfyzFenh3uXzh5+zkTUV5bs/oO2LlW501Cd6CkSdn3P81UCM0jUuNMWx5sx/ond7/UuLiDaXHVIkwcN5Iyh6k2B9PYV/AW4jDgS3 mV4bYbiu NX1/WdSu4rFEdDp4Ifryq+M14EZhxQiYgf/vvxdUrOrHg0J2xKw5vh6zXToXIpHcWz3mIrHjc3xANz+T0Ckuum8cFuNYcRCJObw0ad/NXVHGitg7WIHhL87r0TejN/LxHkANoepiOzqFz9aS1ff4XxXmnHrGfUnacifvpqXYdaj+llSQXsG2ZDFm+wEeLjKXzZAfWHHg6sBiS0nGPsLtAQOPeEVjLfqMi4Bm2 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Apr 18, 2025 at 7:50=E2=80=AFPM Suren Baghdasaryan wrote: > With maple_tree supporting vma tree traversal under RCU and vma and > its important members being RCU-safe, /proc/pid/maps can be read under > RCU and without the need to read-lock mmap_lock. However vma content > can change from under us, therefore we make a copy of the vma and we > pin pointer fields used when generating the output (currently only > vm_file and anon_name). Afterwards we check for concurrent address > space modifications, wait for them to end and retry. While we take > the mmap_lock for reading during such contention, we do that momentarily > only to record new mm_wr_seq counter. This change is designed to reduce > mmap_lock contention and prevent a process reading /proc/pid/maps files > (often a low priority task, such as monitoring/data collection services) > from blocking address space updates. [...] > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index b9e4fbbdf6e6..f9d50a61167c 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c [...] > +/* > + * Take VMA snapshot and pin vm_file and anon_name as they are used by > + * show_map_vma. > + */ > +static int get_vma_snapshot(struct proc_maps_private *priv, struct vm_ar= ea_struct *vma) > +{ > + struct vm_area_struct *copy =3D &priv->vma_copy; > + int ret =3D -EAGAIN; > + > + memcpy(copy, vma, sizeof(*vma)); > + if (copy->vm_file && !get_file_rcu(©->vm_file)) > + goto out; I think this uses get_file_rcu() in a different way than intended. As I understand it, get_file_rcu() is supposed to be called on a pointer which always points to a file with a non-zero refcount (except when it is NULL). That's why it takes a file** instead of a file* - if it observes a zero refcount, it assumes that the pointer must have been updated in the meantime, and retries. Calling get_file_rcu() on a pointer that points to a file with zero refcount, which I think can happen with this patch, will cause an endless loop. (Just as background: For other usecases, get_file_rcu() is supposed to still behave nicely and not spuriously return NULL when the file* is concurrently updated to point to another file*; that's what that loop is for.) (If my understanding is correct, maybe we should document that more explicitly...) Also, I think you are introducing an implicit assumption that remove_vma() does not NULL out the ->vm_file pointer (because that could cause tearing and could theoretically lead to a torn pointer being accessed here). One alternative might be to change the paths that drop references to vma->vm_file (search for vma_close to find them) such that they first NULL out ->vm_file with a WRITE_ONCE() and do the fput() after that, maybe with a new helper like this: static void vma_fput(struct vm_area_struct *vma) { struct file *file =3D vma->vm_file; if (file) { WRITE_ONCE(vma->vm_file, NULL); fput(file); } } Then on the lockless lookup path you could use get_file_rcu() on the ->vm_file pointer _of the original VMA_, and store the returned file* into copy->vm_file. > + if (!anon_vma_name_get_if_valid(copy)) > + goto put_file; > + > + if (!mmap_lock_speculate_retry(priv->mm, priv->mm_wr_seq)) > + return 0; We only check for concurrent updates at this point, so up to here, anything we read from "copy" could contain torn pointers (both because memcpy() is not guaranteed to copy pointers atomically and because the updates to the original VMA are not done with WRITE_ONCE()). That probably means that something like the preceding anon_vma_name_get_if_valid() could crash on an access to a torn pointer. Please either do another mmap_lock_speculate_retry() check directly after the memcpy(), or ensure nothing before this point reads from "copy". > + /* Address space got modified, vma might be stale. Re-lock and re= try. */ > + rcu_read_unlock(); > + ret =3D mmap_read_lock_killable(priv->mm); > + if (!ret) { > + /* mmap_lock_speculate_try_begin() succeeds when holding = mmap_read_lock */ > + mmap_lock_speculate_try_begin(priv->mm, &priv->mm_wr_seq)= ; > + mmap_read_unlock(priv->mm); > + ret =3D -EAGAIN; > + } > + > + rcu_read_lock(); > + > + anon_vma_name_put_if_valid(copy); > +put_file: > + if (copy->vm_file) > + fput(copy->vm_file); > +out: > + return ret; > +} [...] > @@ -266,39 +399,41 @@ static void get_vma_name(struct vm_area_struct *vma= , > } else { > *path =3D file_user_path(vma->vm_file); > } > - return; > + goto out; > } > > if (vma->vm_ops && vma->vm_ops->name) { > *name =3D vma->vm_ops->name(vma); This seems to me like a big, subtle change of semantics. After this change, vm_ops->name() will no longer receive a real VMA; and in particular, I think the .name implementation special_mapping_name used in special_mapping_vmops will have a UAF because it relies on vma->vm_private_data pointing to a live object. I think you'll need to fall back to using the mmap lock and the real VMA if you see a non-NULL vma->vm_ops->name pointer. > if (*name) > - return; > + goto out; > } > > *name =3D arch_vma_name(vma); > if (*name) > - return; > + goto out; > > if (!vma->vm_mm) { > *name =3D "[vdso]"; > - return; > + goto out; > } > > if (vma_is_initial_heap(vma)) { > *name =3D "[heap]"; > - return; > + goto out; > } > > if (vma_is_initial_stack(vma)) { > *name =3D "[stack]"; > - return; > + goto out; > } > > if (anon_name) { > *name_fmt =3D "[anon:%s]"; > *name =3D anon_name->name; > - return; > } > +out: > + if (anon_name && !mmap_locked) > + anon_vma_name_put(anon_name); Isn't this refcount drop too early, causing UAF read? We drop the reference on the anon_name here, but (on some paths) we're about to return anon_name->name to the caller through *name, and the caller will read from it. Ah, but I guess it's actually fine because the refcount increment was unnecessary in the first place, because the vma pointer actually points to a copy of the original VMA, and the copy has its own refcounted reference to the anon_name thanks to get_vma_snapshot()? It might be helpful to have some comments documenting which VMA pointers can point to copies.