From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 976F6C3ABA5 for ; Tue, 29 Apr 2025 17:09:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3977A6B0005; Tue, 29 Apr 2025 13:09:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3475D6B0007; Tue, 29 Apr 2025 13:09:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1EB946B0008; Tue, 29 Apr 2025 13:09:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id F222E6B0005 for ; Tue, 29 Apr 2025 13:09:17 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 6B4391CF574 for ; Tue, 29 Apr 2025 17:09:18 +0000 (UTC) X-FDA: 83387717196.11.4D63150 Received: from mail-qt1-f171.google.com (mail-qt1-f171.google.com [209.85.160.171]) by imf08.hostedemail.com (Postfix) with ESMTP id 7D65D160008 for ; Tue, 29 Apr 2025 17:09:16 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=jYpNRJP4; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf08.hostedemail.com: domain of surenb@google.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745946556; a=rsa-sha256; cv=none; b=DR1k2tajZjdPqN52vcmYjESv4L3m+nOu1jdbmasen9XTY2FCCCmMsBNqUCO4rmnyWX4r3o fcLgJf53AMCF6j/7fuGzob5VCknGQ70feaOkCC12ha75rix1bNUrFaDiOErGtg2GQrKY2m nfZB+M0L/FTQGQirwZrFOkNPY9PgvRI= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=jYpNRJP4; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf08.hostedemail.com: domain of surenb@google.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745946556; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3wN1T3GMxNDHasA1P1pHfr57BfT4ryZUjRwhWQiFJ1U=; b=1oGKxRRpkOgd8hmzDLU5j0oq3KcKEpJ6Yoi9dQuubS1k6KzTd84cDRDG6TyzqmQS0Xv2Wq LsjPw71aDvi6Po4iMR3ndiClht+HaO78x9Bppk/mphPDz6j6FuQvKhzZg6X2SziCasGKaL 1s1pua3LAYAs3YRUFAMau2vLt3rP+rs= Received: by mail-qt1-f171.google.com with SMTP id d75a77b69052e-47681dba807so48081cf.1 for ; Tue, 29 Apr 2025 10:09:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1745946555; x=1746551355; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=3wN1T3GMxNDHasA1P1pHfr57BfT4ryZUjRwhWQiFJ1U=; b=jYpNRJP49XIOW4x2iBbPpeJVfE7tux5V5j0T0c3G0g8ucYClfKN4NazwODbjEsRdOW cWDFNq4oOTV4BBzR4apY86gPWXeLDirVohJEhLm5v5HepNdPVs9JzQNQPZ6VipkccydS I1ULwHaSLL7ILwEdntfzk/4nk9+kGCyo10AJJ53WQ0pJJXgFcbk6nO63hGrbZlRxAC3z xY9dOwxmCLEyxgFmFkQtU7R09uY2xq2fMZ7KcWwaJFD2PgD+VTyR5sPSeNlp9DPGxt8F QvTK2JqMu68evTDz58dRjfZrKu7ieTfXNBh+pJNLeGdBDMnfL231ycxOkXDJi8+VUMtP YJAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745946555; x=1746551355; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3wN1T3GMxNDHasA1P1pHfr57BfT4ryZUjRwhWQiFJ1U=; b=Q3ZoJFRuCzFind0rSMnIM9F37dnsHeZlrOuhxp70CZrO44mAZiQtwWughcC+KnuloW BcXna0sz3MYgsNgMStFprTBPEEvaUj6rom3NCYJYImt7jQX2ozEeTv4s7SSRSBKhJrI9 mk6oi//uSEwv3BUd0UTJzJUQUPWMnJNS5Q3UrT2HMfvqsutkxsr+eNhU8mq+COOs/631 Bv6rHWe+Dq938eQjfbvwHDoUFR3MlTS9D4CXe5+xi6ni9WxoY47YPMcBI+yn+ks1qm9z S4eB+0TAhm+ROiGEsOQSDbWZ9itAtiancV4R/G80OWW3I+LTs66s3zMFf4xA+Zrn2v0H zcHA== X-Forwarded-Encrypted: i=1; AJvYcCVKml0Us3dBb9jx46ajom0/Y2+ElUlnm+KYPu7xfdN5/4p3S+/ZWKsECqBoIgXG3Fsj60SNZkOpHg==@kvack.org X-Gm-Message-State: AOJu0Yxc/IwK9xfqEdIJizUeMfKRYRzd2EO7tFb8leAV/BoL3jfuTXGh lTvq0REZwwEHGMVGeOnnFjUzGSeVcF/M9MmbFZP1HxrfZjB4kM9p4pLvmjB0duJU7j6fpczyvop yGsmhXWGJQ+WN7rsel5Lt+tVeUMjMqEhLFiUn X-Gm-Gg: ASbGnctB6hGTiqsMiXsrhDyO5Sps9SGS3wcdhkyD8Lj7LR3//JyCWuTFB7XPkomJr54 sFCkIvTV+LuIuDvH3Y3X8+I40kZv9oKPuHR2JraLs1eGNMgG949cErU4saw2H1NEc0lQAtYMMMd /7/6uoq5PA7guYiVfu+fCelmHvGEq6pBwSW3UE9ZGMczWjDyQw08kD/xUSQQS0+LM= X-Google-Smtp-Source: AGHT+IESKitBJO3sZv32xrVdF494ZNH19atUoJKC8GwVG1r2WedLX5sPmIE31lJgcwtfPpfMmDhKELfsvLaZi4LwOy0= X-Received: by 2002:ac8:7f4c:0:b0:47d:d252:4873 with SMTP id d75a77b69052e-488a5dedc42mr4952881cf.11.1745946555202; Tue, 29 Apr 2025 10:09:15 -0700 (PDT) MIME-Version: 1.0 References: <20250418174959.1431962-1-surenb@google.com> <20250418174959.1431962-8-surenb@google.com> In-Reply-To: From: Suren Baghdasaryan Date: Tue, 29 Apr 2025 10:09:04 -0700 X-Gm-Features: ATxdqUHykEeEaRUDfVFPrHYAFtv8-ek3ih5pTaKrNuCjWMlHoN1VLkz4YajbEFo Message-ID: Subject: Re: [PATCH v3 7/8] mm/maps: read proc/pid/maps under RCU To: Jann Horn Cc: Al Viro , brauner@kernel.org, linux-fsdevel@vger.kernel.org, akpm@linux-foundation.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, david@redhat.com, vbabka@suse.cz, peterx@redhat.com, hannes@cmpxchg.org, mhocko@kernel.org, paulmck@kernel.org, shuah@kernel.org, adobriyan@gmail.com, josef@toxicpanda.com, yebin10@huawei.com, linux@weissschuh.net, willy@infradead.org, osalvador@suse.de, andrii@kernel.org, ryan.roberts@arm.com, christophe.leroy@csgroup.eu, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 7D65D160008 X-Stat-Signature: ejndxaynctwpakgsu4xxu4w9c1yne6ge X-HE-Tag: 1745946556-57372 X-HE-Meta: U2FsdGVkX190fjA+lm9f95lmupwyRx3WcLtMq3blywQPMcmNVGlFefhZ2j9dWOU1d3OlRbOPRkFSxjplTHNo17d2e9XaNQHtDK2ew7YGp6W6mv0DCgGodTUmuCHcrox+MvYdnADyAEZxywODrSDP4TZht6WOOR8u/e7WE8hlrvCyzNXyMCJHlJ75yD0Iu/QFYNqzHwdg6yYwa7Q7No8fd5fxUT9CzeR6PUpV80Ax7ecOJ5QlKnGX7xT4Q3fUGXQfIwnOGK2pEcOqHzB7VGgMRPCI8OowzpLisd03qGd0rLog1rK5RqvcxZy4gJgr92HBGjzifyrUxqwx+N1BOUHs1vPxm09BdhqHWzucG/ge3FkaE9s7opzeT+xwd8ClqxgLz20vMvVgch1H1hsdKew5z5cPeAIhVqBJ/D08ydsU47KqwvDPtgcNUjBPCiEIkhzFkNF5EGBCiL6Y4x9gm0v9WEfxA7GlG+IaS3s58NyyUKeksDkj4uWzRJW7d/ZHku3/wxmrdG4d9rNUPxHYWrGGgh1zmqhRwxSyOcZNGb1bEfSWO2vViu25nHhMhsAx9tiCDQVctbZkIziEsHpY+SR1dvhNRQn8AZ//k53JMxSERpi4nZJavfZyZYDDxmUNJXGBXSVv5EYBisnTg21JoiYqZKj2zPf9LHKAuAy7Zvr+EVD3dzScM6fKFYG0GKEDwvdyzi8FagLRfOjG3WlLl6BZ/ryrnGq38xAPRBsn0jUDw4SZoQYfxh7kLldDAPvRlVVi0vDmY6Fhy/JYUj0SiRJx2r2PCNFusBd4yYgy7OmSXLoL0zjRjKIqzcV4435FqY0ErQkLIEeXGIg7aCmC1fKZadzdmXudKb7a6LPxkf//vNKrY0e6avgtqqNr5XdHsEyqM/vWKMYscgYw46EixuIrGJ67HNSMy9+K3fBKEscL2zWT8aBF/Mwxa8aFurcBCfv38L13I47+gSIIwnwRrYw rrMMUJZj lsv18x5y60PkdaOKXZLTNfefOk0N0j6Xm3Odcq2mL3RpZu6ryxzvEeAYHIBOX+bpB8yYyG2zVqJwNCiFEMtAuYZJjOiZMiAzsUCSUfF/7sZvaEyxk86WdcHKfvmbAGUwcHxNkYLSufVRQnQwtA7E5aDXgENWXNFRoJSzeUKHyzMxJaOB2rvfY9IgjD6Y5IW0JesBZSJ36sGehKvI2ZCxrJQOsqfQEkBIMIV6W X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Apr 29, 2025 at 8:40=E2=80=AFAM Jann Horn wrote: > > On Fri, Apr 18, 2025 at 7:50=E2=80=AFPM Suren Baghdasaryan wrote: > > With maple_tree supporting vma tree traversal under RCU and vma and > > its important members being RCU-safe, /proc/pid/maps can be read under > > RCU and without the need to read-lock mmap_lock. However vma content > > can change from under us, therefore we make a copy of the vma and we > > pin pointer fields used when generating the output (currently only > > vm_file and anon_name). Afterwards we check for concurrent address > > space modifications, wait for them to end and retry. While we take > > the mmap_lock for reading during such contention, we do that momentaril= y > > only to record new mm_wr_seq counter. This change is designed to reduce > > mmap_lock contention and prevent a process reading /proc/pid/maps files > > (often a low priority task, such as monitoring/data collection services= ) > > from blocking address space updates. > [...] > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > > index b9e4fbbdf6e6..f9d50a61167c 100644 > > --- a/fs/proc/task_mmu.c > > +++ b/fs/proc/task_mmu.c > [...] > > +/* > > + * Take VMA snapshot and pin vm_file and anon_name as they are used by > > + * show_map_vma. > > + */ > > +static int get_vma_snapshot(struct proc_maps_private *priv, struct vm_= area_struct *vma) > > +{ > > + struct vm_area_struct *copy =3D &priv->vma_copy; > > + int ret =3D -EAGAIN; > > + > > + memcpy(copy, vma, sizeof(*vma)); > > + if (copy->vm_file && !get_file_rcu(©->vm_file)) > > + goto out; > > I think this uses get_file_rcu() in a different way than intended. > > As I understand it, get_file_rcu() is supposed to be called on a > pointer which always points to a file with a non-zero refcount (except > when it is NULL). That's why it takes a file** instead of a file* - if > it observes a zero refcount, it assumes that the pointer must have > been updated in the meantime, and retries. Calling get_file_rcu() on a > pointer that points to a file with zero refcount, which I think can > happen with this patch, will cause an endless loop. > (Just as background: For other usecases, get_file_rcu() is supposed to > still behave nicely and not spuriously return NULL when the file* is > concurrently updated to point to another file*; that's what that loop > is for.) Ah, I see. I wasn't aware of this subtlety. I think this is fixable by checking the return value of get_file_rcu() and retrying speculation if it changed. > (If my understanding is correct, maybe we should document that more > explicitly...) Good point. I'll add comments for get_file_rcu() as a separate patch. > > Also, I think you are introducing an implicit assumption that > remove_vma() does not NULL out the ->vm_file pointer (because that > could cause tearing and could theoretically lead to a torn pointer > being accessed here). > > One alternative might be to change the paths that drop references to > vma->vm_file (search for vma_close to find them) such that they first > NULL out ->vm_file with a WRITE_ONCE() and do the fput() after that, > maybe with a new helper like this: > > static void vma_fput(struct vm_area_struct *vma) > { > struct file *file =3D vma->vm_file; > > if (file) { > WRITE_ONCE(vma->vm_file, NULL); > fput(file); > } > } > > Then on the lockless lookup path you could use get_file_rcu() on the > ->vm_file pointer _of the original VMA_, and store the returned file* > into copy->vm_file. Ack. Except for storing the return value of get_file_rcu(). I think once we detect that get_file_rcu() returns a different file we should bail out and retry. The change in file is an indication that the vma got changed from under us, so whatever we have is stale. > > > + if (!anon_vma_name_get_if_valid(copy)) > > + goto put_file; > > + > > + if (!mmap_lock_speculate_retry(priv->mm, priv->mm_wr_seq)) > > + return 0; > > We only check for concurrent updates at this point, so up to here, > anything we read from "copy" could contain torn pointers (both because > memcpy() is not guaranteed to copy pointers atomically and because the > updates to the original VMA are not done with WRITE_ONCE()). > That probably means that something like the preceding > anon_vma_name_get_if_valid() could crash on an access to a torn > pointer. > Please either do another mmap_lock_speculate_retry() check directly > after the memcpy(), or ensure nothing before this point reads from > "copy". Ack. I'll add mmap_lock_speculate_retry() check right after memcpy(). > > > + /* Address space got modified, vma might be stale. Re-lock and = retry. */ > > + rcu_read_unlock(); > > + ret =3D mmap_read_lock_killable(priv->mm); > > + if (!ret) { > > + /* mmap_lock_speculate_try_begin() succeeds when holdin= g mmap_read_lock */ > > + mmap_lock_speculate_try_begin(priv->mm, &priv->mm_wr_se= q); > > + mmap_read_unlock(priv->mm); > > + ret =3D -EAGAIN; > > + } > > + > > + rcu_read_lock(); > > + > > + anon_vma_name_put_if_valid(copy); > > +put_file: > > + if (copy->vm_file) > > + fput(copy->vm_file); > > +out: > > + return ret; > > +} > [...] > > @@ -266,39 +399,41 @@ static void get_vma_name(struct vm_area_struct *v= ma, > > } else { > > *path =3D file_user_path(vma->vm_file); > > } > > - return; > > + goto out; > > } > > > > if (vma->vm_ops && vma->vm_ops->name) { > > *name =3D vma->vm_ops->name(vma); > > This seems to me like a big, subtle change of semantics. After this > change, vm_ops->name() will no longer receive a real VMA; and in > particular, I think the .name implementation special_mapping_name used > in special_mapping_vmops will have a UAF because it relies on > vma->vm_private_data pointing to a live object. Ah, I see. IOW, vma->vm_private_data might change from under us and I don't detect that. Moreover this is just an example and .name() might depend on other things. > > I think you'll need to fall back to using the mmap lock and the real > VMA if you see a non-NULL vma->vm_ops->name pointer. Yeah, either that or obtain the name and make a copy during get_vma_snapshot() using original vma. Will need to check which way is better. > > > if (*name) > > - return; > > + goto out; > > } > > > > *name =3D arch_vma_name(vma); > > if (*name) > > - return; > > + goto out; > > > > if (!vma->vm_mm) { > > *name =3D "[vdso]"; > > - return; > > + goto out; > > } > > > > if (vma_is_initial_heap(vma)) { > > *name =3D "[heap]"; > > - return; > > + goto out; > > } > > > > if (vma_is_initial_stack(vma)) { > > *name =3D "[stack]"; > > - return; > > + goto out; > > } > > > > if (anon_name) { > > *name_fmt =3D "[anon:%s]"; > > *name =3D anon_name->name; > > - return; > > } > > +out: > > + if (anon_name && !mmap_locked) > > + anon_vma_name_put(anon_name); > > Isn't this refcount drop too early, causing UAF read? We drop the > reference on the anon_name here, but (on some paths) we're about to > return anon_name->name to the caller through *name, and the caller > will read from it. > > Ah, but I guess it's actually fine because the refcount increment was > unnecessary in the first place, because the vma pointer actually > points to a copy of the original VMA, and the copy has its own > refcounted reference to the anon_name thanks to get_vma_snapshot()? > It might be helpful to have some comments documenting which VMA > pointers can point to copies. If I follow Andrii's suggestion and avoid vma copying I'll need to implement careful handling of pointers and allow get_vma_name() to fail indicating that vma changed from under us. Let me see if this is still doable in the light of your findings. Thanks for the insightful review and welcome back!