From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E62C9E7716D for ; Thu, 5 Dec 2024 00:18:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4FF0A6B007B; Wed, 4 Dec 2024 19:18:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4AF7F6B0083; Wed, 4 Dec 2024 19:18:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 34F5C6B0085; Wed, 4 Dec 2024 19:18:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 182E96B007B for ; Wed, 4 Dec 2024 19:18:26 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id AF81741F5F for ; Thu, 5 Dec 2024 00:18:25 +0000 (UTC) X-FDA: 82858993308.30.5AD91B3 Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48]) by imf10.hostedemail.com (Postfix) with ESMTP id 1D72CC0002 for ; Thu, 5 Dec 2024 00:18:16 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ipVm0FXr; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of richard.weiyang@gmail.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=richard.weiyang@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733357896; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UTarhtG6QhsBczFzHEEIraRnC7fwbRetkRpuFiI3llM=; b=rUm0iVceSwO+eW8DS5xdmNdyjGqaVUbAwhpOh8Cw8w/0Jb8zVi350J8BQ/p+Thpn9jM6pz m4o1s8QTX+iHyJppuGO4vafEwU7a8TBc3audB4fe4DwpJlYrfwX3JHswLSaa7gIyDxmWec nDz29lKIHDicCBmkr95dbcM7+nM+Y3Y= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733357897; a=rsa-sha256; cv=none; b=SiBqXi4liNbgiX2KjZueB4pKkQjPz3KXZmOxGDkb2/idds6tr+2w0/p13cL3b6p+z6N4qE BfrH0RS69g3qzVivq8y7XR6v16WRvLyKDowMpOUdG7PyKy5GVwxFiRMxmQ9CKZd0L75l55 OwoLLeC6pLNP9O9w/CzI87+Hg2bHXcE= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ipVm0FXr; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of richard.weiyang@gmail.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=richard.weiyang@gmail.com Received: by mail-ed1-f48.google.com with SMTP id 4fb4d7f45d1cf-5d1265ba5a8so269264a12.3 for ; Wed, 04 Dec 2024 16:18:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1733357902; x=1733962702; darn=kvack.org; h=user-agent:in-reply-to:content-disposition:mime-version:references :reply-to:message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=UTarhtG6QhsBczFzHEEIraRnC7fwbRetkRpuFiI3llM=; b=ipVm0FXr2TN+qNvcN62IlqivKWMlZ8SJSLIKhihLdNki8Xn7fWhR5CHdpqo+W8NA+A 9usO0FDPsX6VOf52aIStAp+AEpDLAr65/8rb3w151C7B/XwwWQ3TCQCvpbpxKasf4OkV YOAxahmR/eqkKaK36K32OJPKAQYEiCpwJkkjNmP4AOvI7sptct0g5pusCpjzxsbOBH7O hMapGRc23kbn/5NKmKIcYffkSHx/QjMh0siDdD/FVuM46QH6OO4nKy/QSNYD3a/FVbzc Fs86ZtzdjG9fwOu5lCe4bpottq9YSBzCDGngM/Ehkq7TAaOE7bz9SbS6fNxRzw0RJOl/ GWog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733357902; x=1733962702; h=user-agent:in-reply-to:content-disposition:mime-version:references :reply-to:message-id:subject:cc:to:from:date:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=UTarhtG6QhsBczFzHEEIraRnC7fwbRetkRpuFiI3llM=; b=joKhFXQ3ZeScqpaiKzGL3y0upTt/zMxfNpyf7QcYI/PHJkq8q8/H0LKtoC6tML8iwL QoK53T3ARKfinbFwt0BxsKveIJ/avX70fsMDXzd4V8WueSH28lgxlpX+Zafq0jcdIXIX s3GV9iYv/LmGPn548QWSAhQUWk3BgJBps2HLGvQ2alZikTm3ed3ankiQCB0Yn/ZbioXL hW6virBNZZCjKo89mCY1+fjWfkX0bQCvx9ZAemaq77m8DN2avg+YaJmF2+kw02uGBZ0P 11URRfihTrkV7wFiJ0mqijNRE2oJg0uQQUF9VqLUZ7NfP8sopqSnRFl/2CbHXdQY2+ii wtPg== X-Forwarded-Encrypted: i=1; AJvYcCXpr/UtzXtlbbsL7ZMAJD5lvAQ+7G8awGfuBvyHXU1AhzU4Ocb9USqBO5VKL6J1kkUuFy9MDxtbCg==@kvack.org X-Gm-Message-State: AOJu0YydFYZjvTn8zOi45gXEJUf5wIHi1HMEW2eDYROHUZdhNbsdAZc3 El8z4cZTj0iF9nB8D5/1CPFA/EqT3jjnD14dUtqUEUyItqYCpQTX X-Gm-Gg: ASbGncsXU0xSEpLypEvfclcbM9AP33HssYR1GCbsmoLa9Rg0Y0rQzWmfPczXLlUwshu F+EMDoTRF1euYauZkokaijkqopRBQTXq+G7tjRi4X7qAAG8LZLg5fpaLilJ3hBc6J3XqSKb2dJG E45shleceAQdAg5bSvmknGPXG0VXn6WojWhsEfABPEBJpIoIuCEfyFVVfjJi7PnHt3sMslgvJor 5GijoZ01Gxy0c7P3VbEnQKmjuYVOMxHSsYEEb8NIbjjLj4iUg== X-Google-Smtp-Source: AGHT+IGOxY0n5S9c6n6c0AB0cBoetyxBwtV/KCtKq2kw2LsDGbLYSOjIytcCwsKgi4Po9m/7Ngarhw== X-Received: by 2002:a05:6402:510e:b0:5d0:8606:9ba1 with SMTP id 4fb4d7f45d1cf-5d10cb82718mr7579671a12.24.1733357901839; Wed, 04 Dec 2024 16:18:21 -0800 (PST) Received: from localhost ([185.92.221.13]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5d14b608c8dsm118497a12.48.2024.12.04.16.18.19 (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Wed, 04 Dec 2024 16:18:20 -0800 (PST) Date: Thu, 5 Dec 2024 00:18:19 +0000 From: Wei Yang To: Lorenzo Stoakes Cc: Andrew Morton , "Liam R . Howlett" , Vlastimil Babka , Jann Horn , Eric Biederman , Kees Cook , Alexander Viro , Christian Brauner , Jan Kara , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 3/5] mm: abstract get_arg_page() stack expansion and mmap read lock Message-ID: <20241205001819.derfguaft7oummr6@master> Reply-To: Wei Yang References: <5295d1c70c58e6aa63d14be68d4e1de9fa1c8e6d.1733248985.git.lorenzo.stoakes@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5295d1c70c58e6aa63d14be68d4e1de9fa1c8e6d.1733248985.git.lorenzo.stoakes@oracle.com> User-Agent: NeoMutt/20170113 (1.7.2) X-Stat-Signature: wjf51f7oawtypzkfk817nniw71uwgizo X-Rspamd-Queue-Id: 1D72CC0002 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1733357896-343536 X-HE-Meta: U2FsdGVkX1+j9EK+P9AhC0B1vg1L+qlLLSbwMgTR3lKFdhtKo1vJ1WRhT6vGdOD8uZ9WS457Cd7a2xgobxEqHXxhMw5i/m82Ncu9/4lPrRZZkllzn5VbYlDmUv/JRPxmSoZk2EWTSDf2FZhzhUnBYY4C3l6Gtw61sItCSlykBngcY6DBkTQl4ZctNutRTbQqDeF90q3x5z6xyi5fmA71wpWcuwNPliht4wigIx0JxPCGY8bIJDPKLHHF7teLcsCCMHosuAqbor5x0xEgq2lRXoXVhASL7wR3HNA7vTJodX2rDItgqVH8NJEsVHUN2OIQPtrNklUPeJFBsSvM2pwyL0osqlNK+ZxacqAHMPDoj0XrpfGormddp624OSekZzKDnnYTdyP3SKM2TB4Jd3NfAnUU1du8A9dDTPbvx/RCoE1QWbMF9XN73dSentVU+g0nRS7JNcQtcfoK+GtdHaru1JEZWHnWbTubiOsRVTdgUgNzeqBMKjReOxlCNVb7nxPdHMlI6M/upd1zhw3ev1+dfIgS2RleE56hzCHb6XiJZj9PYtRYguXPHlLIyZmk6hfO+u+HK7qk3h+ecyCFaAUREj+7fdKSIOIByMGzrFxFuJyEacwZBCw3c1BiQ+3H7iznjZ862Uj6hGKKJs5ZdZK6eVRIKG/HuERVXSr3RTnm3flx4JzWyKdLZfMYZLYNBiLTn9b7i17m3G9jIx0LNMaJwo3F7TPNh96l6KktRmLuPmGM4xouGzpY4Ydh0CxGl7JLBV7gKZfmq/FuOK09ESUcGVEZFsIDFRKQxg+0Vr9MaL4bkUiCTBcpk8Q29LKw/hrKykEWfnYqF+VUipl+jrLhZQoGx9oHkXOwTwjKGDra9rWxP4pJVjrr8W/b75lLc0RgKxLZGDEgL/YVjTYEpno18zCYpPMk83qljj0jYQ0YSvS1DEJ5j4RNQpWtYznO4gIxGY1Skjlu3HMVh7vbi/1 qbrbnDR6 1gnsZfN8hW8/3aOr3j7VJj1Jz1hxr4gX3qjsyCSvEU6RXycg1p4VydBlu+aJNJD7pAGb8xMdZDuFBZjpkrRkRQz66NYbBgudhN9g+Zrjq8rphWLCF7aZhjaRhAfCkpWvig1qbYTqmElOwbi0ZGrCdTtbAHftYD9ovomUD+ocGGSl3dAhoqo9NROZ4s93yfs22CPktk2nVmnNBAjYlvmROgla16QX6LW1I3PuaVWSEd7Fr1FfMuHjnLDGdbIjEsE4O/GvBGt1HoAOti2+PAwn/hmhs+aw5GA+9Qk+sIyI7JQeOpSXwM1N8WylnN0QNkxaAjo2imJp9Ejx4OqelkXnB/qPpYiyH6lNvIFC8kkEWJMXyTlhicuJtbvKgl9/E6SIVRMxd/6JCBPZa1QywuD0Z50mrti7IqlTzLlwzUjZZWfQs1r/b6bLNQzTgkynClHgLWD4ggfYAl7PyknJzKnMOXEGhD+sWKMjAShO+pt9/lfoD3CYjIuNX8O1xmdPOvt8NSGTRfBIlmA77S6Uo89vfHMdtT9tM7Pj8Uwt2b1cX/Ixh1PAQITCzcxHk0BArbXmaB8JipDKOSYFH5qARzYyLG2w+H0ygU3RylOjRcHvM43ro2T7MKYwhpTvvPQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000008, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Dec 03, 2024 at 06:05:10PM +0000, Lorenzo Stoakes wrote: >Right now fs/exec.c invokes expand_downwards(), an otherwise internal >implementation detail of the VMA logic in order to ensure that an arg page >can be obtained by get_user_pages_remote(). > >In order to be able to move the stack expansion logic into mm/vma.c in >order to make it available to userland testing we need to find an Looks the second "in order" is not necessary. Not a native speaker, just my personal feeling. >alternative approach here. > >We do so by providing the mmap_read_lock_maybe_expand() function which also >helpfully documents what get_arg_page() is doing here and adds an >additional check against VM_GROWSDOWN to make explicit that the stack >expansion logic is only invoked when the VMA is indeed a downward-growing >stack. > >This allows expand_downwards() to become a static function. > >Importantly, the VMA referenced by mmap_read_maybe_expand() must NOT be >currently user-visible in any way, that is place within an rmap or VMA >tree. It must be a newly allocated VMA. > >This is the case when exec invokes this function. > >Signed-off-by: Lorenzo Stoakes >--- > fs/exec.c | 14 +++--------- > include/linux/mm.h | 5 ++--- > mm/mmap.c | 54 +++++++++++++++++++++++++++++++++++++++++++++- > 3 files changed, 58 insertions(+), 15 deletions(-) > >diff --git a/fs/exec.c b/fs/exec.c >index 98cb7ba9983c..1e1f79c514de 100644 >--- a/fs/exec.c >+++ b/fs/exec.c >@@ -205,18 +205,10 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos, > /* > * Avoid relying on expanding the stack down in GUP (which > * does not work for STACK_GROWSUP anyway), and just do it >- * by hand ahead of time. >+ * ahead of time. > */ >- if (write && pos < vma->vm_start) { >- mmap_write_lock(mm); >- ret = expand_downwards(vma, pos); >- if (unlikely(ret < 0)) { >- mmap_write_unlock(mm); >- return NULL; >- } >- mmap_write_downgrade(mm); >- } else >- mmap_read_lock(mm); >+ if (!mmap_read_lock_maybe_expand(mm, vma, pos, write)) >+ return NULL; > > /* > * We are doing an exec(). 'current' is the process >diff --git a/include/linux/mm.h b/include/linux/mm.h >index 4eb8e62d5c67..48312a934454 100644 >--- a/include/linux/mm.h >+++ b/include/linux/mm.h >@@ -3313,6 +3313,8 @@ extern int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admi > extern int insert_vm_struct(struct mm_struct *, struct vm_area_struct *); > extern void exit_mmap(struct mm_struct *); > int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift); >+bool mmap_read_lock_maybe_expand(struct mm_struct *mm, struct vm_area_struct *vma, >+ unsigned long addr, bool write); > > static inline int check_data_rlimit(unsigned long rlim, > unsigned long new, >@@ -3426,9 +3428,6 @@ extern unsigned long stack_guard_gap; > int expand_stack_locked(struct vm_area_struct *vma, unsigned long address); > struct vm_area_struct *expand_stack(struct mm_struct * mm, unsigned long addr); > >-/* CONFIG_STACK_GROWSUP still needs to grow downwards at some places */ >-int expand_downwards(struct vm_area_struct *vma, unsigned long address); >- > /* Look up the first VMA which satisfies addr < vm_end, NULL if none. */ > extern struct vm_area_struct * find_vma(struct mm_struct * mm, unsigned long addr); > extern struct vm_area_struct * find_vma_prev(struct mm_struct * mm, unsigned long addr, >diff --git a/mm/mmap.c b/mm/mmap.c >index f053de1d6fae..4df38d3717ff 100644 >--- a/mm/mmap.c >+++ b/mm/mmap.c >@@ -1009,7 +1009,7 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address) > * vma is the first one with address < vma->vm_start. Have to extend vma. > * mmap_lock held for writing. > */ >-int expand_downwards(struct vm_area_struct *vma, unsigned long address) >+static int expand_downwards(struct vm_area_struct *vma, unsigned long address) > { > struct mm_struct *mm = vma->vm_mm; > struct vm_area_struct *prev; >@@ -1940,3 +1940,55 @@ int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift) > /* Shrink the vma to just the new range */ > return vma_shrink(&vmi, vma, new_start, new_end, vma->vm_pgoff); > } >+ >+#ifdef CONFIG_MMU >+/* >+ * Obtain a read lock on mm->mmap_lock, if the specified address is below the >+ * start of the VMA, the intent is to perform a write, and it is a >+ * downward-growing stack, then attempt to expand the stack to contain it. >+ * >+ * This function is intended only for obtaining an argument page from an ELF >+ * image, and is almost certainly NOT what you want to use for any other >+ * purpose. >+ * >+ * IMPORTANT - VMA fields are accessed without an mmap lock being held, so the >+ * VMA referenced must not be linked in any user-visible tree, i.e. it must be a >+ * new VMA being mapped. >+ * >+ * The function assumes that addr is either contained within the VMA or below >+ * it, and makes no attempt to validate this value beyond that. >+ * >+ * Returns true if the read lock was obtained and a stack was perhaps expanded, >+ * false if the stack expansion failed. >+ * >+ * On stack expansion the function temporarily acquires an mmap write lock >+ * before downgrading it. >+ */ >+bool mmap_read_lock_maybe_expand(struct mm_struct *mm, >+ struct vm_area_struct *new_vma, >+ unsigned long addr, bool write) >+{ >+ if (!write || addr >= new_vma->vm_start) { >+ mmap_read_lock(mm); >+ return true; >+ } >+ >+ if (!(new_vma->vm_flags & VM_GROWSDOWN)) >+ return false; >+ In expand_downwards() we have this checked. Maybe we just leave this done in one place is enough? >+ mmap_write_lock(mm); >+ if (expand_downwards(new_vma, addr)) { >+ mmap_write_unlock(mm); >+ return false; >+ } >+ >+ mmap_write_downgrade(mm); >+ return true; >+} >+#else >+bool mmap_read_lock_maybe_expand(struct mm_struct *mm, struct vm_area_struct *vma, >+ unsigned long addr, bool write) >+{ >+ return false; >+} >+#endif >-- >2.47.1 > -- Wei Yang Help you, Help me