From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Wei Yang <richard.weiyang@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
Eric Biederman <ebiederm@xmission.com>,
Kees Cook <kees@kernel.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/5] mm: abstract get_arg_page() stack expansion and mmap read lock
Date: Thu, 5 Dec 2024 07:06:07 +0000 [thread overview]
Message-ID: <853b7230-1939-465c-ad78-88400765b424@lucifer.local> (raw)
In-Reply-To: <20241205001819.derfguaft7oummr6@master>
On Thu, Dec 05, 2024 at 12:18:19AM +0000, Wei Yang wrote:
> On Tue, Dec 03, 2024 at 06:05:10PM +0000, Lorenzo Stoakes wrote:
> >Right now fs/exec.c invokes expand_downwards(), an otherwise internal
> >implementation detail of the VMA logic in order to ensure that an arg page
> >can be obtained by get_user_pages_remote().
> >
> >In order to be able to move the stack expansion logic into mm/vma.c in
> >order to make it available to userland testing we need to find an
>
> Looks the second "in order" is not necessary.
>
> Not a native speaker, just my personal feeling.
>
> >alternative approach here.
Sorry missed this one.
You're right this is clunky (non-native speakers often find this better
than native speakers to whom clunky turn of phrase can be more easily
overlooked I imagine).
Second 'in order to' should be 'to' really, but I'm not sure this is
important enough to take pains to address, will fix if a respin is
otherwise needed.
> >
> >We do so by providing the mmap_read_lock_maybe_expand() function which also
> >helpfully documents what get_arg_page() is doing here and adds an
> >additional check against VM_GROWSDOWN to make explicit that the stack
> >expansion logic is only invoked when the VMA is indeed a downward-growing
> >stack.
> >
> >This allows expand_downwards() to become a static function.
> >
> >Importantly, the VMA referenced by mmap_read_maybe_expand() must NOT be
> >currently user-visible in any way, that is place within an rmap or VMA
> >tree. It must be a newly allocated VMA.
> >
> >This is the case when exec invokes this function.
> >
> >Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> >---
> > fs/exec.c | 14 +++---------
> > include/linux/mm.h | 5 ++---
> > mm/mmap.c | 54 +++++++++++++++++++++++++++++++++++++++++++++-
> > 3 files changed, 58 insertions(+), 15 deletions(-)
> >
> >diff --git a/fs/exec.c b/fs/exec.c
> >index 98cb7ba9983c..1e1f79c514de 100644
> >--- a/fs/exec.c
> >+++ b/fs/exec.c
> >@@ -205,18 +205,10 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
> > /*
> > * Avoid relying on expanding the stack down in GUP (which
> > * does not work for STACK_GROWSUP anyway), and just do it
> >- * by hand ahead of time.
> >+ * ahead of time.
> > */
> >- if (write && pos < vma->vm_start) {
> >- mmap_write_lock(mm);
> >- ret = expand_downwards(vma, pos);
> >- if (unlikely(ret < 0)) {
> >- mmap_write_unlock(mm);
> >- return NULL;
> >- }
> >- mmap_write_downgrade(mm);
> >- } else
> >- mmap_read_lock(mm);
> >+ if (!mmap_read_lock_maybe_expand(mm, vma, pos, write))
> >+ return NULL;
> >
> > /*
> > * We are doing an exec(). 'current' is the process
> >diff --git a/include/linux/mm.h b/include/linux/mm.h
> >index 4eb8e62d5c67..48312a934454 100644
> >--- a/include/linux/mm.h
> >+++ b/include/linux/mm.h
> >@@ -3313,6 +3313,8 @@ extern int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admi
> > extern int insert_vm_struct(struct mm_struct *, struct vm_area_struct *);
> > extern void exit_mmap(struct mm_struct *);
> > int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift);
> >+bool mmap_read_lock_maybe_expand(struct mm_struct *mm, struct vm_area_struct *vma,
> >+ unsigned long addr, bool write);
> >
> > static inline int check_data_rlimit(unsigned long rlim,
> > unsigned long new,
> >@@ -3426,9 +3428,6 @@ extern unsigned long stack_guard_gap;
> > int expand_stack_locked(struct vm_area_struct *vma, unsigned long address);
> > struct vm_area_struct *expand_stack(struct mm_struct * mm, unsigned long addr);
> >
> >-/* CONFIG_STACK_GROWSUP still needs to grow downwards at some places */
> >-int expand_downwards(struct vm_area_struct *vma, unsigned long address);
> >-
> > /* Look up the first VMA which satisfies addr < vm_end, NULL if none. */
> > extern struct vm_area_struct * find_vma(struct mm_struct * mm, unsigned long addr);
> > extern struct vm_area_struct * find_vma_prev(struct mm_struct * mm, unsigned long addr,
> >diff --git a/mm/mmap.c b/mm/mmap.c
> >index f053de1d6fae..4df38d3717ff 100644
> >--- a/mm/mmap.c
> >+++ b/mm/mmap.c
> >@@ -1009,7 +1009,7 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address)
> > * vma is the first one with address < vma->vm_start. Have to extend vma.
> > * mmap_lock held for writing.
> > */
> >-int expand_downwards(struct vm_area_struct *vma, unsigned long address)
> >+static int expand_downwards(struct vm_area_struct *vma, unsigned long address)
> > {
> > struct mm_struct *mm = vma->vm_mm;
> > struct vm_area_struct *prev;
> >@@ -1940,3 +1940,55 @@ int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift)
> > /* Shrink the vma to just the new range */
> > return vma_shrink(&vmi, vma, new_start, new_end, vma->vm_pgoff);
> > }
> >+
> >+#ifdef CONFIG_MMU
> >+/*
> >+ * Obtain a read lock on mm->mmap_lock, if the specified address is below the
> >+ * start of the VMA, the intent is to perform a write, and it is a
> >+ * downward-growing stack, then attempt to expand the stack to contain it.
> >+ *
> >+ * This function is intended only for obtaining an argument page from an ELF
> >+ * image, and is almost certainly NOT what you want to use for any other
> >+ * purpose.
> >+ *
> >+ * IMPORTANT - VMA fields are accessed without an mmap lock being held, so the
> >+ * VMA referenced must not be linked in any user-visible tree, i.e. it must be a
> >+ * new VMA being mapped.
> >+ *
> >+ * The function assumes that addr is either contained within the VMA or below
> >+ * it, and makes no attempt to validate this value beyond that.
> >+ *
> >+ * Returns true if the read lock was obtained and a stack was perhaps expanded,
> >+ * false if the stack expansion failed.
> >+ *
> >+ * On stack expansion the function temporarily acquires an mmap write lock
> >+ * before downgrading it.
> >+ */
> >+bool mmap_read_lock_maybe_expand(struct mm_struct *mm,
> >+ struct vm_area_struct *new_vma,
> >+ unsigned long addr, bool write)
> >+{
> >+ if (!write || addr >= new_vma->vm_start) {
> >+ mmap_read_lock(mm);
> >+ return true;
> >+ }
> >+
> >+ if (!(new_vma->vm_flags & VM_GROWSDOWN))
> >+ return false;
> >+
>
> In expand_downwards() we have this checked.
>
> Maybe we just leave this done in one place is enough?
>
> >+ mmap_write_lock(mm);
> >+ if (expand_downwards(new_vma, addr)) {
> >+ mmap_write_unlock(mm);
> >+ return false;
> >+ }
> >+
> >+ mmap_write_downgrade(mm);
> >+ return true;
> >+}
> >+#else
> >+bool mmap_read_lock_maybe_expand(struct mm_struct *mm, struct vm_area_struct *vma,
> >+ unsigned long addr, bool write)
> >+{
> >+ return false;
> >+}
> >+#endif
> >--
> >2.47.1
> >
>
> --
> Wei Yang
> Help you, Help me
next prev parent reply other threads:[~2024-12-05 7:06 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-03 18:05 [PATCH 0/5] mm/vma: make more mmap logic userland testable Lorenzo Stoakes
2024-12-03 18:05 ` [PATCH 1/5] mm/vma: move brk() internals to mm/vma.c Lorenzo Stoakes
2024-12-04 11:55 ` kernel test robot
2024-12-04 12:10 ` Lorenzo Stoakes
2024-12-04 12:08 ` Lorenzo Stoakes
2024-12-04 13:10 ` kernel test robot
2024-12-03 18:05 ` [PATCH 2/5] mm/vma: move unmapped_area() " Lorenzo Stoakes
2024-12-03 18:05 ` [PATCH 3/5] mm: abstract get_arg_page() stack expansion and mmap read lock Lorenzo Stoakes
2024-12-05 0:18 ` Wei Yang
2024-12-05 7:01 ` Lorenzo Stoakes
2024-12-08 11:27 ` Wei Yang
2024-12-09 10:47 ` Lorenzo Stoakes
2024-12-05 7:06 ` Lorenzo Stoakes [this message]
2024-12-10 17:14 ` Jann Horn
2024-12-14 1:05 ` Kees Cook
2024-12-03 18:05 ` [PATCH 4/5] mm/vma: move stack expansion logic to mm/vma.c Lorenzo Stoakes
2024-12-03 18:05 ` [PATCH 5/5] mm/vma: move __vm_munmap() " Lorenzo Stoakes
2024-12-04 23:56 ` [PATCH 0/5] mm/vma: make more mmap logic userland testable Wei Yang
2024-12-05 7:03 ` Lorenzo Stoakes
2024-12-06 0:30 ` Wei Yang
2024-12-09 10:35 ` Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=853b7230-1939-465c-ad78-88400765b424@lucifer.local \
--to=lorenzo.stoakes@oracle.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=ebiederm@xmission.com \
--cc=jack@suse.cz \
--cc=jannh@google.com \
--cc=kees@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=richard.weiyang@gmail.com \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox