linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jann Horn <jannh@google.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	 Vlastimil Babka <vbabka@suse.cz>,
	Eric Biederman <ebiederm@xmission.com>,
	Kees Cook <kees@kernel.org>,
	 Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	 linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	 linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/5] mm: abstract get_arg_page() stack expansion and mmap read lock
Date: Tue, 10 Dec 2024 18:14:53 +0100	[thread overview]
Message-ID: <CAG48ez12K25yNWaAXqMnC8tfpTQFOwzvPsyE7r8N1NM9wqfzzw@mail.gmail.com> (raw)
In-Reply-To: <5295d1c70c58e6aa63d14be68d4e1de9fa1c8e6d.1733248985.git.lorenzo.stoakes@oracle.com>

On Tue, Dec 3, 2024 at 7:05 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
> Right now fs/exec.c invokes expand_downwards(), an otherwise internal
> implementation detail of the VMA logic in order to ensure that an arg page
> can be obtained by get_user_pages_remote().
>
> In order to be able to move the stack expansion logic into mm/vma.c in
> order to make it available to userland testing we need to find an
> alternative approach here.
>
> We do so by providing the mmap_read_lock_maybe_expand() function which also
> helpfully documents what get_arg_page() is doing here and adds an
> additional check against VM_GROWSDOWN to make explicit that the stack
> expansion logic is only invoked when the VMA is indeed a downward-growing
> stack.
>
> This allows expand_downwards() to become a static function.
>
> Importantly, the VMA referenced by mmap_read_maybe_expand() must NOT be
> currently user-visible in any way, that is place within an rmap or VMA
> tree. It must be a newly allocated VMA.
>
> This is the case when exec invokes this function.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
>  fs/exec.c          | 14 +++---------
>  include/linux/mm.h |  5 ++---
>  mm/mmap.c          | 54 +++++++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 58 insertions(+), 15 deletions(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index 98cb7ba9983c..1e1f79c514de 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -205,18 +205,10 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
>         /*
>          * Avoid relying on expanding the stack down in GUP (which
>          * does not work for STACK_GROWSUP anyway), and just do it
> -        * by hand ahead of time.
> +        * ahead of time.
>          */
> -       if (write && pos < vma->vm_start) {
> -               mmap_write_lock(mm);
> -               ret = expand_downwards(vma, pos);
> -               if (unlikely(ret < 0)) {
> -                       mmap_write_unlock(mm);
> -                       return NULL;
> -               }
> -               mmap_write_downgrade(mm);
> -       } else
> -               mmap_read_lock(mm);
> +       if (!mmap_read_lock_maybe_expand(mm, vma, pos, write))
> +               return NULL;
[...]
> +/*
> + * Obtain a read lock on mm->mmap_lock, if the specified address is below the
> + * start of the VMA, the intent is to perform a write, and it is a
> + * downward-growing stack, then attempt to expand the stack to contain it.
> + *
> + * This function is intended only for obtaining an argument page from an ELF
> + * image, and is almost certainly NOT what you want to use for any other
> + * purpose.
> + *
> + * IMPORTANT - VMA fields are accessed without an mmap lock being held, so the
> + * VMA referenced must not be linked in any user-visible tree, i.e. it must be a
> + * new VMA being mapped.
> + *
> + * The function assumes that addr is either contained within the VMA or below
> + * it, and makes no attempt to validate this value beyond that.
> + *
> + * Returns true if the read lock was obtained and a stack was perhaps expanded,
> + * false if the stack expansion failed.
> + *
> + * On stack expansion the function temporarily acquires an mmap write lock
> + * before downgrading it.
> + */
> +bool mmap_read_lock_maybe_expand(struct mm_struct *mm,
> +                                struct vm_area_struct *new_vma,
> +                                unsigned long addr, bool write)
> +{
> +       if (!write || addr >= new_vma->vm_start) {
> +               mmap_read_lock(mm);
> +               return true;
> +       }
> +
> +       if (!(new_vma->vm_flags & VM_GROWSDOWN))
> +               return false;
> +
> +       mmap_write_lock(mm);
> +       if (expand_downwards(new_vma, addr)) {
> +               mmap_write_unlock(mm);
> +               return false;
> +       }
> +
> +       mmap_write_downgrade(mm);
> +       return true;
> +}

Random thought: For write==1, this looks a bit like
lock_mm_and_find_vma(mm, addr, NULL), which needs similar stack
expansion logic for handling userspace faults. But it's for a
sufficiently different situation that maybe it makes sense to keep it
like you did it, as a separate function...


  parent reply	other threads:[~2024-12-10 17:15 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-03 18:05 [PATCH 0/5] mm/vma: make more mmap logic userland testable Lorenzo Stoakes
2024-12-03 18:05 ` [PATCH 1/5] mm/vma: move brk() internals to mm/vma.c Lorenzo Stoakes
2024-12-04 11:55   ` kernel test robot
2024-12-04 12:10     ` Lorenzo Stoakes
2024-12-04 12:08   ` Lorenzo Stoakes
2024-12-04 13:10   ` kernel test robot
2024-12-03 18:05 ` [PATCH 2/5] mm/vma: move unmapped_area() " Lorenzo Stoakes
2024-12-03 18:05 ` [PATCH 3/5] mm: abstract get_arg_page() stack expansion and mmap read lock Lorenzo Stoakes
2024-12-05  0:18   ` Wei Yang
2024-12-05  7:01     ` Lorenzo Stoakes
2024-12-08 11:27       ` Wei Yang
2024-12-09 10:47         ` Lorenzo Stoakes
2024-12-05  7:06     ` Lorenzo Stoakes
2024-12-10 17:14   ` Jann Horn [this message]
2024-12-14  1:05   ` Kees Cook
2024-12-03 18:05 ` [PATCH 4/5] mm/vma: move stack expansion logic to mm/vma.c Lorenzo Stoakes
2024-12-03 18:05 ` [PATCH 5/5] mm/vma: move __vm_munmap() " Lorenzo Stoakes
2024-12-04 23:56 ` [PATCH 0/5] mm/vma: make more mmap logic userland testable Wei Yang
2024-12-05  7:03   ` Lorenzo Stoakes
2024-12-06  0:30     ` Wei Yang
2024-12-09 10:35       ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAG48ez12K25yNWaAXqMnC8tfpTQFOwzvPsyE7r8N1NM9wqfzzw@mail.gmail.com \
    --to=jannh@google.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=ebiederm@xmission.com \
    --cc=jack@suse.cz \
    --cc=kees@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox