From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2EEBDE7717F for ; Tue, 10 Dec 2024 17:15:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9B2B36B009D; Tue, 10 Dec 2024 12:15:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 961D88D000B; Tue, 10 Dec 2024 12:15:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7B64F6B0112; Tue, 10 Dec 2024 12:15:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 58C848D000B for ; Tue, 10 Dec 2024 12:15:34 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id CA0891C7790 for ; Tue, 10 Dec 2024 17:15:33 +0000 (UTC) X-FDA: 82879699392.20.890AD3E Received: from mail-ed1-f47.google.com (mail-ed1-f47.google.com [209.85.208.47]) by imf15.hostedemail.com (Postfix) with ESMTP id 14E40A0008 for ; Tue, 10 Dec 2024 17:15:08 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=nwUBGkO1; spf=pass (imf15.hostedemail.com: domain of jannh@google.com designates 209.85.208.47 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733850908; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1j7LZ3gidxJvgG/m4KFvrvy2HeW92l/T3DtHZWGqx18=; b=cC53QaxixZ+2DfuauShJViPCle8m7IB+0ZSDYHvuN52URudGOPxQ4OqCifQ7cbHO47os19 6TPlmL5HuuNuoE+O7j9Rd/IxwZHff08uJA2PCBrRSSbhcX38iXIotYnn1ZRpXneWULqKeL OJVWul4GyXAEDEHL88KIEczzXqlP8hY= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=nwUBGkO1; spf=pass (imf15.hostedemail.com: domain of jannh@google.com designates 209.85.208.47 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733850908; a=rsa-sha256; cv=none; b=ys4qK/ARtpi7mWJQum614PMiQHFDBi5Ug7FQwlWvQUOBnMgpBkB50lVwFeLn/GGQ/Z00Vi iGeX2Sz3MLX2lcA10Qul+Yguki9CgwBvfAghF2l/zj+YW1RHDrBAHpOuxPZAXl1tIGjj7T l8yOpgq1fCBFA3x9/VEA2nKqBCvdAvU= Received: by mail-ed1-f47.google.com with SMTP id 4fb4d7f45d1cf-5d3c2135f61so9018a12.1 for ; Tue, 10 Dec 2024 09:15:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1733850930; x=1734455730; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=1j7LZ3gidxJvgG/m4KFvrvy2HeW92l/T3DtHZWGqx18=; b=nwUBGkO1ZlgwiQDNsCNkd6gL2JhHnw2vHWM4YvxRS7riT4QS74q1RckAN8WaUlqmZQ rSCzUyDgejglBzXE7PwixmdKF1GC5uOIVhoQKZjJ8IgqNVSaGvWq7gsd4Cm7p4cWzHJR qLEoI3mCESDIxo1yOjw14w6QkON9GlhrtQ4MRq6okpf0Xsvu4WhFA3weE3pD6XIrz31i B4d9UVU3ejk503mHcwh+QyU+mTYKL5dLDdrNiMeRHv50Gh4BaDf00Fsi5VT0hbhFgvFh 1iVnRiUWvUxnAPflxn9CO2Ztttp8hoMVVVowUNz5G01/1bjY9D4dvu1DIo5x70UKnHGi u/HQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733850930; x=1734455730; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1j7LZ3gidxJvgG/m4KFvrvy2HeW92l/T3DtHZWGqx18=; b=Ghom2IA7P3vii6TdaAGLwmCdEv4XFQI+sB0ubXm0nLdWYQYcWosor6VLaWMOYKBiVn 8OiM9nytej9EpRm5lRMOMFHyE6YCxD7AahjvRl8MPOUSTnQ/FgiyzktTuM6SFHoIEmgZ vORWOIMJJbHW6kiSjO/R5HQzDdkkqu8VmAi3/WMmtDTiCE/Nh1B+Ii0etV2cyGNfYLes ibXlgR8L2PztEQAhPJAVGvWxZkqlDsXt3dJkAa2YHmROVh/NW+mVFxdt83SGkfuM6NCh WK0XbRnvUt1oSYC72MCr0jKKf9NqZzI15Qsb+gRzL73A1Fz4lT7WwL4KKW6cH6l5QfDX qwsw== X-Forwarded-Encrypted: i=1; AJvYcCUZZuKY4a8QnjakomP75KAf08Sp22ucaUHrdb6YqF4BBugE4GKTcdb56YRvCJ+/4Dwi1dQCBDVf+g==@kvack.org X-Gm-Message-State: AOJu0YzMOhM6QGxe6KVVaqgDAUSf8+1FUnK3io6RcbM3zB1kh2uLAauo L+ZfDULtzjR4EDs8ys6e9iZTfMOq8Yg/JuGIM/z0HqUYUqOnHH7woUIh0hI5ZAPq3XjlfGbD7LZ nSvm8bLqHTBmZmhv6b8Isw/qZGoI8lQL6htW8 X-Gm-Gg: ASbGnct0mAhwZanBSzc2ryjLpGzsiy1lKLyNXZheC5sJFDxn+TvYRbp9IuiuRITp1e+ QMyHOl62ODSYZfSNE9ijVgMjdZ1+8QrqQXFewfGOJqbms6+eHJIckD4RWSK08ws4= X-Google-Smtp-Source: AGHT+IHT16w6MB/cAWIgtFbiSkDVdo7XP1VoYlvCUhWiCeYhHyzKjFoNyCYw0JcgMIyZtC6modyN7XVZ+Xm+NXJ7OGE= X-Received: by 2002:a50:c2d1:0:b0:5d0:d935:457b with SMTP id 4fb4d7f45d1cf-5d41ed02050mr123811a12.0.1733850929585; Tue, 10 Dec 2024 09:15:29 -0800 (PST) MIME-Version: 1.0 References: <5295d1c70c58e6aa63d14be68d4e1de9fa1c8e6d.1733248985.git.lorenzo.stoakes@oracle.com> In-Reply-To: <5295d1c70c58e6aa63d14be68d4e1de9fa1c8e6d.1733248985.git.lorenzo.stoakes@oracle.com> From: Jann Horn Date: Tue, 10 Dec 2024 18:14:53 +0100 Message-ID: Subject: Re: [PATCH 3/5] mm: abstract get_arg_page() stack expansion and mmap read lock To: Lorenzo Stoakes Cc: Andrew Morton , "Liam R . Howlett" , Vlastimil Babka , Eric Biederman , Kees Cook , Alexander Viro , Christian Brauner , Jan Kara , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 14E40A0008 X-Rspam-User: X-Stat-Signature: ie43tf7z1nhyg3gaqtxhzani8dbfmzn1 X-HE-Tag: 1733850908-421689 X-HE-Meta: U2FsdGVkX19cHwgsA7/pW4peGqiIYkNLv2VtKTK+SRyXEeQnn8TSIxcHr9z8N00WZjzt+O8u+WAAeHVmOqCVEi6oNT3WX5KMuv1ghCfDmhws3Abim0lpgYItkGZV709sm1XcJQSCwSqYgqpZRbW4jJ+Rov/lr3e+kK+kx8FEgFodxw0RkZUlDy7FAf+liGoCKrac3KxrWViBI219obQ3rktSxkENfOauE38uAOwjlq5WtNGgypA5gNPiDDuInburZ3A2OSoq5EYQUtHwnW1EQgJeXFUVwpzZAG3vqVjJrdsEgj/wnVRLQtzwELePwmjfPfTI97UYg3PAaaaoo0aDRP1VNu4ZOR1wWxLwZR2SKLAe/uKIdkU8oUGElBMnFFxtUzegn3Ur/x919BMPFWE0fzt8EygBGMQ+JId5f2JDV0mwPciuP2HE39IGAcAySS6g9Kpn/7LN93JfbNd2FNylGWBvlnXi4INHvCWCCSfamF/fnYgDKzwmqwltJaMSg6IoWcyqwh+R0WdkmKca0xDaF7XsLsycl8ugCsKDXtNakf49/wlJHFjpGykCVDX7KMKSrqsbdQDQSdnjqyFNH1a9m7VpoGx8DZDlKhBrxkxjbvC4U7t/ukRbBSt3MGvi8esdc1ct8iQkyZTpnkLwgUxkJVcEbPkSZ5LmWN7IwU91e0qu1elXOpOpctsjMwVOYRJhYMk5mkeMhlfuQt8Q3yxowuYjHCSaW+OweI8DcNT2oAeddPhBJJ2K5IILklgoY5ZNAE94tYFk9zVK9sYLB7jPHV+JNHxqf/PL1TJVxixM9ta1/gXrbPvnJ7lZAFf8usB4zadXb/iCyOgfdyT4jeS+N4Q6QxtGKXke4x+zm0nKCKRNJeDNU5kgXpRYTxuh12DgbALu1bB8CHj8J0rqiGqHy6cSS+jSS6swuELRxqsHpEuNDr+cUc0kUoQRax+mUjSwBw1Byr4KTwdABagwsjY jW7BCEGd c2rKL6VlmQKIPH7H75WdQKn2+ly+poetX5jqltqzN258BSZ4RKMxQVE0V7HtnGU8lJ4idNoaZCzBw/rzRiUJ9CDlAl1Emkqdhl4dEHUoJEttyqi6HWR8h27XVxlMY1ZOLgLJZQ4WZnB8MWHReBhlShU/rJZFCb+ddidfq7YRgv+DOC6i+N6NXe+T+vTqTW+iWW8oCfUOj/gQ5DRnKxThsSxAgN77EauojKDSQhollr7y/6tCVGEnJOxMfqvti2qQY0nDe5n+kVFh7CdE= X-Bogosity: Unsure, tests=bogofilter, spamicity=0.492525, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Dec 3, 2024 at 7:05=E2=80=AFPM Lorenzo Stoakes wrote: > Right now fs/exec.c invokes expand_downwards(), an otherwise internal > implementation detail of the VMA logic in order to ensure that an arg pag= e > can be obtained by get_user_pages_remote(). > > In order to be able to move the stack expansion logic into mm/vma.c in > order to make it available to userland testing we need to find an > alternative approach here. > > We do so by providing the mmap_read_lock_maybe_expand() function which al= so > helpfully documents what get_arg_page() is doing here and adds an > additional check against VM_GROWSDOWN to make explicit that the stack > expansion logic is only invoked when the VMA is indeed a downward-growing > stack. > > This allows expand_downwards() to become a static function. > > Importantly, the VMA referenced by mmap_read_maybe_expand() must NOT be > currently user-visible in any way, that is place within an rmap or VMA > tree. It must be a newly allocated VMA. > > This is the case when exec invokes this function. > > Signed-off-by: Lorenzo Stoakes > --- > fs/exec.c | 14 +++--------- > include/linux/mm.h | 5 ++--- > mm/mmap.c | 54 +++++++++++++++++++++++++++++++++++++++++++++- > 3 files changed, 58 insertions(+), 15 deletions(-) > > diff --git a/fs/exec.c b/fs/exec.c > index 98cb7ba9983c..1e1f79c514de 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -205,18 +205,10 @@ static struct page *get_arg_page(struct linux_binpr= m *bprm, unsigned long pos, > /* > * Avoid relying on expanding the stack down in GUP (which > * does not work for STACK_GROWSUP anyway), and just do it > - * by hand ahead of time. > + * ahead of time. > */ > - if (write && pos < vma->vm_start) { > - mmap_write_lock(mm); > - ret =3D expand_downwards(vma, pos); > - if (unlikely(ret < 0)) { > - mmap_write_unlock(mm); > - return NULL; > - } > - mmap_write_downgrade(mm); > - } else > - mmap_read_lock(mm); > + if (!mmap_read_lock_maybe_expand(mm, vma, pos, write)) > + return NULL; [...] > +/* > + * Obtain a read lock on mm->mmap_lock, if the specified address is belo= w the > + * start of the VMA, the intent is to perform a write, and it is a > + * downward-growing stack, then attempt to expand the stack to contain i= t. > + * > + * This function is intended only for obtaining an argument page from an= ELF > + * image, and is almost certainly NOT what you want to use for any other > + * purpose. > + * > + * IMPORTANT - VMA fields are accessed without an mmap lock being held, = so the > + * VMA referenced must not be linked in any user-visible tree, i.e. it m= ust be a > + * new VMA being mapped. > + * > + * The function assumes that addr is either contained within the VMA or = below > + * it, and makes no attempt to validate this value beyond that. > + * > + * Returns true if the read lock was obtained and a stack was perhaps ex= panded, > + * false if the stack expansion failed. > + * > + * On stack expansion the function temporarily acquires an mmap write lo= ck > + * before downgrading it. > + */ > +bool mmap_read_lock_maybe_expand(struct mm_struct *mm, > + struct vm_area_struct *new_vma, > + unsigned long addr, bool write) > +{ > + if (!write || addr >=3D new_vma->vm_start) { > + mmap_read_lock(mm); > + return true; > + } > + > + if (!(new_vma->vm_flags & VM_GROWSDOWN)) > + return false; > + > + mmap_write_lock(mm); > + if (expand_downwards(new_vma, addr)) { > + mmap_write_unlock(mm); > + return false; > + } > + > + mmap_write_downgrade(mm); > + return true; > +} Random thought: For write=3D=3D1, this looks a bit like lock_mm_and_find_vma(mm, addr, NULL), which needs similar stack expansion logic for handling userspace faults. But it's for a sufficiently different situation that maybe it makes sense to keep it like you did it, as a separate function...