From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E759EC4727E for ; Tue, 6 Oct 2020 22:55:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5DC402100A for ; Tue, 6 Oct 2020 22:55:06 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=google.com header.i=@google.com header.b="k8l5gHBu" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5DC402100A Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E4B22900004; Tue, 6 Oct 2020 18:55:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DFB79900002; Tue, 6 Oct 2020 18:55:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CE8AE900004; Tue, 6 Oct 2020 18:55:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0202.hostedemail.com [216.40.44.202]) by kanga.kvack.org (Postfix) with ESMTP id A0415900002 for ; Tue, 6 Oct 2020 18:55:05 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 2A8CB181AE86D for ; Tue, 6 Oct 2020 22:55:05 +0000 (UTC) X-FDA: 77343007770.20.robin73_3e12497271ca Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin20.hostedemail.com (Postfix) with ESMTP id 0A1DE180C07AB for ; Tue, 6 Oct 2020 22:55:05 +0000 (UTC) X-HE-Tag: robin73_3e12497271ca X-Filterd-Recvd-Size: 10089 Received: from mail-wr1-f66.google.com (mail-wr1-f66.google.com [209.85.221.66]) by imf48.hostedemail.com (Postfix) with ESMTP for ; Tue, 6 Oct 2020 22:55:04 +0000 (UTC) Received: by mail-wr1-f66.google.com with SMTP id t10so37832wrv.1 for ; Tue, 06 Oct 2020 15:55:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ZE/wKx/2RmNzMs8AoG4OykxIKmOmK9f6qquh+OkR+3A=; b=k8l5gHBu8FLs9Pi1ZWEnxYCUxxmeO6LjSOLKBlTXhxesA30vKQZ1TbOHjgA4/Vxkiu sIZCzhYXm+gsp4aTSWTbAOo3QGodZRtXsPWwjT+62xhCCfGtc9p1GtFEptQ6FIgKmCVE RWo2wLMqH5IRS9hU/QluO6sQpfxOjYGXBg+D8EX9FrvnYQVEYdhem2tfI/8vOI3r79Lv BzWUPsNp636ZRWg+DwQUXSGBOuw5/NZIxUAXtgr5II1G1jjx2Q5/FRzbfXddYePupuNQ WCC2TqJ7L/FHzJOawmS0uel0FSEauRpWzkdNYVGfB50abXfAnmt2aKIL+RuQ3NIGZPS2 PKdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZE/wKx/2RmNzMs8AoG4OykxIKmOmK9f6qquh+OkR+3A=; b=tNxZDuOeTO7heXbxCCIFOu3vnR+V4XWctzBDI8UrSaqXezpGovSjYYjCRzqWNSx+eD /ncoM/ajjmrlrC/3RHCWacg03J+h2zj97kXDd53YivKFFj9oo2Co+6q8aUswXAHoe18V Pe+DgJpAvdZg+30vV/nB6HXu/YKF9kPPIq7afeOamZ6qDorhD7f4wGUvi+Yys5RE+ZTH Gua4ugVJs8e3grY4Xf7U5ZmjuhCKXTc4jbBbSwOfKf3dzcG1FXmWLzZO+qpQ3CRaIMGJ 8TNnXDY8lF/GEnGV5OFb7loLlSX0roJMrPZsW28u/s8Iv2Uy7RVbFg5dis+qDPuqlynU MnSA== X-Gm-Message-State: AOAM531Iuitj2gUapLvpS+atg6jMeG3qF3jVtmV1jlk/ZKqIGgMKRzuz EHjKY1o0J4mpgrqfJ2in1Qnts4C1cDJ4qw== X-Google-Smtp-Source: ABdhPJzjkNN/QpBwkc7ric2cIHgfEVwzyvIhc8OstjwlEId4qz1orekL0RfkqLjL7OfsBD4AfXnEzA== X-Received: by 2002:adf:e70a:: with SMTP id c10mr156257wrm.425.1602024903367; Tue, 06 Oct 2020 15:55:03 -0700 (PDT) Received: from localhost ([2a02:168:96c5:1:55ed:514f:6ad7:5bcc]) by smtp.gmail.com with ESMTPSA id y11sm310504wrs.16.2020.10.06.15.55.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Oct 2020 15:55:02 -0700 (PDT) From: Jann Horn To: Andrew Morton , linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, "Eric W . Biederman" , Michel Lespinasse , Mauro Carvalho Chehab , Sakari Ailus , Jeff Dike , Richard Weinberger , Anton Ivanov , linux-um@lists.infradead.org, Jason Gunthorpe , John Hubbard Subject: [PATCH v2 2/2] exec: Broadly lock nascent mm until setup_arg_pages() Date: Wed, 7 Oct 2020 00:54:50 +0200 Message-Id: <20201006225450.751742-3-jannh@google.com> X-Mailer: git-send-email 2.28.0.806.g8561365e88-goog In-Reply-To: <20201006225450.751742-1-jannh@google.com> References: <20201006225450.751742-1-jannh@google.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: While AFAIK there currently is nothing that can modify the VMA tree of a new mm until userspace has started running under the mm, we should proper= ly lock the mm here anyway, both to keep lockdep happy when adding locking assertions and to be safe in the future in case someone e.g. decides to permit VMA-tree-mutating operations in process_madvise_behavior_valid(). The goal of this patch is to broadly lock the nascent mm in the exec path= , from around the time it is created all the way to the end of setup_arg_pages() (because setup_arg_pages() accesses bprm->vma). As long as the mm is write-locked, keep it around in bprm->mm, even after it has been installed on the task (with an extra reference on the mm, to reduce complexity in free_bprm()). After setup_arg_pages(), we have to unlock the mm so that APIs such as copy_to_user() will work in the following binfmt-specific setup code. Suggested-by: Jason Gunthorpe Suggested-by: Michel Lespinasse Signed-off-by: Jann Horn --- fs/exec.c | 68 ++++++++++++++++++++--------------------- include/linux/binfmts.h | 2 +- 2 files changed, 35 insertions(+), 35 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 229dbc7aa61a..fe11d77e397a 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -254,11 +254,6 @@ static int __bprm_mm_init(struct linux_binprm *bprm) return -ENOMEM; vma_set_anonymous(vma); =20 - if (mmap_write_lock_killable(mm)) { - err =3D -EINTR; - goto err_free; - } - /* * Place the stack at the largest stack address the architecture * supports. Later, we'll move this to an appropriate place. We don't @@ -276,12 +271,9 @@ static int __bprm_mm_init(struct linux_binprm *bprm) goto err; =20 mm->stack_vm =3D mm->total_vm =3D 1; - mmap_write_unlock(mm); bprm->p =3D vma->vm_end - sizeof(void *); return 0; err: - mmap_write_unlock(mm); -err_free: bprm->vma =3D NULL; vm_area_free(vma); return err; @@ -364,9 +356,9 @@ static int bprm_mm_init(struct linux_binprm *bprm) struct mm_struct *mm =3D NULL; =20 bprm->mm =3D mm =3D mm_alloc(); - err =3D -ENOMEM; if (!mm) - goto err; + return -ENOMEM; + mmap_write_lock_nascent(mm); =20 /* Save current stack limit for all calculations made during exec. */ task_lock(current->group_leader); @@ -374,17 +366,12 @@ static int bprm_mm_init(struct linux_binprm *bprm) task_unlock(current->group_leader); =20 err =3D __bprm_mm_init(bprm); - if (err) - goto err; - - return 0; - -err: - if (mm) { - bprm->mm =3D NULL; - mmdrop(mm); - } + if (!err) + return 0; =20 + bprm->mm =3D NULL; + mmap_write_unlock(mm); + mmdrop(mm); return err; } =20 @@ -735,6 +722,7 @@ static int shift_arg_pages(struct vm_area_struct *vma= , unsigned long shift) /* * Finalizes the stack vm_area_struct. The flags and permissions are upd= ated, * the stack is optionally relocated, and some extra space is added. + * At the end of this, the mm_struct will be unlocked on success. */ int setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top, @@ -787,9 +775,6 @@ int setup_arg_pages(struct linux_binprm *bprm, bprm->loader -=3D stack_shift; bprm->exec -=3D stack_shift; =20 - if (mmap_write_lock_killable(mm)) - return -EINTR; - vm_flags =3D VM_STACK_FLAGS; =20 /* @@ -807,7 +792,7 @@ int setup_arg_pages(struct linux_binprm *bprm, ret =3D mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end, vm_flags); if (ret) - goto out_unlock; + return ret; BUG_ON(prev !=3D vma); =20 if (unlikely(vm_flags & VM_EXEC)) { @@ -819,7 +804,7 @@ int setup_arg_pages(struct linux_binprm *bprm, if (stack_shift) { ret =3D shift_arg_pages(vma, stack_shift); if (ret) - goto out_unlock; + return ret; } =20 /* mprotect_fixup is overkill to remove the temporary stack flags */ @@ -846,11 +831,17 @@ int setup_arg_pages(struct linux_binprm *bprm, current->mm->start_stack =3D bprm->p; ret =3D expand_stack(vma, stack_base); if (ret) - ret =3D -EFAULT; + return -EFAULT; =20 -out_unlock: + /* + * From this point on, anything that wants to poke around in the + * mm_struct must lock it by itself. + */ + bprm->vma =3D NULL; mmap_write_unlock(mm); - return ret; + mmput(mm); + bprm->mm =3D NULL; + return 0; } EXPORT_SYMBOL(setup_arg_pages); =20 @@ -1114,8 +1105,6 @@ static int exec_mmap(struct mm_struct *mm) if (ret) return ret; =20 - mmap_write_lock_nascent(mm); - if (old_mm) { /* * Make sure that if there is a core dump in progress @@ -1127,11 +1116,12 @@ static int exec_mmap(struct mm_struct *mm) if (unlikely(old_mm->core_state)) { mmap_read_unlock(old_mm); mutex_unlock(&tsk->signal->exec_update_mutex); - mmap_write_unlock(mm); return -EINTR; } } =20 + /* bprm->mm stays refcounted, current->mm takes an extra reference */ + mmget(mm); task_lock(tsk); active_mm =3D tsk->active_mm; membarrier_exec_mmap(mm); @@ -1141,7 +1131,6 @@ static int exec_mmap(struct mm_struct *mm) tsk->mm->vmacache_seqnum =3D 0; vmacache_flush(tsk); task_unlock(tsk); - mmap_write_unlock(mm); if (old_mm) { mmap_read_unlock(old_mm); BUG_ON(active_mm !=3D old_mm); @@ -1397,8 +1386,6 @@ int begin_new_exec(struct linux_binprm * bprm) if (retval) goto out; =20 - bprm->mm =3D NULL; - #ifdef CONFIG_POSIX_TIMERS exit_itimers(me->signal); flush_itimer_signals(); @@ -1545,6 +1532,18 @@ void setup_new_exec(struct linux_binprm * bprm) me->mm->task_size =3D TASK_SIZE; mutex_unlock(&me->signal->exec_update_mutex); mutex_unlock(&me->signal->cred_guard_mutex); + +#ifndef CONFIG_MMU + /* + * On MMU, setup_arg_pages() wants to access bprm->vma after this point= , + * so we can't drop the mmap lock yet. + * On !MMU, we have neither setup_arg_pages() nor bprm->vma, so we + * should drop the lock here. + */ + mmap_write_unlock(bprm->mm); + mmput(bprm->mm); + bprm->mm =3D NULL; +#endif } EXPORT_SYMBOL(setup_new_exec); =20 @@ -1581,6 +1580,7 @@ static void free_bprm(struct linux_binprm *bprm) { if (bprm->mm) { acct_arg_size(bprm, 0); + mmap_write_unlock(bprm->mm); mmput(bprm->mm); } free_arg_pages(bprm); diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 0571701ab1c5..3bf06212fbae 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -22,7 +22,7 @@ struct linux_binprm { # define MAX_ARG_PAGES 32 struct page *page[MAX_ARG_PAGES]; #endif - struct mm_struct *mm; + struct mm_struct *mm; /* nascent mm, write-locked */ unsigned long p; /* current top of mem */ unsigned long argmin; /* rlimit marker for copy_strings() */ unsigned int --=20 2.28.0.806.g8561365e88-goog