From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD159C433E7 for ; Thu, 15 Oct 2020 00:01:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0D9B42173E for ; Thu, 15 Oct 2020 00:01:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=google.com header.i=@google.com header.b="GFSgmnGu" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0D9B42173E Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D7F676B0071; Wed, 14 Oct 2020 20:00:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D31E96B0072; Wed, 14 Oct 2020 20:00:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC5AF6B0073; Wed, 14 Oct 2020 20:00:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0184.hostedemail.com [216.40.44.184]) by kanga.kvack.org (Postfix) with ESMTP id 8B20F6B0071 for ; Wed, 14 Oct 2020 20:00:59 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 26047181AEF00 for ; Thu, 15 Oct 2020 00:00:59 +0000 (UTC) X-FDA: 77372204238.18.tax40_4c0006627210 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id 05C5D100FEE84 for ; Thu, 15 Oct 2020 00:00:59 +0000 (UTC) X-HE-Tag: tax40_4c0006627210 X-Filterd-Recvd-Size: 10148 Received: from mail-wr1-f66.google.com (mail-wr1-f66.google.com [209.85.221.66]) by imf32.hostedemail.com (Postfix) with ESMTP for ; Thu, 15 Oct 2020 00:00:58 +0000 (UTC) Received: by mail-wr1-f66.google.com with SMTP id b8so1068073wrn.0 for ; Wed, 14 Oct 2020 17:00:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=d++93rmuKVnPkbqL51tLjdy/HXENTumGpvgD+/W0kLI=; b=GFSgmnGueUyX/IIcG66ic4hUSck0/SZq/9BVg5ujVleyAhystaeQUgdcn5Q4ADHH1u uIfQ3IrfmOJc9vjv6EmXIdhTwUpz73jl43Sy53sTLHAUzDOThOBCU+RqFrkffOvgvZcd FbvMRsG71qEacfwpsLhR7ltynkxX7aUkaN26bkU89+h4SSR9PE6AyoOMfJE+u4LwZZs7 vCgIqlLLUlw2eFeNsOGbNgNCwmCs4NcaRqT7DT3+k+GjtyUiIvVCu3/KdSbXM3RfcnrY K1Q0GKChjL4HFh8HH9GVt1gNfH5Wm7tPZt24/CYwSvP7X9dYBY/yyRaEBfIbKfsUqho9 8ojg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=d++93rmuKVnPkbqL51tLjdy/HXENTumGpvgD+/W0kLI=; b=ukYuDlkHkiSSYTJ+zz6gNF47ruO+HlcP7yomoa64IJtJMKD+LKTeSQdfdeOeEg+3f1 S13fxxm+mV2A8w+CXu6TP+rrt4u0j//X3eTX0UeK3i5MgtJ0mxZaXCbKIHwTMERVaQVS WDWMjNiuePoMj2nKae0deinh19Mk2YmN60QEDZi+2POP1TLEtXBZruaYF/i3LQwim8UL R1sAKd6yx2fbHU/GtxevoynwXd7Z7XM1F9X9eL7JntZLnait/w8/gwjgcvPEet8A3qlp ZsosobEqnelw9Tf1TpywcxHbeO2uH+xdB1tatbBJHvokOFyy9phpEnQuymnKwAKMzWxU aKHA== X-Gm-Message-State: AOAM5336U9zJWfYwUojJXLMDaZfr3OaZugaqCK248eKzLaiqJExBWbZS Qrok8/ZG3wgc+fRQcD/TrwS5GQ== X-Google-Smtp-Source: ABdhPJwfV6oOuN2y0EXBVhh4fwUCpwP/+XRg9XR/9m+8iM2Eykpq28sU+nr3UCEmLC4xcRt7oEP7Xg== X-Received: by 2002:adf:9282:: with SMTP id 2mr1104452wrn.43.1602720057220; Wed, 14 Oct 2020 17:00:57 -0700 (PDT) Received: from localhost ([2a02:168:96c5:1:55ed:514f:6ad7:5bcc]) by smtp.gmail.com with ESMTPSA id h3sm1382253wrw.78.2020.10.14.17.00.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 14 Oct 2020 17:00:56 -0700 (PDT) From: Jann Horn To: Andrew Morton , linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, "Eric W . Biederman" , Michel Lespinasse , Mauro Carvalho Chehab , Sakari Ailus , Jeff Dike , Richard Weinberger , Anton Ivanov , linux-um@lists.infradead.org, Jason Gunthorpe , John Hubbard , Johannes Berg Subject: [PATCH v3 2/2] exec: Broadly lock nascent mm until setup_arg_pages() Date: Thu, 15 Oct 2020 02:00:41 +0200 Message-Id: <20201015000041.1734214-3-jannh@google.com> X-Mailer: git-send-email 2.28.0.1011.ga647a8990f-goog In-Reply-To: <20201015000041.1734214-1-jannh@google.com> References: <20201015000041.1734214-1-jannh@google.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: While AFAIK there currently is nothing that can modify the VMA tree of a new mm until userspace has started running under the mm, we should proper= ly lock the mm here anyway, both to keep lockdep happy when adding locking assertions and to be safe in the future in case someone e.g. decides to permit VMA-tree-mutating operations in process_madvise_behavior_valid(). The goal of this patch is to broadly lock the nascent mm in the exec path= , from around the time it is created all the way to the end of setup_arg_pages() (because setup_arg_pages() accesses bprm->vma). As long as the mm is write-locked, keep it around in bprm->mm, even after it has been installed on the task (with an extra reference on the mm, to reduce complexity in free_bprm()). After setup_arg_pages(), we have to unlock the mm so that APIs such as copy_to_user() will work in the following binfmt-specific setup code. Suggested-by: Jason Gunthorpe Suggested-by: Michel Lespinasse Signed-off-by: Jann Horn --- fs/exec.c | 68 ++++++++++++++++++++--------------------- include/linux/binfmts.h | 2 +- 2 files changed, 35 insertions(+), 35 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 229dbc7aa61a..00edf833781f 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -254,11 +254,6 @@ static int __bprm_mm_init(struct linux_binprm *bprm) return -ENOMEM; vma_set_anonymous(vma); =20 - if (mmap_write_lock_killable(mm)) { - err =3D -EINTR; - goto err_free; - } - /* * Place the stack at the largest stack address the architecture * supports. Later, we'll move this to an appropriate place. We don't @@ -276,12 +271,9 @@ static int __bprm_mm_init(struct linux_binprm *bprm) goto err; =20 mm->stack_vm =3D mm->total_vm =3D 1; - mmap_write_unlock(mm); bprm->p =3D vma->vm_end - sizeof(void *); return 0; err: - mmap_write_unlock(mm); -err_free: bprm->vma =3D NULL; vm_area_free(vma); return err; @@ -364,9 +356,9 @@ static int bprm_mm_init(struct linux_binprm *bprm) struct mm_struct *mm =3D NULL; =20 bprm->mm =3D mm =3D mm_alloc(); - err =3D -ENOMEM; if (!mm) - goto err; + return -ENOMEM; + mmap_write_lock_nascent(mm); =20 /* Save current stack limit for all calculations made during exec. */ task_lock(current->group_leader); @@ -374,17 +366,12 @@ static int bprm_mm_init(struct linux_binprm *bprm) task_unlock(current->group_leader); =20 err =3D __bprm_mm_init(bprm); - if (err) - goto err; - - return 0; - -err: - if (mm) { - bprm->mm =3D NULL; - mmdrop(mm); - } + if (!err) + return 0; =20 + bprm->mm =3D NULL; + mmap_write_unlock(mm); + mmdrop(mm); return err; } =20 @@ -735,6 +722,7 @@ static int shift_arg_pages(struct vm_area_struct *vma= , unsigned long shift) /* * Finalizes the stack vm_area_struct. The flags and permissions are upd= ated, * the stack is optionally relocated, and some extra space is added. + * At the end of this, the mm_struct will be unlocked on success. */ int setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top, @@ -787,9 +775,6 @@ int setup_arg_pages(struct linux_binprm *bprm, bprm->loader -=3D stack_shift; bprm->exec -=3D stack_shift; =20 - if (mmap_write_lock_killable(mm)) - return -EINTR; - vm_flags =3D VM_STACK_FLAGS; =20 /* @@ -807,7 +792,7 @@ int setup_arg_pages(struct linux_binprm *bprm, ret =3D mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end, vm_flags); if (ret) - goto out_unlock; + return ret; BUG_ON(prev !=3D vma); =20 if (unlikely(vm_flags & VM_EXEC)) { @@ -819,7 +804,7 @@ int setup_arg_pages(struct linux_binprm *bprm, if (stack_shift) { ret =3D shift_arg_pages(vma, stack_shift); if (ret) - goto out_unlock; + return ret; } =20 /* mprotect_fixup is overkill to remove the temporary stack flags */ @@ -846,11 +831,17 @@ int setup_arg_pages(struct linux_binprm *bprm, current->mm->start_stack =3D bprm->p; ret =3D expand_stack(vma, stack_base); if (ret) - ret =3D -EFAULT; + return -EFAULT; =20 -out_unlock: + /* + * From this point on, anything that wants to poke around in the + * mm_struct must lock it by itself. + */ + bprm->vma =3D NULL; mmap_write_unlock(mm); - return ret; + mmput(mm); + bprm->mm =3D NULL; + return 0; } EXPORT_SYMBOL(setup_arg_pages); =20 @@ -1114,8 +1105,6 @@ static int exec_mmap(struct mm_struct *mm) if (ret) return ret; =20 - mmap_write_lock_nascent(mm); - if (old_mm) { /* * Make sure that if there is a core dump in progress @@ -1127,11 +1116,12 @@ static int exec_mmap(struct mm_struct *mm) if (unlikely(old_mm->core_state)) { mmap_read_unlock(old_mm); mutex_unlock(&tsk->signal->exec_update_mutex); - mmap_write_unlock(mm); return -EINTR; } } =20 + /* bprm->mm stays refcounted, current->mm takes an extra reference */ + mmget(mm); task_lock(tsk); active_mm =3D tsk->active_mm; membarrier_exec_mmap(mm); @@ -1141,7 +1131,6 @@ static int exec_mmap(struct mm_struct *mm) tsk->mm->vmacache_seqnum =3D 0; vmacache_flush(tsk); task_unlock(tsk); - mmap_write_unlock(mm); if (old_mm) { mmap_read_unlock(old_mm); BUG_ON(active_mm !=3D old_mm); @@ -1397,8 +1386,6 @@ int begin_new_exec(struct linux_binprm * bprm) if (retval) goto out; =20 - bprm->mm =3D NULL; - #ifdef CONFIG_POSIX_TIMERS exit_itimers(me->signal); flush_itimer_signals(); @@ -1545,6 +1532,18 @@ void setup_new_exec(struct linux_binprm * bprm) me->mm->task_size =3D TASK_SIZE; mutex_unlock(&me->signal->exec_update_mutex); mutex_unlock(&me->signal->cred_guard_mutex); + + if (!IS_ENABLED(CONFIG_MMU)) { + /* + * On MMU, setup_arg_pages() wants to access bprm->vma after + * this point, so we can't drop the mmap lock yet. + * On !MMU, we have neither setup_arg_pages() nor bprm->vma, + * so we should drop the lock here. + */ + mmap_write_unlock(bprm->mm); + mmput(bprm->mm); + bprm->mm =3D NULL; + } } EXPORT_SYMBOL(setup_new_exec); =20 @@ -1581,6 +1580,7 @@ static void free_bprm(struct linux_binprm *bprm) { if (bprm->mm) { acct_arg_size(bprm, 0); + mmap_write_unlock(bprm->mm); mmput(bprm->mm); } free_arg_pages(bprm); diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 0571701ab1c5..3bf06212fbae 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -22,7 +22,7 @@ struct linux_binprm { # define MAX_ARG_PAGES 32 struct page *page[MAX_ARG_PAGES]; #endif - struct mm_struct *mm; + struct mm_struct *mm; /* nascent mm, write-locked */ unsigned long p; /* current top of mem */ unsigned long argmin; /* rlimit marker for copy_strings() */ unsigned int --=20 2.28.0.1011.ga647a8990f-goog