From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.1 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_2 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2EC3C10F27 for ; Wed, 11 Mar 2020 13:18:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7AB2621D56 for ; Wed, 11 Mar 2020 13:18:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=lca.pw header.i=@lca.pw header.b="q2g5HVw4" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7AB2621D56 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=lca.pw Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0D6D16B0003; Wed, 11 Mar 2020 09:18:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 086CD6B0006; Wed, 11 Mar 2020 09:18:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E6E0C6B0007; Wed, 11 Mar 2020 09:18:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0142.hostedemail.com [216.40.44.142]) by kanga.kvack.org (Postfix) with ESMTP id CCAD66B0003 for ; Wed, 11 Mar 2020 09:18:46 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id B82FF180B7E31 for ; Wed, 11 Mar 2020 13:18:46 +0000 (UTC) X-FDA: 76583136252.11.floor03_69a7fec196756 X-HE-Tag: floor03_69a7fec196756 X-Filterd-Recvd-Size: 13106 Received: from mail-qt1-f195.google.com (mail-qt1-f195.google.com [209.85.160.195]) by imf34.hostedemail.com (Postfix) with ESMTP for ; Wed, 11 Mar 2020 13:18:45 +0000 (UTC) Received: by mail-qt1-f195.google.com with SMTP id t13so1453183qtn.13 for ; Wed, 11 Mar 2020 06:18:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lca.pw; s=google; h=message-id:subject:from:to:cc:date:in-reply-to:references :mime-version:content-transfer-encoding; bh=zDG99WdLxXHRHqpAlE4Kr1ARB/kw0xCVrFSECDHGZtI=; b=q2g5HVw45ub4qGRZ2TmDqz7+EKzIMByrNXJhF81OYjk/jYlezSEqyhXBqp++FkcEt0 858p4kZ6NhfElMa1zAMCQhA6c8eamD5vSCMRn66bUt/44G1VB2VTwF2CtnOjDfN44vT4 xFrpaapDi0X/QCAMv0RohFR7QhtqWIWsT/R+8svf7qWwliC9FfXnBxsLRwRvYatoOe3y YRihct1KoRRzLNUEAeEz7diIsm691okbgkntQagMJOb34wvIvT2gRYzxjyU+rcCbGSkB NqxeNpU/9TuYa9LNT/nixb+StSpI9kJ/ZQxlkWagradtzMWPPvzcX7eCGD8dUM19aKok f/dQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:mime-version:content-transfer-encoding; bh=zDG99WdLxXHRHqpAlE4Kr1ARB/kw0xCVrFSECDHGZtI=; b=cfJ1xa0nvhLYavh05aKP0Jl9Jf5xPW/sRDLHSehs+FERm+VzFrwpjIEplwmYeX96af 5jrFf53NZwLj99OTtjrF/1YLEGjdG0IjYBQoAtftUy9nD/hmdO/JpBcz/ESgtkaoMscL IpB00NEitsX4FS5MHx1EnZZxTi2yLztERW0120DPyzhUQk5FrT7GubrXhUHvtNWgHlQO +8bZCx5sJcjA3bfXu5R15eSpnwleUubKIDl9SBvw+QMDL6V7oeLrLoAQjYl5t4Bu0BCV YWdjZ4Y9eE3TB2sJHVuta5kivd64uyA4pPd7v64r0FSQWEz85Boni5wKDzsAPTbnJ1Ly 4cwQ== X-Gm-Message-State: ANhLgQ2fjaOOZTViqavOBJumk/diHNfSWqPLVL5UJHaTSH/6pHkrXjO5 VdWdOTHvJtoGW8tqEOFIx4K+WQ== X-Google-Smtp-Source: ADFU+vvyf/otIhRI5V52VFI74VzOt7p3eyskKWVi8qCEB+MqKvGW/EWFmgBlovQQ04OmuFPDLYwJWg== X-Received: by 2002:ac8:1add:: with SMTP id h29mr2659982qtk.258.1583932723562; Wed, 11 Mar 2020 06:18:43 -0700 (PDT) Received: from dhcp-41-57.bos.redhat.com (nat-pool-bos-t.redhat.com. [66.187.233.206]) by smtp.gmail.com with ESMTPSA id k4sm24944752qtj.74.2020.03.11.06.18.40 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 11 Mar 2020 06:18:42 -0700 (PDT) Message-ID: <1583932719.7365.171.camel@lca.pw> Subject: Re: [PATCH v2 5/5] exec: Add a exec_update_mutex to replace cred_guard_mutex From: Qian Cai To: "Eric W. Biederman" , Bernd Edlinger Cc: Christian Brauner , Kees Cook , Jann Horn , Jonathan Corbet , Alexander Viro , Andrew Morton , Alexey Dobriyan , Thomas Gleixner , Oleg Nesterov , Frederic Weisbecker , Andrei Vagin , Ingo Molnar , "Peter Zijlstra (Intel)" , Yuyang Du , David Hildenbrand , Sebastian Andrzej Siewior , Anshuman Khandual , David Howells , James Morris , Greg Kroah-Hartman , Shakeel Butt , Jason Gunthorpe , Christian Kellner , Andrea Arcangeli , Aleksa Sarai , "Dmitry V. Levin" , "linux-doc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "linux-mm@kvack.org" , "stable@vger.kernel.org" , "linux-api@vger.kernel.org" Date: Wed, 11 Mar 2020 09:18:39 -0400 In-Reply-To: <87zhcq4jdj.fsf_-_@x220.int.ebiederm.org> References: <87k142lpfz.fsf@x220.int.ebiederm.org> <875zfmloir.fsf@x220.int.ebiederm.org> <87v9nmjulm.fsf@x220.int.ebiederm.org> <202003021531.C77EF10@keescook> <20200303085802.eqn6jbhwxtmz4j2x@wittgenstein> <87v9nlii0b.fsf@x220.int.ebiederm.org> <87a74xi4kz.fsf@x220.int.ebiederm.org> <87r1y8dqqz.fsf@x220.int.ebiederm.org> <87tv32cxmf.fsf_-_@x220.int.ebiederm.org> <87v9ne5y4y.fsf_-_@x220.int.ebiederm.org> <87zhcq4jdj.fsf_-_@x220.int.ebiederm.org> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.22.6 (3.22.6-10.el7) Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, 2020-03-08 at 16:38 -0500, Eric W. Biederman wrote: > The cred_guard_mutex is problematic. The cred_guard_mutex is held > over the userspace accesses as the arguments from userspace are read. > The cred_guard_mutex is held of PTRACE_EVENT_EXIT as the the other > threads are killed. The cred_guard_mutex is held over > "put_user(0, tsk->clear_child_tid)" in exit_mm(). >=20 > Any of those can result in deadlock, as the cred_guard_mutex is held > over a possible indefinite userspace waits for userspace. >=20 > Add exec_update_mutex that is only held over exec updating process > with the new contents of exec, so that code that needs not to be > confused by exec changing the mm and the cred in ways that can not > happen during ordinary execution of a process. >=20 > The plan is to switch the users of cred_guard_mutex to > exec_udpate_mutex one by one. This lets us move forward while still > being careful and not introducing any regressions. >=20 > Link: https://lore.kernel.org/lkml/20160921152946.GA24210@dhcp22.suse.c= z/ > Link: https://lore.kernel.org/lkml/AM6PR03MB5170B06F3A2B75EFB98D071AE4E= 60@AM6PR03MB5170.eurprd03.prod.outlook.com/ > Link: https://lore.kernel.org/linux-fsdevel/20161102181806.GB1112@redha= t.com/ > Link: https://lore.kernel.org/lkml/20160923095031.GA14923@redhat.com/ > Link: https://lore.kernel.org/lkml/20170213141452.GA30203@redhat.com/ > Ref: 45c1a159b85b ("Add PTRACE_O_TRACEVFORKDONE and PTRACE_O_TRACEEXIT = facilities.") > Ref: 456f17cd1a28 ("[PATCH] user-vm-unlock-2.5.31-A2") > Signed-off-by: "Eric W. Biederman" This patch will trigger a warning during boot, [=C2=A0=C2=A0=C2=A019.707214][=C2=A0=C2=A0=C2=A0=C2=A0T1] pci 0035:01:00.= 0: enabling device (0545 -> 0547) [=C2=A0=C2=A0=C2=A019.707287][=C2=A0=C2=A0=C2=A0=C2=A0T1] EEH: Capable ad= apter found: recovery enabled. [=C2=A0=C2=A0=C2=A019.732541][=C2=A0=C2=A0=C2=A0=C2=A0T1] cpuidle-powernv= : Default stop: psscr =3D 0x0000000000000330,mask=3D0x00000000003003ff [=C2=A0=C2=A0=C2=A019.732567][=C2=A0=C2=A0=C2=A0=C2=A0T1] cpuidle-powernv= : Deepest stop: psscr =3D 0x0000000000300375,mask=3D0x00000000003003ff [=C2=A0=C2=A0=C2=A019.732598][=C2=A0=C2=A0=C2=A0=C2=A0T1] cpuidle-powernv= : First stop level that may lose SPRs =3D 0x4 [=C2=A0=C2=A0=C2=A019.732617][=C2=A0=C2=A0=C2=A0=C2=A0T1] cpuidle-powernv= : First stop level that may lose timebase =3D 0x10 [=C2=A0=C2=A0=C2=A019.769784][=C2=A0=C2=A0=C2=A0=C2=A0T1] HugeTLB registe= red 2.00 MiB page size, pre-allocated 0 pages [=C2=A0=C2=A0=C2=A019.769810][=C2=A0=C2=A0=C2=A0=C2=A0T1] HugeTLB registe= red 1.00 GiB page size, pre-allocated 0 pages [=C2=A0=C2=A0=C2=A019.789344][=C2=A0=C2=A0T718]=C2=A0 [=C2=A0=C2=A0=C2=A019.789367][=C2=A0=C2=A0T718] =3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D [=C2=A0=C2=A0=C2=A019.789379][=C2=A0=C2=A0T718] WARNING: bad unlock balan= ce detected! [=C2=A0=C2=A0=C2=A019.789393][=C2=A0=C2=A0T718] 5.6.0-rc5-next-20200311+ = #4 Not tainted [=C2=A0=C2=A0=C2=A019.789414][=C2=A0=C2=A0T718] -------------------------= ------------ [=C2=A0=C2=A0=C2=A019.789426][=C2=A0=C2=A0T718] kworker/u257:0/718 is try= ing to release lock (&sig- >exec_update_mutex) at: [=C2=A0=C2=A0=C2=A019.789459][=C2=A0=C2=A0T718] [] free= _bprm+0xe0/0xf0 [=C2=A0=C2=A0=C2=A019.789481][=C2=A0=C2=A0T718] but there are no more loc= ks to release! [=C2=A0=C2=A0=C2=A019.789502][=C2=A0=C2=A0T718]=C2=A0 [=C2=A0=C2=A0=C2=A019.789502][=C2=A0=C2=A0T718] other info that might hel= p us debug this: [=C2=A0=C2=A0=C2=A019.789537][=C2=A0=C2=A0T718] 1 lock held by kworker/u2= 57:0/718: [=C2=A0=C2=A0=C2=A019.789558][=C2=A0=C2=A0T718]=C2=A0=C2=A0#0: c000001fa8= 842808 (&sig->cred_guard_mutex){+.+.}, at: __do_execve_file.isra.33+0x1b0/0xda0 [=C2=A0=C2=A0=C2=A019.789611][=C2=A0=C2=A0T718]=C2=A0 [=C2=A0=C2=A0=C2=A019.789611][=C2=A0=C2=A0T718] stack backtrace: [=C2=A0=C2=A0=C2=A019.789645][=C2=A0=C2=A0T718] CPU: 8 PID: 718 Comm: kwo= rker/u257:0 Not tainted 5.6.0- rc5-next-20200311+ #4 [=C2=A0=C2=A0=C2=A019.789681][=C2=A0=C2=A0T718] Call Trace: [=C2=A0=C2=A0=C2=A019.789703][=C2=A0=C2=A0T718] [c000000dad8cfa70] [c0000= 00000979b40] dump_stack+0xf4/0x164 (unreliable) [=C2=A0=C2=A0=C2=A019.789742][=C2=A0=C2=A0T718] [c000000dad8cfac0] [c0000= 000001c1d78] print_unlock_imbalance_bug+0x118/0x140 [=C2=A0=C2=A0=C2=A019.789780][=C2=A0=C2=A0T718] [c000000dad8cfb40] [c0000= 000001ceaa0] lock_release+0x270/0x520 [=C2=A0=C2=A0=C2=A019.789817][=C2=A0=C2=A0T718] [c000000dad8cfbf0] [c0000= 000009a2898] __mutex_unlock_slowpath+0x68/0x400 [=C2=A0=C2=A0=C2=A019.789854][=C2=A0=C2=A0T718] [c000000dad8cfcc0] [c0000= 000004c6770] free_bprm+0xe0/0xf0 [=C2=A0=C2=A0=C2=A019.789900][=C2=A0=C2=A0T718] [c000000dad8cfcf0] [c0000= 000004c845c] __do_execve_file.isra.33+0x44c/0xda0 __do_execve_file at fs/exec.c:1904 [=C2=A0=C2=A0=C2=A019.789938][=C2=A0=C2=A0T718] [c000000dad8cfde0] [c0000= 000001391d8] call_usermodehelper_exec_async+0x218/0x250 [=C2=A0=C2=A0=C2=A019.789977][=C2=A0=C2=A0T718] [c000000dad8cfe20] [c0000= 0000000b748] ret_from_kernel_thread+0x5c/0x74 > --- > fs/exec.c | 9 +++++++++ > include/linux/sched/signal.h | 9 ++++++++- > init/init_task.c | 1 + > kernel/fork.c | 1 + > 4 files changed, 19 insertions(+), 1 deletion(-) >=20 > diff --git a/fs/exec.c b/fs/exec.c > index d820a7272a76..ffeebb1f167b 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1014,6 +1014,7 @@ static int exec_mmap(struct mm_struct *mm) > { > struct task_struct *tsk; > struct mm_struct *old_mm, *active_mm; > + int ret; > =20 > /* Notify parent that we're no longer interested in the old VM */ > tsk =3D current; > @@ -1034,6 +1035,11 @@ static int exec_mmap(struct mm_struct *mm) > return -EINTR; > } > } > + > + ret =3D mutex_lock_killable(&tsk->signal->exec_update_mutex); > + if (ret) > + return ret; > + > task_lock(tsk); > active_mm =3D tsk->active_mm; > membarrier_exec_mmap(mm); > @@ -1438,6 +1444,8 @@ static void free_bprm(struct linux_binprm *bprm) > { > free_arg_pages(bprm); > if (bprm->cred) { > + if (!bprm->mm) > + mutex_unlock(¤t->signal->exec_update_mutex); > mutex_unlock(¤t->signal->cred_guard_mutex); > abort_creds(bprm->cred); > } > @@ -1487,6 +1495,7 @@ void install_exec_creds(struct linux_binprm *bprm= ) > * credentials; any time after this it may be unlocked. > */ > security_bprm_committed_creds(bprm); > + mutex_unlock(¤t->signal->exec_update_mutex); > mutex_unlock(¤t->signal->cred_guard_mutex); > } > EXPORT_SYMBOL(install_exec_creds); > diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.= h > index 88050259c466..a29df79540ce 100644 > --- a/include/linux/sched/signal.h > +++ b/include/linux/sched/signal.h > @@ -224,7 +224,14 @@ struct signal_struct { > =20 > struct mutex cred_guard_mutex; /* guard against foreign influences on > * credential calculations > - * (notably. ptrace) */ > + * (notably. ptrace) > + * Deprecated do not use in new code. > + * Use exec_update_mutex instead. > + */ > + struct mutex exec_update_mutex; /* Held while task_struct is being > + * updated during exec, and may have > + * inconsistent permissions. > + */ > } __randomize_layout; > =20 > /* > diff --git a/init/init_task.c b/init/init_task.c > index 9e5cbe5eab7b..bd403ed3e418 100644 > --- a/init/init_task.c > +++ b/init/init_task.c > @@ -26,6 +26,7 @@ static struct signal_struct init_signals =3D { > .multiprocess =3D HLIST_HEAD_INIT, > .rlim =3D INIT_RLIMITS, > .cred_guard_mutex =3D __MUTEX_INITIALIZER(init_signals.cred_guard_mut= ex), > + .exec_update_mutex =3D __MUTEX_INITIALIZER(init_signals.exec_update_m= utex), > #ifdef CONFIG_POSIX_TIMERS > .posix_timers =3D LIST_HEAD_INIT(init_signals.posix_timers), > .cputimer =3D { > diff --git a/kernel/fork.c b/kernel/fork.c > index 60a1295f4384..12896a6ecee6 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -1594,6 +1594,7 @@ static int copy_signal(unsigned long clone_flags,= struct task_struct *tsk) > sig->oom_score_adj_min =3D current->signal->oom_score_adj_min; > =20 > mutex_init(&sig->cred_guard_mutex); > + mutex_init(&sig->exec_update_mutex); > =20 > return 0; > }