From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85267C43334 for ; Wed, 15 Jun 2022 08:53:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D3F226B0071; Wed, 15 Jun 2022 04:53:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CEE8F6B0072; Wed, 15 Jun 2022 04:53:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BB64E6B0073; Wed, 15 Jun 2022 04:53:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id A96DE6B0071 for ; Wed, 15 Jun 2022 04:53:58 -0400 (EDT) Received: from smtpin31.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 70EC9615DC for ; Wed, 15 Jun 2022 08:53:58 +0000 (UTC) X-FDA: 79579857756.31.AB4A854 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf10.hostedemail.com (Postfix) with ESMTP id 045FDC0070 for ; Wed, 15 Jun 2022 08:53:57 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 95A06B81CD1; Wed, 15 Jun 2022 08:53:56 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 49E71C34115; Wed, 15 Jun 2022 08:53:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1655283235; bh=2nPPSVGZHcu+cNkg9BTNypk51Gwnzl3wSf0DXtYRlRU=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=sng/P/z+Z9T8pkPrzl0gwASUr17lQI8Nzd0g5ur7HPXXOa6DcJbGNlLOk1NWfl7hR hzXlaJ85TUMmjDQBaBy7rTYAXDU5CgDEQ5sSu3TzbZk4BOZQB/dJR3B7Dv9hrrfN/5 5hCpOeO253VgT+sJQQMgsOJYV2Iro62qRdGXuC9GvQQ+peK2k1aTOy2WltGnw2ifWi Od8m/REztUqOu2pYb65RqN8IHIjXYscOb72NnGgL+Xec8MqrvzxBJLD7u0zcihsY0D uIkNDAO8j6LUjqgyx4gL0O06k8pmJ1YcSBGy1qF85JETRKBm1x0gOL1xVYzMLhkXwI nawQ/b3qFby9w== Date: Wed, 15 Jun 2022 10:53:50 +0200 From: Christian Brauner To: Florian Weimer Cc: Kees Cook , Andrei Vagin , linux-kernel@vger.kernel.org, Dmitry Safonov <0x7f454c46@gmail.com>, linux-mm@kvack.org, Eric Biederman Subject: Re: [PATCH 1/2] fs/exec: allow to unshare a time namespace on vfork+exec Message-ID: <20220615085350.theicffhehgbmfep@wittgenstein> References: <20220613060723.197407-1-avagin@gmail.com> <202206141412.2B0732FF6C@keescook> <874k0mqs5i.fsf@oldenburg.str.redhat.com> <20220615080000.qtxeosohhyfabzmg@wittgenstein> <87zgiepcmc.fsf@oldenburg.str.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <87zgiepcmc.fsf@oldenburg.str.redhat.com> ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="sng/P/z+"; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf10.hostedemail.com: domain of brauner@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=brauner@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655283238; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/qLspe1YkQ3nWv6HqK4DT/6QpWswA1JGmdqjh62d3aE=; b=XpGIGeTRdsFC7SjSbvo8WHbBIr/caX3eHp2jXmFnoaw9JJ2n5tYKCyGwD74L9yLg8Z5G2u KxTvlIOHqnhsLYgIgPta1/JqUGmGrAZC1lxi9oHRy2ABPotnDfJyMY57gkxBNabqGvG7cR l+gCs/nLzP3DzlSrn/ftqaHWVi994ow= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655283238; a=rsa-sha256; cv=none; b=BT8oyouRWt0ZIMfO6LCPscMaFTGIMKda3uzQ25hBfc4V7HQPjiJMB25AoU4MQRGD5jUipr 9mtyd9Ye4hxOQ2lakUYkSX+Lyzykf4xKmdLh4B8XcmKtqmydoQzVzy8jjG+1nyB/AHl5m5 wJh0BdyhZtJV2+B23P9OFCXyB286ako= X-Stat-Signature: qbioeeciz4741rajqmmwoccs3kzkgwx3 X-Rspamd-Queue-Id: 045FDC0070 X-Rspam-User: Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="sng/P/z+"; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf10.hostedemail.com: domain of brauner@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=brauner@kernel.org X-Rspamd-Server: rspam10 X-HE-Tag: 1655283237-938527 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jun 15, 2022 at 10:14:19AM +0200, Florian Weimer wrote: > * Christian Brauner: > > > For pid namespaces one problem would be that it could end up confusing a > > process about its own pid. This was a more serious problem when the pid > > cache was still active in glibc; but fwiw systemd still has a pid cache > > afair. > > Right. glibc still has a TID cache, mainly for use with recursive > mutexes (where we need a 32-bit thread identifier and can't perform a > system call on every locking operation for performance reasons). > Assuming that a non-delayed CLONE_NEWPID would also change the TID > underneath us, we'd have subtly broken recursive mutexes. Fwiw, you can't call CLONE_NEWPID with CLONE_THREAD. This guarantees that threads can send signals to each other and all threads within the same threadgroup can be reached via proc. It'd be awkward if you'd have a thread whose thread-group leader lives in an ancestor pidns. Even if you'd make whole threadgroup change pid namespaces immediately it would mean allocating new TGID and TIDs in the new pid namespaces - unless they are accidently not already allocated. > > vfork gets away with not updating the TID cache (which is shared with > the parent process) because the parent process is suspended while the > new subprocess is still running and has not execve'ed yet. > > Now one could argue that calling unshare automatically means that you > must not call any glibc functions afterwards (similar to thread-creating > clone), or at least that you cannot call any functions which are not > async-signal-safe, but that does not match existing application > practice. And I think we actually prefer that file servers call chroot Yeah, that'd be a rather subtle and risky change for pid namespaces.