From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3340FECAAA1 for ; Fri, 9 Sep 2022 07:52:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ADD138D0003; Fri, 9 Sep 2022 03:52:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A8D0B8D0001; Fri, 9 Sep 2022 03:52:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 955E78D0003; Fri, 9 Sep 2022 03:52:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 885A68D0001 for ; Fri, 9 Sep 2022 03:52:10 -0400 (EDT) Received: from smtpin31.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 5B99B1A0A07 for ; Fri, 9 Sep 2022 07:52:10 +0000 (UTC) X-FDA: 79891778820.31.491D396 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf02.hostedemail.com (Postfix) with ESMTP id C98038008D for ; Fri, 9 Sep 2022 07:52:09 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 61D18B82376; Fri, 9 Sep 2022 07:52:08 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0BFFCC433B5; Fri, 9 Sep 2022 07:52:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1662709927; bh=B707Oq1VZIWz6dE8cYM+R0BLvoFrOBoqg9wu4br0J1E=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=tp9qZjwIFKqgIt//K0WdYbn8iJ3Fd0/I0mTyYAsw04UEyX3SGijgMURm4EPFV+R5y sWZ7j7UwvnWXYJ+kc/21MZKFuZysPkz+h8FZ3RnJoKFqSWsmp33uU/Q6wew+cYOtV5 PctQ2MYWN+EiA9DBHcKiaWhlHi4iudp01xsMxKV2svJxB4pVdFQTb2mkaHXA60Ji8H h1Se0zsCt1xTiPQigY00G8DKUjpEDK9KBFMyBIBSnkU8vnVOwGfYyAkco4eBvMh6Z6 IlTVfiMQTthtd6hjKhkErYC632M+7d8yhK2NtQxh3UqUeTu0QtAvChBQo3lsbthuw0 2WYSB3tO+j97w== Date: Fri, 9 Sep 2022 09:51:58 +0200 From: Christian Brauner To: "Eric W. Biederman" Cc: Andrei Vagin , Alexey Izbyshev , Florian Weimer , Dmitry Safonov <0x7f454c46@gmail.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Kees Cook Subject: Re: Potentially undesirable interactions between vfork() and time namespaces Message-ID: <20220909075158.ed4linrpwwabxabl@wittgenstein> References: <87czcfhsme.fsf@email.froward.int.ebiederm.org> <874jxkcfoa.fsf@email.froward.int.ebiederm.org> <20220908081003.sjuerd5wiyge4jos@wittgenstein> <87v8pxa51n.fsf@email.froward.int.ebiederm.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <87v8pxa51n.fsf@email.froward.int.ebiederm.org> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662709930; a=rsa-sha256; cv=none; b=aFh4U0qPJ+AnGSu/bMKXDjebYV51UgwMyyyE1pcxWJ33SbFlYBvXwXtEPOgHBXcp0+oiBy LnzuRl9XqhRiqCnKu+kn/hNqqGjRJpyAOXWXk1tvPYU1SYBSgzWA9oFk7HUxJm0fs1vEpR ATb/b+4kR4+WWlNiVe64jW8hGHT/J1s= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=tp9qZjwI; spf=pass (imf02.hostedemail.com: domain of brauner@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=brauner@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662709930; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DoqVqlF8C4jxrJEy+P+96/fjCtpooBAOUZ+4WVXzpEg=; b=6USK1CseeMGDPf9Yyk1XU2jqh6MmOcC4YEmmBThiN+1KmlgNzB+7x5YNpoP7Q4HqXA1/Tx w9T3TgKFQWyQSn4FD8gcCuxPuKBQc4HHkvgBXpdIGZDo8tyttnw7bCGw/DWA8tJwt70mxM uusBWN5TkZNnYtcFZtFtX7SMbUM5mkc= Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=tp9qZjwI; spf=pass (imf02.hostedemail.com: domain of brauner@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=brauner@kernel.org; dmarc=pass (policy=none) header.from=kernel.org X-Stat-Signature: 9g86aiuknzodu9ekfd6fkkj4ae43sirt X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: C98038008D X-HE-Tag: 1662709929-965658 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Sep 08, 2022 at 05:13:08PM -0500, Eric W. Biederman wrote: > Christian Brauner writes: > > > On Wed, Sep 07, 2022 at 10:15:51AM -0700, Andrei Vagin wrote: > >> On Wed, Sep 07, 2022 at 08:33:20AM +0300, Alexey Izbyshev wrote: > >> > > > >> > > That is something to be double checked. > >> > > > >> > > I can't see where it would make sense to unshare a time namespace and > >> > > then call exec, instead of calling exit. So I suspect we can just > >> > > change this behavior and no one will notice. > >> > > > >> > One can imagine a helper binary that calls unshare, forks some children in > >> > new namespaces, and then calls exec to hand off actual work to another > >> > binary (which might not expect being in the new time namespace). I'm purely > >> > theorizing here, however. Keeping a special case for vfork() based only on > >> > FUD is likely a net negative, so it'd be nice to hear actual time namespace > >> > users speak up, and switch to the solution you suggested if they don't care. > >> > >> I can speak for one tool that uses time namespaces for the right > >> reasons. It is CRIU. When a process is restored, the monotonic and > >> boottime clocks have to be adjusted to match old values. It is for what > >> the timens was designed for. These changes doesn't affect CRIU. > >> > >> Honestly, I haven't heard about other users of timens yet. I don't take > >> into account tools like unshare. > > > > LXC/LXD does > > > > unshare(CLONE_NEWTIME) > > // write offsets to /proc/self/timens_offsets > > timens_fd = open("/proc/self/ns/time_for_children", O_RDONLY | O_CLOEXEC) > > setns(timens_fd, CLONE_NEWTIME) > > exec(payload) > > > > so I agree don't change the uapi, please. > > > > But as you can see what we do is basically emulating changing time > > namespace during exec via the setns() prior to the exec call. > > If I understand the description of lxc/lxd correctly the proposed change > will not effect lxc/lxd, as the time namespace is already installed > before exec. If anything what is proposed would potentially allow > lxc/lxd to be simplified in the future by removing the setns. > > Are you then requesting the behavior of the time namespace not change > when the proposed change will not effect lxc/lxd? Don't change /proc/self/ns/time_for_children to a different name. As stated above the proposed exec behavior we currently clearly emulate in userspace. So that part is fine.