From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EDAF3C636D4 for ; Fri, 3 Feb 2023 15:09:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8F5D66B007D; Fri, 3 Feb 2023 10:09:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 873526B0095; Fri, 3 Feb 2023 10:09:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 73B1E6B0096; Fri, 3 Feb 2023 10:09:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 1FD7A6B007D for ; Fri, 3 Feb 2023 10:09:27 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id D2F65AB9BD for ; Fri, 3 Feb 2023 15:09:26 +0000 (UTC) X-FDA: 80426314332.22.7487751 Received: from mail-yb1-f176.google.com (mail-yb1-f176.google.com [209.85.219.176]) by imf28.hostedemail.com (Postfix) with ESMTP id 0AF5CC001A for ; Fri, 3 Feb 2023 15:09:24 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=PDTZr0Lm; spf=pass (imf28.hostedemail.com: domain of edumazet@google.com designates 209.85.219.176 as permitted sender) smtp.mailfrom=edumazet@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675436965; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oM1VZ26VMzyeaMXfXCF0KI1F+A9gkfxg4J5wcG7w8r4=; b=hvBwC7BDf5iHI5fMNQY2zzS7U2rXZD2pVfxqZ9+C/K57MRhBziKyz77YZaKsxe7AHsMv8T 40q43YKp7lZzILszRoOpvDlV+UxLPuD4uW6NyakoztVm0fRdqZ8bJoEizTV03w6sCPl5Go viymo5Hgt02oVpsUyzjDvKAyAsNqln8= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=PDTZr0Lm; spf=pass (imf28.hostedemail.com: domain of edumazet@google.com designates 209.85.219.176 as permitted sender) smtp.mailfrom=edumazet@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675436965; a=rsa-sha256; cv=none; b=ZxV/831Sw1hkwDmzoKgrN7LPc4F83XrThPGXRA1PYpd3XWz92FwegrPlXTpiUZOGOd+hxZ i502l6kDn/yzDO8hrWBxTH+zr53nh67vtr5Z/Rk463MKutK2ep6N3W/crvoQ7GiwULsyBR hHZHd94PRIQ+v5THFjwb6DaqTmkF2r4= Received: by mail-yb1-f176.google.com with SMTP id 74so6318680ybl.12 for ; Fri, 03 Feb 2023 07:09:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=oM1VZ26VMzyeaMXfXCF0KI1F+A9gkfxg4J5wcG7w8r4=; b=PDTZr0Lmh+qWPPwlNmv4VOIgyBqx1VQi+ZkI5uG6OWcbQGP2AgOzV4u7oDRHOxP9Ax u1pQqumqmr6KpA8bUcJ0rRaLAKaqc49hbGagDvVeFLKFzgtLRzmVyzLYHHT1W/dg/uh1 gT3CoSnUgiPCEiw6CXjVJAsjL1HzYZ8qswinkvnY85vuKVXqfB4U4jJA7/UUIJGFzWyA QI2Dj4znYFYWgE+KKRvucqYbNT93BI/RbRxpZ8vEuEcTRV/gqAoaRmWnrHZxlOwjj/4z whyyV4xSoNGZ2n0NmAemrTywt+7eeRAjh2/3HYa1pgq60lMme4o9B0BXmpPhEj2vK15g O8vA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=oM1VZ26VMzyeaMXfXCF0KI1F+A9gkfxg4J5wcG7w8r4=; b=4YWyfpxbGTbbitYZDDE6cZ2zMWaQd1Egjw/+vHw3KRv2e5lFUMEMsqIxyUPNfxfXSD qhHQQhA/ofHI1MSd+KEpn3AOP/1cOqLknKyj9xIn1Qvj/AcGUKXnhZnFru2PDG9IuxJ7 AwB3BIiMyLSQt7DFutOAtYenXW1s8swmwne9cm1q521TR05uT00JnJsb6sTxbBdq1Ff0 rjJIEhZIpF8IV1t5ioNzHH6Lvyq+x0FGh0P8fnSp8egK7VjlK0zn274P1LaqNXQUXwvE nNkLIiXu8hiZexe4jJkCB5ZJJZtw8Rzgs3AbGghGx8MhPgISVg9QT4WD8BrAvlPsHumv 86rg== X-Gm-Message-State: AO0yUKU7Em3prnHfkY4WRP/VniEN/EvxjW55QcL3QXLc/4bVm20HHI4v dyjlgGe4AucBbw0g+puXBVOEy2YDuVR1rDqwxE5+DA== X-Google-Smtp-Source: AK7set+EroEX00FdhlN+nDb0LqTCzvNmLfRjuN9I3dELplssEf9m3RiTPxBm3aoR7QT3/ZyX00mwYd2X5rE/UFjUt5U= X-Received: by 2002:a25:2f47:0:b0:860:c986:cea1 with SMTP id v68-20020a252f47000000b00860c986cea1mr526344ybv.532.1675436963817; Fri, 03 Feb 2023 07:09:23 -0800 (PST) MIME-Version: 1.0 References: <20230202014810.744-1-hdanton@sina.com> In-Reply-To: From: Eric Dumazet Date: Fri, 3 Feb 2023 16:09:12 +0100 Message-ID: Subject: Re: [RFC] net: add new socket option SO_SETNETNS To: Alok Tiagi Cc: Hillf Danton , ebiederm@xmission.com, netdev@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: 5qr5pk9i7z4m9xwfouqaw59kf7aqqua6 X-Rspam-User: X-Rspamd-Queue-Id: 0AF5CC001A X-Rspamd-Server: rspam06 X-HE-Tag: 1675436964-903240 X-HE-Meta: U2FsdGVkX1+3T3d5r8xbJjnuyG9EKPq33DZVI+xWwgaie0vSBz+vAl4KzyC6gnprGEI1WDhGoTg/xFrXLblxLJCAQmDngRMuNgO2BrB/Ejiiiv9VgYu0KHUxeGdwxc+G6HdWjTir/zfDGbDtX6TpJrNADrG4Rzi389AuhkUJDo68BxCcD7dxpu2bwDseUhkQiOgXMWNsc+pK9CWeAgNCArIWfcpzHyTg+atEUOv7GwNOZhRvd4qSFRX7W4fXe2imgFCT2FeszmV/N8xxDHQIIxIXWqGKCdHeYj+ucq7ZK0frILr3+Fzx2j2Ptr5NZkln4OQwKFHjhgiH5MQtLZV8+GZQWuQrvTv7ENcBYBC0xc87CoaPqEtLCmd1K0yWFPMouu+mqY3p33VNqYuas4bIGz35M8inaCkH66qxiUEfSx5wFUnXM129lvBhRR+bO1kks6kODmAoVBVi3tchaoJI0XSTG58uo6or2xxEmcXbGnw/smylZdcl/0dEE1fIYyh0LAeBTSPPAvu1grUTmeHzkOo9zriaj+MovXeTe9QVKdFNG2Tai1MlnFBXH1HoqyifYRrt1f4qHbBv9Hw+Z2uurBng/dIOnsozFeNQ7Znj1hyC0RPqiIRMVCmv8DkSmQ76KkLJT2KfNru2n4VvXclNcK/ZUI2xcbiUx6SryrxH2xsxWUJwjNKx9C5b3/7Lugd9KiG+k3KnZhqbxHsqVQJbIp6l0f9WkgAzzcBNG6MuxsqYvq6ijAnQ/Pbw2V+4FvLj0iNYf/cNi6SIHb9zwh6232XQT6cdsXpjsJtmTG/w5DLQpNd/9A8hIoYHcRkULlBMbTzBnibz5kYrv7lVxaba7r8o62Z05ewYC4Y2xuX3HUiOFCz6AxqQCs0/Dxb6Cv1SLuXzfmrZUawB6O+rvHbwqKhR3Rfm3u9X66nqKWRu3fGCHN13GtEWHxtcKRthajgYc9m7BAVuCouVtGMmFs/ IlS1snOQ 8ectE3bAlzYoqqRPt0qhdtAz9nkgzaN6n8MujiGzhTQ3gc1wpqgttc5qH4nSsl8w/EVuhvdD1lojd9O58P07y7LYVsQrJOiD1EuT8OaM83/QAJh+vVnKeyDfzEBviXjVRktqZzuKrO5azTH0H5+dQ0t7EeLdPQ27Jut+GEuna3nGU5ouP2U7ITzit+2NkiwWNYAsMHavgbuUgFA2SJUE9R3A8vLQ3tJtJTB6qosywmbx/5Pf1Fy7KUnqU18sIkQjEDGljdxFSjgW8KFs+uRToKRhiMFkezLozGQ9ytH9Ay75OYNfuAYtZ3VTDnvi95U89b3Y9E493A2CYWq3vZ33oB2nAY8i+vvBkwwTs X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Feb 3, 2023 at 12:59 AM Alok Tiagi wrote: > > On Thu, Feb 02, 2023 at 09:10:23PM +0100, Eric Dumazet wrote: > > On Thu, Feb 2, 2023 at 8:55 PM Alok Tiagi wrote: > > > > > > On Thu, Feb 02, 2023 at 09:48:10AM +0800, Hillf Danton wrote: > > > > On Wed, 1 Feb 2023 19:22:57 +0000 aloktiagi > > > > > @@ -1535,6 +1535,52 @@ int sk_setsockopt(struct sock *sk, int level, int optname, > > > > > WRITE_ONCE(sk->sk_txrehash, (u8)val); > > > > > break; > > > > > > > > > > + case SO_SETNETNS: > > > > > + { > > > > > + struct net *other_ns, *my_ns; > > > > > + > > > > > + if (sk->sk_family != AF_INET && sk->sk_family != AF_INET6) { > > > > > + ret = -EOPNOTSUPP; > > > > > + break; > > > > > + } > > > > > + > > > > > + if (sk->sk_type != SOCK_STREAM && sk->sk_type != SOCK_DGRAM) { > > > > > + ret = -EOPNOTSUPP; > > > > > + break; > > > > > + } > > > > > + > > > > > + other_ns = get_net_ns_by_fd(val); > > > > > + if (IS_ERR(other_ns)) { > > > > > + ret = PTR_ERR(other_ns); > > > > > + break; > > > > > + } > > > > > + > > > > > + if (!ns_capable(other_ns->user_ns, CAP_NET_ADMIN)) { > > > > > + ret = -EPERM; > > > > > + goto out_err; > > > > > + } > > > > > + > > > > > + /* check that the socket has never been connected or recently disconnected */ > > > > > + if (sk->sk_state != TCP_CLOSE || sk->sk_shutdown & SHUTDOWN_MASK) { > > > > > + ret = -EOPNOTSUPP; > > > > > + goto out_err; > > > > > + } > > > > > + > > > > > + /* check that the socket is not bound to an interface*/ > > > > > + if (sk->sk_bound_dev_if != 0) { > > > > > + ret = -EOPNOTSUPP; > > > > > + goto out_err; > > > > > + } > > > > > + > > > > > + my_ns = sock_net(sk); > > > > > + sock_net_set(sk, other_ns); > > > > > + put_net(my_ns); > > > > > + break; > > > > > > > > cpu 0 cpu 2 > > > > --- --- > > > > ns = sock_net(sk); > > > > my_ns = sock_net(sk); > > > > sock_net_set(sk, other_ns); > > > > put_net(my_ns); > > > > ns is invalid ? > > > > > > That is the reason we want the socket to be in an un-connected state. That > > > should help us avoid this situation. > > > > This is not enough.... > > > > Another thread might look at sock_net(sk), for example from inet_diag > > or tcp timers > > (which can be fired even in un-connected state) > > > > Even UDP sockets can receive packets while being un-connected, > > and they need to deref the net pointer. > > > > Currently there is no protection about sock_net(sk) being changed on the fly, > > and the struct net could disappear and be freed. > > > > There are ~1500 uses of sock_net(sk) in the kernel, I do not think > > you/we want to audit all > > of them to check what could go wrong... > > I agree, auditing all the uses of sock_net(sk) is not a feasible option. From my > exploration of the usage of sock_net(sk) it appeared that it might be safe to > swap a sockets net ns if it had never been connected but I looked at only a > subset of such uses. > > Introducing a ref counting logic to every access of sock_net(sk) may help get > around this but invovles a bigger change to increment and decrement the count at > every use of sock_net(). > > Any suggestions if this could be achieved in another way much close to the > socket creation time or any comments on our workaround for injecting sockets using > seccomp addfd? Maybe the existing BPF hook in inet_create() could be used ? err = BPF_CGROUP_RUN_PROG_INET_SOCK(sk); The BPF program might be able to switch the netns, because at this time the new socket is not yet visible from external threads. Although it is not going to catch dual stack uses (open a V6 socket, then use a v4mapped address at bind()/connect()/...