From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A136C05027 for ; Fri, 3 Feb 2023 17:50:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BCFBC6B0071; Fri, 3 Feb 2023 12:50:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B7FA36B0074; Fri, 3 Feb 2023 12:50:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A6F9B6B0078; Fri, 3 Feb 2023 12:50:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 99A216B0071 for ; Fri, 3 Feb 2023 12:50:51 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 5F0A6A013D for ; Fri, 3 Feb 2023 17:50:51 +0000 (UTC) X-FDA: 80426721102.03.5545D18 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) by imf06.hostedemail.com (Postfix) with ESMTP id 89A5318000B for ; Fri, 3 Feb 2023 17:50:49 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=dKqUAXFf; spf=pass (imf06.hostedemail.com: domain of aloktiagi@gmail.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=aloktiagi@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675446649; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lEm7fmys02qKnl0xXNPaVbUx5HvkvW/capEyugpYu90=; b=Je5+vK/fHDuojDjS5goxxLYp6Nlc8ciz27YuTZAlS/KheaIHEXQKoB9snfQxZgwA1PVcuR uRIBf001YT882gjeE4lw9WtHGt8sasuD4AuE1O3WBa3ofAuNlpNgwGISoOokU0T8kf0fzS gHzl4W8jbPQAHcTZkX/wJvPPyJxzcBs= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=dKqUAXFf; spf=pass (imf06.hostedemail.com: domain of aloktiagi@gmail.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=aloktiagi@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675446649; a=rsa-sha256; cv=none; b=5aVU+cmLKQjFByNpwIhahVnggrcFCfbCk7yke2OQU+o+aYLrQuQ7oLWwxi6UDNPPQVDmKa OIH8AmZgdW2Eg3yITtBlPcVrjZ//d5bSJCPtDbqH3ZA6kw/FkSq5zh/t7CkGG4yM9hj+na 5KmZQAjRuIYtihjKcYh/ZEtdtGb9p6E= Received: by mail-pl1-f171.google.com with SMTP id e6so5969540plg.12 for ; Fri, 03 Feb 2023 09:50:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=lEm7fmys02qKnl0xXNPaVbUx5HvkvW/capEyugpYu90=; b=dKqUAXFfsFJzfRKLi2wMrrUvHIz6nntb5AdB1QsZHzQQ4CeZMXnGBl0NbQeKpeJoQq MbejZhgZWadnFU3Gh7SvhOYwjmpyBPoUJ4i7ABgv9kuv1pgOZ5b0c3T3q9hsbTttgun+ iurNnhvZBEouQ8WPcUURrRRjWt03gAP9C42kH4V9IU+R+tO3JwT+T/ZN16DMUIvHNjMP WhIPhPfkIOeJEGnLoQR5NFKRXtNxjsBW67OuOzHuTXTRT3KsyPvhKojn+/LoOmtDINZB Es/plj2F68ID4w1dYbycgCGCpWxr5Xry6tZS+DLLH2CWv8n2cJPh2dJ3bb8Grd0U+i0R 2Ajw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=lEm7fmys02qKnl0xXNPaVbUx5HvkvW/capEyugpYu90=; b=0TYT801vbMaKZTVYkX4kNv+ORRi+WJRdIW7WIGx8vcvHXTZ4Hr+uewItA+LF3/OvNm 0eaWESId3T96/q9FHtxatAuXyymnc+H5VwjpOYh1ZV1C0+IsVj3RrGHEe2Y+90FRUQbe Oe6Rx7yf0gIEHpu7vBqRFZElF+C6eLglHlQd3xBZfXr6URpSuPKBctdehlies+oXusq2 QdnALdVrUKYHWQJ7WH18XIHjcoHWuAQeCSnBndToemMDYbFqn9Ob8ivvSx1jP3uzJMfv xnsGkc8nmWPxciSEk52ForL75+9m4fgfsqI1gIY28hCnHS8tvkeOc/8vq9mHCIKRaw/W Y2nA== X-Gm-Message-State: AO0yUKUNm/yj9Q7eUlmV4dYVozD0JpCvUKbcRUPVpNEvwYBugG42Zcbt 9MUnU8jgyHkMNRj6SGNp7Hg= X-Google-Smtp-Source: AK7set+IhFpM/0Z35m7NvTtko1Kgj615r8OrKMK7hGUGf/giMKEV4amuQK9I+6DcEQ7ORt3eTd9n2w== X-Received: by 2002:a17:90a:e2c2:b0:22c:868a:ef56 with SMTP id fr2-20020a17090ae2c200b0022c868aef56mr9754961pjb.2.1675446648044; Fri, 03 Feb 2023 09:50:48 -0800 (PST) Received: from ip-172-31-38-16.us-west-2.compute.internal (ec2-52-37-71-140.us-west-2.compute.amazonaws.com. [52.37.71.140]) by smtp.gmail.com with ESMTPSA id q69-20020a17090a1b4b00b0023086f9af77sm161358pjq.8.2023.02.03.09.50.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Feb 2023 09:50:47 -0800 (PST) Date: Fri, 3 Feb 2023 17:50:46 +0000 From: Alok Tiagi To: Eric Dumazet Cc: Hillf Danton , ebiederm@xmission.com, netdev@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC] net: add new socket option SO_SETNETNS Message-ID: References: <20230202014810.744-1-hdanton@sina.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 89A5318000B X-Stat-Signature: 47h4xipqibbgute4i1fjf493q3ukpdmi X-HE-Tag: 1675446649-315424 X-HE-Meta: U2FsdGVkX181qNng30eCsKyOpZOdFPRXjNB4ZXsgD9aJ/be8KQkeLGaL80+pZN024J4+pPJ3wm3G7Z2kvczroNj/s6oRrrpq4L3AXiXTbjADJauRzmp5LaD8rnE6z77SnyACOVl0XGI6eeepkukzYikv2y4d3xjK15HrOZzMqVdmVv4SvA6DexRMBvK1JqH01sg1njuvJP7OCK0bCfXgClTnuPtci5Ax/p66OhG6oTyLUY8y/OQ9PQfZFFkK8c7NSCIpYT55vaPraZDGAYFjcu5gaUe3tdwmZaVGvWW5nUI6HqwPSxJvGttSLGtpUs6d4dgyMZJ9qY5nF7bz9aLMszeQ+mIeNFRCIDU2xBOdpU2WBvvK4fY56dfOT9022qeuwLlCVaZ+YTifqUi2t/7HNqbBsrxUW34AOaTQIo27IudxeDr6Zz44/3IPQHrcPWEuIlQ2diIqH/13LSlkA2gNc3aqaTmnZsVghvD8I7vhVyfdMAnTA7dD/+7PM6iB0HwDWUGYNdELXLNHLZHtmhrl8jgCyVejBr29jLB3HJajYf1i1XlyqOf+UJv4j1/frlryBvixcLSdOKy8f+Yz5WpLe5L9aJ/s+O5nfWTnuHH2SspeD8ALh8ZPEj6tN5Ud0feAIQSMJH3SVMQJg7qfdwh2QAIrFNYu8KQc5WUYlwZZ0BD6jdPCZy4X6UohUweCOA2MWAnxmBBWvoavX7CRsSF+R/C/S2nnZaX4Sk+kiSMd65Cx9ynpAPapP1KrWSFy0+yG3bIyQRpYtdqz0t3XqHjamMWm4hN4tD0leaRGZi5Pqhl0BAGTNg8EKv3ZHFjMDij6wXpMxcKhfXvnWBSWYY8FPFHfqwjgTqc+zx7ciVQXeYKARjA5JqZ2gQ9EEIJHTn1cMxH/znoc8FnIOE1zacgt+xsegVDJnI+2726DtM5H3bK8liNcYrhscU4LXwDccHcX1vyTw7+rssK6xiRovJL MzB30r7q GUYsVvQYkCkVlGRPJ/K7tv6DuoavjpU4M2ZoLWVjetoUazB6HXHOl0bHzIqGJe/xVLNIfbdWuEU1l8u8Nn/7TRq+90Fgz6vs9NotTMSZW2NP09mbjryMgDvZ4TL/BYmqpp62oTSygSejOrFU6f3o3EtlqMEk27rAJXlwyBMVaV8kU85s6leWrkKKiCGcU501dZDiRffPeaHIQoNt2MLwa5kPFzbDuyAUqiJpAj4uc+ypzLUXAM//Z17qSnzdNmNNvknKFEf95h6y6YUacf89rUe5z2QEffzbjUvLWwG+jjcegraD5t1DGqR/wwxl66aUGz5TXX5txgqlea+rxr0pxPl+f3QLFx2f9V5VgqBjgMRGWRFH4AwDdOKyVqBea5F5Ug3U4wXGMJl1aO9SJvSqSqXViLY63nKqVJO55KUkjTPS5u1zs9n3kuas7J/YRxSz44ojvOoAgdPaQHRB8lq3j+mLUIDaZqOOLdclbUbzOJAXFj6PcxYoKuOJ5eed5Yh/15VJNnIVnQ/SxNfRr+qcO1+ifXkx7Ha7eAOtW X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Feb 03, 2023 at 04:09:12PM +0100, Eric Dumazet wrote: > On Fri, Feb 3, 2023 at 12:59 AM Alok Tiagi wrote: > > > > On Thu, Feb 02, 2023 at 09:10:23PM +0100, Eric Dumazet wrote: > > > On Thu, Feb 2, 2023 at 8:55 PM Alok Tiagi wrote: > > > > > > > > On Thu, Feb 02, 2023 at 09:48:10AM +0800, Hillf Danton wrote: > > > > > On Wed, 1 Feb 2023 19:22:57 +0000 aloktiagi > > > > > > @@ -1535,6 +1535,52 @@ int sk_setsockopt(struct sock *sk, int level, int optname, > > > > > > WRITE_ONCE(sk->sk_txrehash, (u8)val); > > > > > > break; > > > > > > > > > > > > + case SO_SETNETNS: > > > > > > + { > > > > > > + struct net *other_ns, *my_ns; > > > > > > + > > > > > > + if (sk->sk_family != AF_INET && sk->sk_family != AF_INET6) { > > > > > > + ret = -EOPNOTSUPP; > > > > > > + break; > > > > > > + } > > > > > > + > > > > > > + if (sk->sk_type != SOCK_STREAM && sk->sk_type != SOCK_DGRAM) { > > > > > > + ret = -EOPNOTSUPP; > > > > > > + break; > > > > > > + } > > > > > > + > > > > > > + other_ns = get_net_ns_by_fd(val); > > > > > > + if (IS_ERR(other_ns)) { > > > > > > + ret = PTR_ERR(other_ns); > > > > > > + break; > > > > > > + } > > > > > > + > > > > > > + if (!ns_capable(other_ns->user_ns, CAP_NET_ADMIN)) { > > > > > > + ret = -EPERM; > > > > > > + goto out_err; > > > > > > + } > > > > > > + > > > > > > + /* check that the socket has never been connected or recently disconnected */ > > > > > > + if (sk->sk_state != TCP_CLOSE || sk->sk_shutdown & SHUTDOWN_MASK) { > > > > > > + ret = -EOPNOTSUPP; > > > > > > + goto out_err; > > > > > > + } > > > > > > + > > > > > > + /* check that the socket is not bound to an interface*/ > > > > > > + if (sk->sk_bound_dev_if != 0) { > > > > > > + ret = -EOPNOTSUPP; > > > > > > + goto out_err; > > > > > > + } > > > > > > + > > > > > > + my_ns = sock_net(sk); > > > > > > + sock_net_set(sk, other_ns); > > > > > > + put_net(my_ns); > > > > > > + break; > > > > > > > > > > cpu 0 cpu 2 > > > > > --- --- > > > > > ns = sock_net(sk); > > > > > my_ns = sock_net(sk); > > > > > sock_net_set(sk, other_ns); > > > > > put_net(my_ns); > > > > > ns is invalid ? > > > > > > > > That is the reason we want the socket to be in an un-connected state. That > > > > should help us avoid this situation. > > > > > > This is not enough.... > > > > > > Another thread might look at sock_net(sk), for example from inet_diag > > > or tcp timers > > > (which can be fired even in un-connected state) > > > > > > Even UDP sockets can receive packets while being un-connected, > > > and they need to deref the net pointer. > > > > > > Currently there is no protection about sock_net(sk) being changed on the fly, > > > and the struct net could disappear and be freed. > > > > > > There are ~1500 uses of sock_net(sk) in the kernel, I do not think > > > you/we want to audit all > > > of them to check what could go wrong... > > > > I agree, auditing all the uses of sock_net(sk) is not a feasible option. From my > > exploration of the usage of sock_net(sk) it appeared that it might be safe to > > swap a sockets net ns if it had never been connected but I looked at only a > > subset of such uses. > > > > Introducing a ref counting logic to every access of sock_net(sk) may help get > > around this but invovles a bigger change to increment and decrement the count at > > every use of sock_net(). > > > > Any suggestions if this could be achieved in another way much close to the > > socket creation time or any comments on our workaround for injecting sockets using > > seccomp addfd? > > Maybe the existing BPF hook in inet_create() could be used ? > > err = BPF_CGROUP_RUN_PROG_INET_SOCK(sk); > > The BPF program might be able to switch the netns, because at this > time the new socket is not > yet visible from external threads. > > Although it is not going to catch dual stack uses (open a V6 socket, > then use a v4mapped address at bind()/connect()/... We thought of a similar approach by intercepting the socket() call in seccomp and injecting a new file descritpor much earlier but as you said we run into the issue of handling dual stack sockets since we do not know in advance if its going to be used for a v4mapped address.