From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07AC0C433E0 for ; Fri, 15 Jan 2021 14:59:33 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AEC1223382 for ; Fri, 15 Jan 2021 14:59:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AEC1223382 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 360D18D018B; Fri, 15 Jan 2021 09:59:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2C2308D0189; Fri, 15 Jan 2021 09:59:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 13ACD8D0187; Fri, 15 Jan 2021 09:59:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0231.hostedemail.com [216.40.44.231]) by kanga.kvack.org (Postfix) with ESMTP id C62D58D018A for ; Fri, 15 Jan 2021 09:59:15 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id BA06433C4 for ; Fri, 15 Jan 2021 14:59:13 +0000 (UTC) X-FDA: 77708317386.16.rose82_3805d1227530 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id 84052100E6917 for ; Fri, 15 Jan 2021 14:59:13 +0000 (UTC) X-HE-Tag: rose82_3805d1227530 X-Filterd-Recvd-Size: 9033 Received: from smtprelay.hostedemail.com (smtprelay0072.hostedemail.com [216.40.44.72]) by imf26.hostedemail.com (Postfix) with ESMTP for ; Fri, 15 Jan 2021 14:59:12 +0000 (UTC) Received: from forelay.hostedemail.com (clb03-v110.bra.tucows.net [216.40.38.60]) by smtprelay04.hostedemail.com (Postfix) with ESMTP id 79176180A68D8 for ; Fri, 15 Jan 2021 14:59:12 +0000 (UTC) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 6175D2492 for ; Fri, 15 Jan 2021 14:59:12 +0000 (UTC) X-FDA: 77708317344.13.copy26_3705c7a27530 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id 3D84A18140B69 for ; Fri, 15 Jan 2021 14:59:12 +0000 (UTC) X-HE-Tag: copy26_3705c7a27530 X-Filterd-Recvd-Size: 7059 Received: from raptor.unsafe.ru (raptor.unsafe.ru [5.9.43.93]) by imf04.hostedemail.com (Postfix) with ESMTP for ; Fri, 15 Jan 2021 14:59:11 +0000 (UTC) Received: from comp-core-i7-2640m-0182e6.redhat.com (ip-89-103-122-167.net.upcbroadband.cz [89.103.122.167]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by raptor.unsafe.ru (Postfix) with ESMTPSA id 3CFFB20478; Fri, 15 Jan 2021 14:58:59 +0000 (UTC) From: Alexey Gladkov To: LKML , io-uring@vger.kernel.org, Kernel Hardening , Linux Containers , linux-mm@kvack.org Cc: Alexey Gladkov , Andrew Morton , Christian Brauner , "Eric W . Biederman" , Jann Horn , Jens Axboe , Kees Cook , Linus Torvalds , Oleg Nesterov Subject: [RFC PATCH v3 0/8] Count rlimits in each user namespace Date: Fri, 15 Jan 2021 15:57:21 +0100 Message-Id: X-Mailer: git-send-email 2.29.2 MIME-Version: 1.0 X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.6.1 (raptor.unsafe.ru [5.9.43.93]); Fri, 15 Jan 2021 14:59:10 +0000 (UTC) Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Preface ------- These patches are for binding the rlimit counters to a user in user names= pace. This patch set can be applied on top of: git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git v5.11-rc2 Problem ------- The RLIMIT_NPROC, RLIMIT_MEMLOCK, RLIMIT_SIGPENDING, RLIMIT_MSGQUEUE rlim= its implementation places the counters in user_struct [1]. These limits are g= lobal between processes and persists for the lifetime of the process, even if processes are in different user namespaces. To illustrate the impact of rlimits, let's say there is a program that do= es not fork. Some service-A wants to run this program as user X in multiple cont= ainers. Since the program never fork the service wants to set RLIMIT_NPROC=3D1. service-A \- program (uid=3D1000, container1, rlimit_nproc=3D1) \- program (uid=3D1000, container2, rlimit_nproc=3D1) The service-A sets RLIMIT_NPROC=3D1 and runs the program in container1. W= hen the service-A tries to run a program with RLIMIT_NPROC=3D1 in container2 it f= ails since user X already has one running process. The problem is not that the limit from container1 affects container2. The problem is that limit is verified against the global counter that reflect= s the number of processes in all containers. This problem can be worked around by using different users for each conta= iner but in this case we face a different problem of uid mapping when transfer= ring files from one container to another. Eric W. Biederman mentioned this issue [2][3]. Introduced changes ------------------ To address the problem, we bind rlimit counters to user namespace. Each c= ounter reflects the number of processes in a given uid in a given user namespace= . The result is a tree of rlimit counters with the biggest value at the root (a= ka init_user_ns). The limit is considered exceeded if it's exceeded up in th= e tree. [1] https://lore.kernel.org/containers/87imd2incs.fsf@x220.int.ebiederm.o= rg/ [2] https://lists.linuxfoundation.org/pipermail/containers/2020-August/04= 2096.html [3] https://lists.linuxfoundation.org/pipermail/containers/2020-October/0= 42524.html Changelog --------- v3: * Added get_ucounts() function to increase the reference count. The exist= ing get_counts() function renamed to __get_ucounts(). * The type of ucounts.count changed from atomic_t to refcount_t. * Dropped 'const' from set_cred_ucounts() arguments. * Fixed a bug with freeing the cred structure after calling cred_alloc_bl= ank(). * Commit messages have been updated. * Added selftest. v2: * RLIMIT_MEMLOCK, RLIMIT_SIGPENDING and RLIMIT_MSGQUEUE are migrated to u= counts. * Added ucounts for pair uid and user namespace into cred. * Added the ability to increase ucount by more than 1. v1: * After discussion with Eric W. Biederman, I increased the size of ucount= s to atomic_long_t. * Added ucount_max to avoid the fork bomb. -- Alexey Gladkov (8): Use refcount_t for ucounts reference counting Add a reference to ucounts for each cred Move RLIMIT_NPROC counter to ucounts Move RLIMIT_MSGQUEUE counter to ucounts Move RLIMIT_SIGPENDING counter to ucounts Move RLIMIT_MEMLOCK counter to ucounts Move RLIMIT_NPROC check to the place where we increment the counter kselftests: Add test to check for rlimit changes in different user namespaces fs/exec.c | 2 +- fs/hugetlbfs/inode.c | 17 +- fs/io-wq.c | 22 ++- fs/io-wq.h | 2 +- fs/io_uring.c | 2 +- fs/proc/array.c | 2 +- include/linux/cred.h | 3 + include/linux/hugetlb.h | 3 +- include/linux/mm.h | 4 +- include/linux/sched/user.h | 6 - include/linux/shmem_fs.h | 2 +- include/linux/signal_types.h | 4 +- include/linux/user_namespace.h | 31 +++- ipc/mqueue.c | 29 ++-- ipc/shm.c | 31 ++-- kernel/cred.c | 46 ++++- kernel/exit.c | 2 +- kernel/fork.c | 12 +- kernel/signal.c | 53 +++--- kernel/sys.c | 13 -- kernel/ucount.c | 111 +++++++++--- kernel/user.c | 2 - kernel/user_namespace.c | 7 +- mm/memfd.c | 4 +- mm/mlock.c | 35 ++-- mm/mmap.c | 3 +- mm/shmem.c | 8 +- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/rlimits/.gitignore | 2 + tools/testing/selftests/rlimits/Makefile | 6 + tools/testing/selftests/rlimits/config | 1 + .../selftests/rlimits/rlimits-per-userns.c | 161 ++++++++++++++++++ 32 files changed, 445 insertions(+), 182 deletions(-) create mode 100644 tools/testing/selftests/rlimits/.gitignore create mode 100644 tools/testing/selftests/rlimits/Makefile create mode 100644 tools/testing/selftests/rlimits/config create mode 100644 tools/testing/selftests/rlimits/rlimits-per-userns.c --=20 2.29.2