From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22F6AC433DB for ; Fri, 22 Jan 2021 13:00:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9DFFA230FC for ; Fri, 22 Jan 2021 13:00:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9DFFA230FC Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1D89A6B0022; Fri, 22 Jan 2021 08:00:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 072436B000D; Fri, 22 Jan 2021 08:00:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D8E6D6B0012; Fri, 22 Jan 2021 08:00:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0089.hostedemail.com [216.40.44.89]) by kanga.kvack.org (Postfix) with ESMTP id A92F16B000D for ; Fri, 22 Jan 2021 08:00:55 -0500 (EST) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 693728249980 for ; Fri, 22 Jan 2021 13:00:55 +0000 (UTC) X-FDA: 77733420870.01.card06_24143312756c Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin01.hostedemail.com (Postfix) with ESMTP id 8568A10057102 for ; Fri, 22 Jan 2021 13:00:53 +0000 (UTC) X-HE-Tag: card06_24143312756c X-Filterd-Recvd-Size: 7105 Received: from raptor.unsafe.ru (raptor.unsafe.ru [5.9.43.93]) by imf04.hostedemail.com (Postfix) with ESMTP for ; Fri, 22 Jan 2021 13:00:52 +0000 (UTC) Received: from comp-core-i7-2640m-0182e6.redhat.com (ip-94-112-41-137.net.upcbroadband.cz [94.112.41.137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by raptor.unsafe.ru (Postfix) with ESMTPSA id 40C16209AF; Fri, 22 Jan 2021 13:00:49 +0000 (UTC) From: Alexey Gladkov To: LKML , io-uring@vger.kernel.org, Kernel Hardening , Linux Containers , linux-mm@kvack.org Cc: Alexey Gladkov , Andrew Morton , Christian Brauner , "Eric W . Biederman" , Jann Horn , Jens Axboe , Kees Cook , Linus Torvalds , Oleg Nesterov Subject: [PATCH v4 0/7] Count rlimits in each user namespace Date: Fri, 22 Jan 2021 14:00:09 +0100 Message-Id: X-Mailer: git-send-email 2.29.2 MIME-Version: 1.0 X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.6.1 (raptor.unsafe.ru [5.9.43.93]); Fri, 22 Jan 2021 13:00:51 +0000 (UTC) Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Preface ------- These patches are for binding the rlimit counters to a user in user names= pace. This patch set can be applied on top of: git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git v5.11-rc2 Problem ------- The RLIMIT_NPROC, RLIMIT_MEMLOCK, RLIMIT_SIGPENDING, RLIMIT_MSGQUEUE rlim= its implementation places the counters in user_struct [1]. These limits are g= lobal between processes and persists for the lifetime of the process, even if processes are in different user namespaces. To illustrate the impact of rlimits, let's say there is a program that do= es not fork. Some service-A wants to run this program as user X in multiple cont= ainers. Since the program never fork the service wants to set RLIMIT_NPROC=3D1. service-A \- program (uid=3D1000, container1, rlimit_nproc=3D1) \- program (uid=3D1000, container2, rlimit_nproc=3D1) The service-A sets RLIMIT_NPROC=3D1 and runs the program in container1. W= hen the service-A tries to run a program with RLIMIT_NPROC=3D1 in container2 it f= ails since user X already has one running process. The problem is not that the limit from container1 affects container2. The problem is that limit is verified against the global counter that reflect= s the number of processes in all containers. This problem can be worked around by using different users for each conta= iner but in this case we face a different problem of uid mapping when transfer= ring files from one container to another. Eric W. Biederman mentioned this issue [2][3]. Introduced changes ------------------ To address the problem, we bind rlimit counters to user namespace. Each c= ounter reflects the number of processes in a given uid in a given user namespace= . The result is a tree of rlimit counters with the biggest value at the root (a= ka init_user_ns). The limit is considered exceeded if it's exceeded up in th= e tree. [1] https://lore.kernel.org/containers/87imd2incs.fsf@x220.int.ebiederm.o= rg/ [2] https://lists.linuxfoundation.org/pipermail/containers/2020-August/04= 2096.html [3] https://lists.linuxfoundation.org/pipermail/containers/2020-October/0= 42524.html Changelog --------- v4: * Reverted the type change of ucounts.count to refcount_t. * Fixed typo in the kernel/cred.c v3: * Added get_ucounts() function to increase the reference count. The exist= ing get_counts() function renamed to __get_ucounts(). * The type of ucounts.count changed from atomic_t to refcount_t. * Dropped 'const' from set_cred_ucounts() arguments. * Fixed a bug with freeing the cred structure after calling cred_alloc_bl= ank(). * Commit messages have been updated. * Added selftest. v2: * RLIMIT_MEMLOCK, RLIMIT_SIGPENDING and RLIMIT_MSGQUEUE are migrated to u= counts. * Added ucounts for pair uid and user namespace into cred. * Added the ability to increase ucount by more than 1. v1: * After discussion with Eric W. Biederman, I increased the size of ucount= s to atomic_long_t. * Added ucount_max to avoid the fork bomb. -- Alexey Gladkov (7): Add a reference to ucounts for each cred Move RLIMIT_NPROC counter to ucounts Move RLIMIT_MSGQUEUE counter to ucounts Move RLIMIT_SIGPENDING counter to ucounts Move RLIMIT_MEMLOCK counter to ucounts Move RLIMIT_NPROC check to the place where we increment the counter kselftests: Add test to check for rlimit changes in different user namespaces fs/exec.c | 2 +- fs/hugetlbfs/inode.c | 17 +- fs/io-wq.c | 22 ++- fs/io-wq.h | 2 +- fs/io_uring.c | 2 +- fs/proc/array.c | 2 +- include/linux/cred.h | 3 + include/linux/hugetlb.h | 3 +- include/linux/mm.h | 4 +- include/linux/sched/user.h | 6 - include/linux/shmem_fs.h | 2 +- include/linux/signal_types.h | 4 +- include/linux/user_namespace.h | 23 ++- ipc/mqueue.c | 29 ++-- ipc/shm.c | 31 ++-- kernel/cred.c | 46 ++++- kernel/exit.c | 2 +- kernel/fork.c | 12 +- kernel/signal.c | 53 +++--- kernel/sys.c | 13 -- kernel/ucount.c | 109 ++++++++++-- kernel/user.c | 2 - kernel/user_namespace.c | 7 +- mm/memfd.c | 4 +- mm/mlock.c | 35 ++-- mm/mmap.c | 3 +- mm/shmem.c | 8 +- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/rlimits/.gitignore | 2 + tools/testing/selftests/rlimits/Makefile | 6 + tools/testing/selftests/rlimits/config | 1 + .../selftests/rlimits/rlimits-per-userns.c | 161 ++++++++++++++++++ 32 files changed, 448 insertions(+), 169 deletions(-) create mode 100644 tools/testing/selftests/rlimits/.gitignore create mode 100644 tools/testing/selftests/rlimits/Makefile create mode 100644 tools/testing/selftests/rlimits/config create mode 100644 tools/testing/selftests/rlimits/rlimits-per-userns.c --=20 2.29.2