From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7047C433EF for ; Sat, 16 Oct 2021 02:08:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4A05B60F11 for ; Sat, 16 Oct 2021 02:08:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 4A05B60F11 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=sina.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 900886B0071; Fri, 15 Oct 2021 22:08:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8AFF36B0072; Fri, 15 Oct 2021 22:08:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7A9F96B0073; Fri, 15 Oct 2021 22:08:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0139.hostedemail.com [216.40.44.139]) by kanga.kvack.org (Postfix) with ESMTP id 6778A6B0071 for ; Fri, 15 Oct 2021 22:08:46 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 1DD0182499B9 for ; Sat, 16 Oct 2021 02:08:46 +0000 (UTC) X-FDA: 78700667052.03.185D7CE Received: from r3-23.sinamail.sina.com.cn (r3-23.sinamail.sina.com.cn [202.108.3.23]) by imf23.hostedemail.com (Postfix) with SMTP id 1595990000A0 for ; Sat, 16 Oct 2021 02:08:40 +0000 (UTC) Received: from unknown (HELO localhost.localdomain)([123.123.28.185]) by sina.com (172.16.97.23) with ESMTP id 616A342600024DC0; Sat, 16 Oct 2021 10:08:40 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com X-SMAIL-MID: 86076154919953 From: Hillf Danton To: "Eric W . Biederman" Cc: Rune Kleveland , Yu Zhao , Alexey Gladkov , Jordan Glover , LKML , linux-mm@kvack.org, containers@lists.linux-foundation.org Subject: Re: [CFT][PATCH] ucounts: Fix signal ucount refcounting Date: Sat, 16 Oct 2021 10:08:33 +0800 Message-Id: <20211016020833.1538-1-hdanton@sina.com> In-Reply-To: <87mtnavszx.fsf_-_@disp2133> References: <1M9_d6wrcu6rdPe1ON0_k0lOxJMyyot3KAb1gdyuwzDPC777XVUWPHoTCEVmcK3fYfgu7sIo3PSaLe9KulUdm4TWVuqlbKyYGxRAjsf_Cpk=@protonmail.ch> <87ee9pa6xw.fsf@disp2133> <878rzw77i3.fsf@disp2133> <20210929173611.fo5traia77o63gpw@example.org> <20210930130640.wudkpmn3cmah2cjz@example.org> <878rz8wwb6.fsf@disp2133> <87v92cvhbf.fsf@disp2133> MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 1595990000A0 X-Stat-Signature: 4ongenp83qy7zwoanc15eqsssorwccu9 Authentication-Results: imf23.hostedemail.com; dkim=none; spf=pass (imf23.hostedemail.com: domain of hdanton@sina.com designates 202.108.3.23 as permitted sender) smtp.mailfrom=hdanton@sina.com; dmarc=none X-HE-Tag: 1634350120-319724 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, 15 Oct 2021 17:10:58 -0500 Eric W. Biederman wrote: >=20 > In commit fda31c50292a ("signal: avoid double atomic counter > increments for user accounting") Linus made a clever optimization to > how rlimits and the struct user_struct. Unfortunately that > optimization does not work in the obvious way when moved to nested > rlimits. The problem is that the last decrement of the per user > namespace per user sigpending counter might also be the last decrement > of the sigpending counter in the parent user namespace as well. Which > means that simply freeing the leaf ucount in __free_sigqueue is not > enough. >=20 > Maintain the optimization and handle the tricky cases by introducing > inc_rlimit_get_ucounts and dec_rlimit_put_ucounts. >=20 > By moving the entire optimization into functions that perform all of > the work it becomes possible to ensure that every level is handled > properly. >=20 > I wish we had a single user across all of the threads whose rlimit > could be charged so we did not need this complexity. >=20 > Cc: stable@vger.kernel.org > Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts") > Signed-off-by: "Eric W. Biederman" > --- >=20 > With a lot of help from Alex who found a way I could reproduce this > I believe I have found the issue. >=20 > Could people who are seeing this issue test and verify this solves the > problem for them? >=20 > include/linux/user_namespace.h | 2 ++ > kernel/signal.c | 25 +++++---------------- > kernel/ucount.c | 41 ++++++++++++++++++++++++++++++++++ > 3 files changed, 49 insertions(+), 19 deletions(-) >=20 > diff --git a/include/linux/user_namespace.h b/include/linux/user_namesp= ace.h > index eb70cabe6e7f..33a4240e6a6f 100644 > --- a/include/linux/user_namespace.h > +++ b/include/linux/user_namespace.h > @@ -127,6 +127,8 @@ static inline long get_ucounts_value(struct ucounts= *ucounts, enum ucount_type t > =20 > long inc_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type= , long v); > bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type= , long v); > +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type = type); > +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type = type); > bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type ty= pe, unsigned long max); > =20 > static inline void set_rlimit_ucount_max(struct user_namespace *ns, > diff --git a/kernel/signal.c b/kernel/signal.c > index a3229add4455..762de58c6e76 100644 > --- a/kernel/signal.c > +++ b/kernel/signal.c > @@ -425,22 +425,10 @@ __sigqueue_alloc(int sig, struct task_struct *t, = gfp_t gfp_flags, > */ > rcu_read_lock(); > ucounts =3D task_ucounts(t); > - sigpending =3D inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, = 1); > - switch (sigpending) { > - case 1: > - if (likely(get_ucounts(ucounts))) > - break; > - fallthrough; > - case LONG_MAX: > - /* > - * we need to decrease the ucount in the userns tree on any > - * failure to avoid counts leaking. > - */ > - dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1); > - rcu_read_unlock(); > - return NULL; > - } > + sigpending =3D inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDI= NG); > rcu_read_unlock(); > + if (sigpending =3D=3D LONG_MAX) > + return NULL; > =20 > if (override_rlimit || likely(sigpending <=3D task_rlimit(t, RLIMIT_S= IGPENDING))) { > q =3D kmem_cache_alloc(sigqueue_cachep, gfp_flags); > @@ -449,8 +437,7 @@ __sigqueue_alloc(int sig, struct task_struct *t, gf= p_t gfp_flags, > } > =20 > if (unlikely(q =3D=3D NULL)) { > - if (dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1)) > - put_ucounts(ucounts); > + dec_rlimit_put_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING); > } else { > INIT_LIST_HEAD(&q->list); > q->flags =3D sigqueue_flags; > @@ -463,8 +450,8 @@ static void __sigqueue_free(struct sigqueue *q) > { > if (q->flags & SIGQUEUE_PREALLOC) > return; > - if (q->ucounts && dec_rlimit_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPEN= DING, 1)) { > - put_ucounts(q->ucounts); > + if (q->ucounts) { > + dec_rlimit_put_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING); > q->ucounts =3D NULL; > } > kmem_cache_free(sigqueue_cachep, q); > diff --git a/kernel/ucount.c b/kernel/ucount.c > index 3b7e176cf7a2..687d77aa66bb 100644 > --- a/kernel/ucount.c > +++ b/kernel/ucount.c > @@ -285,6 +285,47 @@ bool dec_rlimit_ucounts(struct ucounts *ucounts, e= num ucount_type type, long v) > return (new =3D=3D 0); > } > =20 > +static void do_dec_rlimit_put_ucounts(struct ucounts *ucounts, > + struct ucounts *last, enum ucount_type type) > +{ > + struct ucounts *iter; > + for (iter =3D ucounts; iter !=3D last; iter =3D iter->ns->ucounts) { > + long dec =3D atomic_long_add_return(-1, &iter->ucount[type]); > + WARN_ON_ONCE(dec < 0); > + if (dec =3D=3D 0) > + put_ucounts(iter); > + } Given kfree in put_ucounts(), this has difficulty surviving tests like kasan if the put pairs with the get in the below inc_rlimit_get_ucounts()= . > +} > + > +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type = type) > +{ > + do_dec_rlimit_put_ucounts(ucounts, NULL, type); > +} > + > +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type = type) > +{ > + struct ucounts *iter; > + long dec, ret =3D 0; > + > + for (iter =3D ucounts; iter; iter =3D iter->ns->ucounts) { > + long max =3D READ_ONCE(iter->ns->ucount_max[type]); > + long new =3D atomic_long_add_return(1, &iter->ucount[type]); > + if (new < 0 || new > max) > + goto unwind; > + else if (iter =3D=3D ucounts) > + ret =3D new; > + if ((new =3D=3D 1) && (get_ucounts(iter) !=3D iter)) > + goto dec_unwind; Add a line of comment for get to ease readers. Hillf > + } > + return ret; > +dec_unwind: > + dec =3D atomic_long_add_return(1, &iter->ucount[type]); > + WARN_ON_ONCE(dec < 0); > +unwind: > + do_dec_rlimit_put_ucounts(ucounts, iter, type); > + return LONG_MAX; > +} > + > bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type ty= pe, unsigned long max) > { > struct ucounts *iter; > --=20 > 2.20.1