From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0A2FC433ED for ; Wed, 19 May 2021 11:44:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 840AB6135B for ; Wed, 19 May 2021 11:44:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 840AB6135B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0CFA56B006C; Wed, 19 May 2021 07:44:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 07FFC6B006E; Wed, 19 May 2021 07:44:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E140E6B0070; Wed, 19 May 2021 07:44:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0238.hostedemail.com [216.40.44.238]) by kanga.kvack.org (Postfix) with ESMTP id B09766B006C for ; Wed, 19 May 2021 07:44:08 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 51B5F180AD830 for ; Wed, 19 May 2021 11:44:08 +0000 (UTC) X-FDA: 78157796976.06.55806E8 Received: from mail-ot1-f47.google.com (mail-ot1-f47.google.com [209.85.210.47]) by imf16.hostedemail.com (Postfix) with ESMTP id A9C7A801A837 for ; Wed, 19 May 2021 11:44:07 +0000 (UTC) Received: by mail-ot1-f47.google.com with SMTP id d25-20020a0568300459b02902f886f7dd43so11481359otc.6 for ; Wed, 19 May 2021 04:44:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to:cc :content-transfer-encoding; bh=KvQEt0Pbe3v9Z/6IbeJxPBUgmCm+xqGUBv92hphBoAs=; b=KMYlpNlJawOoZQ0mx8WQC9BeQdHxhNu+IsxGWLcMeAAlzLp0G7i1RZsY/s8sQbPxKk bSHbS/D3idx1OTwd0CIRoMdlcdlN4YXf9iGsHONkkiyI+5DSRmsFiakMAcatzFdzUyKp Qnw1xV/dWCGKs2o9a5r3nrtTGa7CV8ex/+Tu81qsRNk1miaRPS59GGeJT1eWv7b24ghE XCcW19WfGNAjCYP6RUi5Vjxi60DYJUUECjTA+lqZjaZ2boRDQM3mqiY1pyyYxiuw4Odg GzJ5etm2QP/gISftGDQQSQjohyfq2pI1MBvRv9jWNnv/rvB/aCIApyUKKCkXfFX3dfyr uKtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc :content-transfer-encoding; bh=KvQEt0Pbe3v9Z/6IbeJxPBUgmCm+xqGUBv92hphBoAs=; b=o+utoop3elJDPg3ZMvOawVFaS4ZGkJ0J9g3AEDJszTL8OZ+xFBqHH6K/SuD08eWBVY oYoQndvZr3zk8yCbQnYNk10jb4Yq2vH/MTKVkfEdfcw5u0h4k8pw68N0h11IifD+FDTE IifmCQ0/zexYyesfQDwAA84fyC4yr5W6+ir+1+Ry4e92GEZwhO4ZUyTv11RLCezVGBnk hgHZSS2JUuqYn/enWavr/6aNUTX6bEKWIa/YuCU7HCWOOVJvL1MwhCKqYzFM+n4w9v5U fwJ0NJ57D1Wp58qX+KZ+jtEE2bpN9hE8qeppwgkEu4IwyOdkzy9NPKVQdD/f4Xr9YSaj iNCQ== X-Gm-Message-State: AOAM533nIh+pzyjqyOI85fFyoY9+WUYJYoI96oS74y5geLMCQCVs5MMo +FcDby1GpSYArggi7dvrOZr6v7U6es1dIUFNADI= X-Google-Smtp-Source: ABdhPJzsq90iErWPu0ErB1vYn/Jjk+dgbA9u7AoYKOjnee1YYCru/u+xXPEWiTDexJd4Xy6EjyJjNSwGXjJbTg+ixGU= X-Received: by 2002:a05:6830:1205:: with SMTP id r5mr8839995otp.359.1621424647414; Wed, 19 May 2021 04:44:07 -0700 (PDT) MIME-Version: 1.0 From: Glaive Neo Date: Wed, 19 May 2021 19:43:56 +0800 Message-ID: Subject: Fw: User-controllable memcg-unaccounted objects of time namespace To: Michal Hocko Cc: "hannes@cmpxchg.org" , Vladimir Davydov , "shenwenbo@zju.edu.cn" , cgroups@vger.kernel.org, linux-mm@kvack.org, "mhocko@kernel.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=KMYlpNlJ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf16.hostedemail.com: domain of nglaive@gmail.com designates 209.85.210.47 as permitted sender) smtp.mailfrom=nglaive@gmail.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: A9C7A801A837 X-Stat-Signature: 1ijtoy8gy63bc7b6sjrg4ftgyd3gjz5s X-HE-Tag: 1621424647-204146 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: CC reply. Sorry for occupying your time. I was unaware that I had to use plain text e-mail to make public mailing list accept it, and I omitted these addresses to avoid rejecting notification. Yutian Yang, Zhejiang University > -----Original Messages----- > From: "Michal Hocko" > Sent Time: 2021-05-19 17:18:45 (Wednesday) > To: "Yutian Yang" > Cc: > Subject: Re: Fw: Re: Re: User-controllable memcg-unaccounted objects of t= ime namespace > > Did you plan to post this reply to the mailing list with the whole > original CC list? > > On Wed 19-05-21 16:56:27, Yutian Yang wrote: > > > > > > > > -----Original Messages----- > > From: "Yutian Yang" > > Sent Time: 2021-05-18 19:29:40 (Tuesday) > > To: "Michal Hocko" > > Cc: tglx@linutronix.de, "vdavydov.dev@gmail.com" , "shenwenbosmile@gmail.com" > > Subject: Re: Re: User-controllable memcg-unaccounted objects of time na= mespace > > > > Sorry for the delayed response. I believe our patches are necessary and= will answer your questions piece by piece. > > > > For the practicality of our concerns, we have confirmed that repeatedly= creating new namespaces could lead to breaking memcg limit. Although the n= umber of namespaces could be limited by per-user quota (e.g., max_time_name= spaces), depending on per-user quota to limit memory usage is unsafe and im= practical as users may have their own considerations when setting these lim= its. In fact, limitation on memory usage is more foundamental than limitati= on on various kernel objects. I believe this is also the reason why the fd = tables and pipe buffers have been accounted by memcg even if they are also = under per-user quota's limitation. The same reason applies to limitation of= pid cgroups. Moreover, both net and uts namespaces are properly accounted = while the others are not, which shows inconsistencies. > > > > For other unaccounted allocations (proc_alloc_inum, vvar_page and likel= y others), we have not reached them yet as our detecting tool reported many= results which require much manual effort to go through. To me, it seems th= at these allocations also need patches. > > > > Lastly, our work is based on a detecting tool and we only report missin= g-charging sites that are manually confirmed to be triggerable from syscall= s. The results that are obviously unexploitable like uncharged ldt_struct, = which is allocated per process, are also filtered out. We would like to con= tinuously contribute to memcg and we are planning to submit more patches in= the future. > > > > Thanks! > > > > Yutian Yang, > > Zhejiang University > > > > > > > -----Original Messages----- > > > From: "Michal Hocko" > > > Sent Time: 2021-04-16 14:29:52 (Friday) > > > To: "Yutian Yang" > > > Cc: tglx@linutronix.de, "shenwenbo@zju.edu.cn" = , "vdavydov.dev@gmail.com" > > > Subject: Re: User-controllable memcg-unaccounted objects of time name= space > > > > > > Thank you for this and other reports which are trying to track memcg > > > unaccounted objects. I have few remarks/questions. > > > > > > > > > On Thu 15-04-21 21:29:57, Yutian Yang wrote: > > > > Hi, our team has found bugs in time namespace module on Linux kerne= l v5.10.19, which leads to user-controllable memcg-unaccounted objects. > > > > They are caused by the code snippets listed below: > > > > > > > > /*--------------- kernel/time/namespace.c --------------------*/ > > > > ...... > > > > 91ns =3D kmalloc(sizeof(*ns), GFP_KERNEL); > > > > 92if (!ns) > > > > 93goto fail_dec; > > > > ...... > > > > /*----------------------------- end -------------------------------= */ > > > > > > > > > > > > The code at line 91 could be triggered by syscall clone if > > > > CLONE_NEWTIME flag is set in the parameter. A user could repeatedly > > > > make the clone syscall and trigger the bugs to occupy more and > > > > more unaccounted memory. In fact, time namespaces objects could be > > > > allocated by users and are also controllable by users. As a result, > > > > they need to be accounted and we suggest the following patch: > > > > > > Is this a practical concern? I am not really deeply familiar with > > > namespaces but isn't there any cap on how many of them can be created= by > > > user? If not, isn't that contained by the pid cgroup controller? If e= ven > > > that is not the case, care to explain why? > > > > > > You are referring to struct time_namespace above (that is 88B) but I = can > > > see there are other unaccounted allocations (proc_alloc_inum, vvar_pa= ge > > > and likely others) so why the above is more important than those? > > > > > > Btw. a similar feedback applies to other reports similar to this one.= I > > > assume you have some sort of tool to explore those potential run away= s > > > and that is really great but it would be really helpful and highly > > > appreciated to analyze those reports and try to provide some sort of > > > risk assessment. > > > > > > Thanks! > > > -- > > > Michal Hocko > > > SUSE Labs > > -- > Michal Hocko > SUSE Labs