From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,MSGID_FROM_MTA_HEADER, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05D26C433E0 for ; Wed, 24 Jun 2020 04:30:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8689520781 for ; Wed, 24 Jun 2020 04:30:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=fb.com header.i=@fb.com header.b="IyQta9mm"; dkim=fail reason="signature verification failed" (1024-bit key) header.d=fb.onmicrosoft.com header.i=@fb.onmicrosoft.com header.b="WC997DOR" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8689520781 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B08626B0003; Wed, 24 Jun 2020 00:30:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AB9AD6B0007; Wed, 24 Jun 2020 00:30:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 95B396B0008; Wed, 24 Jun 2020 00:30:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0166.hostedemail.com [216.40.44.166]) by kanga.kvack.org (Postfix) with ESMTP id 788486B0003 for ; Wed, 24 Jun 2020 00:30:36 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 21D612C81 for ; Wed, 24 Jun 2020 04:30:36 +0000 (UTC) X-FDA: 76962829272.24.vein44_19126cd26e41 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin24.hostedemail.com (Postfix) with ESMTP id 028F91A4A0 for ; Wed, 24 Jun 2020 04:30:35 +0000 (UTC) X-HE-Tag: vein44_19126cd26e41 X-Filterd-Recvd-Size: 12802 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by imf31.hostedemail.com (Postfix) with ESMTP for ; Wed, 24 Jun 2020 04:30:34 +0000 (UTC) Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 05O4PAl2030909; Tue, 23 Jun 2020 21:30:31 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=date : from : to : cc : subject : message-id : references : content-type : content-transfer-encoding : in-reply-to : mime-version; s=facebook; bh=j2101CErjOndZGUm/ygjJbsPSBU032t3FbmOjMUxhoE=; b=IyQta9mmaCNVA1OGp4EwEYPwsT3AitDcr9Wxj17xhEgSKcCzenO56hG33MtDKa/+rKfd 4htzNBkOdUEynwXKkNflD5QlD2dwL3WNvrLZe4Hi/8QxrDyN402zFDBABbcthQvuhzIv 6Rw20rpI5qpoF2MICAeW3emttMmYccFlaOE= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com with ESMTP id 31ux0vrc1v-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Tue, 23 Jun 2020 21:30:30 -0700 Received: from NAM12-BN8-obe.outbound.protection.outlook.com (100.104.98.9) by o365-in.thefacebook.com (100.104.94.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1979.3; Tue, 23 Jun 2020 21:30:30 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=gRWexaKbMUHHvRIdSmhI/FsoC2DGAHHXqXDlHqb+uB54Isnosq3mqpHKAV5nundBcHD1G/V5pYBS3jmhPXBfSL/htWbGvyEuGVwT0J9x+uws+d7pM6A1gywDJq3LImCZ3/QZYfXwNLjsrR/hmSY3zwcP1rUYwPibuehOCHpqD1BHUK3UXTgt5YbSXxcm7gcVQkxaTsiHvMQMamo4jWPJtSOw67ASRhHOtfNHTVm4ogWqOK4/T6fMEF5oXiCWzGiyxQ6fRlNmv3HUWsE4kID3vZOJxB+vso/SUJuwsa6Iam3uyAXKXRznESGAsS1k44Z4kHgTXoR1LyGatZelWn2Zsw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=YI9ymdRiiHa+FT15ilKXN+zNYt3IVbTfgLFinUxjeZI=; b=faaUizWIBqxpo/UKXknuNz5B2mRdNbatXwu51KJtvtAE407BYWuDzSWvs3ad2DM+McOKggvNT2gjUSX0T1PjHA98nRk61NUEa1hkRg5yTbPqOE7lz5zTAtOjnJ9NkCnRei0Kai0Ghmz6gc2oPMwr2joSZgK2VXafnKE92xCmnVSz5luERcduZuD6fvTl8L8FbmRMrVLtJRquJQfhwAOUwHd92IKbh8Sm8ye+Oeoi5UESNv/MNtRfQT9VqKH+JiBprpDZzqzNQk9jkh2ukTbSv9Tzimfp2N+O6zXrlN3OFh3ys2y5NrDSKXCfVV/0bkOyrEv16e/0c2EAL4AalJthmA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=fb.com; dmarc=pass action=none header.from=fb.com; dkim=pass header.d=fb.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.onmicrosoft.com; s=selector2-fb-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=YI9ymdRiiHa+FT15ilKXN+zNYt3IVbTfgLFinUxjeZI=; b=WC997DORN03qqIliFGIVILO0nHiQq/A8Au7Ab4F/rlZVFlf8/xdzpIDbkCfYNfiSQjWkkSXLxTVdvhd0gr2avAOdXjl8QIhqE3/VhmL8O2Q+tKpYjmnM53o2uzoH98ywBbaQ9ivgLQeQ7Cj5X6sH5zKAQFR5yrE0UNaHboWywLk= Authentication-Results: gmail.com; dkim=none (message not signed) header.d=none;gmail.com; dmarc=none action=none header.from=fb.com; Received: from BYAPR15MB4136.namprd15.prod.outlook.com (2603:10b6:a03:96::24) by BY5PR15MB3602.namprd15.prod.outlook.com (2603:10b6:a03:1f8::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3109.22; Wed, 24 Jun 2020 04:30:28 +0000 Received: from BYAPR15MB4136.namprd15.prod.outlook.com ([fe80::48e3:c159:703d:a2f1]) by BYAPR15MB4136.namprd15.prod.outlook.com ([fe80::48e3:c159:703d:a2f1%5]) with mapi id 15.20.3131.020; Wed, 24 Jun 2020 04:30:28 +0000 Date: Tue, 23 Jun 2020 21:30:21 -0700 From: Roman Gushchin To: Xie Xun CC: Johannes Weiner , Michal Hocko , Vladimir Davydov , Andrew Morton , "cgroups@vger.kernel.org" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "shenwenbosmile@gmail.com" Subject: Re: memcg missing charge when setting BPF Message-ID: <20200624043021.GA3669@carbon.dhcp.thefacebook.com> References: <1139555701.2821292.1592970418462.ref@mail.yahoo.com> <1139555701.2821292.1592970418462@mail.yahoo.com> Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <1139555701.2821292.1592970418462@mail.yahoo.com> X-ClientProxiedBy: BY5PR04CA0018.namprd04.prod.outlook.com (2603:10b6:a03:1d0::28) To BYAPR15MB4136.namprd15.prod.outlook.com (2603:10b6:a03:96::24) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from carbon.dhcp.thefacebook.com (2620:10d:c090:400::5:2dfd) by BY5PR04CA0018.namprd04.prod.outlook.com (2603:10b6:a03:1d0::28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3109.22 via Frontend Transport; Wed, 24 Jun 2020 04:30:26 +0000 X-Originating-IP: [2620:10d:c090:400::5:2dfd] X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 276827ce-b772-4fbe-3e94-08d817f752b3 X-MS-TrafficTypeDiagnostic: BY5PR15MB3602: X-Microsoft-Antispam-PRVS: X-FB-Source: Internal X-MS-Oob-TLC-OOBClassifiers: OLM:7219; X-Forefront-PRVS: 0444EB1997 X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: aKhucdoBqzxq50QX6Aq9M/tkaum8FawlWxKmqSiiZ81DOc0F1Ei7oyrZ7WvsSYk0vpOcIQlHVB7mDvgy3GP+rvDIEjYtLF1Iwah/Smp8LrbH/kTTySW5gB1XQ3iNH7NeKDWeZ/y1XsSeaNtjTFfvmrXjq3tBb375AiqKnTv5MC8VOFJjvDHPqvo0PwYTdz2XVIxWG+wHBRxyN1cdP0fySI+BuAaa5cUZBq7T2AdpC08jroA/sk8fwJ4hDmVpeXEiUoJWIi1upcI+UWM2Gid3dH0BYHm+k4OqONu0Ze4yX9X/EAToBYzKlJ05kXJMGbOvpp2IUP8rLIUd4f0KyJNZ7iIgrcKBUmMRpFKilcwVQ9l5imBf/5TRQKutSii9mtUA0fc2kFgOlLTDW6PQNQEl0w== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BYAPR15MB4136.namprd15.prod.outlook.com;PTR:;CAT:NONE;SFTY:;SFS:(346002)(136003)(396003)(376002)(39860400002)(366004)(83380400001)(478600001)(6506007)(1076003)(86362001)(8936002)(6916009)(8676002)(2906002)(16526019)(966005)(52116002)(55016002)(316002)(33656002)(7696005)(66946007)(9686003)(5660300002)(66476007)(4326008)(54906003)(66556008)(6666004)(186003);DIR:OUT;SFP:1102; X-MS-Exchange-AntiSpam-MessageData: vxyF+EoA+mq4W08eUGJSr26OguTiVLqsI86AfGIXwmNIa1PnPxLlZ/tKcGxjFOIDFRYrWp9zUoEJ31CQzLNw8rSCMT40GkO/OEs/yvPx3AgnlvIPiLDXHNxYX0/ZK0vGsFKC4Sbvx9RGUyd5r4rdXdjm5u8Xnd/YXkL2ImEsgGoqZdLtNow5Zt9Zlt8moY1ie+7+sJt3ecd9sNeCt0xlLD0p8jG0hFjiiBDf3a42DAmga4Bkf8WgfaRkcRumu0FCKuDxNy436WvHpfmxZn0wPCKkNA6yzeWm/aIghbH5a2wHf/Pb3Z+A/fz1B6bYlbrZleipIuf4ooe46n1HumsbJzvI8I4r1vFjS5Ea1e9EBgfCqAHRWh34U0WQq0eqEGXCZEodaSxn5gKRsngAEYhr5iGZWeZuHpw8+wYkdcLtKsftnrneLeopyCV/gOLrLa8NYVBicezTOIqdfxDvLx7andv1IpQep7eBxPbaU6BcXyOrYiKbSqfpT5lp1SF2CEln0GV47liI2N0DGlMzSruMIw== X-MS-Exchange-CrossTenant-Network-Message-Id: 276827ce-b772-4fbe-3e94-08d817f752b3 X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Jun 2020 04:30:28.7143 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 8ae927fe-1255-47a7-a2af-5f3a069daaa2 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: CO1ONKMDkPAr+ikYTm1JxL8rHAQN8+yfmagoKrwlvLY2VGFyGFLsCty90I+mPB7z X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY5PR15MB3602 X-OriginatorOrg: fb.com X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216,18.0.687 definitions=2020-06-24_01:2020-06-23,2020-06-24 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 malwarescore=0 bulkscore=0 cotscore=-2147483648 mlxscore=0 suspectscore=0 adultscore=0 phishscore=0 clxscore=1011 spamscore=0 mlxlogscore=999 lowpriorityscore=0 impostorscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2006240032 X-FB-Internal: deliver X-Rspamd-Queue-Id: 028F91A4A0 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello Xie! It's actually not a surprise, it's a known limitation/exception. Partially it was so because historically there was no way to account percpu memory, and some bpf maps can are using it quite extensively. Fortunately, it changed recently, and 5.9 will likely get an ability to account percpu memory. The latest version of the patchset I've actuall= y sent today: https://lore.kernel.org/linux-mm/20200623184515.4132564-1-guro@fb.com/T/#= m0be45dd71e6a238985181c213d9934731949c089 I also have a patchset in work which adds a memcg accounting of bpf memor= y (programs and maps). I plan to send it upstream on the next week. If ever= ything will go smoothly it might appear in 5.9 as well. Unfortunately the magnitude of required changes does not allow to backpor= t these changes to older kernels. Thanks! PS I'll be completely offline till the end of the week. I'll respond all = e-mail on Monday, Jun 29th. Thanks! On Wed, Jun 24, 2020 at 03:46:58AM +0000, Xie Xun wrote: > Hello, >=20 > I found that programs can consume much more memory than memcg limit by = setting BPF for many times. It's because that allocations during setting = BPF are not charged by memcg. >=20 >=20 > Below is how I did it: >=20 > 1. Run Linux kernel in a QEMU virtual machine (x86_64) with 1GB physica= l memory. > =A0=A0 The kernel is built with memcg and memcg kmem accounting enabled= . >=20 > 2. Create a docker (runC) container, with memory limit 100MB. >=20 > =A0=A0 docker run --name debian --memory 100000000 --kernel-memory 5000= 0000 \ > =A0=A0 debian:slim /bin/bash >=20 > 3. In the container, run a program to set BPF for many times. I use prc= tl to set BPF. >=20 > =A0=A0 while(1) > =A0=A0=A0=A0 { > =A0=A0=A0=A0=A0=A0 prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &bpf); > =A0=A0=A0=A0 } >=20 > 4. Physical memory usage(the one by `free` or `top`) is increased by ar= ound 40MB, > =A0=A0 but memory usage of the container's memcg doesn't increase a lot= (around 100KB). >=20 > 5. Run several processes to set BPF, and almost all physical memory is = consumed. > =A0=A0 Sometimes some processes not in the container are also killed du= e to OOM. >=20 > I also try this with user namespace on, and I can still kill host proce= sses inside container in this way. So this problem may be dangerous for c= ontainers that based on cgroups. >=20 >=20 > kernel version: 5.3.6 > kernel configuration: in attachment (CONFIG_MEMCG_KMEM is on) >=20 >=20 > This blog also shows this problem: https://urldefense.proofpoint.com/v2= /url?u=3Dhttps-3A__blog.xiexun.tech_break-2Dmemcg.html&d=3DDwIFaQ&c=3D5VD= 0RTtNlTh3ycd41b3MUw&r=3DjJYgtDM7QT-W-Fz_d29HYQ&m=3DIBhsN9u88bNDFoDHNutIMK= B-YrCvCOIvw-8z9RpB8RI&s=3DO1b3udJv7obq8vZ88-YPEDzs7hhGov3o_Txskn4IeyA&e=3D= =20 >=20 >=20 > Cause of this problem: >=20 > Memory allocations during setting BPF are not charged by memcg. For exa= mple, > in kernel/bpf/core.c:bpf_prog_alloc, bpf_prog_alloc_no_stats and alloc_= percpu_gfp > are called to allocate memory. However, neither of them are charged by = memcg. > So if we trigger this path for many times, we can consume lots of memor= y, without > increasing our memcg usage. >=20 > /* ------------ */ > struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flag= s) > { > =A0=A0 =A0gfp_t gfp_flags =3D GFP_KERNEL | __GFP_ZERO | gfp_extra_flags= ; > =A0=A0 =A0struct bpf_prog *prog; > =A0=A0 =A0int cpu; >=20 > =A0=A0 =A0prog =3D bpf_prog_alloc_no_stats(size, gfp_extra_flags); > =A0=A0 =A0if (!prog) > =A0=A0 =A0=A0=A0 =A0return NULL; >=20 > =A0=A0 =A0prog->aux->stats =3D alloc_percpu_gfp(struct bpf_prog_stats, = gfp_flags); >=20 > =A0=A0 =A0/* ... */ >=20 > } > /* ------------ */ >=20 >=20 > My program that sets BPF: >=20 > /* ------------ */ > #include > #include > #include > #include > #include > #include > #include > #include > #include > #include >=20 > int main() > { > =A0 struct sock_filter insns[] =3D > =A0=A0=A0 { > =A0=A0=A0=A0 { > =A0=A0=A0=A0=A0 .code =3D 0x6, > =A0=A0=A0=A0=A0 .jt =3D 0, > =A0=A0=A0=A0=A0 .jf =3D 0, > =A0=A0=A0=A0=A0 .k =3D SECCOMP_RET_ALLOW > =A0=A0=A0=A0 } > =A0=A0=A0 }; > =A0 struct sock_fprog bpf =3D > =A0 { > =A0=A0 .len =3D 1, > =A0=A0 .filter =3D insns > =A0 }; > =A0 int ret; > =A0 > =A0 ret =3D prctl(PR_SET_NO_NEW_PRIVS, 1, NULL, 0, 0); > =A0 if (ret) > =A0=A0=A0 { > =A0=A0=A0=A0=A0 printf("error1 %d\n", errno); > =A0=A0=A0=A0=A0 return 1; > =A0=A0=A0 } > =A0 int count =3D 0; > =A0 while (1) > =A0=A0=A0 { > =A0=A0=A0=A0=A0 ret =3D prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &bpf= ); > =A0=A0=A0=A0=A0 if (ret) > =A0=A0=A0=A0=A0=A0=A0 { > =A0=A0=A0=A0=A0=A0=A0=A0=A0 sleep(1); > =A0=A0=A0=A0=A0=A0=A0=A0=A0 printf("error %d\n", errno); > =A0=A0=A0=A0=A0=A0=A0 } > =A0=A0=A0=A0=A0 else > =A0=A0=A0=A0=A0=A0=A0 { > =A0=A0=A0=A0=A0=A0=A0=A0=A0 count++; > =A0=A0=A0=A0=A0=A0=A0=A0=A0 printf("ok %d\n", count); > =A0=A0=A0=A0=A0=A0=A0 } > =A0=A0=A0 } > =A0 return 0; > } > /* ------------ */ >=20 >=20 > Thanks, > Xie Xun