From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7CC70C433EF for ; Mon, 13 Jun 2022 05:34:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 16CA18D014F; Mon, 13 Jun 2022 01:34:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0F5FB8D0142; Mon, 13 Jun 2022 01:34:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E63718D014F; Mon, 13 Jun 2022 01:34:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id D1D288D0142 for ; Mon, 13 Jun 2022 01:34:36 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A9F9935252 for ; Mon, 13 Jun 2022 05:34:36 +0000 (UTC) X-FDA: 79572097752.08.36017BE Received: from mail-lf1-f49.google.com (mail-lf1-f49.google.com [209.85.167.49]) by imf28.hostedemail.com (Postfix) with ESMTP id 98C4FC00A1 for ; Mon, 13 Jun 2022 05:34:35 +0000 (UTC) Received: by mail-lf1-f49.google.com with SMTP id 20so7105675lfz.8 for ; Sun, 12 Jun 2022 22:34:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=openvz-org.20210112.gappssmtp.com; s=20210112; h=message-id:date:mime-version:user-agent:from:subject:to:cc :references:content-language:in-reply-to:content-transfer-encoding; bh=RTfZfeDWSN9OZfR7Sv1ydjBOr9B3JdsI1X9ofrvtdxQ=; b=bWa4lC6WVgrmLNdVVYMdTBcNm6a37vG8B+DS8jzdCwMcANuwtYv2akgfv285oN78T2 hGpMT+t/LTTVWpKQnVj0/aM+JgR5Oh5UyiTAKL2+NhTokEU6RX7MOnZ1JqaP6Qqly3Oi RWPyrXa0GwAdQQOS9/WV9nnhUOMaVi16RHFGXrIBxHCjBbb2YAkHTmraXFOmXYTEHNA4 tp7Wbni7gfyRw0y7l/9Wbu+zCozeGy9fQ28Q7pz1XTQbq6w1stc6kRO0Y3maWPmE8ax/ ftgTMltKXboV+qctvSYXMHSXpAckaHfR4UYMMhnsCOTnwHbekGj/wOzwXhUHZM81/YGW l+iA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:from :subject:to:cc:references:content-language:in-reply-to :content-transfer-encoding; bh=RTfZfeDWSN9OZfR7Sv1ydjBOr9B3JdsI1X9ofrvtdxQ=; b=S8gRUGOga0PliwZhdyAY2Wr9muXWbthONnirlFhi2ZPO7hYG1mPlr9k83vGPHYWqV0 y0j/OM9jTwEX0XhBvP0keFVM/vLSohbTrTqoDax+by6OhENTKbmBeaaZp3j1PQ/7f75L NEQc+LHRrUKZDZWwW2514gptW4ruU7AISaw8HwTdmHU+/qx7KDSOy637RLoJDJFftzM6 PCF11u9RfgNtKoAJvY12a0MP2UFhhQrAoNGEL5/DqxzaIJMJr8QN8ICZVUiQlvklo3aK DoAf/Ekd0X9G8eX9Tf4TWfXQW/Lovsj/FSGqJLWdtcTbL1dKOLeTOI1ZkOlQGWp8QmpZ 6XMw== X-Gm-Message-State: AOAM533RC2KImW9cODssNOAmM+p2ESYHEObKjx8011axARtZEG6ehsAm Xg6CK449m54NLhoEeRZHKf7JNg== X-Google-Smtp-Source: ABdhPJxaTScRvzktio8CGbYDVlIrmnxIimKUk3Ms7a00oEpJtPaXGohF/YVkPrgTtrJu3rtgSJSSQw== X-Received: by 2002:a19:645c:0:b0:479:10e0:72c2 with SMTP id b28-20020a19645c000000b0047910e072c2mr31921225lfj.237.1655098473738; Sun, 12 Jun 2022 22:34:33 -0700 (PDT) Received: from [192.168.1.65] ([46.188.121.129]) by smtp.gmail.com with ESMTPSA id c22-20020a056512325600b0047255d2110fsm847962lfr.62.2022.06.12.22.34.32 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 12 Jun 2022 22:34:33 -0700 (PDT) Message-ID: <4e685057-b07d-745d-fdaa-1a6a5a681060@openvz.org> Date: Mon, 13 Jun 2022 08:34:32 +0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 From: Vasily Averin Subject: [PATCH mm v4 0/9] memcg: accounting for objects allocated by mkdir cgroup To: Andrew Morton Cc: kernel@openvz.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Shakeel Butt , Roman Gushchin , =?UTF-8?Q?Michal_Koutn=c3=bd?= , Vlastimil Babka , Michal Hocko , Muchun Song , cgroups@vger.kernel.org References: <3e1d6eab-57c7-ba3d-67e1-c45aa0dfa2ab@openvz.org> Content-Language: en-US In-Reply-To: <3e1d6eab-57c7-ba3d-67e1-c45aa0dfa2ab@openvz.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655098476; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RTfZfeDWSN9OZfR7Sv1ydjBOr9B3JdsI1X9ofrvtdxQ=; b=JURgA3G8860jFpmZmY9tg7LOpmJPbHjNeR20G95vFQ986n/iidT/wBqYHW3eG5RPaAGn3V HMNZXsaJvtGY5klyurnEkZV/Ti9i/h6dWyEUnyfN0Eyc4F77ozV96qVkoIrNt++NoSHqwp 3TrpReaQTufNjI2aWgrW/NR+YElj4Ls= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=openvz-org.20210112.gappssmtp.com header.s=20210112 header.b=bWa4lC6W; dmarc=pass (policy=none) header.from=openvz.org; spf=pass (imf28.hostedemail.com: domain of vvs@openvz.org designates 209.85.167.49 as permitted sender) smtp.mailfrom=vvs@openvz.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655098476; a=rsa-sha256; cv=none; b=SIaNzFY42wEO+3aR02/Pg6mm62h05v+VoAjPex5HOj6uFY6ntKUQrG6BdpwTqho8OlTDKV Apui13cZyMnzIdSm7AI3GB/DMuVn9gKSwueaQMWreqOzt2SmaSxPqh6yQMI3d4cHA5tKYt 104JyDRW74uN38alo8OtybZWKD70rFA= X-Stat-Signature: xxr6wc6163pgzaudn3yka75hqcrsjbkt X-Rspamd-Queue-Id: 98C4FC00A1 Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=openvz-org.20210112.gappssmtp.com header.s=20210112 header.b=bWa4lC6W; dmarc=pass (policy=none) header.from=openvz.org; spf=pass (imf28.hostedemail.com: domain of vvs@openvz.org designates 209.85.167.49 as permitted sender) smtp.mailfrom=vvs@openvz.org X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1655098475-573299 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In some cases, creating a cgroup allocates a noticeable amount of memory. This operation can be executed from inside memory-limited container, but currently this memory is not accounted to memcg and can be misused. This allow container to exceed the assigned memory limit and avoid memcg OOM. Moreover, in case of global memory shortage on the host, the OOM-killer may not find a real memory eater and start killing random processes on the host. This is especially important for OpenVZ and LXC used on hosting, where containers are used by untrusted end users. Below is tracing results of mkdir /sys/fs/cgroup/vvs.test on 4cpu VM with Fedora and self-complied upstream kernel. The calculations are not precise, it depends on kernel config options, number of cpus, enabled controllers, ignores possible page allocations etc. However this is enough to clarify the general situation. All allocations are splitted into: - common part, always called for each cgroup type - per-cgroup allocations In each group we consider 2 corner cases: - usual allocations, important for 1-2 CPU nodes/Vms - percpu allocations, important for 'big irons' common part: ~11Kb + 318 bytes percpu memcg: ~17Kb + 4692 bytes percpu cpu: ~2.5Kb + 1036 bytes percpu cpuset: ~3Kb + 12 bytes percpu blkcg: ~3Kb + 12 bytes percpu pid: ~1.5Kb + 12 bytes percpu perf: ~320b + 60 bytes percpu ------------------------------------------- total: ~38Kb + 6142 bytes percpu currently accounted: 4668 bytes percpu - it's important to account usual allocations called in common part, because almost all of cgroup-specific allocations are small. One exception here is memory cgroup, it allocates a few huge objects that should be accounted. - Percpu allocation called in common part, in memcg and cpu cgroups should be accounted, rest ones are small an can be ignored. - KERNFS objects are allocated both in common part and in most of cgroups Details can be found here: https://lore.kernel.org/all/d28233ee-bccb-7bc3-c2ec-461fd7f95e6a@openvz.org/ I checked other cgroups types was found that they all can be ignored. Additionally I found allocation of struct rt_rq called in cpu cgroup if CONFIG_RT_GROUP_SCHED was enabled, it allocates huge (~1700 bytes) percpu structure and should be accounted too. v4: 1) re-based to linux-next (next-20220610) now psi_group is not a part of struct cgroup and is allocated on demand 2) added received approval from Muchun Song 3) improved cover letter description according to akpm@ request v3: 1) re-based to current upstream (v5.18-11267-gb00ed48bb0a7) 2) fixed few typos 3) added received approvals v2: 1) re-split to simplify possible bisect, re-ordered 2) added accounting for percpu psi_group_cpu and cgroup_rstat_cpu, allocated in common part 3) added accounting for percpu allocation of struct rt_rq (actual if CONFIG_RT_GROUP_SCHED is enabled) 4) improved patches descriptions Vasily Averin (9): memcg: enable accounting for struct cgroup memcg: enable accounting for kernfs nodes memcg: enable accounting for kernfs iattrs memcg: enable accounting for struct simple_xattr memcg: enable accounting for percpu allocation of struct psi_group_cpu memcg: enable accounting for percpu allocation of struct cgroup_rstat_cpu memcg: enable accounting for large allocations in mem_cgroup_css_alloc memcg: enable accounting for allocations in alloc_fair_sched_group memcg: enable accounting for perpu allocation of struct rt_rq fs/kernfs/mount.c | 6 ++++-- fs/xattr.c | 2 +- kernel/cgroup/cgroup.c | 2 +- kernel/cgroup/rstat.c | 3 ++- kernel/sched/fair.c | 4 ++-- kernel/sched/psi.c | 5 +++-- kernel/sched/rt.c | 2 +- mm/memcontrol.c | 4 ++-- 8 files changed, 16 insertions(+), 12 deletions(-) -- 2.36.1