From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D552BC433DB for ; Tue, 2 Mar 2021 02:50:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 474E964DD0 for ; Tue, 2 Mar 2021 02:50:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 474E964DD0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 811228D00CE; Mon, 1 Mar 2021 21:50:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7C2D58D0063; Mon, 1 Mar 2021 21:50:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 68B028D00CE; Mon, 1 Mar 2021 21:50:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0083.hostedemail.com [216.40.44.83]) by kanga.kvack.org (Postfix) with ESMTP id 500368D0063 for ; Mon, 1 Mar 2021 21:50:45 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 048A3180AD80F for ; Tue, 2 Mar 2021 02:50:45 +0000 (UTC) X-FDA: 77873406450.24.C60462F Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf30.hostedemail.com (Postfix) with ESMTP id 6B11EE0011C5 for ; Tue, 2 Mar 2021 02:50:41 +0000 (UTC) Received: by mail-pl1-f176.google.com with SMTP id e9so11156274plh.3 for ; Mon, 01 Mar 2021 18:50:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=SzsqJhnliez1tKlON8yLC5F6USIx2iyCxc12/4u/r+c=; b=I4jmhul4Z5yg+xo9mAznhlNQXfF9s8TJskKTSIYrPYUwxVyIOcIlOkHWTsBaQVpGvf zDfHR7YnJY4d4j+veaCOxSfnf0WdJ3ID5HmWGdqqfn4X9IRnN3/RLi+lxjq9EkL/uYJg Ok1k3/niPacZp26eNMSpbDoq3GpfxW/XQqR6xZjE05+hd3yztSnH3lGoSdP8q0vzU8mV ti376Ul89wmAe/aqsDWlTzuj21DVQu/MYm6lieeS66zGszHG8yOjJilhLpYNAckcid0b vaMpKvLB8/KzK02Oz61Wzk23FWx6sno07D+PyBkHlLt9mXti8KpMz9jXfJ37NeirOLtu 9sVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=SzsqJhnliez1tKlON8yLC5F6USIx2iyCxc12/4u/r+c=; b=ElXKmKukIJSw6C6YAy8kGk7qKaLCEYNpTBi+wuw8SOpol3YnL95GR0oFdSxaizNQVj yOimeS6UKsLXrkP6rBXZtvqT20NBagmghgF3qRO1PtN94w97EYNGS4Q3mhlawdpQVJth M0LvBg4VrnvBOAOyBMJbbXfWjsDAQ4adGYzqWuoDxxEIVKAl4vMYYVclF5LeY3N1DwDw NkeiKjdlADi7nzRJv5A3/lvtR4M337Aeaq6TH0Zl3e6c/cP1kG5W9jvkUYw79eg/3G9V uzk5mnBxkHtiVYzKb/IhU+sGn7Xigm9yduvL/Kt7HRIAWLIVcKokzSi65Ug6kXMRn1xB OhSg== X-Gm-Message-State: AOAM533miGQaN8zN910iSbadewz7ELVaDafGQXh+T8ivnPbxnjVjDmF2 +IAwQ6xeg/XtdRyLMBTg3c8qp7o3lDqWI5/ZwV0Mhw== X-Google-Smtp-Source: ABdhPJwO4U6Vuif29dmFk6pc2ND9DlOcJXGnqC3Sy0dnPR0PpJ3LuKt8LwGDKRqjDVvzgWiY/91mipCndD5BXqlSM8E= X-Received: by 2002:a17:902:e54e:b029:e3:9f84:db8e with SMTP id n14-20020a170902e54eb02900e39f84db8emr1508267plf.24.1614653440529; Mon, 01 Mar 2021 18:50:40 -0800 (PST) MIME-Version: 1.0 References: <20210301062227.59292-1-songmuchun@bytedance.com> In-Reply-To: From: Muchun Song Date: Tue, 2 Mar 2021 10:50:04 +0800 Message-ID: Subject: Re: [External] Re: [PATCH 0/5] Use obj_cgroup APIs to change kmem pages To: Roman Gushchin Cc: viro@zeniv.linux.org.uk, Jan Kara , amir73il@gmail.com, Alexei Starovoitov , Daniel Borkmann , andrii@kernel.org, Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , kpsingh@kernel.org, mingo@redhat.com, Peter Zijlstra , juri.lelli@redhat.com, Vincent Guittot , dietmar.eggemann@arm.com, Steven Rostedt , Benjamin Segall , mgorman@suse.de, bristot@redhat.com, Johannes Weiner , Michal Hocko , Vladimir Davydov , Andrew Morton , Shakeel Butt , Alex Shi , Chris Down , richard.weiyang@gmail.com, Vlastimil Babka , mathieu.desnoyers@efficios.com, posk@google.com, Jann Horn , Joonsoo Kim , Daniel Vetter , longman@redhat.com, Michel Lespinasse , Christian Brauner , "Eric W. Biederman" , Kees Cook , krisman@collabora.com, esyr@redhat.com, Suren Baghdasaryan , Marco Elver , linux-fsdevel , LKML , Networking , bpf , Cgroups , Linux Memory Management List , Xiongchun duan Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: u5bb8dfdohwnjwd7bzzxokog7zq9qm19 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 6B11EE0011C5 Received-SPF: none (bytedance.com>: No applicable sender policy available) receiver=imf30; identity=mailfrom; envelope-from=""; helo=mail-pl1-f176.google.com; client-ip=209.85.214.176 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1614653441-215491 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Mar 2, 2021 at 9:12 AM Roman Gushchin wrote: > > Hi Muchun! > > On Mon, Mar 01, 2021 at 02:22:22PM +0800, Muchun Song wrote: > > Since Roman series "The new cgroup slab memory controller" applied. All > > slab objects are changed via the new APIs of obj_cgroup. This new APIs > > introduce a struct obj_cgroup instead of using struct mem_cgroup directly > > to charge slab objects. It prevents long-living objects from pinning the > > original memory cgroup in the memory. But there are still some corner > > objects (e.g. allocations larger than order-1 page on SLUB) which are > > not charged via the API of obj_cgroup. Those objects (include the pages > > which are allocated from buddy allocator directly) are charged as kmem > > pages which still hold a reference to the memory cgroup. > > Yes, this is a good idea, large kmallocs should be treated the same > way as small ones. > > > > > E.g. We know that the kernel stack is charged as kmem pages because the > > size of the kernel stack can be greater than 2 pages (e.g. 16KB on x86_64 > > or arm64). If we create a thread (suppose the thread stack is charged to > > memory cgroup A) and then move it from memory cgroup A to memory cgroup > > B. Because the kernel stack of the thread hold a reference to the memory > > cgroup A. The thread can pin the memory cgroup A in the memory even if > > we remove the cgroup A. If we want to see this scenario by using the > > following script. We can see that the system has added 500 dying cgroups. > > > > #!/bin/bash > > > > cat /proc/cgroups | grep memory > > > > cd /sys/fs/cgroup/memory > > echo 1 > memory.move_charge_at_immigrate > > > > for i in range{1..500} > > do > > mkdir kmem_test > > echo $$ > kmem_test/cgroup.procs > > sleep 3600 & > > echo $$ > cgroup.procs > > echo `cat kmem_test/cgroup.procs` > cgroup.procs > > rmdir kmem_test > > done > > > > cat /proc/cgroups | grep memory > > Well, moving processes between cgroups always created a lot of issues > and corner cases and this one is definitely not the worst. So this problem > looks a bit artificial, unless I'm missing something. But if it doesn't > introduce any new performance costs and doesn't make the code more complex, > I have nothing against. OK. I just want to show that large kmallocs are charged as kmem pages. So I constructed this test case. > > Btw, can you, please, run the spell-checker on commit logs? There are many > typos (starting from the title of the series, I guess), which make the patchset > look less appealing. Sorry for my poor English. I will do that. Thanks for your suggestions. > > Thank you! > > > > > This patchset aims to make those kmem pages drop the reference to memory > > cgroup by using the APIs of obj_cgroup. Finally, we can see that the number > > of the dying cgroups will not increase if we run the above test script. > > > > Patch 1-3 are using obj_cgroup APIs to charge kmem pages. The remote > > memory cgroup charing APIs is a mechanism to charge kernel memory to a > > given memory cgroup. So I also make it use the APIs of obj_cgroup. > > Patch 4-5 are doing this. > > > > Muchun Song (5): > > mm: memcontrol: introduce obj_cgroup_{un}charge_page > > mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem > > page > > mm: memcontrol: reparent the kmem pages on cgroup removal > > mm: memcontrol: move remote memcg charging APIs to CONFIG_MEMCG_KMEM > > mm: memcontrol: use object cgroup for remote memory cgroup charging > > > > fs/buffer.c | 10 +- > > fs/notify/fanotify/fanotify.c | 6 +- > > fs/notify/fanotify/fanotify_user.c | 2 +- > > fs/notify/group.c | 3 +- > > fs/notify/inotify/inotify_fsnotify.c | 8 +- > > fs/notify/inotify/inotify_user.c | 2 +- > > include/linux/bpf.h | 2 +- > > include/linux/fsnotify_backend.h | 2 +- > > include/linux/memcontrol.h | 109 +++++++++++--- > > include/linux/sched.h | 6 +- > > include/linux/sched/mm.h | 30 ++-- > > kernel/bpf/syscall.c | 35 ++--- > > kernel/fork.c | 4 +- > > mm/memcontrol.c | 276 ++++++++++++++++++++++------------- > > mm/page_alloc.c | 4 +- > > 15 files changed, 324 insertions(+), 175 deletions(-) > > > > -- > > 2.11.0 > >