From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0545C433E0 for ; Mon, 1 Mar 2021 06:25:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E666164DF1 for ; Mon, 1 Mar 2021 06:25:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E666164DF1 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7A7D88D0041; Mon, 1 Mar 2021 01:25:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 732248D0040; Mon, 1 Mar 2021 01:25:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5D3258D0041; Mon, 1 Mar 2021 01:25:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0202.hostedemail.com [216.40.44.202]) by kanga.kvack.org (Postfix) with ESMTP id 430F38D0040 for ; Mon, 1 Mar 2021 01:25:22 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id F0DF8180ACF76 for ; Mon, 1 Mar 2021 06:25:21 +0000 (UTC) X-FDA: 77870318442.27.6DA4D67 Received: from mail-pj1-f54.google.com (mail-pj1-f54.google.com [209.85.216.54]) by imf28.hostedemail.com (Postfix) with ESMTP id 59F762000382 for ; Mon, 1 Mar 2021 06:25:21 +0000 (UTC) Received: by mail-pj1-f54.google.com with SMTP id o6so11034339pjf.5 for ; Sun, 28 Feb 2021 22:25:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=wh9K8k/cLdEa+pvuxX0zgE5b842GEaAUv1TfPTB1I1E=; b=a1njz4rSfNhYNlrVmnxZvIibOJ3N5IqT3x9MOJOzivPWjvWO7cRRE4PQQdR1+f2mBM E0Jkya+zdkPfmtLMf9tMZGa6ujnyrWnvRcaGnsVip8Nm3c8C5Zn3odFwF1uuAHlzMiYP P9Eps74BbRMqv2xdOugMvWCjfQCjogdmgHuo4KAKShWvT7QHAgL8K9X41llRif3QFHN1 NNG+ohZ4yQsg8hp8jHSmKkinxDvnoEDptm5WTFkiXI1/fCzDo4YnjXgHAHQRxUny4u9B WJDupUt8w2sSdcLCT5kr/LZ3bTPVbyJ27SY8iotwHvJWdXL9DasADNr/tiPfSHK30r8v 8spA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=wh9K8k/cLdEa+pvuxX0zgE5b842GEaAUv1TfPTB1I1E=; b=uY4h5l3npSJqdumDHeNXcC+emHjSfEPSsb6B5O+jFNh88Iaqx6+r9/z7vbHG54U7Hr 71QuywvfnQKhoGYozWHMDtDwJ86SKiH1jomnRuHKfygNMxoIMz7HIV/7s+WDDxBFHc5O zamd8IHbplBtJW0MSQNpTrOyryT4wYVMR4ci2FEfzGFTX9/Jd91bQ611zXIfzScfvtKU ggppYH8ktRJ9mH2ZhUCzv+oLzwkOTOuxVn3qT3Nmdp8GzAUNd+/qCn1yukHv2yuQGsSO htHWniYpVWgt+LVNIZCvBHwBmLeoSMcQXVoQpPS/eSGkFenSltx1ySlvPdTYuT1GhaSG UAjA== X-Gm-Message-State: AOAM5304Urokc00Plu0Onj7yzcGzMe+cdIm3AWRUMjFhNuEtci/yfj8R bVzviR64qx0GYcLjj26//dk6UQ== X-Google-Smtp-Source: ABdhPJz2R6TNlSFuQyimvuh+McsZ5qntIYM0Gtg8I9zxelrrwm2OitPXd7CeqNORMhI0Hu+T/1UfIA== X-Received: by 2002:a17:902:ed82:b029:e2:d106:e76e with SMTP id e2-20020a170902ed82b02900e2d106e76emr14406195plj.38.1614579920255; Sun, 28 Feb 2021 22:25:20 -0800 (PST) Received: from localhost.localdomain ([139.177.225.227]) by smtp.gmail.com with ESMTPSA id x6sm14304626pfd.12.2021.02.28.22.25.05 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 28 Feb 2021 22:25:19 -0800 (PST) From: Muchun Song To: viro@zeniv.linux.org.uk, jack@suse.cz, amir73il@gmail.com, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com, akpm@linux-foundation.org, shakeelb@google.com, guro@fb.com, songmuchun@bytedance.com, alex.shi@linux.alibaba.com, alexander.h.duyck@linux.intel.com, chris@chrisdown.name, richard.weiyang@gmail.com, vbabka@suse.cz, mathieu.desnoyers@efficios.com, posk@google.com, jannh@google.com, iamjoonsoo.kim@lge.com, daniel.vetter@ffwll.ch, longman@redhat.com, walken@google.com, christian.brauner@ubuntu.com, ebiederm@xmission.com, keescook@chromium.org, krisman@collabora.com, esyr@redhat.com, surenb@google.com, elver@google.com Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com Subject: [PATCH 3/5] mm: memcontrol: reparent the kmem pages on cgroup removal Date: Mon, 1 Mar 2021 14:22:25 +0800 Message-Id: <20210301062227.59292-4-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210301062227.59292-1-songmuchun@bytedance.com> References: <20210301062227.59292-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 59F762000382 X-Stat-Signature: 6jh7xncdx5498f5134hndswjymxdcx4u Received-SPF: none (bytedance.com>: No applicable sender policy available) receiver=imf28; identity=mailfrom; envelope-from=""; helo=mail-pj1-f54.google.com; client-ip=209.85.216.54 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1614579921-98684 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently the slab objects already reparent to it's parent memcg on cgroup removal. But there are still some corner objects which are not reparent (e.g. allocations larger than order-1 page on SLUB). Actually those objects are allocated directly from the buddy allocator. And they are chared as kmem to memcg via __memcg_kmem_charge_page(). Such objects are not reparent on cgroup removal. So this patch aims to reparent kmem pages on cgroup removal. Doing this is simple with help of the infrastructures of obj_cgroup. Finally, the page->memcg_data points to an object cgroup for the kmem page. Signed-off-by: Muchun Song --- include/linux/memcontrol.h | 66 +++++++++++-------- mm/memcontrol.c | 155 ++++++++++++++++++++++++---------------= ------ 2 files changed, 124 insertions(+), 97 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 1d2c82464c8c..27043478220f 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -370,23 +370,15 @@ static inline bool page_memcg_charged(struct page *= page) } =20 /* - * page_memcg_kmem - get the memory cgroup associated with a kmem page. - * @page: a pointer to the page struct + * After the initialization objcg->memcg is always pointing at + * a valid memcg, but can be atomically swapped to the parent memcg. * - * Returns a pointer to the memory cgroup associated with the kmem page, - * or NULL. This function assumes that the page is known to have a prope= r - * memory cgroup pointer. It is only suitable for kmem pages which means - * PageMemcgKmem() returns true for this page. + * The caller must ensure that the returned memcg won't be released: + * e.g. acquire the rcu_read_lock or css_set_lock. */ -static inline struct mem_cgroup *page_memcg_kmem(struct page *page) +static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *obj= cg) { - unsigned long memcg_data =3D page->memcg_data; - - VM_BUG_ON_PAGE(PageSlab(page), page); - VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page); - VM_BUG_ON_PAGE(!(memcg_data & MEMCG_DATA_KMEM), page); - - return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); + return READ_ONCE(objcg->memcg); } =20 /* @@ -462,6 +454,17 @@ static inline struct mem_cgroup *page_memcg_check(st= ruct page *page) if (memcg_data & MEMCG_DATA_OBJCGS) return NULL; =20 + if (memcg_data & MEMCG_DATA_KMEM) { + struct obj_cgroup *objcg; + + /* + * The caller must ensure that the returned memcg won't be + * released: e.g. acquire the rcu_read_lock or css_set_lock. + */ + objcg =3D (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); + return obj_cgroup_memcg(objcg); + } + return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); } =20 @@ -520,6 +523,24 @@ static inline struct obj_cgroup **page_objcgs_check(= struct page *page) return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); } =20 +/* + * page_objcg - get the object cgroup associated with a kmem page + * @page: a pointer to the page struct + * + * Returns a pointer to the object cgroup associated with the kmem page, + * or NULL. This function assumes that the page is known to have an + * associated object cgroup. It's only safe to call this function + * against kmem pages (PageMemcgKmem() returns true). + */ +static inline struct obj_cgroup *page_objcg(struct page *page) +{ + unsigned long memcg_data =3D page->memcg_data; + + VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page); + VM_BUG_ON_PAGE(!(memcg_data & MEMCG_DATA_KMEM), page); + + return (struct obj_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); +} #else static inline struct obj_cgroup **page_objcgs(struct page *page) { @@ -530,6 +551,11 @@ static inline struct obj_cgroup **page_objcgs_check(= struct page *page) { return NULL; } + +static inline struct obj_cgroup *page_objcg(struct page *page) +{ + return NULL; +} #endif =20 static __always_inline bool memcg_stat_item_in_bytes(int idx) @@ -748,18 +774,6 @@ static inline void obj_cgroup_put(struct obj_cgroup = *objcg) percpu_ref_put(&objcg->refcnt); } =20 -/* - * After the initialization objcg->memcg is always pointing at - * a valid memcg, but can be atomically swapped to the parent memcg. - * - * The caller must ensure that the returned memcg won't be released: - * e.g. acquire the rcu_read_lock or css_set_lock. - */ -static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *obj= cg) -{ - return READ_ONCE(objcg->memcg); -} - static inline void mem_cgroup_put(struct mem_cgroup *memcg) { if (memcg) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index bfd6efe1e196..39cb8c5bf8b2 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -856,10 +856,16 @@ void __mod_lruvec_page_state(struct page *page, enu= m node_stat_item idx, { struct page *head =3D compound_head(page); /* rmap on tail pages */ struct mem_cgroup *memcg; - pg_data_t *pgdat =3D page_pgdat(page); + pg_data_t *pgdat; struct lruvec *lruvec; =20 - memcg =3D PageMemcgKmem(head) ? page_memcg_kmem(head) : page_memcg(head= ); + if (PageMemcgKmem(head)) { + __mod_lruvec_kmem_state(page_to_virt(head), idx, val); + return; + } + + pgdat =3D page_pgdat(head); + memcg =3D page_memcg(head); /* Untracked pages have no memcg, no lruvec. Update only the node */ if (!memcg) { __mod_node_page_state(pgdat, idx, val); @@ -1056,24 +1062,6 @@ static __always_inline struct mem_cgroup *active_m= emcg(void) return current->active_memcg; } =20 -static __always_inline struct mem_cgroup *get_active_memcg(void) -{ - struct mem_cgroup *memcg; - - rcu_read_lock(); - memcg =3D active_memcg(); - if (memcg) { - /* current->active_memcg must hold a ref. */ - if (WARN_ON_ONCE(!css_tryget(&memcg->css))) - memcg =3D root_mem_cgroup; - else - memcg =3D current->active_memcg; - } - rcu_read_unlock(); - - return memcg; -} - static __always_inline bool memcg_kmem_bypass(void) { /* Allow remote memcg charging from any context. */ @@ -1088,20 +1076,6 @@ static __always_inline bool memcg_kmem_bypass(void= ) } =20 /** - * If active memcg is set, do not fallback to current->mm->memcg. - */ -static __always_inline struct mem_cgroup *get_mem_cgroup_from_current(vo= id) -{ - if (memcg_kmem_bypass()) - return NULL; - - if (unlikely(active_memcg())) - return get_active_memcg(); - - return get_mem_cgroup_from_mm(current->mm); -} - -/** * mem_cgroup_iter - iterate over memory cgroup hierarchy * @root: hierarchy root * @prev: previously returned memcg, NULL on first invocation @@ -3148,18 +3122,18 @@ static void __memcg_kmem_uncharge(struct mem_cgro= up *memcg, unsigned int nr_page */ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order) { - struct mem_cgroup *memcg; + struct obj_cgroup *objcg; int ret =3D 0; =20 - memcg =3D get_mem_cgroup_from_current(); - if (memcg && !mem_cgroup_is_root(memcg)) { - ret =3D __memcg_kmem_charge(memcg, gfp, 1 << order); + objcg =3D get_obj_cgroup_from_current(); + if (objcg) { + ret =3D obj_cgroup_charge_page(objcg, gfp, 1 << order); if (!ret) { - page->memcg_data =3D (unsigned long)memcg | + page->memcg_data =3D (unsigned long)objcg | MEMCG_DATA_KMEM; return 0; } - css_put(&memcg->css); + obj_cgroup_put(objcg); } return ret; } @@ -3171,17 +3145,18 @@ int __memcg_kmem_charge_page(struct page *page, g= fp_t gfp, int order) */ void __memcg_kmem_uncharge_page(struct page *page, int order) { - struct mem_cgroup *memcg; + struct obj_cgroup *objcg; unsigned int nr_pages =3D 1 << order; =20 if (!page_memcg_charged(page)) return; =20 - memcg =3D page_memcg_kmem(page); - VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page); - __memcg_kmem_uncharge(memcg, nr_pages); + VM_BUG_ON_PAGE(!PageMemcgKmem(page), page); + + objcg =3D page_objcg(page); + obj_cgroup_uncharge_page(objcg, nr_pages); page->memcg_data =3D 0; - css_put(&memcg->css); + obj_cgroup_put(objcg); } =20 static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_= bytes) @@ -6798,8 +6773,12 @@ struct uncharge_gather { struct mem_cgroup *memcg; unsigned long nr_pages; unsigned long pgpgout; - unsigned long nr_kmem; struct page *dummy_page; + +#ifdef CONFIG_MEMCG_KMEM + struct obj_cgroup *objcg; + unsigned long nr_kmem; +#endif }; =20 static inline void uncharge_gather_clear(struct uncharge_gather *ug) @@ -6811,12 +6790,21 @@ static void uncharge_batch(const struct uncharge_= gather *ug) { unsigned long flags; =20 +#ifdef CONFIG_MEMCG_KMEM + if (ug->objcg) { + obj_cgroup_uncharge_page(ug->objcg, ug->nr_kmem); + /* drop reference from uncharge_kmem_page */ + obj_cgroup_put(ug->objcg); + } +#endif + + if (!ug->memcg) + return; + if (!mem_cgroup_is_root(ug->memcg)) { page_counter_uncharge(&ug->memcg->memory, ug->nr_pages); if (do_memsw_account()) page_counter_uncharge(&ug->memcg->memsw, ug->nr_pages); - if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem) - page_counter_uncharge(&ug->memcg->kmem, ug->nr_kmem); memcg_oom_recover(ug->memcg); } =20 @@ -6826,26 +6814,40 @@ static void uncharge_batch(const struct uncharge_= gather *ug) memcg_check_events(ug->memcg, ug->dummy_page); local_irq_restore(flags); =20 - /* drop reference from uncharge_page */ + /* drop reference from uncharge_user_page */ css_put(&ug->memcg->css); } =20 -static void uncharge_page(struct page *page, struct uncharge_gather *ug) +#ifdef CONFIG_MEMCG_KMEM +static void uncharge_kmem_page(struct page *page, struct uncharge_gather= *ug) { - unsigned long nr_pages; - struct mem_cgroup *memcg; + struct obj_cgroup *objcg =3D page_objcg(page); =20 - VM_BUG_ON_PAGE(PageLRU(page), page); + if (ug->objcg !=3D objcg) { + if (ug->objcg) { + uncharge_batch(ug); + uncharge_gather_clear(ug); + } + ug->objcg =3D objcg; =20 - if (!page_memcg_charged(page)) - return; + /* pairs with obj_cgroup_put in uncharge_batch */ + obj_cgroup_get(ug->objcg); + } + + ug->nr_kmem +=3D compound_nr(page); + page->memcg_data =3D 0; + obj_cgroup_put(ug->objcg); +} +#else +static void uncharge_kmem_page(struct page *page, struct uncharge_gather= *ug) +{ +} +#endif + +static void uncharge_user_page(struct page *page, struct uncharge_gather= *ug) +{ + struct mem_cgroup *memcg =3D page_memcg(page); =20 - /* - * Nobody should be changing or seriously looking at - * page memcg at this point, we have fully exclusive - * access to the page. - */ - memcg =3D PageMemcgKmem(page) ? page_memcg_kmem(page) : page_memcg(page= ); if (ug->memcg !=3D memcg) { if (ug->memcg) { uncharge_batch(ug); @@ -6856,18 +6858,30 @@ static void uncharge_page(struct page *page, stru= ct uncharge_gather *ug) /* pairs with css_put in uncharge_batch */ css_get(&ug->memcg->css); } + ug->pgpgout++; + ug->dummy_page =3D page; + + ug->nr_pages +=3D compound_nr(page); + page->memcg_data =3D 0; + css_put(&ug->memcg->css); +} =20 - nr_pages =3D compound_nr(page); - ug->nr_pages +=3D nr_pages; +static void uncharge_page(struct page *page, struct uncharge_gather *ug) +{ + VM_BUG_ON_PAGE(PageLRU(page), page); =20 + if (!page_memcg_charged(page)) + return; + + /* + * Nobody should be changing or seriously looking at + * page memcg at this point, we have fully exclusive + * access to the page. + */ if (PageMemcgKmem(page)) - ug->nr_kmem +=3D nr_pages; + uncharge_kmem_page(page, ug); else - ug->pgpgout++; - - ug->dummy_page =3D page; - page->memcg_data =3D 0; - css_put(&ug->memcg->css); + uncharge_user_page(page, ug); } =20 /** @@ -6910,8 +6924,7 @@ void mem_cgroup_uncharge_list(struct list_head *pag= e_list) uncharge_gather_clear(&ug); list_for_each_entry(page, page_list, lru) uncharge_page(page, &ug); - if (ug.memcg) - uncharge_batch(&ug); + uncharge_batch(&ug); } =20 /** --=20 2.11.0