From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DDB53C433DB for ; Fri, 19 Mar 2021 16:39:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 70E246197D for ; Fri, 19 Mar 2021 16:39:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 70E246197D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 14C6E6B008A; Fri, 19 Mar 2021 12:39:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 124128D0016; Fri, 19 Mar 2021 12:39:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB9878D0015; Fri, 19 Mar 2021 12:39:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0162.hostedemail.com [216.40.44.162]) by kanga.kvack.org (Postfix) with ESMTP id CC6836B008A for ; Fri, 19 Mar 2021 12:39:22 -0400 (EDT) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 85E3B8249980 for ; Fri, 19 Mar 2021 16:39:22 +0000 (UTC) X-FDA: 77937184164.10.CA76F37 Received: from mail-pg1-f170.google.com (mail-pg1-f170.google.com [209.85.215.170]) by imf22.hostedemail.com (Postfix) with ESMTP id 850A0C0007C1 for ; Fri, 19 Mar 2021 16:39:00 +0000 (UTC) Received: by mail-pg1-f170.google.com with SMTP id e33so4049828pgm.13 for ; Fri, 19 Mar 2021 09:39:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=EmBuA/r+QurHtqpfxW42oGsEG8Apvr3KHhKly1xy83o=; b=aZQqcz7JVko6UPPEEBN3nrKpbT45dGZnBrUpOBP61X37n1d9BOG6o0ll5LNlhioyvT El2kFYrz2BIEghgNPJgTH3L0KEourUdLmxs9KX9yMCV5PWo5ymul2d3otYD+frUa/g+w 3zD3e8I9/Vgfvv9uabk4QIfh6trmKEL75AbACojXKKh3ICWUcNoRqOANEAx5x9F4FL9g cdrd0eBw+J8+Gn/4vP7D8Lq+fkJU8aDyN9q3lJP5GCITPrADy53rGAeCr63SsUbj7aht yuR1F1HZryIBHH2ChCqrweDuSJNOgg7CIdxfoa0ezuwyPO3B/Hlz8GdNKA6zYRNZRA85 f97w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=EmBuA/r+QurHtqpfxW42oGsEG8Apvr3KHhKly1xy83o=; b=m0sOR7PrdR+nRUPWOwwlz0NF5MSIDelJ9BjvC/yYOyGJMJoGEko4D5tj6uv0En340v YFXD66QqoB483S8tuM/679XYkCB6O224Er76HHnuh5QJxE5IQSJ0XHiGA4HYycCYgQSy Vp1FVcC6GP8UPmrLue+2nSsGrRNX5KGttj44RuFdhQh2v1rne6UYma8+KAg68K7mTMii ih5EslMIMOC13Hem+15E+6CS03YnnicyRH6i+GNwyTVvMt4grtYUi274CjAohTCSIa3B gzxFYG9lGMRDuR5TGvIzlhWVSDwY9xhzi3fSaiZZVGx1tHmw+H+gMvQc+WOgSU/1pEUr dNyA== X-Gm-Message-State: AOAM533Q1k7/cyCT/j5vc3kTwvUCAyBMu2abK9459W4uOcFL7jowVOTm 5p4egU5E3yzmz6ojakypVCMiCQ== X-Google-Smtp-Source: ABdhPJwiLpxAfQjAZYRWBHTDmUFzm2ukdCITd6xH+YrO7TdxTEhp2chmlPPff0OLWeJn32rNjKW8Xw== X-Received: by 2002:a62:5e05:0:b029:20b:241e:4e18 with SMTP id s5-20020a625e050000b029020b241e4e18mr10016154pfb.1.1616171938301; Fri, 19 Mar 2021 09:38:58 -0700 (PDT) Received: from localhost.bytedance.net ([139.177.225.231]) by smtp.gmail.com with ESMTPSA id z25sm5860239pfn.37.2021.03.19.09.38.54 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 19 Mar 2021 09:38:58 -0700 (PDT) From: Muchun Song To: guro@fb.com, hannes@cmpxchg.org, mhocko@kernel.org, akpm@linux-foundation.org, shakeelb@google.com, vdavydov.dev@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, Muchun Song Subject: [PATCH v5 5/7] mm: memcontrol: use obj_cgroup APIs to charge kmem pages Date: Sat, 20 Mar 2021 00:38:18 +0800 Message-Id: <20210319163821.20704-6-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210319163821.20704-1-songmuchun@bytedance.com> References: <20210319163821.20704-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 850A0C0007C1 X-Stat-Signature: 175euy7eyts3yb5147rmgcjj81inmp15 Received-SPF: none (bytedance.com>: No applicable sender policy available) receiver=imf22; identity=mailfrom; envelope-from=""; helo=mail-pg1-f170.google.com; client-ip=209.85.215.170 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616171940-667440 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Since Roman series "The new cgroup slab memory controller" applied. All slab objects are charged via the new APIs of obj_cgroup. The new APIs introduce a struct obj_cgroup to charge slab objects. It prevents long-living objects from pinning the original memory cgroup in the memory= . But there are still some corner objects (e.g. allocations larger than order-1 page on SLUB) which are not charged via the new APIs. Those objects (include the pages which are allocated from buddy allocator directly) are charged as kmem pages which still hold a reference to the memory cgroup. We want to reuse the obj_cgroup APIs to charge the kmem pages. If we do that, we should store an object cgroup pointer to page->memcg_data for the kmem pages. Finally, page->memcg_data will have 3 different meanings. 1) For the slab pages, page->memcg_data points to an object cgroups vector. 2) For the kmem pages (exclude the slab pages), page->memcg_data points to an object cgroup. 3) For the user pages (e.g. the LRU pages), page->memcg_data points to a memory cgroup. We do not change the behavior of page_memcg() and page_memcg_rcu(). They are also suitable for LRU pages and kmem pages. Why? Because memory allocations pinning memcgs for a long time - it exists at a larger scale and is causing recurring problems in the real world: page cache doesn't get reclaimed for a long time, or is used by the second, third, fourth, ... instance of the same job that was restarted into a new cgroup every time. Unreclaimable dying cgroups pile up, waste memory, and make page reclaim very inefficient. We can convert LRU pages and most other raw memcg pins to the objcg direction to fix this problem, and then the page->memcg will always point to an object cgroup pointer. At that time, LRU pages and kmem pages will be treated the same. The implementation of page_memcg() will remove the kmem page check. This patch aims to charge the kmem pages by using the new APIs of obj_cgroup. Finally, the page->memcg_data of the kmem page points to an object cgroup. We can use the __page_objcg() to get the object cgroup associated with a kmem page. Or we can use page_memcg() to get the memory cgroup associated with a kmem page, but caller must ensure that the returned memcg won't be released (e.g. acquire the rcu_read_lock or css_set_lock). Signed-off-by: Muchun Song Acked-by: Johannes Weiner --- include/linux/memcontrol.h | 116 +++++++++++++++++++++++++++++++++++----= ------ mm/memcontrol.c | 110 +++++++++++++++++++++------------------= --- 2 files changed, 145 insertions(+), 81 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index e6dc793d587d..395a113e4a3b 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -358,6 +358,62 @@ enum page_memcg_data_flags { =20 #define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1) =20 +static inline bool PageMemcgKmem(struct page *page); + +/* + * After the initialization objcg->memcg is always pointing at + * a valid memcg, but can be atomically swapped to the parent memcg. + * + * The caller must ensure that the returned memcg won't be released: + * e.g. acquire the rcu_read_lock or css_set_lock. + */ +static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *obj= cg) +{ + return READ_ONCE(objcg->memcg); +} + +/* + * __page_memcg - get the memory cgroup associated with a non-kmem page + * @page: a pointer to the page struct + * + * Returns a pointer to the memory cgroup associated with the page, + * or NULL. This function assumes that the page is known to have a + * proper memory cgroup pointer. It's not safe to call this function + * against some type of pages, e.g. slab pages or ex-slab pages or + * kmem pages. + */ +static inline struct mem_cgroup *__page_memcg(struct page *page) +{ + unsigned long memcg_data =3D page->memcg_data; + + VM_BUG_ON_PAGE(PageSlab(page), page); + VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page); + VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, page); + + return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); +} + +/* + * __page_objcg - get the object cgroup associated with a kmem page + * @page: a pointer to the page struct + * + * Returns a pointer to the object cgroup associated with the page, + * or NULL. This function assumes that the page is known to have a + * proper object cgroup pointer. It's not safe to call this function + * against some type of pages, e.g. slab pages or ex-slab pages or + * LRU pages. + */ +static inline struct obj_cgroup *__page_objcg(struct page *page) +{ + unsigned long memcg_data =3D page->memcg_data; + + VM_BUG_ON_PAGE(PageSlab(page), page); + VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page); + VM_BUG_ON_PAGE(!(memcg_data & MEMCG_DATA_KMEM), page); + + return (struct obj_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); +} + /* * page_memcg - get the memory cgroup associated with a page * @page: a pointer to the page struct @@ -367,20 +423,23 @@ enum page_memcg_data_flags { * proper memory cgroup pointer. It's not safe to call this function * against some type of pages, e.g. slab pages or ex-slab pages. * - * Any of the following ensures page and memcg binding stability: + * For a non-kmem page any of the following ensures page and memcg bindi= ng + * stability: + * * - the page lock * - LRU isolation * - lock_page_memcg() * - exclusive reference + * + * For a kmem page a caller should hold an rcu read lock to protect memc= g + * associated with a kmem page from being released. */ static inline struct mem_cgroup *page_memcg(struct page *page) { - unsigned long memcg_data =3D page->memcg_data; - - VM_BUG_ON_PAGE(PageSlab(page), page); - VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page); - - return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); + if (PageMemcgKmem(page)) + return obj_cgroup_memcg(__page_objcg(page)); + else + return __page_memcg(page); } =20 /* @@ -394,11 +453,19 @@ static inline struct mem_cgroup *page_memcg(struct = page *page) */ static inline struct mem_cgroup *page_memcg_rcu(struct page *page) { + unsigned long memcg_data =3D READ_ONCE(page->memcg_data); + VM_BUG_ON_PAGE(PageSlab(page), page); WARN_ON_ONCE(!rcu_read_lock_held()); =20 - return (struct mem_cgroup *)(READ_ONCE(page->memcg_data) & - ~MEMCG_DATA_FLAGS_MASK); + if (memcg_data & MEMCG_DATA_KMEM) { + struct obj_cgroup *objcg; + + objcg =3D (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); + return obj_cgroup_memcg(objcg); + } + + return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); } =20 /* @@ -406,15 +473,21 @@ static inline struct mem_cgroup *page_memcg_rcu(str= uct page *page) * @page: a pointer to the page struct * * Returns a pointer to the memory cgroup associated with the page, - * or NULL. This function unlike page_memcg() can take any page + * or NULL. This function unlike page_memcg() can take any page * as an argument. It has to be used in cases when it's not known if a p= age - * has an associated memory cgroup pointer or an object cgroups vector. + * has an associated memory cgroup pointer or an object cgroups vector o= r + * an object cgroup. + * + * For a non-kmem page any of the following ensures page and memcg bindi= ng + * stability: * - * Any of the following ensures page and memcg binding stability: * - the page lock * - LRU isolation * - lock_page_memcg() * - exclusive reference + * + * For a kmem page a caller should hold an rcu read lock to protect memc= g + * associated with a kmem page from being released. */ static inline struct mem_cgroup *page_memcg_check(struct page *page) { @@ -427,6 +500,13 @@ static inline struct mem_cgroup *page_memcg_check(st= ruct page *page) if (memcg_data & MEMCG_DATA_OBJCGS) return NULL; =20 + if (memcg_data & MEMCG_DATA_KMEM) { + struct obj_cgroup *objcg; + + objcg =3D (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); + return obj_cgroup_memcg(objcg); + } + return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); } =20 @@ -713,18 +793,6 @@ static inline void obj_cgroup_put(struct obj_cgroup = *objcg) percpu_ref_put(&objcg->refcnt); } =20 -/* - * After the initialization objcg->memcg is always pointing at - * a valid memcg, but can be atomically swapped to the parent memcg. - * - * The caller must ensure that the returned memcg won't be released: - * e.g. acquire the rcu_read_lock or css_set_lock. - */ -static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *obj= cg) -{ - return READ_ONCE(objcg->memcg); -} - static inline void mem_cgroup_put(struct mem_cgroup *memcg) { if (memcg) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8d28a5a2ee58..962499542531 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -855,18 +855,22 @@ void __mod_lruvec_page_state(struct page *page, enu= m node_stat_item idx, int val) { struct page *head =3D compound_head(page); /* rmap on tail pages */ - struct mem_cgroup *memcg =3D page_memcg(head); + struct mem_cgroup *memcg; pg_data_t *pgdat =3D page_pgdat(page); struct lruvec *lruvec; =20 + rcu_read_lock(); + memcg =3D page_memcg(head); /* Untracked pages have no memcg, no lruvec. Update only the node */ if (!memcg) { + rcu_read_unlock(); __mod_node_page_state(pgdat, idx, val); return; } =20 lruvec =3D mem_cgroup_lruvec(memcg, pgdat); __mod_lruvec_state(lruvec, idx, val); + rcu_read_unlock(); } EXPORT_SYMBOL(__mod_lruvec_page_state); =20 @@ -1055,20 +1059,6 @@ static __always_inline struct mem_cgroup *active_m= emcg(void) return current->active_memcg; } =20 -static __always_inline struct mem_cgroup *get_active_memcg(void) -{ - struct mem_cgroup *memcg; - - rcu_read_lock(); - memcg =3D active_memcg(); - /* remote memcg must hold a ref. */ - if (memcg && WARN_ON_ONCE(!css_tryget(&memcg->css))) - memcg =3D root_mem_cgroup; - rcu_read_unlock(); - - return memcg; -} - static __always_inline bool memcg_kmem_bypass(void) { /* Allow remote memcg charging from any context. */ @@ -1083,20 +1073,6 @@ static __always_inline bool memcg_kmem_bypass(void= ) } =20 /** - * If active memcg is set, do not fallback to current->mm->memcg. - */ -static __always_inline struct mem_cgroup *get_mem_cgroup_from_current(vo= id) -{ - if (memcg_kmem_bypass()) - return NULL; - - if (unlikely(active_memcg())) - return get_active_memcg(); - - return get_mem_cgroup_from_mm(current->mm); -} - -/** * mem_cgroup_iter - iterate over memory cgroup hierarchy * @root: hierarchy root * @prev: previously returned memcg, NULL on first invocation @@ -3152,18 +3128,18 @@ static void __memcg_kmem_uncharge(struct mem_cgro= up *memcg, unsigned int nr_page */ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order) { - struct mem_cgroup *memcg; + struct obj_cgroup *objcg; int ret =3D 0; =20 - memcg =3D get_mem_cgroup_from_current(); - if (memcg && !mem_cgroup_is_root(memcg)) { - ret =3D __memcg_kmem_charge(memcg, gfp, 1 << order); + objcg =3D get_obj_cgroup_from_current(); + if (objcg) { + ret =3D obj_cgroup_charge_pages(objcg, gfp, 1 << order); if (!ret) { - page->memcg_data =3D (unsigned long)memcg | + page->memcg_data =3D (unsigned long)objcg | MEMCG_DATA_KMEM; return 0; } - css_put(&memcg->css); + obj_cgroup_put(objcg); } return ret; } @@ -3175,16 +3151,16 @@ int __memcg_kmem_charge_page(struct page *page, g= fp_t gfp, int order) */ void __memcg_kmem_uncharge_page(struct page *page, int order) { - struct mem_cgroup *memcg =3D page_memcg(page); + struct obj_cgroup *objcg; unsigned int nr_pages =3D 1 << order; =20 - if (!memcg) + if (!PageMemcgKmem(page)) return; =20 - VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page); - __memcg_kmem_uncharge(memcg, nr_pages); + objcg =3D __page_objcg(page); + obj_cgroup_uncharge_pages(objcg, nr_pages); page->memcg_data =3D 0; - css_put(&memcg->css); + obj_cgroup_put(objcg); } =20 static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_= bytes) @@ -6799,7 +6775,7 @@ int mem_cgroup_charge(struct page *page, struct mm_= struct *mm, gfp_t gfp_mask) =20 struct uncharge_gather { struct mem_cgroup *memcg; - unsigned long nr_pages; + unsigned long nr_memory; unsigned long pgpgout; unsigned long nr_kmem; struct page *dummy_page; @@ -6814,10 +6790,10 @@ static void uncharge_batch(const struct uncharge_= gather *ug) { unsigned long flags; =20 - if (!mem_cgroup_is_root(ug->memcg)) { - page_counter_uncharge(&ug->memcg->memory, ug->nr_pages); + if (ug->nr_memory) { + page_counter_uncharge(&ug->memcg->memory, ug->nr_memory); if (do_memsw_account()) - page_counter_uncharge(&ug->memcg->memsw, ug->nr_pages); + page_counter_uncharge(&ug->memcg->memsw, ug->nr_memory); if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem) page_counter_uncharge(&ug->memcg->kmem, ug->nr_kmem); memcg_oom_recover(ug->memcg); @@ -6825,7 +6801,7 @@ static void uncharge_batch(const struct uncharge_ga= ther *ug) =20 local_irq_save(flags); __count_memcg_events(ug->memcg, PGPGOUT, ug->pgpgout); - __this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, ug->nr_pages)= ; + __this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, ug->nr_memory= ); memcg_check_events(ug->memcg, ug->dummy_page); local_irq_restore(flags); =20 @@ -6836,40 +6812,60 @@ static void uncharge_batch(const struct uncharge_= gather *ug) static void uncharge_page(struct page *page, struct uncharge_gather *ug) { unsigned long nr_pages; + struct mem_cgroup *memcg; + struct obj_cgroup *objcg; =20 VM_BUG_ON_PAGE(PageLRU(page), page); =20 - if (!page_memcg(page)) - return; - /* * Nobody should be changing or seriously looking at - * page_memcg(page) at this point, we have fully + * page memcg or objcg at this point, we have fully * exclusive access to the page. */ + if (PageMemcgKmem(page)) { + objcg =3D __page_objcg(page); + /* + * This get matches the put at the end of the function and + * kmem pages do not hold memcg references anymore. + */ + memcg =3D get_mem_cgroup_from_objcg(objcg); + } else { + memcg =3D __page_memcg(page); + } =20 - if (ug->memcg !=3D page_memcg(page)) { + if (!memcg) + return; + + if (ug->memcg !=3D memcg) { if (ug->memcg) { uncharge_batch(ug); uncharge_gather_clear(ug); } - ug->memcg =3D page_memcg(page); + ug->memcg =3D memcg; ug->dummy_page =3D page; =20 /* pairs with css_put in uncharge_batch */ - css_get(&ug->memcg->css); + css_get(&memcg->css); } =20 nr_pages =3D compound_nr(page); - ug->nr_pages +=3D nr_pages; =20 - if (PageMemcgKmem(page)) + if (PageMemcgKmem(page)) { + ug->nr_memory +=3D nr_pages; ug->nr_kmem +=3D nr_pages; - else + + page->memcg_data =3D 0; + obj_cgroup_put(objcg); + } else { + /* LRU pages aren't accounted at the root level */ + if (!mem_cgroup_is_root(memcg)) + ug->nr_memory +=3D nr_pages; ug->pgpgout++; =20 - page->memcg_data =3D 0; - css_put(&ug->memcg->css); + page->memcg_data =3D 0; + } + + css_put(&memcg->css); } =20 /** --=20 2.11.0