From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88382E8FDBD for ; Tue, 3 Oct 2023 22:09:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9964B6B020F; Tue, 3 Oct 2023 18:09:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 91EED6B0246; Tue, 3 Oct 2023 18:09:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7C02B6B0248; Tue, 3 Oct 2023 18:09:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 667A16B020F for ; Tue, 3 Oct 2023 18:09:50 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 371EC40485 for ; Tue, 3 Oct 2023 22:09:50 +0000 (UTC) X-FDA: 81305543340.09.0D9932B Received: from mail-io1-f51.google.com (mail-io1-f51.google.com [209.85.166.51]) by imf22.hostedemail.com (Postfix) with ESMTP id 7E97CC002B for ; Tue, 3 Oct 2023 22:09:48 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=OTl1KAWU; spf=pass (imf22.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.51 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696370988; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dsg9xzxqKGmu4TINwdzeQuCMMXAOdP/P1mEVIlXmoM0=; b=V/VuLYR0lQKdG2F9Ew5STyUNXeuzrPJl8X+hE0xg8AYe3+8Tx3GQc8qpBf9Hc/+UOaLW7v uhOT7ZAphX1hXsQZXigRZBgRDl/w360yM7NO1L9JE3S786upxMYlACL+UROY18eLog4GI2 dMkRHXeyIPobrwcehpwWAm1O/g/Hwkg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696370988; a=rsa-sha256; cv=none; b=gxd5Yd2D/WX+Eai8AnysWeBVdXJ2eBo+WSge+WdeciewNdPaVH/hf3v79MLYVdUxfS/sSY FpYmtgBNLvyRXcxyfzj1vZg96nInXwNoIy016CBEgNO4GKR8bJHGlo3+KtBcVoX43S8o7G ktSJYDFy6h3obTNxiA2EO+ww2uZRGOE= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=OTl1KAWU; spf=pass (imf22.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.51 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-io1-f51.google.com with SMTP id ca18e2360f4ac-79f95cd15dfso56297039f.0 for ; Tue, 03 Oct 2023 15:09:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696370987; x=1696975787; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=dsg9xzxqKGmu4TINwdzeQuCMMXAOdP/P1mEVIlXmoM0=; b=OTl1KAWUaQI27PitMJSdNuC9JRYGjM0ptpVrFr+VXsuyJ07PYiSh8RzrcdNyVyEMoA Bq8LKtJx6LVXfBGWpdkwUkxrb1j1GK7VWMmopbOS3uhL7fHIscDTPEXWrG/ag3JlT+wQ ZBTzafKGb6MhpjfQHJ+gBUACEAdW3wJ1iyl8HomDwaHuqZoMfgbt7YOQHmS7wJj+QhVE /yRZ5tVTNOEtXi6j5Wp0M6t3HWxxJT5X2V1hHDCrn1FKWGQyU/6KBcCK639ytejzlnOF vMpl0iys5ZpVwV6YprkiLJTiEVg5ezib/tewYJMM27Kqkgv3TBuoGiLlduAPGUmyL3l8 MxSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696370987; x=1696975787; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dsg9xzxqKGmu4TINwdzeQuCMMXAOdP/P1mEVIlXmoM0=; b=wLoI0KGCJfajQuSl2ZKXi/aGZsPzJjGROKsJZAWi77GR2akpohKmOc+HinMY7czCtS FfHjKlkgfVZpk8ZG7xzpbp4h3qF5Lr8jqWozJ9VApWREhuP6jY2m1R1h5amcKIqlrjLF pYFCEXHLJfOOdibt+TWipBloxcVZDJneujs1SHp5pZoIK6V/xe9fh+8A/y2Z+xxyEJ0F K9fXIVe56SuozKkCSdCOUgaQwUulO9cVbCCZUp2SewSfG2BB2FBvBCwSCOrK8zyWik/Z WUlky3hhdwiDr2S2lyYZ+yEndmyp2xj0viVF01RlM+FkvnDhNbH9hRt5uVKytzcLfMkd ok1A== X-Gm-Message-State: AOJu0YzHs0U6XNW1pMO/wE7fULiGVWQi7vwwqi14IyvV8T9JGPhbxTXT r/AeDsR738Vbv2JDqMFxN0nnCOmgIQzmtL+nvnM= X-Google-Smtp-Source: AGHT+IEMO/yaZpQdQysYHNZTgoqLNX+ymIVVK4ORvqXdjSE5fddQDoxD5uVLChNCygNMBLcz43Xd8yv4Lv9XNi9Q0x4= X-Received: by 2002:a5d:9954:0:b0:786:f4a0:d37e with SMTP id v20-20020a5d9954000000b00786f4a0d37emr765084ios.4.1696370987432; Tue, 03 Oct 2023 15:09:47 -0700 (PDT) MIME-Version: 1.0 References: <20231003001828.2554080-1-nphamcs@gmail.com> <20231003001828.2554080-3-nphamcs@gmail.com> <20231003171329.GB314430@monkey> <20231003183928.GC20979@cmpxchg.org> In-Reply-To: <20231003183928.GC20979@cmpxchg.org> From: Nhat Pham Date: Tue, 3 Oct 2023 15:09:36 -0700 Message-ID: Subject: Re: [PATCH v3 2/3] hugetlb: memcg: account hugetlb-backed memory in memory controller To: Johannes Weiner Cc: Mike Kravetz , akpm@linux-foundation.org, riel@surriel.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com, shuah@kernel.org, yosryahmed@google.com, fvdl@google.com, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 7E97CC002B X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: hspcuajd9rwo6dbhe1y3y9hsztdhhpai X-HE-Tag: 1696370988-446452 X-HE-Meta: U2FsdGVkX18p0CasTYCWqGFdjZk1ngKjpMzh5svVz4dsmu2d3+azgNUqlv1fRXUemTEY1bNdpHAqquungKEtp5/inPPk5grtddsZyWvzkzKXZu33K+3q2gHhfuKLkw14Qd07U5OTwGvgN7lY1VPlOqLvu4oOLQS1zm01G0LzCQFHvK7h4dsmsQE/ai0VPx8U/p32lU7zymzoKgKfbGEnb1tCiwEAXorAIuwoFSXsGkn5NjrIwyGpouFS6TlbIKwxQvEhdHsej/U590JeW7alVzUeqNLoH6l5VpadHubH1Yv6XFO2n9FxABHBQcCEgQQjStov0UJ411YBA/yIzLZUaf6vVzdHigPC3WD8/gUi8tA4aI9QPrAvYQiR28AsKjXAtDkKReVy6WX8xFaRldEM4Fon7ypyhXirzTx7D1e3pM1aPzCy0tYhbC5uxqjdyK+Sx6vunSH7iTtilIkIu3UsDqC7YCbnIgzrvRySuK3UADgZo72/m8SZNlONgOPcHvNmy3ULUgIM6zsRw2UgMvjqHhyYLjhO8qsT4P7+74+NQPiknMegEBcLwNcHZqNRAserwq0WoF8KxmwUY0PPMtPRCxYdC8fHnbeTLTvEVD8tWy6/u+Dv7ZtkuCRa9w/1GKKJIahzsE3VLGmupV5DiQRnAfxGEyX1bYKo4b7SocxVaTfL1pWuJFWNR2SxtQsL2f6zDTh/KptWfx4Gn/o864rPDrO46lvcgscrV3zJ1RoSU8uUKRAi9BYdSxM0yAVABiaZreRAe2GSh7Pztez5t6Mq99Jrgmoc75dZ2v0Flj1ZJZ07SVJ/qCwCdesHz05oGb3pwEL2RDQQir56MPERaKliLk0pr7Q2JJUtptvV0AD+ggvY0PzUDyOQ5mFe4K32DGP0AadSJiA3r210oKmJLSlEJ8l4t+BhDxQhzDKozwP2O3e0YFR+XWrvw+mmdj0SNQrPWtQmMCUFT9BVm0+6FSR YxFxHeLq FZSs8sVZrWQFe3nl68ukzFHUVOR/vvL6SCU0iDQYcfSFKIUHAHP/uSdeFNNBFi4VoKYFatO6T2m0d/7tSnJ+y/U6dN9WnCYijwcbqaF2sx6S0fWpRj2GnXE41x09H6rCtGoUAyrAFJHYYOQ8rr9XL5GldePDeKfxQAozPi2FOnQboE4/tVwcQ+Pn0Oi4B11jxBwHwD4mBK8D6062lFF00a0eOhZ7KSMyU14mCze8dcxp8D1chNPKDPduiKaKw4D1rFeG7Lj5osyh+Rv48H/evn33fgJ4zLp//uWrUcW6ECvXinlkVgwxhkryLr+kZJWgN5MDUYuqRs9MdUQZZ9By9TXmkatzXbuViUumjGRO08QSlIt0yR1JNKbcXloQyWBYd4AZ72QpLri+bW1vcZ5pFeFYfcipbCgRUvrajwlXePJyFBC7GafpI2PkZtQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Oct 3, 2023 at 11:39=E2=80=AFAM Johannes Weiner wrote: > > On Tue, Oct 03, 2023 at 11:01:24AM -0700, Nhat Pham wrote: > > On Tue, Oct 3, 2023 at 10:13=E2=80=AFAM Mike Kravetz wrote: > > > > > > On 10/02/23 17:18, Nhat Pham wrote: > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > > > index de220e3ff8be..74472e911b0a 100644 > > > > --- a/mm/hugetlb.c > > > > +++ b/mm/hugetlb.c > > > > @@ -1902,6 +1902,7 @@ void free_huge_folio(struct folio *folio) > > > > pages_per_huge_page(h), folio); > > > > hugetlb_cgroup_uncharge_folio_rsvd(hstate_index(h), > > > > pages_per_huge_page(h), fol= io); > > > > + mem_cgroup_uncharge(folio); > > > > if (restore_reserve) > > > > h->resv_huge_pages++; > > > > > > > > @@ -3009,11 +3010,20 @@ struct folio *alloc_hugetlb_folio(struct vm= _area_struct *vma, > > > > struct hugepage_subpool *spool =3D subpool_vma(vma); > > > > struct hstate *h =3D hstate_vma(vma); > > > > struct folio *folio; > > > > - long map_chg, map_commit; > > > > + long map_chg, map_commit, nr_pages =3D pages_per_huge_page(h)= ; > > > > long gbl_chg; > > > > - int ret, idx; > > > > + int memcg_charge_ret, ret, idx; > > > > struct hugetlb_cgroup *h_cg =3D NULL; > > > > + struct mem_cgroup *memcg; > > > > bool deferred_reserve; > > > > + gfp_t gfp =3D htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL; > > > > + > > > > + memcg =3D get_mem_cgroup_from_current(); > > > > + memcg_charge_ret =3D mem_cgroup_hugetlb_try_charge(memcg, gfp= , nr_pages); > > > > + if (memcg_charge_ret =3D=3D -ENOMEM) { > > > > + mem_cgroup_put(memcg); > > > > + return ERR_PTR(-ENOMEM); > > > > + } > > > > > > > > idx =3D hstate_index(h); > > > > /* > > > > @@ -3022,8 +3032,12 @@ struct folio *alloc_hugetlb_folio(struct vm_= area_struct *vma, > > > > * code of zero indicates a reservation exists (no change). > > > > */ > > > > map_chg =3D gbl_chg =3D vma_needs_reservation(h, vma, addr); > > > > - if (map_chg < 0) > > > > + if (map_chg < 0) { > > > > + if (!memcg_charge_ret) > > > > + mem_cgroup_cancel_charge(memcg, nr_pages); > > > > + mem_cgroup_put(memcg); > > > > return ERR_PTR(-ENOMEM); > > > > + } > > > > > > > > /* > > > > * Processes that did not create the mapping will have no > > > > @@ -3034,10 +3048,8 @@ struct folio *alloc_hugetlb_folio(struct vm_= area_struct *vma, > > > > */ > > > > if (map_chg || avoid_reserve) { > > > > gbl_chg =3D hugepage_subpool_get_pages(spool, 1); > > > > - if (gbl_chg < 0) { > > > > - vma_end_reservation(h, vma, addr); > > > > - return ERR_PTR(-ENOSPC); > > > > - } > > > > + if (gbl_chg < 0) > > > > + goto out_end_reservation; > > > > > > > > /* > > > > * Even though there was no reservation in the region= /reserve > > > > @@ -3119,6 +3131,11 @@ struct folio *alloc_hugetlb_folio(struct vm_= area_struct *vma, > > > > hugetlb_cgroup_uncharge_folio_rsvd(hstate_ind= ex(h), > > > > pages_per_huge_page(h), folio= ); > > > > } > > > > + > > > > + if (!memcg_charge_ret) > > > > + mem_cgroup_commit_charge(folio, memcg); > > > > + mem_cgroup_put(memcg); > > > > + > > > > return folio; > > > > > > > > out_uncharge_cgroup: > > > > @@ -3130,7 +3147,11 @@ struct folio *alloc_hugetlb_folio(struct vm_= area_struct *vma, > > > > out_subpool_put: > > > > if (map_chg || avoid_reserve) > > > > hugepage_subpool_put_pages(spool, 1); > > > > +out_end_reservation: > > > > vma_end_reservation(h, vma, addr); > > > > + if (!memcg_charge_ret) > > > > + mem_cgroup_cancel_charge(memcg, nr_pages); > > > > + mem_cgroup_put(memcg); > > > > return ERR_PTR(-ENOSPC); > > > > } > > > > > > > > > > IIUC, huge page usage is charged in alloc_hugetlb_folio and uncharged= in > > > free_huge_folio. During migration, huge pages are allocated via > > > alloc_migrate_hugetlb_folio, not alloc_hugetlb_folio. So, there is n= o > > > charging for the migration target page and we uncharge the source pag= e. > > > It looks like there will be no charge for the huge page after migrati= on? > > > > > > > Ah I see! This is a bit subtle indeed. > > > > For the hugetlb controller, it looks like they update the cgroup info > > inside move_hugetlb_state(), which calls hugetlb_cgroup_migrate() > > to transfer the hugetlb cgroup info to the destination folio. > > > > Perhaps we can do something analogous here. > > > > > If my analysis above is correct, then we may need to be careful about > > > this accounting. We may not want both source and target pages to be > > > charged at the same time. > > > > We can create a variant of mem_cgroup_migrate that does not double > > charge, but instead just copy the mem_cgroup information to the new > > folio, and then clear that info from the old folio. That way the memory > > usage counters are untouched. > > > > Somebody with more expertise on migration should fact check me > > of course :) > > The only reason mem_cgroup_migrate() double charges right now is > because it's used by replace_page_cache_folio(). In that context, the > isolation of the old page isn't quite as thorough as with migration, > so it cannot transfer and uncharge directly. This goes back a long > time: 0a31bc97c80c3fa87b32c091d9a930ac19cd0c40 > > If you rename the current implementation to mem_cgroup_replace_page() > for that one caller, you can add a mem_cgroup_migrate() variant which > is charge neutral and clears old->memcg_data. This can be used for > regular and hugetlb page migration. Something like this (totally > untested): > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index a4d3282493b6..17ec45bf3653 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -7226,29 +7226,14 @@ void mem_cgroup_migrate(struct folio *old, struct= folio *new) > if (mem_cgroup_disabled()) > return; > > - /* Page cache replacement: new folio already charged? */ > - if (folio_memcg(new)) > - return; > - > memcg =3D folio_memcg(old); > VM_WARN_ON_ONCE_FOLIO(!memcg, old); > if (!memcg) > return; > > - /* Force-charge the new page. The old one will be freed soon */ > - if (!mem_cgroup_is_root(memcg)) { > - page_counter_charge(&memcg->memory, nr_pages); > - if (do_memsw_account()) > - page_counter_charge(&memcg->memsw, nr_pages); > - } > - > - css_get(&memcg->css); > + /* Transfer the charge and the css ref */ > commit_charge(new, memcg); > - > - local_irq_save(flags); > - mem_cgroup_charge_statistics(memcg, nr_pages); > - memcg_check_events(memcg, folio_nid(new)); > - local_irq_restore(flags); > + old->memcg_data =3D 0; > } > > DEFINE_STATIC_KEY_FALSE(memcg_sockets_enabled_key); > Ah, I like this. Will send a fixlet based on this :) I was scratching my head trying to figure out why we were doing the double charging in the first place. Thanks for the context, Johannes!