From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5F34E7AD59 for ; Tue, 3 Oct 2023 18:01:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 02A556B00D9; Tue, 3 Oct 2023 14:01:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EF4D16B00DC; Tue, 3 Oct 2023 14:01:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D6ED56B00E0; Tue, 3 Oct 2023 14:01:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id BE2736B00D9 for ; Tue, 3 Oct 2023 14:01:39 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 74D204039D for ; Tue, 3 Oct 2023 18:01:39 +0000 (UTC) X-FDA: 81304917918.10.B856A16 Received: from mail-io1-f41.google.com (mail-io1-f41.google.com [209.85.166.41]) by imf19.hostedemail.com (Postfix) with ESMTP id 6F6991A0015 for ; Tue, 3 Oct 2023 18:01:37 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=iD4zm7S5; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.41 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696356097; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LRPRGCl334YYyAG6h9aCAG5kzsOBR2VDhX+FzPQHrPo=; b=6rlHJZ7c7RL+wzS0OytyS5B/Hwal5jkZNGMSYEL3mjynYx76XrsfdcHbB0pLzFnyM3j9a6 V531m3+RQ9kqFN7J9g8MSt8rZ7tz31dgcz4Ncw5I+pW9yl3Qv8dtS+79EhyTnrNTX3stha rhDuN+Eeuukll/x+UdNwW2esb99ICCc= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=iD4zm7S5; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.41 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696356097; a=rsa-sha256; cv=none; b=Cbz5muGtt5cb87Q9jdSQzHQSOr8f3Tx3VJUD8PvDVXHORsztE7YkxxO9cihhVIt8MjhgUS jsxXuuVbOfwwf59n6nm70BiGQUAeGbkLM0ZcvXbSocFJRxKCU3le5FqECvgmaEg/T3Bh8M xy9wgcsXrLcm6tFZ/puJldX0oq4cKhQ= Received: by mail-io1-f41.google.com with SMTP id ca18e2360f4ac-79fe6da0095so48274439f.0 for ; Tue, 03 Oct 2023 11:01:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696356096; x=1696960896; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=LRPRGCl334YYyAG6h9aCAG5kzsOBR2VDhX+FzPQHrPo=; b=iD4zm7S5H4k0Rp5o/9Dv3BbSvJHUInQ7Nh1tq4owUM/dyTYSH7IwJ7FTK9fhm61DsV WTycnV617ofhyB0JEim1/GzRWMqv8OXDk0wdheEkaJluDrF9/zZkcLp1T2C8VTHqbBf5 ZnIfrup5SmlKXJvTgM/WYiSlj/xRgWKYW3crjzZ/yLMRn9ms8a3AQJcU+FjOdi9kXLpu 5KlNAQi5XZpsBzu7Qz8Zq6lCRY754kl4kIQ0mtYxWgb3+HrSEkOWAViMv3s6eLVdYMZa gd3RFyTK9sl2OEsfqlP1VhelIcxcweYUVdtjm2S4ktoLik988uX/rjg7yr8xRj+Eu0Dt /AWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696356096; x=1696960896; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LRPRGCl334YYyAG6h9aCAG5kzsOBR2VDhX+FzPQHrPo=; b=dEBjmBKh5VTXLE7ZZpZwo6gaivIx5r42ke58u4D/ZARxNwB4JXMXzUeOPvGaIx5fCF 2be4jfqa2jfThqDIf6w9Szn9KEwBuqmi80mlGkmSjSt3ky0EHmw4PoCDQaSfIsjMkgxg MmnPrycS1dFhUSmhlqiNQ7CZUly+Ave3hbndwlN+OMDxoB4rfnagy8+WRLlXC5vzV91G XvfaSzdXvc+W/s2X9pz65ZyJQa2IkhrmqRTJhOmUGBgcXmqA4FloufItVq8T9T2XFqw7 ovt4DS3LL1XxmMDr9blGpwmnpf+5Q81UgcqzyMWl+m7+mHA1d1H591cpMvVA5tLMsC7z PtsA== X-Gm-Message-State: AOJu0YySudNCIuFNTjkRCU3fi4m+xIBPDoD6Yn+gpKXLoGBIYZR1MF+L DeXAQ40w/jBt6ahTFqUSnGETYY3R/wcT2x0KSUI= X-Google-Smtp-Source: AGHT+IEAc/VlTDM6G3+ejDdc28tcoNLhVuby1nYW07+fmrY/lmDdPL9J3sq+5bbFNqRJoa2Msb/n8nMOy1XwQd/F5Zk= X-Received: by 2002:a5e:8809:0:b0:79a:b53c:d758 with SMTP id l9-20020a5e8809000000b0079ab53cd758mr230501ioj.1.1696356096375; Tue, 03 Oct 2023 11:01:36 -0700 (PDT) MIME-Version: 1.0 References: <20231003001828.2554080-1-nphamcs@gmail.com> <20231003001828.2554080-3-nphamcs@gmail.com> <20231003171329.GB314430@monkey> In-Reply-To: <20231003171329.GB314430@monkey> From: Nhat Pham Date: Tue, 3 Oct 2023 11:01:24 -0700 Message-ID: Subject: Re: [PATCH v3 2/3] hugetlb: memcg: account hugetlb-backed memory in memory controller To: Mike Kravetz Cc: akpm@linux-foundation.org, riel@surriel.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com, shuah@kernel.org, yosryahmed@google.com, fvdl@google.com, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 6F6991A0015 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 757xe5tq7bscaw35j9qbn8y5xpawcxr6 X-HE-Tag: 1696356097-696106 X-HE-Meta: U2FsdGVkX19FB4smnuPHNa2TPooAD5n4GMttKua7zEGwPrprl039054COzmfdKbmwNO7ziKwFlcSmQT3kLOgONmbB8DqHsXJ9MFWObs9UBsKP0JObCOzUlCW62m4BLrsg0mSBk7KkaYvPPkSbE1ppkfyc+6G6QySeytuSTlD+8UG99midhatJdp2VK0NXXQXp+4aieWCRhhVX0r1/w79A9gRw0EsXvd1t3RGgjno91CSue/5IjDgEGIHqj/3JAOcjB1anmQNSnWu+S6rNezfDeIAzvfWGytBqP07dFHTteh3T9qLbFq5oCN+CT6DxTeWszC7y9oMMf/1oyJ8+SCekroC5e3khBq4mEhykDCh8DC5d7NHAZ9+H9X1OaUZwL+tpsI6rWEr+N/+6xENzGqaHoi2eex1Aq3tDo8w3cKQ5pyiMGSw/nT+UB3oFEfKGBsG3UlErZfz7tn2p/QGkIyVseVNeeXYav0uQQzpPAozVVooHW28AzqwXJbHIjbMf4TgWgF5F0L0lHaR1Z5Vupne+d644K4P+Hh7ies38yxTrynGuBiAudLorMz1DhNlXvsvXO7YKsGU9b6FXD0Mz8mDApDua9C/UKrZqyQL/tWaiCQ34bL2ri3vGE45LRBGrxoPTK5iMA5ZYj7hl3zx+9jroPW4WcIW9qntaq0a8GHXOwEfCrxmM5/EPkMrXmYQGWTe9FDg+rmaWiKg2I1TuhIOiP47QHVILZpacQSg1QVF573mvo405c858kyg1ldqFgDM9ArRmDtXzfzbizxQdONq0Er6ln4GFIskctFphenCxu3Eon391dDKm6GXdbNREX+ZXBIzxGrgf6J8BtU350NYEHUcmFVZGqYQVmPFtcFQXbWz+xqgqNnGbgaqiR/FHYtyEbbKQzyj/QWY2BMMraIQgAuSOYxCA5jBsBVJMRLMk9x5g7Df9s6c0ZseGZZ7fjP0X3ubsrlux0Vd55wWrDZ jmoCkaTg 8s9nKHWRMEeqI45lVjG6j+VByzR6l96KwzvBQTILlryMqS+K94B1rtZD6uiWprfn7eb/Fcv+PA3wHVsD6bFZ9GgSfvJQVXHB89ltfx35tt8bYajgZxViMRpuaNrEcQloJc3HtIhM4phNfnwb3HUyPrCjAakdlt70LaVhY2SToo8uTtQ55U/McMCQu70Qcj1ftMWgKarArWBliOaDYUCYDWzYvGCWjj5S5gc3xF+vMCW/3XgPCdf08HKJUTCi2QeHu/PKgYW/WRjVOIa9Lv3fBL3eNf620IA4ApIUpvj9ujyKbY6+6zslmi4sVev4m5Gac+ibLvjm4p32MTL5VA0F3K/93hzLthwRhQrihAPeNaolVx8vk5Ocfm+2P/g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Oct 3, 2023 at 10:13=E2=80=AFAM Mike Kravetz wrote: > > On 10/02/23 17:18, Nhat Pham wrote: > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index de220e3ff8be..74472e911b0a 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -1902,6 +1902,7 @@ void free_huge_folio(struct folio *folio) > > pages_per_huge_page(h), folio); > > hugetlb_cgroup_uncharge_folio_rsvd(hstate_index(h), > > pages_per_huge_page(h), folio); > > + mem_cgroup_uncharge(folio); > > if (restore_reserve) > > h->resv_huge_pages++; > > > > @@ -3009,11 +3010,20 @@ struct folio *alloc_hugetlb_folio(struct vm_are= a_struct *vma, > > struct hugepage_subpool *spool =3D subpool_vma(vma); > > struct hstate *h =3D hstate_vma(vma); > > struct folio *folio; > > - long map_chg, map_commit; > > + long map_chg, map_commit, nr_pages =3D pages_per_huge_page(h); > > long gbl_chg; > > - int ret, idx; > > + int memcg_charge_ret, ret, idx; > > struct hugetlb_cgroup *h_cg =3D NULL; > > + struct mem_cgroup *memcg; > > bool deferred_reserve; > > + gfp_t gfp =3D htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL; > > + > > + memcg =3D get_mem_cgroup_from_current(); > > + memcg_charge_ret =3D mem_cgroup_hugetlb_try_charge(memcg, gfp, nr= _pages); > > + if (memcg_charge_ret =3D=3D -ENOMEM) { > > + mem_cgroup_put(memcg); > > + return ERR_PTR(-ENOMEM); > > + } > > > > idx =3D hstate_index(h); > > /* > > @@ -3022,8 +3032,12 @@ struct folio *alloc_hugetlb_folio(struct vm_area= _struct *vma, > > * code of zero indicates a reservation exists (no change). > > */ > > map_chg =3D gbl_chg =3D vma_needs_reservation(h, vma, addr); > > - if (map_chg < 0) > > + if (map_chg < 0) { > > + if (!memcg_charge_ret) > > + mem_cgroup_cancel_charge(memcg, nr_pages); > > + mem_cgroup_put(memcg); > > return ERR_PTR(-ENOMEM); > > + } > > > > /* > > * Processes that did not create the mapping will have no > > @@ -3034,10 +3048,8 @@ struct folio *alloc_hugetlb_folio(struct vm_area= _struct *vma, > > */ > > if (map_chg || avoid_reserve) { > > gbl_chg =3D hugepage_subpool_get_pages(spool, 1); > > - if (gbl_chg < 0) { > > - vma_end_reservation(h, vma, addr); > > - return ERR_PTR(-ENOSPC); > > - } > > + if (gbl_chg < 0) > > + goto out_end_reservation; > > > > /* > > * Even though there was no reservation in the region/res= erve > > @@ -3119,6 +3131,11 @@ struct folio *alloc_hugetlb_folio(struct vm_area= _struct *vma, > > hugetlb_cgroup_uncharge_folio_rsvd(hstate_index(h= ), > > pages_per_huge_page(h), folio); > > } > > + > > + if (!memcg_charge_ret) > > + mem_cgroup_commit_charge(folio, memcg); > > + mem_cgroup_put(memcg); > > + > > return folio; > > > > out_uncharge_cgroup: > > @@ -3130,7 +3147,11 @@ struct folio *alloc_hugetlb_folio(struct vm_area= _struct *vma, > > out_subpool_put: > > if (map_chg || avoid_reserve) > > hugepage_subpool_put_pages(spool, 1); > > +out_end_reservation: > > vma_end_reservation(h, vma, addr); > > + if (!memcg_charge_ret) > > + mem_cgroup_cancel_charge(memcg, nr_pages); > > + mem_cgroup_put(memcg); > > return ERR_PTR(-ENOSPC); > > } > > > > IIUC, huge page usage is charged in alloc_hugetlb_folio and uncharged in > free_huge_folio. During migration, huge pages are allocated via > alloc_migrate_hugetlb_folio, not alloc_hugetlb_folio. So, there is no > charging for the migration target page and we uncharge the source page. > It looks like there will be no charge for the huge page after migration? > Ah I see! This is a bit subtle indeed. For the hugetlb controller, it looks like they update the cgroup info inside move_hugetlb_state(), which calls hugetlb_cgroup_migrate() to transfer the hugetlb cgroup info to the destination folio. Perhaps we can do something analogous here. > If my analysis above is correct, then we may need to be careful about > this accounting. We may not want both source and target pages to be > charged at the same time. We can create a variant of mem_cgroup_migrate that does not double charge, but instead just copy the mem_cgroup information to the new folio, and then clear that info from the old folio. That way the memory usage counters are untouched. Somebody with more expertise on migration should fact check me of course :) > -- > Mike Kravetz