From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53478E8FDBF for ; Tue, 3 Oct 2023 23:26:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D805B8D009D; Tue, 3 Oct 2023 19:26:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D30A38D0003; Tue, 3 Oct 2023 19:26:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C20848D009D; Tue, 3 Oct 2023 19:26:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id B3F328D0003 for ; Tue, 3 Oct 2023 19:26:26 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 835FF404CB for ; Tue, 3 Oct 2023 23:26:26 +0000 (UTC) X-FDA: 81305736372.09.54FE94C Received: from mail-io1-f42.google.com (mail-io1-f42.google.com [209.85.166.42]) by imf23.hostedemail.com (Postfix) with ESMTP id CA529140007 for ; Tue, 3 Oct 2023 23:26:24 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SSojYuUw; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.42 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696375584; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4ny5iyIo3c29FhVEcNOS9Ddt74DJ5wZg8ueoeeWyydk=; b=vN22D80pkqFV8LPQ0taqCpbq8silols71SMkScuFsO9+BKph9peGRYjsqn3curyT39hZQt 3yBJHwdAVOPKrHQ4m9xr9EngHqu92JhXsxf9tEvVxaEDrNvHB/d5Q+MS8JqMEk2E2AXL1N oA557uO7c5AkoP3gsPAhUxLIG1a9+jA= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SSojYuUw; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf23.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.42 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696375584; a=rsa-sha256; cv=none; b=3v861RYL8m+iE8zmYxr1wKk3QpzzTl1j+yPBS+LWEOKcnRipLQ1hHLuN9MIV7H+I80VWd8 6qQo0/pp8COBjdx+h8XtPEzaNXpxOpi8S7H6+fJS9i5+5VTGUrOBnSajU5qgEHgiDIzoz3 HHk8V4YNT8vtsCKZyr/Coc0iRVlKAyU= Received: by mail-io1-f42.google.com with SMTP id ca18e2360f4ac-79fb64b5265so60915439f.1 for ; Tue, 03 Oct 2023 16:26:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696375584; x=1696980384; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=4ny5iyIo3c29FhVEcNOS9Ddt74DJ5wZg8ueoeeWyydk=; b=SSojYuUwCZCu91Ky52OjA9QRp1HkSA5uVwEgW3rAiHqz65TItSFCtexckLYLELV5nV Z6r8tUly09MmfZgBf0o5WiRW0xIjDu1aRt7dxnszu7/9F/56j+60KQBBv4MYews2uQCU 4Z9/GjXFt27X55VYfljQZ+fkjDm4AwnLVkNYb93n2x7w2Ih1aMti3VpKQgKpYUb2pFqG n9xRWEGo9rib75vPJawHj+onJ0LLFq4luISwiBDPmInlbeDKvNSzm4Ix4+RMts0XobkM 9G4ukXT+WvLllLDQias4+7vfnZHMtpDE9qXOFnL49zptk1eeBWixvnQdR2pFlOpd4U4X F58A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696375584; x=1696980384; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4ny5iyIo3c29FhVEcNOS9Ddt74DJ5wZg8ueoeeWyydk=; b=V1aqVdvIgPxNFxE4hoWYlQsPTX8nLl+7Oy6i3t9H4lIAuFrYmXZgbqFMv84ilpOa3a bROXAULdcCvK29PSdvDh1CCDKpC3BqfrpqENqOXKU71m9OmWiybUZW9bTlPWiYA8GCpG TQbnj9J7jT1YdQsh+AO8JkdRVFH/GyJ8AbRMrsBuzLU2IOulJab2O8lmEJDVR2PciTfS VnemaSM0wgEH030WLXIStrXfqCZ/02TVNO6PiG8vp3oyHEJ1LbYAFXPA1lB+LIcgU0W/ nr9feOsqoMWgG5yEmaWhdBFGGkxTsT6F3awq1C4HuE3JeMubb4Mx3Pu9letTgpjURsG8 l3VA== X-Gm-Message-State: AOJu0YyKdfkIY2gGae6o9x+W/IMC+dEoAk50DFra5kizIXOe8jCCm+iY 5NEvdQZUcnArK/rFiHouG45uY3ikSwaCvpABeIE= X-Google-Smtp-Source: AGHT+IFIfHB1/JD1gmiRZKngRz/4e5Chk2T8BrzRwvNrmCYseeO/v2iFv+ET8tm+jBP4rg99nzpSdoR615FyoHKPLd0= X-Received: by 2002:a5e:8345:0:b0:78b:d0a9:34fb with SMTP id y5-20020a5e8345000000b0078bd0a934fbmr783370iom.20.1696375583948; Tue, 03 Oct 2023 16:26:23 -0700 (PDT) MIME-Version: 1.0 References: <20231003001828.2554080-1-nphamcs@gmail.com> <20231003001828.2554080-3-nphamcs@gmail.com> <20231003171329.GB314430@monkey> <20231003183928.GC20979@cmpxchg.org> <20231003224214.GE314430@monkey> In-Reply-To: <20231003224214.GE314430@monkey> From: Nhat Pham Date: Tue, 3 Oct 2023 16:26:10 -0700 Message-ID: Subject: Re: [PATCH v3 2/3] hugetlb: memcg: account hugetlb-backed memory in memory controller To: Mike Kravetz Cc: Johannes Weiner , akpm@linux-foundation.org, riel@surriel.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com, shuah@kernel.org, yosryahmed@google.com, fvdl@google.com, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: CA529140007 X-Stat-Signature: 7xx8ent7ecnopeahb8fp8s3cub9wyzzj X-HE-Tag: 1696375584-593577 X-HE-Meta: U2FsdGVkX19pPBCbvSqEW6RB0896RoNwUNivWedWM8u7XPijTOLqNcajNHEYFIt9yGxtgyR4sjq4BVKoYk3Ps2Nt9GtMiLxuUCo2xOU8yfH5GaslGNr1DAWid+k1qPYHVsZRsTCc1gruRVlLWlr/pQNxq5RcT6wBHcOonRRX68+b+e/N3PQrSn4syGCK1YOytby/u/KslPGXAWiqBgpteYzYdH1UV5tEEKhw0bkRp1855JDcdMKSOIxiUEI0WgasjiPoktjF94fm81IneTcbZpQECQ/Yf2UpU+6LUsjIBoykpFvaAKsVCdUH/EAiB7e7CxfsRkYb+fnoytMaXPmgTYIMI77MAPsCQ0nXuMVqLOFFeZyZTZiAxbST2PIFuVqi5xA4zaFDb82OMxlFF/o3+6GmSwfy43T4GZETIc6NJ1pmV3DKhtumGS+00rNrScuY35YfBL0Jwf+WFL1ePxU5LQdqkqoTb16Wns8k7lOzmJAxFY5n6Cd+6GXuFcXkuKqAr6a+anFY5rCr4Z0xFt5HxB+I34gJ5j6TrXMQP30MSAzF7HOIbxkWeakAERqsFIatI5+VSL1Ld5hrxp9pJXv2D0Oatm6xZDL0DYN0xMzwWJhjYA/gEUArhhWjpimUOBoVyJMt/r8mefz4EvC1qnBEguLgAOyFnaWFAYa7cJSEOPCUa7dKKtXgwp3p8nyAscE1bLjPYCoFr3mRSfXNhe8vvg0u7hymODViKVz+plyV5ybgFeQ5HOW8YtKcI5cAaNI81MT3OnWbBfUAsujA+af7uQmk+8fuUqIALED9qTrq53nHaE0LqOdA1p3dW+9yJJxDv2CdPaLwsXefVbkOFC24WPQXH9tWbbrZeWlYP0dmvLiP7ErWYnJn92PmL5/p5Auj99PcpgdZdXy0Ba2g4o2SImcYaQ8UaRUe406zBmQdMU5mEQ1vnCoUJDAEP+51G8zlFumdoDN0LuW1oi02fK4 9r5OmF1a SpBct8Y81Vc8kEYnIPlX7n1Nd13iUbFuzeJx0bcWSx4ro8kQ0Vo6iSV/o8sISlCtpPnX4Q+nt2yd3sgCLD+403306Z+kV0yDwA9VO6OnoNLNaol57vZC2xOv3ibCDkamautqQTWkLp29KLv6OWRTs4tDdsC/vXSLMmvrls6/I/FGa4uoLpAhHuFeKrM4+84ugj8K6QJQhW3VSDgFgN9ftxfP1MhtK1/gsVP1gyLtkxazAX/g4fv7/Ikr39yeWQwaqNrvYpTTAjRKePaA4DJq1KGc8n4dt5k+l4z2F0XbPI+VRvORtsmfrjGjpcdxbE2JzTSojKPGPcJfWVMRm+KlcfCIi+MP1mPbUXsYNroJ+p37tSiDCLx9jjKQo+zFllO8Y2Zh110FEpGg3Km709yEeYo0T0cCEH1YtD6Ue87k52wvIP5ct+xNYxY7HbQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Oct 3, 2023 at 3:42=E2=80=AFPM Mike Kravetz wrote: > > On 10/03/23 15:09, Nhat Pham wrote: > > On Tue, Oct 3, 2023 at 11:39=E2=80=AFAM Johannes Weiner wrote: > > > On Tue, Oct 03, 2023 at 11:01:24AM -0700, Nhat Pham wrote: > > > > On Tue, Oct 3, 2023 at 10:13=E2=80=AFAM Mike Kravetz wrote: > > > > > On 10/02/23 17:18, Nhat Pham wrote: > > > > > > > > > > IIUC, huge page usage is charged in alloc_hugetlb_folio and uncha= rged in > > > > > free_huge_folio. During migration, huge pages are allocated via > > > > > alloc_migrate_hugetlb_folio, not alloc_hugetlb_folio. So, there = is no > > > > > charging for the migration target page and we uncharge the source= page. > > > > > It looks like there will be no charge for the huge page after mig= ration? > > > > > > > > > > > > > Ah I see! This is a bit subtle indeed. > > > > > > > > For the hugetlb controller, it looks like they update the cgroup in= fo > > > > inside move_hugetlb_state(), which calls hugetlb_cgroup_migrate() > > > > to transfer the hugetlb cgroup info to the destination folio. > > > > > > > > Perhaps we can do something analogous here. > > > > > > > > > If my analysis above is correct, then we may need to be careful a= bout > > > > > this accounting. We may not want both source and target pages to= be > > > > > charged at the same time. > > > > > > > > We can create a variant of mem_cgroup_migrate that does not double > > > > charge, but instead just copy the mem_cgroup information to the new > > > > folio, and then clear that info from the old folio. That way the me= mory > > > > usage counters are untouched. > > > > > > > > Somebody with more expertise on migration should fact check me > > > > of course :) > > > > > > The only reason mem_cgroup_migrate() double charges right now is > > > because it's used by replace_page_cache_folio(). In that context, the > > > isolation of the old page isn't quite as thorough as with migration, > > > so it cannot transfer and uncharge directly. This goes back a long > > > time: 0a31bc97c80c3fa87b32c091d9a930ac19cd0c40 > > > > > > If you rename the current implementation to mem_cgroup_replace_page() > > > for that one caller, you can add a mem_cgroup_migrate() variant which > > > is charge neutral and clears old->memcg_data. This can be used for > > > regular and hugetlb page migration. Something like this (totally > > > untested): > > > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > index a4d3282493b6..17ec45bf3653 100644 > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -7226,29 +7226,14 @@ void mem_cgroup_migrate(struct folio *old, st= ruct folio *new) > > > if (mem_cgroup_disabled()) > > > return; > > > > > > - /* Page cache replacement: new folio already charged? */ > > > - if (folio_memcg(new)) > > > - return; > > > - > > > memcg =3D folio_memcg(old); > > > VM_WARN_ON_ONCE_FOLIO(!memcg, old); > > > if (!memcg) > > > return; > > > > > > - /* Force-charge the new page. The old one will be freed soon = */ > > > - if (!mem_cgroup_is_root(memcg)) { > > > - page_counter_charge(&memcg->memory, nr_pages); > > > - if (do_memsw_account()) > > > - page_counter_charge(&memcg->memsw, nr_pages); > > > - } > > > - > > > - css_get(&memcg->css); > > > + /* Transfer the charge and the css ref */ > > > commit_charge(new, memcg); > > > - > > > - local_irq_save(flags); > > > - mem_cgroup_charge_statistics(memcg, nr_pages); > > > - memcg_check_events(memcg, folio_nid(new)); > > > - local_irq_restore(flags); > > > + old->memcg_data =3D 0; > > > } > > > > > > DEFINE_STATIC_KEY_FALSE(memcg_sockets_enabled_key); > > > > > > > Ah, I like this. Will send a fixlet based on this :) > > I was scratching my head trying to figure out why we were > > doing the double charging in the first place. Thanks for the context, > > Johannes! > > Be sure to check for code similar to this in folio_migrate_flags: > > void folio_migrate_flags(struct folio *newfolio, struct folio *folio) > { > ... > if (!folio_test_hugetlb(folio)) > mem_cgroup_migrate(folio, newfolio); > } > > There are many places where hugetlb is special cased. Yeah makes sense. I'm actually gonna take advantage of this, and remove the test hugetlb check here, so that it will also migrate the memcg metadata in this case too. See the new patch I just sent out. > -- > Mike Kravetz