From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15BCCC3DA49 for ; Thu, 25 Jul 2024 21:42:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8E36D6B0095; Thu, 25 Jul 2024 17:42:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 893B26B0096; Thu, 25 Jul 2024 17:42:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 75B946B0098; Thu, 25 Jul 2024 17:42:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 576BE6B0095 for ; Thu, 25 Jul 2024 17:42:36 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id AEE861C0666 for ; Thu, 25 Jul 2024 21:42:35 +0000 (UTC) X-FDA: 82379599470.26.E206359 Received: from mail-qv1-f50.google.com (mail-qv1-f50.google.com [209.85.219.50]) by imf01.hostedemail.com (Postfix) with ESMTP id 8C7EB40017 for ; Thu, 25 Jul 2024 21:42:33 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=vRDAnLyL; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf01.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.50 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721943713; a=rsa-sha256; cv=none; b=RXzvEaLd1+0bpCo8448+tGb3PNH2Q21jfZJ+qpeowm65Ca1vIT4+/m5+G1PhPnkotjQL1O FbZgseH+CW72Jtsex6J8rB1LgjTan1At8dcU8IHWkDFoI0mgRGaVfbBpbVwHxFFtjq0dNx cKozhlBOnCqGpj6YEmHZaBj+BAxrDmQ= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=vRDAnLyL; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf01.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.50 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721943713; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JUNNKtqHZbxwfAx7HOnTg7aMz1aKTG+IkoHsU1G/oG4=; b=klLRslbRHkJmHN1L8LMjXvBAE82+RHUHJdOOcqubFHY0J8BbJatscLnd+dO1PrZQUGzuaz A4TTuUJXV6oIcxkqJsArv4DWapUG85QdqTZYln/Rg2zt7feBk/2vUcF2civL8NEQEDsjtD YsinRcsJu1pgSr/cvGVRibNgLqBKV/c= Received: by mail-qv1-f50.google.com with SMTP id 6a1803df08f44-6b79c969329so205546d6.0 for ; Thu, 25 Jul 2024 14:42:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1721943752; x=1722548552; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=JUNNKtqHZbxwfAx7HOnTg7aMz1aKTG+IkoHsU1G/oG4=; b=vRDAnLyLlDJj5Y/pYLClWffdMXAn3HMOIzUibAE8qGwLxngpiybr2GGZ2tfX1PVxpd 6AVsjlkwyO5jJ/XIGXDNRvvJ83RwroRGagng40ab3jntjlJV/LhKo245IogmVZR9G641 uzXXekQUuuAMWyiR9y3hFhr+wZBc4EwoZ6aQg61BkxL8HysfdPaAOU/NeZpO3JDq3cBJ 1a1JllxyC6nBhEJ7VwoJySH65K56tnlAAwfxshRX3ktbiso9qTRb+kQBmUnqda3pBErD TYtR+7EZcDAeVxsZTNjxXxdTVGHdL03CFjfflXMMmDRw846FSLQ8aiErwCNden7t9eeX LBxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721943752; x=1722548552; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=JUNNKtqHZbxwfAx7HOnTg7aMz1aKTG+IkoHsU1G/oG4=; b=W/+5kjlYHvx1Fpg0kl7H2bSTtReYn1ViL28MeefLEdBtq+jGL/zfhIwJjr8B8EOjYl j1zj3dGGwGDmUTet4gYhlsXYVHbuUnwdbqlNgPFLgbXUl+qVd1aYOFmSbf8/fO4xgugR y4H/CMXa78pG6aDq6dFOgIOKlTO62f+dl3kTOiZLoXHkdXVCmdtdvsYaNQKutvGMnef5 rlYniEVsf9staQjYVQu73DHtoGBPXW8xBV+DCgT+1xrIi9CR1zB8LbNKGr9cbvYFSEgq ZZ363GE2itdoyZD/nnmTd5yn+3HFqRI65gIgSN2Ir3yOsSitu66u9Gz6hwm5Hzm4Bdae dJGg== X-Forwarded-Encrypted: i=1; AJvYcCVD1EWo2iAicJhjB3BeAuPI2FUVsY4WGJy9e5FovKk5qdyaciXN3vG8TG3LJZHQqrW/CXa4Svc6gGU31wtAdninIIk= X-Gm-Message-State: AOJu0YzPmWKqsOredH5Dy7zdHE9Ja5kpf9RFPnOEz65undPVThuUiuys 2R/rRGBJCoezOQU1AZxGq5OnSGdpCxcJMXV/AdCnPRmj2KBOwcteSkLCP4U3tIa7YayWeVMYAX+ f X-Google-Smtp-Source: AGHT+IFu2YtwJBnJ95W0WxuO+SmfkJkFUWVBgTkLY+R2I9ra50FAos4fu+qCOCY8imKGfO9mdUJ9Ew== X-Received: by 2002:a05:6214:c6b:b0:6b7:4298:2c75 with SMTP id 6a1803df08f44-6bb408c8cf0mr29487846d6.55.1721943752414; Thu, 25 Jul 2024 14:42:32 -0700 (PDT) Received: from localhost ([2603:7000:c01:2716:da5e:d3ff:fee7:26e7]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6bb3fa9bcabsm10912776d6.95.2024.07.25.14.42.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Jul 2024 14:42:32 -0700 (PDT) Date: Thu, 25 Jul 2024 17:42:27 -0400 From: Johannes Weiner To: Roman Gushchin Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko , Shakeel Butt , Muchun Song Subject: Re: [PATCH v2 3/5] mm: memcg: merge multiple page_counters into a single structure Message-ID: <20240725214227.GC1702603@cmpxchg.org> References: <20240724202103.1210065-1-roman.gushchin@linux.dev> <20240724202103.1210065-4-roman.gushchin@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240724202103.1210065-4-roman.gushchin@linux.dev> X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 8C7EB40017 X-Stat-Signature: smagkn1o4885zoxnq3rag4cm1jdtkdj9 X-Rspam-User: X-HE-Tag: 1721943753-10453 X-HE-Meta: U2FsdGVkX1/CDsQu9I2EO5ooEIwoCWktjMko+2+RMy+qaPXkqNXYFwLM9OLomDIeIF9VgfE80Z8SF/Ylm0K1iGG9/495oaY5rPxV3fw80DZnyt7GQoJ9li59Lrn+YZrSFxPw5Sg8RNgE49fLocccp+wIJeZmUAWtgmd/op6J7BHJT1WEYBAVsOyWH+gL1gT//POcZLtVgbjwQ5X4TB2UhfKmcZPHP8ZwVmRqRERIkjnau9E13TDVnkkcmdv6yg42X/qAhwkcqmubsylLKgxiLxpR0+UFFSWHJ0kly8qK9LqMpBcx3oIF+yXouzLk80UeRDSo/J+on6TnFOo+7VAxrir+ElQEctmRZjOM/rcHcAk2L7hNV7rfU3gi8grSaeHoKdEbv8R71s+epxoN9erXxNFi46QThb/9tD2m+9l/vAyNB+B3qM2MSnwL1Jxx4Tx2OeavPoUO27H5WUHLoAM0DjhqjRuNlJYK/icVUt/OemzcV65EBTGN2Mt9uC1tKTucV+USgjbbWtLcL3nLMqpQzlDoAOxw5vehvDqQLgkc5DBkY2Ve1t+XzcrDnZ+Gta/jCLPS+m8pAqT0tobg8pwmh1BCI6qkBvjUbXYoMoC7Z9B3X1Xd36YDkIabmfEwyTq9iCgO4k+Qx2OoilaU1YTscx5/tIW9tUjQmwIoHh4WCko1jXj2a1xMBnsK0eUXsIAaSpJVXVvGU7lGb83lrSXZuAbUYbX3PLV1lvnmpa8ieiqUY9Zel8D3/b7QM7hwpKqwsurh+sP4P+R8SIFgzYEu6g846cbgJOB/c1p5PSUr6da26u0wsxqw10nDep3cRYqahAtTYOZy1nwwqfoOgqZHazbkkv9JjPHflDJEIekc0MjRhzIzSfNcxaY8Ds5HrFpsdgmS7IdnxuwYYcbGQulNaTjR4fL7fxheKxhuQ2vcYH09QbHo6yqP2KuDI43+ziSqCtinuCdZ6sOV9e8RFfA wH1xE5pV VEChHShzTZudTpKip2RV8J47Vh8zUMfK4YyHAIB+LVi0RnJoQ6oJm2VPxlWOA9K36wdzC3NJzBJAWemC6tY6zUYzspSoVZgbGsi4k8LfABQEOZr2tb4TUsPWZKpYRoFyMPNd3WbiiZGTZvU2y5qD8iA8g0WRAKRynpKYKOgEQG50fYxfw0ds5h45MLvrb39U3i7Alff358hRte3lJ6nFoKfKN6a4idBUu+zFY2zSyjPG31I5gupSNSnmera73gHN87QMpvtcIoqAVPoo39gNe0wdY6rKwpf9WdYL/ND3mObKpuJelpTbNx8SDDLwXfjCTQX2afAAQ/ow3ElUn3KP1H2EX04cEruuLOKVc/DMDaP+x/9kKaq45wRLKWSKkYDfL3h5MIWg+FvFFRUxT28EM5tUWO9sDadmb0VkwrElz4nAqPbUesTu52l6HvUU9qCAWeylIKLWjXdNOSSLSQBlsKy5t7FoHTB5CpL5l X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jul 24, 2024 at 08:21:01PM +0000, Roman Gushchin wrote: > --- a/include/linux/page_counter.h > +++ b/include/linux/page_counter.h > @@ -5,14 +5,71 @@ > #include > #include > #include > +#include > #include > > +/* > + * Page counters are used by memory and hugetlb cgroups. > + * Memory cgroups are using up to 4 separate counters: > + * memory, swap (memory + swap on cgroup v1), kmem and tcpmem. > + * Hugetlb cgroups are using 2 arrays of counters with HUGE_MAX_HSTATE > + * in each to track the usage and reservations of each supported > + * hugepage size. > + * > + * Protection (min/low) is supported only for the first counter > + * with idx 0 and only if the counter was initialized with the protection > + * support. > + */ > + > +enum mem_counter_type { > +#ifdef CONFIG_MEMCG > + /* Unified memory counter */ > + MCT_MEM, > + > + /* Swap */ > + MCT_SWAP, > + > + /* Memory + swap */ > + MCT_MEMSW = MCT_SWAP, > + > +#ifdef CONFIG_MEMCG_V1 > + /* Kernel memory */ > + MCT_KMEM, > + > + /* Tcp memory */ > + MCT_TCPMEM, > +#endif /* CONFIG_MEMCG_V1 */ > +#endif /* CONFIG_MEMCG */ > + > + /* Maximum number of memcg counters */ > + MCT_NR_MEMCG_ITEMS, > +}; > + > +#ifdef CONFIG_CGROUP_HUGETLB > +#ifdef HUGE_MAX_HSTATE > +#define MCT_NR_HUGETLB_ITEMS HUGE_MAX_HSTATE > +#else > +#define MCT_NR_HUGETLB_ITEMS 1 > +#endif > + > +/* > + * max() can't be used here: even though __builtin_choose_expr() evaluates > + * to true, the false clause generates a compiler error: > + * error: braced-group within expression allowed only inside a function . > + */ > +#define MCT_NR_ITEMS (__builtin_choose_expr(MCT_NR_MEMCG_ITEMS > MCT_NR_HUGETLB_ITEMS, \ > + MCT_NR_MEMCG_ITEMS, MCT_NR_HUGETLB_ITEMS)) > + > +#else /* CONFIG_CGROUP_HUGETLB */ > +#define MCT_NR_ITEMS MCT_NR_MEMCG_ITEMS > +#endif /* CONFIG_CGROUP_HUGETLB */ > + > struct page_counter { > /* > * Make sure 'usage' does not share cacheline with any other field. The > * memcg->memory.usage is a hot member of struct mem_cgroup. > */ > - atomic_long_t usage; > + atomic_long_t usage[MCT_NR_ITEMS]; > CACHELINE_PADDING(_pad1_); > > /* effective memory.min and memory.min usage tracking */ > @@ -25,9 +82,9 @@ struct page_counter { > atomic_long_t low_usage; > atomic_long_t children_low_usage; > > - unsigned long watermark; > - unsigned long local_watermark; /* track min of fd-local resets */ > - unsigned long failcnt; > + unsigned long watermark[MCT_NR_ITEMS]; > + unsigned long local_watermark[MCT_NR_ITEMS]; /* track min of fd-local resets */ > + unsigned long failcnt[MCT_NR_ITEMS]; > > /* Keep all the read most fields in a separete cacheline. */ > CACHELINE_PADDING(_pad2_); > @@ -35,8 +92,9 @@ struct page_counter { > bool protection_support; > unsigned long min; > unsigned long low; > - unsigned long high; > - unsigned long max; > + unsigned long high[MCT_NR_ITEMS]; > + unsigned long max[MCT_NR_ITEMS]; > + > struct page_counter *parent; > } ____cacheline_internodealigned_in_smp; This hardcodes way too much user specifics into what should be a self-contained piece of abstraction. Should anybody else try to use the 'struct page_counter' for a single resource elsewhere, they'd end up with up to 5 counters, watermarks, failcnts, highs and maxs etc. It's clear that the hierarchical aspect of the page_counter no longer works. It was okay-ish when all we had was a simple hard limit. Now we carry to much stuff in it that is only useful to a small set of users. Instead of doubling down and repeating the mistakes we made for struct page, it would be better to move user specific stuff out of it entirely. I think there are two patterns that would be more natural: a) pass a callback function and have a priv pointer in page_counter for user-specifics; or b) remove the hierarchy aspect from the page counter entirely, let the *caller* walk the tree, call the page counter code to trycharge (and log watermark) only that level, and handle everything caller specific on the caller side. All users already have parent linkage in the css, so this would eliminate the parent pointer in page_counter altogether. The protection stuff could be moved to memcg proper, eliminating yet more fields that aren't interesting to any other user. You'd save more memory, while at the same time keeping struct page_counter clean and generic.