From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3EF1DC3DA4A for ; Fri, 12 Jul 2024 12:23:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C80496B0093; Fri, 12 Jul 2024 08:23:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C2F006B0095; Fri, 12 Jul 2024 08:23:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AD9F86B0096; Fri, 12 Jul 2024 08:23:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 89E356B0093 for ; Fri, 12 Jul 2024 08:23:12 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 1FC0B80AAF for ; Fri, 12 Jul 2024 12:23:12 +0000 (UTC) X-FDA: 82331015424.21.895D239 Received: from mail-ej1-f49.google.com (mail-ej1-f49.google.com [209.85.218.49]) by imf15.hostedemail.com (Postfix) with ESMTP id 21B4FA0006 for ; Fri, 12 Jul 2024 12:23:09 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=LABA1ef3; spf=pass (imf15.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=ioworker0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720786956; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VhiqbETTWcozFlAHDLKTjrYIwdph/hMIshsUTKEoalA=; b=IA6K4HkbNsqY2+47JsA1DjEjrHvD2MFGCk4+6DSVK2v2oZOFk93uu/eg4S8DgyrTqAWBvd geWbceSR4IijNXxmFn7xJNVt1S+Bdm/NEW2O2kpoGxPerStysfuCTETMYsA50SzI8LJ4ZY lDwjay7SdyIlGBy309V1WkW/DAy7MeM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720786956; a=rsa-sha256; cv=none; b=FSiGq4G+E87sYqAQgLyL9/pUafE15Mo6RlJpDRJPK2WizgAKTxkmqQ2i4qVpbJYKDHyfbL AzCkLcKAHIRtuwfPW5CL0zfVHpzMLkfXrI8Tsbv2EmDyUxYnCpCKrwJ8fGMAR8qRuy3a+1 biwWQuaN2cZotgPOn/j3BJyHdlC92zs= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=LABA1ef3; spf=pass (imf15.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=ioworker0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ej1-f49.google.com with SMTP id a640c23a62f3a-a77d9217e6fso262165866b.2 for ; Fri, 12 Jul 2024 05:23:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720786988; x=1721391788; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=VhiqbETTWcozFlAHDLKTjrYIwdph/hMIshsUTKEoalA=; b=LABA1ef33CKD3VPfML+LmCIZ6ct6Uj8R8cga6JcDMpQOobeplFuFmGKljLkg5smlAB w8slUNo7U75TJQvgeHhR4yWbVsBQ3ykS/nhBicFykCPcyCM5akjI70eHkNXcxfZ+HddH HZDm+XbS2kydd6Jx29LwVOm+oTj/OF58G+0oHMNXcMdTgt1D22EznV7FE5ZMO0U4e47U dV3TNvyVT7R9NolG4V8q87a3UcG8O7jtyBBNWijywPsguB9aNwkH9Cx0dFH/5RA+IF2S hAHwDJ/lyYZkSCyZ44Jd2kRTZ4v5mFotUYopkmWcMZ1fJ+2teKGuxz9zO3oQ3xj8u8GF 1cvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720786988; x=1721391788; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VhiqbETTWcozFlAHDLKTjrYIwdph/hMIshsUTKEoalA=; b=XLTcMDQLdxEaQyl/RKaIfF7JBZZbQqIJTBndp+k8fsgjqhyGJMVy+dHfN0SgW6ae9a 3YeyaSTgznuLnm6r/CSAqUNpBeOH5L6B+O9f2n1ftvJtCo556yP19+Sz7/K3MB8GvOii Yxfk9R3M1GwPjPZCeAHDEr8Qm5cUpSHlwjLlLMCl9xdF1fOZMGYR3xLmOA7qrPxkUAj6 +LxOuM+eRjloUmhOQut7U2qn3EOnH6cgqousOVIkaV5+d9TNgdSYoH+Iamq68vNC5aVI 5m0Rwq1SsEhLS7SHKO2R2aKfmpg04X6lC8feUNUD4XHGrxVQinVoePImmdUwftBP8/ns Oozw== X-Forwarded-Encrypted: i=1; AJvYcCW6ynA/Mscp15AeOO8JFT3LD1/3uFRrY/95MuT8lzsVAtj1qjP8yF6FrgE1gH7AxuAtsHtnxH820SXELfCXF0Ur6uA= X-Gm-Message-State: AOJu0YzbqWTREdh0txOA5usnU6G3ZuhUIOdseyGLbKXJJhFaAtBz4VGV 5I9YjaILO1ZUh+K+lgrmLaF7L5ayjLAc6YHKWD4tVwgCDodU+CAg/SqVf/4+anFWKVNIhvlZTHR 4QtyJB2PzqQCLzBEt4IE7swouFmw= X-Google-Smtp-Source: AGHT+IHkOtgqIhIlCaHcSo2ryZ6N4ZdnSfcLbIKs8ZFt8VhICo+9fJSIFftbg5sv3pHkujmCn1wSYNnfKnZMSZcOtg4= X-Received: by 2002:a17:906:6c93:b0:a75:2387:7801 with SMTP id a640c23a62f3a-a780b885f65mr646392566b.61.1720786988213; Fri, 12 Jul 2024 05:23:08 -0700 (PDT) MIME-Version: 1.0 References: <20240711072929.3590000-1-ryan.roberts@arm.com> <20240711072929.3590000-3-ryan.roberts@arm.com> <9e0d84e5-2319-4425-9760-2c6bb23fc390@linux.alibaba.com> In-Reply-To: <9e0d84e5-2319-4425-9760-2c6bb23fc390@linux.alibaba.com> From: Lance Yang Date: Fri, 12 Jul 2024 20:22:31 +0800 Message-ID: Subject: Re: [PATCH v1 2/2] mm: mTHP stats for pagecache folio allocations To: Baolin Wang Cc: Ryan Roberts , Andrew Morton , Hugh Dickins , Jonathan Corbet , "Matthew Wilcox (Oracle)" , David Hildenbrand , Barry Song , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: n4hzzd4pgxzwrktk4fdmy8j1skbyiome X-Rspamd-Queue-Id: 21B4FA0006 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1720786989-342347 X-HE-Meta: U2FsdGVkX19TmMCZI9acMetL1FVbZcf7S8bPCssHKlvxIwYStWKIj/AFMtmkj9dEMFsrEsI0/uZ0t7a3ofD6txDYVrfha3Xe+IeoDGgbpOU40mIQIQlVTXIVSMDJ8JKljy9VwfwEKY0x+EpuxacVv986Y2j//jsn5qqrQeA71VqH/JHVt7iHzQxzdNyJytxrBy3W7G1qA95GIuqcus9DxvYERa9Fc2x+u6g+e4qrBxrVrT6luuALjTSdTnxDDorFGVoaFrGyOH+v9zacx3K2nK4sWLwqclDu0Xl7kmtBmR10m6y9UPgduNZiwlyW8mPA8vczsrKMPIGhUz4FM3z97q2b7olpIZ4V6MsrlhPek4N0dfkojIXr2KMYrXwRfEiHK4e9sOYlPC1jWXVY8wOGVFh65J3DGwYXhc63aMz6okds2Ts8ylPT2KxtjcY1VtIRs9q73NKgdNJNBhikEjVYoCtOz9R/f+rjzDaKeZUMPUdp53N3Qhdm9nu/mlWKiSQowkm7PH/ZL/XB53xKvn26eQ8Vpx/PfHOXeyvIFq+lUgbiwDmED8q5RTOZaKD3EfvYjAWDSsMlIevgRaBQ7QHqa2Bfa9vwCbnc7Z//Q1BnJuuOjqRJR+vW5PGnXLVmXplJrcm1LqaOeTPWAuAp3+mYGFsUfS5ih9EAA+/n2OBl8vA4oHD1vNWjH55bg6SVnl/25TTQXAV3wfjk2Pxr2iAJEz7yFBggcDskDSCmP7MIeMb/UXck4EKnyILoE/LOXk2PdI77MiJQ+r7sA5frckB8aqjOjgt+73ROUHuZUUgKxd8URaMFTV/nKj0hoGD5JsdqJrTf2W6AsopXPMTASiw+SrtmGJGcrpw4wI1Z5p4qVdTj2dL+XFBflCCxhKCdyes8ZV+FbP5PkkR5JUFr5O7+1cAqRCqC+qEo9MEqKQpkYie/rvKpMOsaGaflxauIYLQ+l4qpCNLzeuP8rCa3HRm 4KSHLeay kNlZHlxg6Vakc28ILbXhRCYZNJcjInhgs3OUA6PNSlHb1cgyg9kmqtumUJ7xG3hpb5RecOClFUQrSFs6S8r8Wln/uQXei2kxoipyMkqIaizVDm0yaAcof/qiDYB4vpJiDOH6+4fpf0mSuKRhvRB+CzPRQbTSW6RJS9eWkqUh6MzHCjXYXEAVCcUMdhzr2R8x5fdfR0fA6ddxXqY4qIJX9L01KVsouJyHQ8/3M7hZr+rnZxsEDsUTQybZfT/1oeMpET9tir9c2JdN/zk9+P//lhmS+zwH0xyJLukVsKCB832/KRUZs360rT4mELhDQSpodP5dzvCeQXPtdF5K1K/MqInR3P7cHrujVd9gnuUEaJo005J0uUXxkmCtKCw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jul 12, 2024 at 11:00=E2=80=AFAM Baolin Wang wrote: > > > > On 2024/7/11 15:29, Ryan Roberts wrote: > > Expose 3 new mTHP stats for file (pagecache) folio allocations: > > > > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_alloc > > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallbac= k > > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallbac= k_charge > > > > This will provide some insight on the sizes of large folios being > > allocated for file-backed memory, and how often allocation is failing. > > > > All non-order-0 (and most order-0) folio allocations are currently done > > through filemap_alloc_folio(), and folios are charged in a subsequent > > call to filemap_add_folio(). So count file_fallback when allocation > > fails in filemap_alloc_folio() and count file_alloc or > > file_fallback_charge in filemap_add_folio(), based on whether charging > > succeeded or not. There are some users of filemap_add_folio() that > > allocate their own order-0 folio by other means, so we would not count > > an allocation failure in this case, but we also don't care about order-= 0 > > allocations. This approach feels like it should be good enough and > > doesn't require any (impractically large) refactoring. > > > > The existing mTHP stats interface is reused to provide consistency to > > users. And because we are reusing the same interface, we can reuse the > > same infrastructure on the kernel side. The one small wrinkle is that > > the set of folio sizes supported by the pagecache are not identical to > > those supported by anon and shmem; pagecache supports order-1, unlike > > anon and shmem, and the max pagecache order may be less than PMD-size > > (see arm64 with 64K base pages), again unlike anon and shmem. So we now > > create a hugepages-*kB directory for the union of the sizes supported b= y > > all 3 memory types and populate it with the relevant stats and controls= . > > Personally, I like the idea that can help analyze the allocation of > large folios for the page cache. > > However, I have a slight concern about the consistency of the interface. > > For 64K, the fields layout: > =E2=94=9C=E2=94=80=E2=94=80 hugepages-64kB > =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 enabled > =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 shmem_enabled > =E2=94=82 =E2=94=94=E2=94=80=E2=94=80 stats > =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 anon_fault_alloc > =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 anon_fault_fallback > =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 anon_fault_fallback_charge > =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 file_alloc > =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 file_fallback > =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 file_fallback_charge > =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 shmem_alloc > =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 shmem_fallback > =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 shmem_fallback_charge > =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 split > =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 split_deferred > =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 split_failed > =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 swpout > =E2=94=82 =E2=94=94=E2=94=80=E2=94=80 swpout_fallback > > But for 8K (for pagecache), you removed some fields (of course, I > understand why they are not supported). > > =E2=94=9C=E2=94=80=E2=94=80 hugepages-8kB > =E2=94=82 =E2=94=94=E2=94=80=E2=94=80 stats > =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 file_alloc > =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 file_fallback > =E2=94=82 =E2=94=94=E2=94=80=E2=94=80 file_fallback_charge > > This might not be user-friendly for some user-space parsing tools, as > they lack certain fields for the same pattern interfaces. Of course, > this might not be an issue if we have clear documentation describing the > differences here:) > > Another possible approach is to maintain the same field layout to keep > consistent, but prohibit writing to the fields that are not supported by > the pagecache, and any stats read from them would be 0. I agree that maintaining a uniform field layout, especially at the stats level, might be necessary ;) Keeping a consistent interface could future-proof the design. It allows for the possibility that features not currently supported for 8kB pages might be enabled in the future. Anyway, like David said, it's always tough messing with such stuff ;p Thanks, Lance > > > > Signed-off-by: Ryan Roberts > > --- > > Documentation/admin-guide/mm/transhuge.rst | 13 +++ > > include/linux/huge_mm.h | 6 +- > > include/linux/pagemap.h | 17 ++- > > mm/filemap.c | 6 +- > > mm/huge_memory.c | 117 ++++++++++++++++----= - > > 5 files changed, 128 insertions(+), 31 deletions(-) > > > > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation= /admin-guide/mm/transhuge.rst > > index 058485daf186..d4857e457add 100644 > > --- a/Documentation/admin-guide/mm/transhuge.rst > > +++ b/Documentation/admin-guide/mm/transhuge.rst > > @@ -512,6 +512,19 @@ shmem_fallback_charge > > falls back to using small pages even though the allocation was > > successful. > > > > +file_alloc > > + is incremented every time a file huge page is successfully > > + allocated. > > + > > +file_fallback > > + is incremented if a file huge page is attempted to be allocated > > + but fails and instead falls back to using small pages. > > + > > +file_fallback_charge > > + is incremented if a file huge page cannot be charged and instead > > + falls back to using small pages even though the allocation was > > + successful. > > + > > split > > is incremented every time a huge page is successfully split into > > smaller orders. This can happen for a variety of reasons but a > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > > index cb93b9009ce4..b4fba11976f2 100644 > > --- a/include/linux/huge_mm.h > > +++ b/include/linux/huge_mm.h > > @@ -117,6 +117,9 @@ enum mthp_stat_item { > > MTHP_STAT_SHMEM_ALLOC, > > MTHP_STAT_SHMEM_FALLBACK, > > MTHP_STAT_SHMEM_FALLBACK_CHARGE, > > + MTHP_STAT_FILE_ALLOC, > > + MTHP_STAT_FILE_FALLBACK, > > + MTHP_STAT_FILE_FALLBACK_CHARGE, > > MTHP_STAT_SPLIT, > > MTHP_STAT_SPLIT_FAILED, > > MTHP_STAT_SPLIT_DEFERRED, > > @@ -292,11 +295,10 @@ unsigned long thp_vma_allowable_orders(struct vm_= area_struct *vma, > > > > struct thpsize { > > struct kobject kobj; > > - struct list_head node; > > int order; > > }; > > > > -#define to_thpsize(kobj) container_of(kobj, struct thpsize, kobj) > > +#define to_thpsize(_kobj) container_of(_kobj, struct thpsize, kobj) > > > > #define transparent_hugepage_use_zero_page() = \ > > (transparent_hugepage_flags & \ > > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > > index 6e2f72d03176..f45a1ba6d9b6 100644 > > --- a/include/linux/pagemap.h > > +++ b/include/linux/pagemap.h > > @@ -365,6 +365,7 @@ static inline void mapping_set_gfp_mask(struct addr= ess_space *m, gfp_t mask) > > */ > > #define MAX_XAS_ORDER (XA_CHUNK_SHIFT * 2 - 1) > > #define MAX_PAGECACHE_ORDER min(MAX_XAS_ORDER, PREFERRED_MAX_PAGECACH= E_ORDER) > > +#define PAGECACHE_LARGE_ORDERS ((BIT(MAX_PAGECACHE_ORDER + 1) - = 1) & ~BIT(0)) > > > > /** > > * mapping_set_large_folios() - Indicate the file supports large foli= os. > > @@ -562,14 +563,26 @@ static inline void *detach_page_private(struct pa= ge *page) > > } > > > > #ifdef CONFIG_NUMA > > -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order= ); > > +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int ord= er); > > #else > > -static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsi= gned int order) > > +static inline struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, un= signed int order) > > { > > return folio_alloc_noprof(gfp, order); > > } > > #endif > > > > +static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsi= gned int order) > > +{ > > + struct folio *folio; > > + > > + folio =3D __filemap_alloc_folio_noprof(gfp, order); > > + > > + if (!folio) > > + count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK); > > + > > + return folio; > > +} > > + > > #define filemap_alloc_folio(...) \ > > alloc_hooks(filemap_alloc_folio_noprof(__VA_ARGS__)) > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > index 53d5d0410b51..131d514fca29 100644 > > --- a/mm/filemap.c > > +++ b/mm/filemap.c > > @@ -963,6 +963,8 @@ int filemap_add_folio(struct address_space *mapping= , struct folio *folio, > > int ret; > > > > ret =3D mem_cgroup_charge(folio, NULL, gfp); > > + count_mthp_stat(folio_order(folio), > > + ret ? MTHP_STAT_FILE_FALLBACK_CHARGE : MTHP_STAT_FILE_ALL= OC); > > if (ret) > > return ret; > > > > @@ -990,7 +992,7 @@ int filemap_add_folio(struct address_space *mapping= , struct folio *folio, > > EXPORT_SYMBOL_GPL(filemap_add_folio); > > > > #ifdef CONFIG_NUMA > > -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order= ) > > +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int ord= er) > > { > > int n; > > struct folio *folio; > > @@ -1007,7 +1009,7 @@ struct folio *filemap_alloc_folio_noprof(gfp_t gf= p, unsigned int order) > > } > > return folio_alloc_noprof(gfp, order); > > } > > -EXPORT_SYMBOL(filemap_alloc_folio_noprof); > > +EXPORT_SYMBOL(__filemap_alloc_folio_noprof); > > #endif > > > > /* > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index f9696c94e211..559553e2a662 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -452,8 +452,9 @@ static const struct attribute_group hugepage_attr_g= roup =3D { > > > > static void hugepage_exit_sysfs(struct kobject *hugepage_kobj); > > static void thpsize_release(struct kobject *kobj); > > +static void thpsize_child_release(struct kobject *kobj); > > static DEFINE_SPINLOCK(huge_anon_orders_lock); > > -static LIST_HEAD(thpsize_list); > > +static LIST_HEAD(thpsize_child_list); > > > > static ssize_t thpsize_enabled_show(struct kobject *kobj, > > struct kobj_attribute *attr, char *bu= f) > > @@ -537,6 +538,18 @@ static const struct kobj_type thpsize_ktype =3D { > > .sysfs_ops =3D &kobj_sysfs_ops, > > }; > > > > +static const struct kobj_type thpsize_child_ktype =3D { > > + .release =3D &thpsize_child_release, > > + .sysfs_ops =3D &kobj_sysfs_ops, > > +}; > > + > > +struct thpsize_child { > > + struct kobject kobj; > > + struct list_head node; > > +}; > > + > > +#define to_thpsize_child(_kobj) container_of(_kobj, struct thpsize, ko= bj) > > + > > DEFINE_PER_CPU(struct mthp_stat, mthp_stats) =3D {{{0}}}; > > > > static unsigned long sum_mthp_stat(int order, enum mthp_stat_item ite= m) > > @@ -557,7 +570,7 @@ static unsigned long sum_mthp_stat(int order, enum = mthp_stat_item item) > > static ssize_t _name##_show(struct kobject *kobj, \ > > struct kobj_attribute *attr, char *buf) \ > > { \ > > - int order =3D to_thpsize(kobj)->order; = \ > > + int order =3D to_thpsize(kobj->parent)->order; = \ > > \ > > return sysfs_emit(buf, "%lu\n", sum_mthp_stat(order, _index)); \ > > } \ > > @@ -591,41 +604,93 @@ static struct attribute *stats_attrs[] =3D { > > }; > > > > static struct attribute_group stats_attr_group =3D { > > - .name =3D "stats", > > .attrs =3D stats_attrs, > > }; > > > > -static struct thpsize *thpsize_create(int order, struct kobject *paren= t) > > +DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC); > > +DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK); > > +DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CH= ARGE); > > + > > +static struct attribute *file_stats_attrs[] =3D { > > + &file_alloc_attr.attr, > > + &file_fallback_attr.attr, > > + &file_fallback_charge_attr.attr, > > + NULL, > > +}; > > + > > +static struct attribute_group file_stats_attr_group =3D { > > + .attrs =3D file_stats_attrs, > > +}; > > + > > +static int thpsize_create(int order, struct kobject *parent) > > { > > unsigned long size =3D (PAGE_SIZE << order) / SZ_1K; > > + struct thpsize_child *stats; > > struct thpsize *thpsize; > > int ret; > > > > + /* > > + * Each child object (currently only "stats" directory) holds a > > + * reference to the top-level thpsize object, so we can drop our = ref to > > + * the top-level once stats is setup. Then we just need to drop a > > + * reference on any children to clean everything up. We can't jus= t use > > + * the attr group name for the stats subdirectory because there m= ay be > > + * multiple attribute groups to populate inside stats and overlay= ing > > + * using the name property isn't supported in that way; each attr= group > > + * name, if provided, must be unique in the parent directory. > > + */ > > + > > thpsize =3D kzalloc(sizeof(*thpsize), GFP_KERNEL); > > - if (!thpsize) > > - return ERR_PTR(-ENOMEM); > > + if (!thpsize) { > > + ret =3D -ENOMEM; > > + goto err; > > + } > > + thpsize->order =3D order; > > > > ret =3D kobject_init_and_add(&thpsize->kobj, &thpsize_ktype, pare= nt, > > "hugepages-%lukB", size); > > if (ret) { > > kfree(thpsize); > > - return ERR_PTR(ret); > > + goto err; > > } > > > > - ret =3D sysfs_create_group(&thpsize->kobj, &thpsize_attr_group); > > - if (ret) { > > + stats =3D kzalloc(sizeof(*stats), GFP_KERNEL); > > + if (!stats) { > > kobject_put(&thpsize->kobj); > > - return ERR_PTR(ret); > > + ret =3D -ENOMEM; > > + goto err; > > } > > > > - ret =3D sysfs_create_group(&thpsize->kobj, &stats_attr_group); > > + ret =3D kobject_init_and_add(&stats->kobj, &thpsize_child_ktype, > > + &thpsize->kobj, "stats"); > > + kobject_put(&thpsize->kobj); > > if (ret) { > > - kobject_put(&thpsize->kobj); > > - return ERR_PTR(ret); > > + kfree(stats); > > + goto err; > > } > > > > - thpsize->order =3D order; > > - return thpsize; > > + if (BIT(order) & THP_ORDERS_ALL_ANON) { > > + ret =3D sysfs_create_group(&thpsize->kobj, &thpsize_attr_= group); > > + if (ret) > > + goto err_put; > > + > > + ret =3D sysfs_create_group(&stats->kobj, &stats_attr_grou= p); > > + if (ret) > > + goto err_put; > > + } > > + > > + if (BIT(order) & PAGECACHE_LARGE_ORDERS) { > > + ret =3D sysfs_create_group(&stats->kobj, &file_stats_attr= _group); > > + if (ret) > > + goto err_put; > > + } > > + > > + list_add(&stats->node, &thpsize_child_list); > > + return 0; > > +err_put: > > IIUC, I think you should call 'sysfs_remove_group' to remove the group > before putting the kobject. > > > + kobject_put(&stats->kobj); > > +err: > > + return ret; > > } > > > > static void thpsize_release(struct kobject *kobj) > > @@ -633,10 +698,14 @@ static void thpsize_release(struct kobject *kobj) > > kfree(to_thpsize(kobj)); > > } > > > > +static void thpsize_child_release(struct kobject *kobj) > > +{ > > + kfree(to_thpsize_child(kobj)); > > +} > > + > > static int __init hugepage_init_sysfs(struct kobject **hugepage_kobj) > > { > > int err; > > - struct thpsize *thpsize; > > unsigned long orders; > > int order; > > > > @@ -665,16 +734,14 @@ static int __init hugepage_init_sysfs(struct kobj= ect **hugepage_kobj) > > goto remove_hp_group; > > } > > > > - orders =3D THP_ORDERS_ALL_ANON; > > + orders =3D THP_ORDERS_ALL_ANON | PAGECACHE_LARGE_ORDERS; > > order =3D highest_order(orders); > > while (orders) { > > - thpsize =3D thpsize_create(order, *hugepage_kobj); > > - if (IS_ERR(thpsize)) { > > + err =3D thpsize_create(order, *hugepage_kobj); > > + if (err) { > > pr_err("failed to create thpsize for order %d\n",= order); > > - err =3D PTR_ERR(thpsize); > > goto remove_all; > > } > > - list_add(&thpsize->node, &thpsize_list); > > order =3D next_order(&orders, order); > > } > > > > @@ -692,11 +759,11 @@ static int __init hugepage_init_sysfs(struct kobj= ect **hugepage_kobj) > > > > static void __init hugepage_exit_sysfs(struct kobject *hugepage_kobj) > > { > > - struct thpsize *thpsize, *tmp; > > + struct thpsize_child *child, *tmp; > > > > - list_for_each_entry_safe(thpsize, tmp, &thpsize_list, node) { > > - list_del(&thpsize->node); > > - kobject_put(&thpsize->kobj); > > + list_for_each_entry_safe(child, tmp, &thpsize_child_list, node) { > > + list_del(&child->node); > > + kobject_put(&child->kobj); > > } > > > > sysfs_remove_group(hugepage_kobj, &khugepaged_attr_group);