From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71F8EC3DA63 for ; Wed, 24 Jul 2024 10:23:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CDABA6B0096; Wed, 24 Jul 2024 06:23:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C8ADB6B0099; Wed, 24 Jul 2024 06:23:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B2A796B009A; Wed, 24 Jul 2024 06:23:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 92E8A6B0096 for ; Wed, 24 Jul 2024 06:23:26 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 428F3C081D for ; Wed, 24 Jul 2024 10:23:26 +0000 (UTC) X-FDA: 82374259212.26.16C62FA Received: from mail-ua1-f50.google.com (mail-ua1-f50.google.com [209.85.222.50]) by imf28.hostedemail.com (Postfix) with ESMTP id 5E143C001E for ; Wed, 24 Jul 2024 10:23:24 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=none; spf=pass (imf28.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.50 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721816556; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CkIYtHK9zAI2mVfqs2m7noG5/FMJoIhu1hhEtMBZiGw=; b=XElSkDSf6E9/BZga3UEi+JN28zNt/ZlFg1neVCBXvuZKE/D+46spwACXmlEF9V76u0NS4Y tSAr0qCpfqAKrXC4bie1q9ONfpDr1//KDHALpwDIlKGjpXdO+pi9UWvkjP7qEl7Znztww7 rj/OIHs1Cd24vQs3qGXaEUZ3fAmiszg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721816556; a=rsa-sha256; cv=none; b=zO8Ypxob0G/eoS3QmbpO8YtlxF5wwiNVbriMeKVEgm7CkbouQQeF+lP6mjvCOh+9D3b57g +vvPa8i7Nu7PCEzRe8uhN7eUuDX7244NwgOPYK9DGCxDcW5QZb7YQITMgsye3W4eVTrQ9l qQLhNT7igwSo+kqIFBAf37GfeL+Es/4= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=none; spf=pass (imf28.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.50 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none) Received: by mail-ua1-f50.google.com with SMTP id a1e0cc1a2514c-810177d1760so1970110241.2 for ; Wed, 24 Jul 2024 03:23:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721816603; x=1722421403; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CkIYtHK9zAI2mVfqs2m7noG5/FMJoIhu1hhEtMBZiGw=; b=DomrVgo6ZsydivwkDSNkXJU443go9xBuLJWJ75tAFD2h5M+1GzqkeConRTH65FJiqC 7B9WpxTW1iXWcGFphgRQkQrcxt6NJ+l5pCX5ZCAAvHREI51gtAivuTHAbqvY9pnocgO+ TJGYGubgeRVwhvjxEBvAjL8b0j/dyy7k9XpazpNpxvH1jcEHSXKTAPgPLtIW++2k7D+Z NyQkKfmN0J0tZnE36E6QI+3fnJLrtZTqpfq0E+t6Z40eturDKHIOCCdIHLN41QNrr+wD 3PypUUfdV6duIJrut5OFiMQ7tEcyP0ZJlZJk6dr7nU8PHXM64KYpsHtUadITFt2G+WB0 Z5zA== X-Forwarded-Encrypted: i=1; AJvYcCUzVKyYxlUpgH+s2ehJp6VZp5jx4PH2QOtXxHsx50AOjpJgH8bP8H5jx6xoVd1dQ/Mmpfw7rxWmpqe4kOn4z5aIibM= X-Gm-Message-State: AOJu0YyMZg4yfl4ULJN0aHcSnSW0ssb/cf6cjAJXY8d1l+jtwco0slse nacdCYTFv7G0+p8QMDO51IVA8YAhayqdzmNNpEtttodKbwhPZoOa17O63k/caspamJaMlUNS9RH 0SozsZpyoaeQSg/lMrJzwpSobG/E= X-Google-Smtp-Source: AGHT+IH46wTYXWwzmrGyVtUN6LxRMskuoTyF0NDe2yE9aqmXMJt1QonlifQaLxeRm985pR7fHy14tQuqru364zQ/g9o= X-Received: by 2002:a05:6102:2c03:b0:492:9adb:f4de with SMTP id ada2fe7eead31-493c4ad8f59mr2023242137.5.1721816603366; Wed, 24 Jul 2024 03:23:23 -0700 (PDT) MIME-Version: 1.0 References: <20240716135907.4047689-1-ryan.roberts@arm.com> <20240716135907.4047689-4-ryan.roberts@arm.com> <223acdf2-ba25-4c31-94df-a212d31bd6c3@arm.com> In-Reply-To: From: Barry Song Date: Wed, 24 Jul 2024 22:23:12 +1200 Message-ID: Subject: Re: [PATCH v2 3/3] mm: mTHP stats for pagecache folio allocations To: Ryan Roberts Cc: Andrew Morton , Hugh Dickins , Jonathan Corbet , "Matthew Wilcox (Oracle)" , David Hildenbrand , Lance Yang , Baolin Wang , Gavin Shan , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 5E143C001E X-Stat-Signature: ymm4s74kmf3rm1x4w8tcra8mff819x9j X-HE-Tag: 1721816604-119256 X-HE-Meta: U2FsdGVkX19eizuOP4hjLyOS8gYHfE/vB8dH9kL6h+AsCH6CzYco5+gMpE9Ll8nVrxhPQDmxRq0Av3LOutvn0sunAtDmkkDbQ6nVKHM6Q4dZNRwYWzUKpv+QZWNMhhwMb192rfPorJQrerzoi4MzZ+mVJuFSQY1lFOK7068RaltT1QQBOZuY7VqNjFMfLXndiaFYKL2Fch6UsEHACMVqyca+Q6jb1OWj53Wui1A9ktWXuHGSvL0hUPeZyj3Sw3xgtJgVoQMd4fgO1v3gh1w7QLGWveuXSrbdleXuubgwy04Ffcjr+P6Ohf5qiDYc6Oti9tgbIBXHyUMFzSHf6vP4n6gGL8TuQsWA+34FOXALLBDGjbyqmHl8jXRhVL48RkGc+DvT7qwVYC05ueH4W3h2YBvXlVSlT/2eKzYchBQURFllGvsPF1C4MmXPivLxQwm1HFtIGT4wHrxmwc4JLbx1h1EWr+RoQ8jXkK2cKOp37fvgn/jLO8v6+X1nTGygQ+ch0o6DwXYDY3VW/5e8OS0lbBkpdnkAsTmhgOW9CKt/ykUFoAAXdSiaq3elTPYWobT3EB+d1nmcqcm+6ChEkw6fvEKt6rTtknl1UZek2vcyR5LDD/iZ0fWzAZTG6GuHrKwcOMyUCwHGifyAhnbCgvRnjlKyiQwgYmhoeBXCLYBdyCqMrWH/TqU2TANQzj48EQC+1/4Cuu82X7vYfv5P1HiobloT3LuTjSnGysdLo4js/6BSbWhiv8Bl/OKCRq5v0ZGQcFxahJ/+HUZMrhg+33PvEHUbPZRLUoRyD1tvsdUqicTxcCIwAHsJW0HVrmtLU40jn8qqNXk0BxWFExGPK9dgtXHOpqqbvC89nK5FcenaJFXRyvBO35PANH/5ydJMqRGNMYrZi7cmQcNQEpE+8ND4EKQJmLJmJVzLAH5mpdOt81taOEBKp8GhKJk2ZMINp7++XvUgPfRD9YGEZfEa79N FtAfFUoi TG1yUpHgiE4covxy0iAgvMCHpEAe6/mgwOHZoETeE9b56sBzbgqe0Oang9WQ3MumAFXtwHvz5uuuWx0Rb8JLpNa1X8goIOTEYFhcsodNLiKTQFJNJJ4THFjbUaNFR3BSESTv29khhksV63PGbLUlfRT7bhzbldfpjz7M42r5l6BcgRYW3+eSyT+7KsCqwduNfTdVG3UlwUlyUiKhuEGNn7z3d6YoWxHE1bltxEILcXsHSTsdZGh8MsxmcQOlmgvTKiOKTxiZMNiZSMSHC7ZdWjMspE/DqWKh7kMtKTWj8+NoEz7rpjXXtUkn7CMgy9VFqR+hlHSuXyv4Txbk+dJYaG+JqGQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jul 24, 2024 at 8:12=E2=80=AFPM Ryan Roberts = wrote: > > On 23/07/2024 23:07, Barry Song wrote: > > On Fri, Jul 19, 2024 at 8:56=E2=80=AFPM Ryan Roberts wrote: > >> > >> On 19/07/2024 01:12, Barry Song wrote: > >>> On Wed, Jul 17, 2024 at 1:59=E2=80=AFAM Ryan Roberts wrote: > >>>> > >>>> Expose 3 new mTHP stats for file (pagecache) folio allocations: > >>>> > >>>> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_alloc > >>>> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallb= ack > >>>> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallb= ack_charge > >>>> > >>>> This will provide some insight on the sizes of large folios being > >>>> allocated for file-backed memory, and how often allocation is failin= g. > >>>> > >>>> All non-order-0 (and most order-0) folio allocations are currently d= one > >>>> through filemap_alloc_folio(), and folios are charged in a subsequen= t > >>>> call to filemap_add_folio(). So count file_fallback when allocation > >>>> fails in filemap_alloc_folio() and count file_alloc or > >>>> file_fallback_charge in filemap_add_folio(), based on whether chargi= ng > >>>> succeeded or not. There are some users of filemap_add_folio() that > >>>> allocate their own order-0 folio by other means, so we would not cou= nt > >>>> an allocation failure in this case, but we also don't care about ord= er-0 > >>>> allocations. This approach feels like it should be good enough and > >>>> doesn't require any (impractically large) refactoring. > >>>> > >>>> The existing mTHP stats interface is reused to provide consistency t= o > >>>> users. And because we are reusing the same interface, we can reuse t= he > >>>> same infrastructure on the kernel side. > >>>> > >>>> Signed-off-by: Ryan Roberts > >>>> --- > >>>> Documentation/admin-guide/mm/transhuge.rst | 13 +++++++++++++ > >>>> include/linux/huge_mm.h | 3 +++ > >>>> include/linux/pagemap.h | 16 ++++++++++++++-- > >>>> mm/filemap.c | 6 ++++-- > >>>> mm/huge_memory.c | 7 +++++++ > >>>> 5 files changed, 41 insertions(+), 4 deletions(-) > >>>> > >>>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentat= ion/admin-guide/mm/transhuge.rst > >>>> index 058485daf186..d4857e457add 100644 > >>>> --- a/Documentation/admin-guide/mm/transhuge.rst > >>>> +++ b/Documentation/admin-guide/mm/transhuge.rst > >>>> @@ -512,6 +512,19 @@ shmem_fallback_charge > >>>> falls back to using small pages even though the allocation w= as > >>>> successful. > >>>> > >>>> +file_alloc > >>>> + is incremented every time a file huge page is successfully > >>>> + allocated. > >>>> + > >>>> +file_fallback > >>>> + is incremented if a file huge page is attempted to be alloca= ted > >>>> + but fails and instead falls back to using small pages. > >>>> + > >>>> +file_fallback_charge > >>>> + is incremented if a file huge page cannot be charged and ins= tead > >>>> + falls back to using small pages even though the allocation w= as > >>>> + successful. > >>>> + > >>> > >>> I realized that when we talk about fallback, it doesn't necessarily m= ean > >>> small pages; it could also refer to smaller huge pages. > >> > >> Yes good point, I'll update the documentation to reflect that for all = memory types. > >> > >>> > >>> anon_fault_alloc > >>> is incremented every time a huge page is successfully > >>> allocated and charged to handle a page fault. > >>> > >>> anon_fault_fallback > >>> is incremented if a page fault fails to allocate or charge > >>> a huge page and instead falls back to using huge pages with > >>> lower orders or small pages. > >>> > >>> anon_fault_fallback_charge > >>> is incremented if a page fault fails to charge a huge page an= d > >>> instead falls back to using huge pages with lower orders or > >>> small pages even though the allocation was successful. > >>> > >>> This also applies to files, right? > >> > >> It does in the place you highlight below, but page_cache_ra_order() ju= st falls > >> back immediately to order-0. Regardless, I think we should just docume= nt the > >> user facing docs to allow for a lower high order. That gives the imple= mentation > >> more flexibility. > >> > >>> > >>> do { > >>> gfp_t alloc_gfp =3D gfp; > >>> > >>> err =3D -ENOMEM; > >>> if (order > 0) > >>> alloc_gfp |=3D __GFP_NORETRY | __GFP_= NOWARN; > >>> folio =3D filemap_alloc_folio(alloc_gfp, orde= r); > >>> if (!folio) > >>> continue; > >>> > >>> /* Init accessed so avoid atomic > >>> mark_page_accessed later */ > >>> if (fgp_flags & FGP_ACCESSED) > >>> __folio_set_referenced(folio); > >>> > >>> err =3D filemap_add_folio(mapping, folio, ind= ex, gfp); > >>> if (!err) > >>> break; > >>> folio_put(folio); > >>> folio =3D NULL; > >>> } while (order-- > 0); > >>> > >>> > >>>> split > >>>> is incremented every time a huge page is successfully split = into > >>>> smaller orders. This can happen for a variety of reasons but= a > >>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > >>>> index b8c63c3e967f..4f9109fcdded 100644 > >>>> --- a/include/linux/huge_mm.h > >>>> +++ b/include/linux/huge_mm.h > >>>> @@ -123,6 +123,9 @@ enum mthp_stat_item { > >>>> MTHP_STAT_SHMEM_ALLOC, > >>>> MTHP_STAT_SHMEM_FALLBACK, > >>>> MTHP_STAT_SHMEM_FALLBACK_CHARGE, > >>>> + MTHP_STAT_FILE_ALLOC, > >>>> + MTHP_STAT_FILE_FALLBACK, > >>>> + MTHP_STAT_FILE_FALLBACK_CHARGE, > >>>> MTHP_STAT_SPLIT, > >>>> MTHP_STAT_SPLIT_FAILED, > >>>> MTHP_STAT_SPLIT_DEFERRED, > >>>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > >>>> index 6e2f72d03176..95a147b5d117 100644 > >>>> --- a/include/linux/pagemap.h > >>>> +++ b/include/linux/pagemap.h > >>>> @@ -562,14 +562,26 @@ static inline void *detach_page_private(struct= page *page) > >>>> } > >>>> > >>>> #ifdef CONFIG_NUMA > >>>> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int or= der); > >>>> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int = order); > >>>> #else > >>>> -static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, u= nsigned int order) > >>>> +static inline struct folio *__filemap_alloc_folio_noprof(gfp_t gfp,= unsigned int order) > >>>> { > >>>> return folio_alloc_noprof(gfp, order); > >>>> } > >>>> #endif > >>>> > >>>> +static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, u= nsigned int order) > >>>> +{ > >>>> + struct folio *folio; > >>>> + > >>>> + folio =3D __filemap_alloc_folio_noprof(gfp, order); > >>>> + > >>>> + if (!folio) > >>>> + count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK); > >>>> + > >>>> + return folio; > >>>> +} > >>> > >>> Do we need to add and export __filemap_alloc_folio_noprof()? > >> > >> It is exported. See the below change in filemap.c. Previously > >> filemap_alloc_folio_noprof() was exported, but that is now an inline a= nd > >> __filemap_alloc_folio_noprof() is exported in its place. > >> > >>> In any case, > >>> we won't call count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK) and > >>> will only allocate the folio instead? > >> > >> Sorry I don't understand what you mean by this? > > > > Ryan, my question is whether exporting __filemap_alloc_folio_noprof() m= ight lead > > to situations where people call __filemap_alloc_folio_noprof() and > > forget to call > > count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK). Currently, it seems > > that counting is always necessary? > > OK to make sure I've understood, I think you are saying that there is a r= isk > that people will call __filemap_alloc_folio_noprof() and bypass the stat > counting? But its the same problem with all "_noprof" functions; if those= are > called directly, it bypasses the memory allocation profiling infrastructu= re. So > this just needs to be a case of "don't do that" IMHO. filemap_alloc_folio= () is > the public API. Indeed. Maybe I'm overthinking it. > > > Another option is that we still > > call count mthp > > within filemap_alloc_folio_noprof() and make it noinline if > > __filemap_alloc_folio_noprof() > > is not inline? > > I could certainly make filemap_alloc_folio_noprof() always out-of-line an= d then > handle the counting privately in the compilation unit. But before my chan= ge > filemap_alloc_folio_noprof() was inline for the !CONFIG_NUMA case and I w= as > trying not to change that. I think what you're suggesting would be tidier > though. I'll make the change. But I don't think it solves the wider probl= em of > the possibility that people can call private APIs. That's what review is = for. Agreed. > > Thanks, > Ryan > > > > > >> > >>> > >>>> + > >>>> #define filemap_alloc_folio(...) \ > >>>> alloc_hooks(filemap_alloc_folio_noprof(__VA_ARGS__)) > >>>> > >>>> diff --git a/mm/filemap.c b/mm/filemap.c > >>>> index 53d5d0410b51..131d514fca29 100644 > >>>> --- a/mm/filemap.c > >>>> +++ b/mm/filemap.c > >>>> @@ -963,6 +963,8 @@ int filemap_add_folio(struct address_space *mapp= ing, struct folio *folio, > >>>> int ret; > >>>> > >>>> ret =3D mem_cgroup_charge(folio, NULL, gfp); > >>>> + count_mthp_stat(folio_order(folio), > >>>> + ret ? MTHP_STAT_FILE_FALLBACK_CHARGE : MTHP_STAT_FIL= E_ALLOC); > >>>> if (ret) > >>>> return ret; > >>> > >>> Would the following be better? > >>> > >>> ret =3D mem_cgroup_charge(folio, NULL, gfp); > >>> if (ret) { > >>> count_mthp_stat(folio_order(folio), > >>> MTHP_STAT_FILE_FALLBACK_CHARGE); > >>> return ret; > >>> } > >>> count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC); > >>> > >>> Anyway, it's up to you. The code just feels a bit off to me :-) > >> > >> Yes, agree your version is better. I also noticed that anon and shmem = increment > >> FALLBACK whenever FALLBACK_CHARGE is incremented so I should apply tho= se same > >> semantics here. > >> > >> Thanks, > >> Ryan > >> > >> > >>> > >>>> > >>>> @@ -990,7 +992,7 @@ int filemap_add_folio(struct address_space *mapp= ing, struct folio *folio, > >>>> EXPORT_SYMBOL_GPL(filemap_add_folio); > >>>> > >>>> #ifdef CONFIG_NUMA > >>>> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int or= der) > >>>> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int = order) > >>>> { > >>>> int n; > >>>> struct folio *folio; > >>>> @@ -1007,7 +1009,7 @@ struct folio *filemap_alloc_folio_noprof(gfp_t= gfp, unsigned int order) > >>>> } > >>>> return folio_alloc_noprof(gfp, order); > >>>> } > >>>> -EXPORT_SYMBOL(filemap_alloc_folio_noprof); > >>>> +EXPORT_SYMBOL(__filemap_alloc_folio_noprof); > >>>> #endif > >>>> > >>>> /* > >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > >>>> index 578ac212c172..26d558e3e80f 100644 > >>>> --- a/mm/huge_memory.c > >>>> +++ b/mm/huge_memory.c > >>>> @@ -608,7 +608,14 @@ static struct attribute_group anon_stats_attr_g= rp =3D { > >>>> .attrs =3D anon_stats_attrs, > >>>> }; > >>>> > >>>> +DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC); > >>>> +DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK); > >>>> +DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK= _CHARGE); > >>>> + > >>>> static struct attribute *file_stats_attrs[] =3D { > >>>> + &file_alloc_attr.attr, > >>>> + &file_fallback_attr.attr, > >>>> + &file_fallback_charge_attr.attr, > >>>> #ifdef CONFIG_SHMEM > >>>> &shmem_alloc_attr.attr, > >>>> &shmem_fallback_attr.attr, > >>>> -- > >>>> 2.43.0 > >>>> > >>> > > Thanks Barry