From: Ryan Roberts <ryan.roberts@arm.com>
To: Barry Song <21cnbao@gmail.com>
Cc: akpm@linux-foundation.org, baolin.wang@linux.alibaba.com,
corbet@lwn.net, da.gomez@samsung.com, david@redhat.com,
hughd@google.com, ioworker0@gmail.com,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
willy@infradead.org, ziy@nvidia.com
Subject: Re: [PATCH v1] mm: shmem: Rename mTHP shmem counters
Date: Tue, 9 Jul 2024 09:35:31 +0100 [thread overview]
Message-ID: <ca200353-da41-4f2e-b036-f22ab4bbf1b6@arm.com> (raw)
In-Reply-To: <CAGsJ_4xy+ADGF2te4Xh=DMABDGEAAp-7QGPgJF6MstOYXVNYtA@mail.gmail.com>
On 09/07/2024 09:13, Barry Song wrote:
> On Tue, Jul 9, 2024 at 7:55 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 09/07/2024 02:44, Barry Song wrote:
>>> On Tue, Jul 9, 2024 at 12:30 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>
>>>> On 08/07/2024 12:36, Barry Song wrote:
>>>>> On Mon, Jul 8, 2024 at 11:24 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>>>
>>>>>> The legacy PMD-sized THP counters at /proc/vmstat include
>>>>>> thp_file_alloc, thp_file_fallback and thp_file_fallback_charge, which
>>>>>> rather confusingly refer to shmem THP and do not include any other types
>>>>>> of file pages. This is inconsistent since in most other places in the
>>>>>> kernel, THP counters are explicitly separated for anon, shmem and file
>>>>>> flavours. However, we are stuck with it since it constitutes a user ABI.
>>>>>>
>>>>>> Recently, commit 66f44583f9b6 ("mm: shmem: add mTHP counters for
>>>>>> anonymous shmem") added equivalent mTHP stats for shmem, keeping the
>>>>>> same "file_" prefix in the names. But in future, we may want to add
>>>>>> extra stats to cover actual file pages, at which point, it would all
>>>>>> become very confusing.
>>>>>>
>>>>>> So let's take the opportunity to rename these new counters "shmem_"
>>>>>> before the change makes it upstream and the ABI becomes immutable.
>>>>>
>>>>> Personally, I think this approach is much clearer. However, I recall
>>>>> we discussed this
>>>>> before [1], and it seems that inconsistency is a concern?
>>>>
>>>> Embarrassingly, I don't recall that converstation at all :-| but at least what I
>>>> said then is consistent with what I've done in this patch.
>>>>
>>>> I think David's conclusion from that thread was to call them FILE_, and add both
>>>> shmem and pagecache counts to those counters, meaning we can keep the same name
>>>> as legacy THP counters. But those legacy THP counters only count shmem, and I
>>>> don't think we would get away with adding pagecache counts to those at this
>>>> point? (argument: they have been around for long time and there is a risk that
>>>> user space relies on them and if they were to dramatically increase due to
>>>> pagecache addition now that could break things). In that case, there is still
>>>> inconsistency, but its worse; the names are consistent but the semantics are
>>>> inconsistent.
>>>>
>>>> So my vote is to change to SHMEM_ as per this patch :)
>>>
>>> I have no objections. However, I dislike the documentation for
>>> thp_file_*. Perhaps we can clean it all up together ?
>>
>> I agree that we should clean this documentation up and I'm happy to roll it into
>> v2. However, I don't think what you have suggested is quite right.
>>
>> thp_file_alloc, thp_file_fallback and thp_file_fallback_charge *only* count
>> shmem. They don't count pagecache. So perhaps the change should be "...every
>> time a shmem huge page (dispite being named after "file", the counter measures
>> only shmem) is..."?
>
> I understand what you are saying, and I know that thp_file_* has only
> included shmem so far. My question is whether it will include regular
> files in the future? If not, I am perfectly fine with your approach.
My whole reasoning for this patch is based on the assertion that since
THP_FILE_ALLOC has been there for 8 years and in all that time has only counted
shmem, then its highly likely that someone is depending on that semantic and we
can't change it. I don't have any actual evidence of code that relies on it though.
I propose I change the docs to reflect what's actually happening today (i.e.
shmem *only*). If we later decide we want to also report page cache numbers
through that same counter, then we can change the docs at that point. But if I
get my way, we'll soon have mTHP counters for FILE, which is solely for page
cache. So You'll be able to get all the fine-grained info out of those and there
will be no need to mess with the legacy counters.
>
> READ_ONLY_THP_FOR_FS isn't applicable in this path as it is created
> by khugepaged collapse.
>
>>
>> thp_file_mapped includes both file and shmem, so agree with your change there.
>>
>> What do you think?
>>
>>
>>>
>>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
>>> index 709fe10b60f4..65df48cb3bbb 100644
>>> --- a/Documentation/admin-guide/mm/transhuge.rst
>>> +++ b/Documentation/admin-guide/mm/transhuge.rst
>>> @@ -417,21 +417,22 @@ thp_collapse_alloc_failed
>>> the allocation.
>>>
>>> thp_file_alloc
>>> - is incremented every time a file huge page is successfully
>>> - allocated.
>>> + is incremented every time a file (including shmem) huge page is
>>> + successfully allocated.
>>>
>>> thp_file_fallback
>>> - is incremented if a file huge page is attempted to be allocated
>>> - but fails and instead falls back to using small pages.
>>> + is incremented if a file (including shmem) huge page is attempted
>>> + to be allocated but fails and instead falls back to using small
>>> + pages.
>>>
>>> thp_file_fallback_charge
>>> - is incremented if a file huge page cannot be charged and instead
>>> - falls back to using small pages even though the allocation was
>>> - successful.
>>> + is incremented if a file (including shmem) huge page cannot be
>>> + charged and instead falls back to using small pages even though
>>> + the allocation was successful.
>>>
>>> thp_file_mapped
>>> - is incremented every time a file huge page is mapped into
>>> - user address space.
>>> + is incremented every time a file (including shmem) huge page is
>>> + mapped into user address space.
>>>
>>> thp_split_page
>>> is incremented every time a huge page is split into base
>>>
>>>>
>>>>>
>>>>> [1] https://lore.kernel.org/linux-mm/05d0096e4ec3e572d1d52d33a31a661321ac1551.1713755580.git.baolin.wang@linux.alibaba.com/
>>>>>
>>>>>
>>>>>>
>>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>>>>> ---
>>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> Applies on top of today's mm-unstable (2073cda629a4) and tested with mm
>>>>>> selftests; no regressions observed.
>>>>>>
>>>>>> The backstory here is that I'd like to introduce some counters for regular file
>>>>>> folio allocations to observe how often large folio allocation succeeds, but
>>>>>> these shmem counters are named "file" which is going to make things confusing.
>>>>>> So hoping to solve that before commit 66f44583f9b6 ("mm: shmem: add mTHP
>>>>>> counters for anonymous shmem") goes upstream (it is currently in mm-stable).
>>>>>>
>>>>>> Admittedly, this change means the mTHP stat names are not the same as the legacy
>>>>>> PMD-size THP names, but I think that's a smaller issue than having "file_" mTHP
>>>>>> stats that only count shmem, then having to introduce "file2_" or "pgcache_"
>>>>>> stats for the regular file memory, which is even more inconsistent IMHO. I guess
>>>>>> the alternative is to count both shmem and file in these mTHP stats (that's how
>>>>>> they were documented anyway) but I think it's better to be able to consider them
>>>>>> separately like we do for all the other counters.
>>>>>>
>>>>>> Thanks,
>>>>>> Ryan
>>>>>>
>>>>>> Documentation/admin-guide/mm/transhuge.rst | 12 ++++++------
>>>>>> include/linux/huge_mm.h | 6 +++---
>>>>>> mm/huge_memory.c | 12 ++++++------
>>>>>> mm/shmem.c | 8 ++++----
>>>>>> 4 files changed, 19 insertions(+), 19 deletions(-)
>>>>>>
>>>>>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
>>>>>> index 747c811ee8f1..8b891689fc13 100644
>>>>>> --- a/Documentation/admin-guide/mm/transhuge.rst
>>>>>> +++ b/Documentation/admin-guide/mm/transhuge.rst
>>>>>> @@ -496,16 +496,16 @@ swpout_fallback
>>>>>> Usually because failed to allocate some continuous swap space
>>>>>> for the huge page.
>>>>>>
>>>>>> -file_alloc
>>>>>> - is incremented every time a file huge page is successfully
>>>>>> +shmem_alloc
>>>>>> + is incremented every time a shmem huge page is successfully
>>>>>> allocated.
>>>>>>
>>>>>> -file_fallback
>>>>>> - is incremented if a file huge page is attempted to be allocated
>>>>>> +shmem_fallback
>>>>>> + is incremented if a shmem huge page is attempted to be allocated
>>>>>> but fails and instead falls back to using small pages.
>>>>>>
>>>>>> -file_fallback_charge
>>>>>> - is incremented if a file huge page cannot be charged and instead
>>>>>> +shmem_fallback_charge
>>>>>> + is incremented if a shmem huge page cannot be charged and instead
>>>>>> falls back to using small pages even though the allocation was
>>>>>> successful.
>>>>>>
>>>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>>>>> index acb6ac24a07e..cff002be83eb 100644
>>>>>> --- a/include/linux/huge_mm.h
>>>>>> +++ b/include/linux/huge_mm.h
>>>>>> @@ -269,9 +269,9 @@ enum mthp_stat_item {
>>>>>> MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
>>>>>> MTHP_STAT_SWPOUT,
>>>>>> MTHP_STAT_SWPOUT_FALLBACK,
>>>>>> - MTHP_STAT_FILE_ALLOC,
>>>>>> - MTHP_STAT_FILE_FALLBACK,
>>>>>> - MTHP_STAT_FILE_FALLBACK_CHARGE,
>>>>>> + MTHP_STAT_SHMEM_ALLOC,
>>>>>> + MTHP_STAT_SHMEM_FALLBACK,
>>>>>> + MTHP_STAT_SHMEM_FALLBACK_CHARGE,
>>>>>> MTHP_STAT_SPLIT,
>>>>>> MTHP_STAT_SPLIT_FAILED,
>>>>>> MTHP_STAT_SPLIT_DEFERRED,
>>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>>> index 9ec64aa2be94..f9696c94e211 100644
>>>>>> --- a/mm/huge_memory.c
>>>>>> +++ b/mm/huge_memory.c
>>>>>> @@ -568,9 +568,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
>>>>>> DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
>>>>>> DEFINE_MTHP_STAT_ATTR(swpout, MTHP_STAT_SWPOUT);
>>>>>> DEFINE_MTHP_STAT_ATTR(swpout_fallback, MTHP_STAT_SWPOUT_FALLBACK);
>>>>>> -DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
>>>>>> -DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
>>>>>> -DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
>>>>>> +DEFINE_MTHP_STAT_ATTR(shmem_alloc, MTHP_STAT_SHMEM_ALLOC);
>>>>>> +DEFINE_MTHP_STAT_ATTR(shmem_fallback, MTHP_STAT_SHMEM_FALLBACK);
>>>>>> +DEFINE_MTHP_STAT_ATTR(shmem_fallback_charge, MTHP_STAT_SHMEM_FALLBACK_CHARGE);
>>>>>> DEFINE_MTHP_STAT_ATTR(split, MTHP_STAT_SPLIT);
>>>>>> DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED);
>>>>>> DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED);
>>>>>> @@ -581,9 +581,9 @@ static struct attribute *stats_attrs[] = {
>>>>>> &anon_fault_fallback_charge_attr.attr,
>>>>>> &swpout_attr.attr,
>>>>>> &swpout_fallback_attr.attr,
>>>>>> - &file_alloc_attr.attr,
>>>>>> - &file_fallback_attr.attr,
>>>>>> - &file_fallback_charge_attr.attr,
>>>>>> + &shmem_alloc_attr.attr,
>>>>>> + &shmem_fallback_attr.attr,
>>>>>> + &shmem_fallback_charge_attr.attr,
>>>>>> &split_attr.attr,
>>>>>> &split_failed_attr.attr,
>>>>>> &split_deferred_attr.attr,
>>>>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>>>>> index 921d59c3d669..f24dfbd387ba 100644
>>>>>> --- a/mm/shmem.c
>>>>>> +++ b/mm/shmem.c
>>>>>> @@ -1777,7 +1777,7 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf,
>>>>>> if (pages == HPAGE_PMD_NR)
>>>>>> count_vm_event(THP_FILE_FALLBACK);
>>>>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>>>>> - count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK);
>>>>>> + count_mthp_stat(order, MTHP_STAT_SHMEM_FALLBACK);
>>>>>> #endif
>>>>>> order = next_order(&suitable_orders, order);
>>>>>> }
>>>>>> @@ -1804,8 +1804,8 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf,
>>>>>> count_vm_event(THP_FILE_FALLBACK_CHARGE);
>>>>>> }
>>>>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>>>>> - count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK);
>>>>>> - count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK_CHARGE);
>>>>>> + count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_FALLBACK);
>>>>>> + count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_FALLBACK_CHARGE);
>>>>>> #endif
>>>>>> }
>>>>>> goto unlock;
>>>>>> @@ -2181,7 +2181,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
>>>>>> if (folio_test_pmd_mappable(folio))
>>>>>> count_vm_event(THP_FILE_ALLOC);
>>>>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>>>>> - count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC);
>>>>>> + count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_ALLOC);
>>>>>> #endif
>>>>>> goto alloced;
>>>>>> }
>>>>>> --
>>>>>> 2.43.0
>>>>>>
>>>>>
>
> Thanks
> Barry
next prev parent reply other threads:[~2024-07-09 8:35 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-08 11:24 Ryan Roberts
2024-07-08 11:36 ` Barry Song
2024-07-08 12:29 ` Ryan Roberts
2024-07-08 20:50 ` David Hildenbrand
2024-07-09 1:21 ` Lance Yang
2024-07-09 7:47 ` Ryan Roberts
2024-07-09 7:54 ` David Hildenbrand
2024-07-09 7:59 ` Ryan Roberts
2024-07-09 1:07 ` Baolin Wang
2024-07-09 1:44 ` Barry Song
2024-07-09 7:55 ` Ryan Roberts
2024-07-09 8:13 ` Barry Song
2024-07-09 8:35 ` Ryan Roberts [this message]
2024-07-09 8:40 ` Barry Song
2024-07-09 1:26 ` Lance Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ca200353-da41-4f2e-b036-f22ab4bbf1b6@arm.com \
--to=ryan.roberts@arm.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=corbet@lwn.net \
--cc=da.gomez@samsung.com \
--cc=david@redhat.com \
--cc=hughd@google.com \
--cc=ioworker0@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox