* Re: [PATCH] mm: khugepaged: fix NR_FILE_PAGES accounting in collapse_file()
2026-01-29 18:40 [PATCH] mm: khugepaged: fix NR_FILE_PAGES accounting in collapse_file() Shakeel Butt
@ 2026-01-29 18:50 ` Andrew Morton
2026-01-29 18:55 ` Shakeel Butt
2026-01-29 22:47 ` Matthew Wilcox
` (2 subsequent siblings)
3 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2026-01-29 18:50 UTC (permalink / raw)
To: Shakeel Butt
Cc: Johannes Weiner, Rik van Riel, Song Liu, Kiryl Shutsemau,
Usama Arif, David Hildenbrand, Lorenzo Stoakes, Zi Yan,
Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Meta kernel team, linux-mm,
cgroups, linux-kernel
On Thu, 29 Jan 2026 10:40:54 -0800 Shakeel Butt <shakeel.butt@linux.dev> wrote:
> In META's fleet, we are seeing high level cgroups with zero file memcg
> stat but their descendants have non-zero file stat. This should not be
> possible. On further inspection by looking at kernel data structures
> though drgn, it was revealed that the high level cgroups have negative
> file stat which was aggregated from their children.
>
> Another interesting point was that this specific issue start happening
> more often as we started deploying thp-always more widely which
> indicates some correlation between file memory and THPs and indeed it
> was found that file memcg stat accounting is buggy in the collapse code
> path from the start.
So this has no known runtime effect apart from incorrect accounting?
> Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Should we cc:stable?
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] mm: khugepaged: fix NR_FILE_PAGES accounting in collapse_file()
2026-01-29 18:50 ` Andrew Morton
@ 2026-01-29 18:55 ` Shakeel Butt
0 siblings, 0 replies; 8+ messages in thread
From: Shakeel Butt @ 2026-01-29 18:55 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Rik van Riel, Song Liu, Kiryl Shutsemau,
Usama Arif, David Hildenbrand, Lorenzo Stoakes, Zi Yan,
Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Meta kernel team, linux-mm,
cgroups, linux-kernel
On Thu, Jan 29, 2026 at 10:50:15AM -0800, Andrew Morton wrote:
> On Thu, 29 Jan 2026 10:40:54 -0800 Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> > In META's fleet, we are seeing high level cgroups with zero file memcg
> > stat but their descendants have non-zero file stat. This should not be
> > possible. On further inspection by looking at kernel data structures
> > though drgn, it was revealed that the high level cgroups have negative
> > file stat which was aggregated from their children.
> >
> > Another interesting point was that this specific issue start happening
> > more often as we started deploying thp-always more widely which
> > indicates some correlation between file memory and THPs and indeed it
> > was found that file memcg stat accounting is buggy in the collapse code
> > path from the start.
>
> So this has no known runtime effect apart from incorrect accounting?
Yes just an accounting bug.
>
> > Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
>
> Should we cc:stable?
I think so.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: khugepaged: fix NR_FILE_PAGES accounting in collapse_file()
2026-01-29 18:40 [PATCH] mm: khugepaged: fix NR_FILE_PAGES accounting in collapse_file() Shakeel Butt
2026-01-29 18:50 ` Andrew Morton
@ 2026-01-29 22:47 ` Matthew Wilcox
2026-01-30 0:47 ` Shakeel Butt
2026-01-29 22:49 ` Usama Arif
2026-01-30 0:32 ` Johannes Weiner
3 siblings, 1 reply; 8+ messages in thread
From: Matthew Wilcox @ 2026-01-29 22:47 UTC (permalink / raw)
To: Shakeel Butt
Cc: Andrew Morton, Johannes Weiner, Rik van Riel, Song Liu,
Kiryl Shutsemau, Usama Arif, David Hildenbrand, Lorenzo Stoakes,
Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Meta kernel team, linux-mm,
cgroups, linux-kernel
On Thu, Jan 29, 2026 at 10:40:54AM -0800, Shakeel Butt wrote:
> Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Are you sure this is the right Fixes? 99cb0dbd47a1 wasn't cgroup
aware:
if (nr_none) {
struct zone *zone = page_zone(new_page);
__mod_node_page_state(zone->zone_pgdat, NR_FILE_PAGES, nr_none);
- __mod_node_page_state(zone->zone_pgdat, NR_SHMEM, nr_none);
+ if (is_shmem)
+ __mod_node_page_state(zone->zone_pgdat,
+ NR_SHMEM, nr_none);
}
b8eddff8886b added cgroup support:
if (is_shmem)
- __inc_node_page_state(new_page, NR_SHMEM_THPS);
+ __inc_lruvec_page_state(new_page, NR_SHMEM_THPS);
else {
- __inc_node_page_state(new_page, NR_FILE_THPS);
+ __inc_lruvec_page_state(new_page, NR_FILE_THPS);
filemap_nr_thps_inc(mapping);
}
which would seem like the right Fixes: to me?
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> ---
> mm/khugepaged.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 1d994b6c58c6..1cf8e154e214 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2200,8 +2200,8 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> else
> lruvec_stat_mod_folio(new_folio, NR_FILE_THPS, HPAGE_PMD_NR);
>
> + lruvec_stat_mod_folio(new_folio, NR_FILE_PAGES, HPAGE_PMD_NR);
> if (nr_none) {
> - lruvec_stat_mod_folio(new_folio, NR_FILE_PAGES, nr_none);
> /* nr_none is always 0 for non-shmem. */
> lruvec_stat_mod_folio(new_folio, NR_SHMEM, nr_none);
> }
> @@ -2238,6 +2238,8 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> */
> list_for_each_entry_safe(folio, tmp, &pagelist, lru) {
> list_del(&folio->lru);
> + lruvec_stat_mod_folio(folio, NR_FILE_PAGES,
> + -folio_nr_pages(folio));
> folio->mapping = NULL;
> folio_clear_active(folio);
> folio_clear_unevictable(folio);
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] mm: khugepaged: fix NR_FILE_PAGES accounting in collapse_file()
2026-01-29 22:47 ` Matthew Wilcox
@ 2026-01-30 0:47 ` Shakeel Butt
0 siblings, 0 replies; 8+ messages in thread
From: Shakeel Butt @ 2026-01-30 0:47 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Andrew Morton, Johannes Weiner, Rik van Riel, Song Liu,
Kiryl Shutsemau, Usama Arif, David Hildenbrand, Lorenzo Stoakes,
Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Meta kernel team, linux-mm,
cgroups, linux-kernel
On Thu, Jan 29, 2026 at 10:47:16PM +0000, Matthew Wilcox wrote:
> On Thu, Jan 29, 2026 at 10:40:54AM -0800, Shakeel Butt wrote:
> > Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
>
> Are you sure this is the right Fixes? 99cb0dbd47a1 wasn't cgroup
> aware:
>
> if (nr_none) {
> struct zone *zone = page_zone(new_page);
>
> __mod_node_page_state(zone->zone_pgdat, NR_FILE_PAGES, nr_none);
> - __mod_node_page_state(zone->zone_pgdat, NR_SHMEM, nr_none);
> + if (is_shmem)
> + __mod_node_page_state(zone->zone_pgdat,
> + NR_SHMEM, nr_none);
> }
>
> b8eddff8886b added cgroup support:
>
> if (is_shmem)
> - __inc_node_page_state(new_page, NR_SHMEM_THPS);
> + __inc_lruvec_page_state(new_page, NR_SHMEM_THPS);
> else {
> - __inc_node_page_state(new_page, NR_FILE_THPS);
> + __inc_lruvec_page_state(new_page, NR_FILE_THPS);
> filemap_nr_thps_inc(mapping);
> }
>
> which would seem like the right Fixes: to me?
>
b8eddff8886b is about different stats NR_SHMEM_THPS & NR_FILE_THPS.
However as Johannes responded this code was broken even before memcg
awareness for NR_FILE_PAGES and NR_SHMEM.
I will send v2 with correct handling of NR_SHMEM as well.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: khugepaged: fix NR_FILE_PAGES accounting in collapse_file()
2026-01-29 18:40 [PATCH] mm: khugepaged: fix NR_FILE_PAGES accounting in collapse_file() Shakeel Butt
2026-01-29 18:50 ` Andrew Morton
2026-01-29 22:47 ` Matthew Wilcox
@ 2026-01-29 22:49 ` Usama Arif
2026-01-30 0:32 ` Johannes Weiner
3 siblings, 0 replies; 8+ messages in thread
From: Usama Arif @ 2026-01-29 22:49 UTC (permalink / raw)
To: Shakeel Butt, Andrew Morton
Cc: Johannes Weiner, Rik van Riel, Song Liu, Kiryl Shutsemau,
David Hildenbrand, Lorenzo Stoakes, Zi Yan, Baolin Wang,
Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
Lance Yang, Meta kernel team, linux-mm, cgroups, linux-kernel
On 29/01/2026 18:40, Shakeel Butt wrote:
> In META's fleet, we are seeing high level cgroups with zero file memcg
> stat but their descendants have non-zero file stat. This should not be
> possible. On further inspection by looking at kernel data structures
> though drgn, it was revealed that the high level cgroups have negative
> file stat which was aggregated from their children.
>
> Another interesting point was that this specific issue start happening
> more often as we started deploying thp-always more widely which
> indicates some correlation between file memory and THPs and indeed it
> was found that file memcg stat accounting is buggy in the collapse code
> path from the start.
>
> When collapse_file() replaces small folios with a large THP, it fails to
> properly update the NR_FILE_PAGES memcg stat for both the old folios
> being freed and the new THP being added. It assumes the old and new
> folios belong to the same cgroup. However this assumption breaks in
> couple of scenarios:
>
> 1. Binary (executable) package downloader running in a different cgroup
> than the actual job executing the downloaded package.
>
> 2. File shared and mapped by processes running in different cgroups. One
> process read-in the file and the second process either through
> madvise(COLLAPSE) or khugepaged on behalf of second process
> collapsing the file.
>
> So, the current code has two bugs:
>
> 1. For non-shmem files, NR_FILE_PAGES is never incremented for the new
> THP because nr_none is always 0 for non-shmem, and the stat update is
> inside the "if (nr_none)" block.
>
> 2. When freeing old folios, NR_FILE_PAGES is never decremented because
> folio->mapping is set to NULL directly without calling
> filemap_unaccount_folio().
>
> These bugs cause incorrect per-memcg accounting when the process
> triggering the collapse (MADV_COLLAPSE or khugepaged) belongs to a
> different memcg than the process that originally faulted in the pages:
>
> - Process A (memcg X) reads file, creating 512 small page cache folios
> charged to memcg X (NR_FILE_PAGES += 512 for memcg X)
>
> - Process B (memcg Y) triggers collapse via MADV_COLLAPSE or khugepaged
> scans B's mm. The new THP is charged to memcg Y.
>
> - Old folios freed: NR_FILE_PAGES not decremented (bug)
> New THP added: NR_FILE_PAGES not incremented (bug)
>
> - Later, THP removed from page cache: NR_FILE_PAGES -= 512 for memcg Y
>
> Result: memcg X has +512 inflated pages, memcg Y has -512 (negative!)
>
> Fix this by:
> 1. Always incrementing NR_FILE_PAGES by HPAGE_PMD_NR for the new THP
> 2. Decrementing NR_FILE_PAGES for each old folio before clearing its
> mapping pointer
>
> For shmem with holes (nr_none > 0), the net change is still +nr_none
> since we decrement (HPAGE_PMD_NR - nr_none) old pages and increment
> HPAGE_PMD_NR new pages.
>
> Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev
Acked-by: Usama Arif <usamaarif642@gmail.com>
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] mm: khugepaged: fix NR_FILE_PAGES accounting in collapse_file()
2026-01-29 18:40 [PATCH] mm: khugepaged: fix NR_FILE_PAGES accounting in collapse_file() Shakeel Butt
` (2 preceding siblings ...)
2026-01-29 22:49 ` Usama Arif
@ 2026-01-30 0:32 ` Johannes Weiner
2026-01-30 0:50 ` Shakeel Butt
3 siblings, 1 reply; 8+ messages in thread
From: Johannes Weiner @ 2026-01-30 0:32 UTC (permalink / raw)
To: Shakeel Butt
Cc: Andrew Morton, Rik van Riel, Song Liu, Kiryl Shutsemau,
Usama Arif, David Hildenbrand, Lorenzo Stoakes, Zi Yan,
Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Meta kernel team, linux-mm,
cgroups, linux-kernel
On Thu, Jan 29, 2026 at 10:40:54AM -0800, Shakeel Butt wrote:
> @@ -2200,8 +2200,8 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> else
> lruvec_stat_mod_folio(new_folio, NR_FILE_THPS, HPAGE_PMD_NR);
>
> + lruvec_stat_mod_folio(new_folio, NR_FILE_PAGES, HPAGE_PMD_NR);
The memcg breakage is more visible, but I think this has been broken
for NUMA stats even longer. new_folio could also come from a different
node than the subpages, after all.
> if (nr_none) {
> - lruvec_stat_mod_folio(new_folio, NR_FILE_PAGES, nr_none);
> /* nr_none is always 0 for non-shmem. */
> lruvec_stat_mod_folio(new_folio, NR_SHMEM, nr_none);
So AFAICT NR_SHMEM needs the same treatment.
It looks like that's been broken since f3f0e1d2150b ("khugepaged: add
support of collapse for tmpfs/shmem pages").
Anon pages avoided this, because their accounting is done in rmap.c
when the pmd is mapped and the ptes zapped.
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] mm: khugepaged: fix NR_FILE_PAGES accounting in collapse_file()
2026-01-30 0:32 ` Johannes Weiner
@ 2026-01-30 0:50 ` Shakeel Butt
0 siblings, 0 replies; 8+ messages in thread
From: Shakeel Butt @ 2026-01-30 0:50 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Rik van Riel, Song Liu, Kiryl Shutsemau,
Usama Arif, David Hildenbrand, Lorenzo Stoakes, Zi Yan,
Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Meta kernel team, linux-mm,
cgroups, linux-kernel
On Thu, Jan 29, 2026 at 07:32:14PM -0500, Johannes Weiner wrote:
> On Thu, Jan 29, 2026 at 10:40:54AM -0800, Shakeel Butt wrote:
> > @@ -2200,8 +2200,8 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> > else
> > lruvec_stat_mod_folio(new_folio, NR_FILE_THPS, HPAGE_PMD_NR);
> >
> > + lruvec_stat_mod_folio(new_folio, NR_FILE_PAGES, HPAGE_PMD_NR);
>
> The memcg breakage is more visible, but I think this has been broken
> for NUMA stats even longer. new_folio could also come from a different
> node than the subpages, after all.
Indeed you are right, so I should blame Kiryl instead of Song :P
>
> > if (nr_none) {
> > - lruvec_stat_mod_folio(new_folio, NR_FILE_PAGES, nr_none);
> > /* nr_none is always 0 for non-shmem. */
> > lruvec_stat_mod_folio(new_folio, NR_SHMEM, nr_none);
>
> So AFAICT NR_SHMEM needs the same treatment.
>
> It looks like that's been broken since f3f0e1d2150b ("khugepaged: add
> support of collapse for tmpfs/shmem pages").
Thanks, I will send v2 with correct handling of NR_SHMEM as well.
^ permalink raw reply [flat|nested] 8+ messages in thread