linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Barry Song <21cnbao@gmail.com>
To: zhongjinji <zhongjinji@honor.com>
Cc: zhanghongru06@gmail.com, Liam.Howlett@oracle.com,
	 akpm@linux-foundation.org, axelrasmussen@google.com,
	david@kernel.org,  hannes@cmpxchg.org, jackmanb@google.com,
	linux-kernel@vger.kernel.org,  linux-mm@kvack.org,
	lorenzo.stoakes@oracle.com, mhocko@suse.com,  rppt@kernel.org,
	surenb@google.com, vbabka@suse.cz, weixugc@google.com,
	 yuanchu@google.com, zhanghongru@xiaomi.com, ziy@nvidia.com
Subject: Re: [PATCH 2/3] mm/vmstat: get fragmentation statistics from per-migragetype count
Date: Sat, 29 Nov 2025 15:55:19 +0800	[thread overview]
Message-ID: <CAGsJ_4wUQdQyB_3y0Buf3uG34hvgpMAP3qHHwJM3=R01RJOuvw@mail.gmail.com> (raw)
In-Reply-To: <CAGsJ_4xtc7ipFKYNQkGa-dSn7C8S7-J8LURqYrehfgenfPT=+w@mail.gmail.com>

On Sat, Nov 29, 2025 at 8:00 AM Barry Song <21cnbao@gmail.com> wrote:
>
> > >       if (order >= pageblock_order && !is_migrate_isolate(migratetype))
> > >               __mod_zone_page_state(zone, NR_FREE_PAGES_BLOCKS, -nr_pages);
> > > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > > index bb09c032eecf..9334bbbe1e16 100644
> > > --- a/mm/vmstat.c
> > > +++ b/mm/vmstat.c
> > > @@ -1590,32 +1590,16 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
> > >                                       zone->name,
> > >                                       migratetype_names[mtype]);
> > >               for (order = 0; order < NR_PAGE_ORDERS; ++order) {
> > > -                     unsigned long freecount = 0;
> > > -                     struct free_area *area;
> > > -                     struct list_head *curr;
> > > +                     unsigned long freecount;
> > >                       bool overflow = false;
> > >
> > > -                     area = &(zone->free_area[order]);
> > > -
> > > -                     list_for_each(curr, &area->free_list[mtype]) {
> > > -                             /*
> > > -                              * Cap the free_list iteration because it might
> > > -                              * be really large and we are under a spinlock
> > > -                              * so a long time spent here could trigger a
> > > -                              * hard lockup detector. Anyway this is a
> > > -                              * debugging tool so knowing there is a handful
> > > -                              * of pages of this order should be more than
> > > -                              * sufficient.
> > > -                              */
> > > -                             if (++freecount >= 100000) {
> > > -                                     overflow = true;
> > > -                                     break;
> > > -                             }
> > > +                     /* Keep the same output format for user-space tools compatibility */
> > > +                     freecount = READ_ONCE(zone->free_area[order].mt_nr_free[mtype]);
> >
> > I think it might be better for using an array of size NR_PAGE_ORDERS to store
> > the free count for each order. Like the code below.
>
> Right. If we want the freecount to accurately reflect the current system
> state, we still need to take the zone lock.
>
> Multiple independent WRITE_ONCE and READ_ONCE operations do not guarantee
> correctness. They may ensure single-copy atomicity per access, but not for the
> overall result.

On second thought, the original code releases and re-acquires the spinlock
for each order, so cross-variable consistency may not be a real issue.
Adding data_race() to silence KCSAN warnings should be sufficient?
I mean something like the following.

@@ -843,8 +842,8 @@ static inline void move_to_free_list(struct page
*page, struct zone *zone,
                     get_pageblock_migratetype(page), old_mt, nr_pages);

        list_move_tail(&page->buddy_list, &area->free_list[new_mt]);
-       WRITE_ONCE(area->mt_nr_free[old_mt], area->mt_nr_free[old_mt] - 1);
-       WRITE_ONCE(area->mt_nr_free[new_mt], area->mt_nr_free[new_mt] + 1);
+       area->mt_nr_free[old_mt]--;
+       area->mt_nr_free[new_mt]++;

        account_freepages(zone, -nr_pages, old_mt);
        account_freepages(zone, nr_pages, new_mt);
@@ -875,8 +874,7 @@ static inline void
__del_page_from_free_list(struct page *page, struct zone *zon
        __ClearPageBuddy(page);
        set_page_private(page, 0);
        area->nr_free--;
-       WRITE_ONCE(area->mt_nr_free[migratetype],
-               area->mt_nr_free[migratetype] - 1);
+       area->mt_nr_free[migratetype]--;

        if (order >= pageblock_order && !is_migrate_isolate(migratetype))
                __mod_zone_page_state(zone, NR_FREE_PAGES_BLOCKS, -nr_pages);
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 7e1e931eb209..d74004eb8c4d 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1599,7 +1599,7 @@ static void pagetypeinfo_showfree_print(struct
seq_file *m,
                        bool overflow = false;

                        /* Keep the same output format for user-space
tools compatibility */
-                       freecount =
READ_ONCE(zone->free_area[order].mt_nr_free[mtype]);
+                       freecount =
data_race(zone->free_area[order].mt_nr_free[mtype]);
                        if (freecount >= 100000) {
                                overflow = true;
                                freecount = 100000;

Thanks
Barry


  reply	other threads:[~2025-11-29  7:55 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-28  3:10 [PATCH 0/3] mm: add per-migratetype counts to buddy allocator and optimize pagetypeinfo access Hongru Zhang
2025-11-28  3:11 ` [PATCH 1/3] mm/page_alloc: add per-migratetype counts to buddy allocator Hongru Zhang
2025-11-29  0:34   ` Barry Song
2025-11-28  3:12 ` [PATCH 2/3] mm/vmstat: get fragmentation statistics from per-migragetype count Hongru Zhang
2025-11-28 12:03   ` zhongjinji
2025-11-29  0:00     ` Barry Song
2025-11-29  7:55       ` Barry Song [this message]
2025-12-01 12:29       ` Hongru Zhang
2025-12-01 18:54         ` Barry Song
2025-11-28  3:12 ` [PATCH 3/3] mm: optimize free_area_empty() check using per-migratetype counts Hongru Zhang
2025-11-29  0:04   ` Barry Song
2025-11-29  9:24     ` Barry Song
2025-11-28  7:49 ` [PATCH 0/3] mm: add per-migratetype counts to buddy allocator and optimize pagetypeinfo access Lorenzo Stoakes
2025-11-28  8:34   ` Hongru Zhang
2025-11-28  8:40     ` Lorenzo Stoakes
2025-11-28  9:24 ` Vlastimil Babka
2025-11-28 13:08   ` Johannes Weiner
2025-12-01  2:36   ` Hongru Zhang
2025-12-01 17:01     ` Zi Yan
2025-12-02  2:42       ` Hongru Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGsJ_4wUQdQyB_3y0Buf3uG34hvgpMAP3qHHwJM3=R01RJOuvw@mail.gmail.com' \
    --to=21cnbao@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=weixugc@google.com \
    --cc=yuanchu@google.com \
    --cc=zhanghongru06@gmail.com \
    --cc=zhanghongru@xiaomi.com \
    --cc=zhongjinji@honor.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox