linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Shakeel Butt <shakeelb@google.com>
To: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Linux MM <linux-mm@kvack.org>,
	 LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: memcontrol: account kernel stack per node
Date: Tue, 30 Jun 2020 07:34:13 -0700	[thread overview]
Message-ID: <CALvZod6BMBuJrw2SnaUR8gWUzoX1NYnTC0=XbMYM+B_F4O5sUQ@mail.gmail.com> (raw)
In-Reply-To: <20200630032353.GA26969@carbon.dhcp.thefacebook.com>

On Mon, Jun 29, 2020 at 8:24 PM Roman Gushchin <guro@fb.com> wrote:
>
> On Mon, Jun 29, 2020 at 05:44:13PM -0700, Shakeel Butt wrote:
> > Currently the kernel stack is being accounted per-zone. There is no need
> > to do that. In addition due to being per-zone, memcg has to keep a
> > separate MEMCG_KERNEL_STACK_KB. Make the stat per-node and deprecate
> > MEMCG_KERNEL_STACK_KB as memcg_stat_item is an extension of
> > node_stat_item.
> >
> > Signed-off-by: Shakeel Butt <shakeelb@google.com>
> > ---
> >  drivers/base/node.c        |  4 ++--
> >  fs/proc/meminfo.c          |  4 ++--
> >  include/linux/memcontrol.h |  2 --
> >  include/linux/mmzone.h     |  8 ++++----
> >  kernel/fork.c              | 29 ++++++++++-------------------
> >  kernel/scs.c               |  2 +-
> >  mm/memcontrol.c            |  2 +-
> >  mm/page_alloc.c            | 16 ++++++++--------
> >  mm/vmstat.c                |  8 ++++----
> >  9 files changed, 32 insertions(+), 43 deletions(-)
> >
> > diff --git a/drivers/base/node.c b/drivers/base/node.c
> > index 0cf13e31603c..508b80f6329b 100644
> > --- a/drivers/base/node.c
> > +++ b/drivers/base/node.c
> > @@ -440,9 +440,9 @@ static ssize_t node_read_meminfo(struct device *dev,
> >                      nid, K(node_page_state(pgdat, NR_FILE_MAPPED)),
> >                      nid, K(node_page_state(pgdat, NR_ANON_MAPPED)),
> >                      nid, K(i.sharedram),
> > -                    nid, sum_zone_node_page_state(nid, NR_KERNEL_STACK_KB),
> > +                    nid, node_page_state(pgdat, NR_KERNEL_STACK_KB),
> >  #ifdef CONFIG_SHADOW_CALL_STACK
> > -                    nid, sum_zone_node_page_state(nid, NR_KERNEL_SCS_KB),
> > +                    nid, node_page_state(pgdat, NR_KERNEL_SCS_KB),
> >  #endif
> >                      nid, K(sum_zone_node_page_state(nid, NR_PAGETABLE)),
> >                      nid, 0UL,
> > diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
> > index f262bff3ca31..887a5532e449 100644
> > --- a/fs/proc/meminfo.c
> > +++ b/fs/proc/meminfo.c
> > @@ -101,10 +101,10 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
> >       show_val_kb(m, "SReclaimable:   ", sreclaimable);
> >       show_val_kb(m, "SUnreclaim:     ", sunreclaim);
> >       seq_printf(m, "KernelStack:    %8lu kB\n",
> > -                global_zone_page_state(NR_KERNEL_STACK_KB));
> > +                global_node_page_state(NR_KERNEL_STACK_KB));
> >  #ifdef CONFIG_SHADOW_CALL_STACK
> >       seq_printf(m, "ShadowCallStack:%8lu kB\n",
> > -                global_zone_page_state(NR_KERNEL_SCS_KB));
> > +                global_node_page_state(NR_KERNEL_SCS_KB));
> >  #endif
> >       show_val_kb(m, "PageTables:     ",
> >                   global_zone_page_state(NR_PAGETABLE));
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index ba1e42715ecf..a3ddb236898e 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -33,8 +33,6 @@ enum memcg_stat_item {
> >       MEMCG_SWAP = NR_VM_NODE_STAT_ITEMS,
> >       MEMCG_SOCK,
> >       MEMCG_PERCPU_B,
> > -     /* XXX: why are these zone and not node counters? */
> > -     MEMCG_KERNEL_STACK_KB,
> >       MEMCG_NR_STAT,
> >  };
> >
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index 8e859444927a..b79f73ce8b57 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -153,10 +153,6 @@ enum zone_stat_item {
> >       NR_ZONE_WRITE_PENDING,  /* Count of dirty, writeback and unstable pages */
> >       NR_MLOCK,               /* mlock()ed pages found and moved off LRU */
> >       NR_PAGETABLE,           /* used for pagetables */
> > -     NR_KERNEL_STACK_KB,     /* measured in KiB */
> > -#if IS_ENABLED(CONFIG_SHADOW_CALL_STACK)
> > -     NR_KERNEL_SCS_KB,       /* measured in KiB */
> > -#endif
> >       /* Second 128 byte cacheline */
> >       NR_BOUNCE,
> >  #if IS_ENABLED(CONFIG_ZSMALLOC)
> > @@ -201,6 +197,10 @@ enum node_stat_item {
> >       NR_KERNEL_MISC_RECLAIMABLE,     /* reclaimable non-slab kernel pages */
> >       NR_FOLL_PIN_ACQUIRED,   /* via: pin_user_page(), gup flag: FOLL_PIN */
> >       NR_FOLL_PIN_RELEASED,   /* pages returned via unpin_user_page() */
> > +     NR_KERNEL_STACK_KB,     /* measured in KiB */
> > +#if IS_ENABLED(CONFIG_SHADOW_CALL_STACK)
> > +     NR_KERNEL_SCS_KB,       /* measured in KiB */
> > +#endif
> >       NR_VM_NODE_STAT_ITEMS
> >  };
> >
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 73fdfa9674b5..ee5393350ef7 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -278,7 +278,7 @@ static inline void free_thread_stack(struct task_struct *tsk)
> >
> >               for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++) {
> >                       mod_memcg_page_state(vm->pages[i],
> > -                                          MEMCG_KERNEL_STACK_KB,
> > +                                          NR_KERNEL_STACK_KB,
> >                                            -(int)(PAGE_SIZE / 1024));
>
> Hello, Shakeel!
>
> Thank you for the cleanup, it makes total sense to me.
>
> However I have some concerns: mod_memcg_page_state() does change only memcg's counters,
> but not lruvec counters. So to make it per-node per-memcg (aka  lruvec)
> we need to use mod_lruvec_state(), otherwise we won't have global per-node values.
>
> >
> >                       memcg_kmem_uncharge_page(vm->pages[i], 0);
> > @@ -381,32 +381,23 @@ static void account_kernel_stack(struct task_struct *tsk, int account)
> >  {
> >       void *stack = task_stack_page(tsk);
> >       struct vm_struct *vm = task_stack_vm_area(tsk);
> > +     struct page *page;
> >
> >       BUILD_BUG_ON(IS_ENABLED(CONFIG_VMAP_STACK) && PAGE_SIZE % 1024 != 0);
> >
> >       if (vm) {
> > -             int i;
> > -
> >               BUG_ON(vm->nr_pages != THREAD_SIZE / PAGE_SIZE);
> > +             page = vm->pages[0];
> >
> > -             for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++) {
> > -                     mod_zone_page_state(page_zone(vm->pages[i]),
> > -                                         NR_KERNEL_STACK_KB,
> > -                                         PAGE_SIZE / 1024 * account);
> > -             }
> >       } else {
> > -             /*
> > -              * All stack pages are in the same zone and belong to the
> > -              * same memcg.
> > -              */
> > -             struct page *first_page = virt_to_page(stack);
> > -
> > -             mod_zone_page_state(page_zone(first_page), NR_KERNEL_STACK_KB,
> > -                                 THREAD_SIZE / 1024 * account);
> > -
> > -             mod_memcg_obj_state(stack, MEMCG_KERNEL_STACK_KB,
> > +             page = virt_to_page(stack);
> > +             mod_memcg_obj_state(stack, NR_KERNEL_STACK_KB,
> >                                   account * (THREAD_SIZE / 1024));
> >       }
> > +
> > +     /* All stack pages are in the same node. */
> > +     mod_node_page_state(page_pgdat(page), NR_KERNEL_STACK_KB,
> > +                         THREAD_SIZE / 1024 * account);
> >  }
>
> And then we probably don't need a separate change for memcg- and per-node counters.
>

Yes, I thought about combining memcg and per-node counters but got
worried that the cached stacks for CONFIG_VMAP_STACK would not be
accounted for in the per-node global counters but I see that we
already don't account for them in both counters. I will further
simplify these. Thanks for the suggestion.

Shakeel


      reply	other threads:[~2020-06-30 14:34 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-30  0:44 Shakeel Butt
2020-06-30  3:23 ` Roman Gushchin
2020-06-30 14:34   ` Shakeel Butt [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALvZod6BMBuJrw2SnaUR8gWUzoX1NYnTC0=XbMYM+B_F4O5sUQ@mail.gmail.com' \
    --to=shakeelb@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox