Re: [PATCH] Synchronize task mm counters on context switch

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Daniel Colascione <dancol@google.com>
To: Minchan Kim <minchan@kernel.org>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH] Synchronize task mm counters on context switch
Date: Wed, 21 Feb 2018 16:23:43 -0800	[thread overview]
Message-ID: <CAKOZuetc7DepPPO6DmMp9APNz5+8+KansNBr_ijuuyCTu=v1mg@mail.gmail.com> (raw)
In-Reply-To: <20180222001635.GB27147@rodete-desktop-imager.corp.google.com>

[-- Attachment #1: Type: text/plain, Size: 3085 bytes --]

Thanks for taking a look.

On Wed, Feb 21, 2018 at 4:16 PM, Minchan Kim <minchan@kernel.org> wrote:

> Hi Daniel,
>
> On Wed, Feb 21, 2018 at 11:05:04AM -0800, Daniel Colascione wrote:
> > On Mon, Feb 5, 2018 at 2:03 PM, Daniel Colascione <dancol@google.com>
> wrote:
> >
> > > When SPLIT_RSS_COUNTING is in use (which it is on SMP systems,
> > > generally speaking), we buffer certain changes to mm-wide counters
> > > through counters local to the current struct task, flushing them to
> > > the mm after seeing 64 page faults, as well as on task exit and
> > > exec. This scheme can leave a large amount of memory unaccounted-for
> > > in process memory counters, especially for processes with many threads
> > > (each of which gets 64 "free" faults), and it produces an
> > > inconsistency with the same memory counters scanned VMA-by-VMA using
> > > smaps. This inconsistency can persist for an arbitrarily long time,
> > > since there is no way to force a task to flush its counters to its mm.
>
> Nice catch. Incosistency is bad but we usually have done it for
> performance.
> So, FWIW, it would be much better to describe what you are suffering from
> for matainter to take it.
>

The problem is that the per-process counters in /proc/pid/status lag behind
the actual memory allocations, leading to an inaccurate view of overall
memory consumed by each process.


> > > This patch flushes counters on context switch. This way, we bound the
> > > amount of unaccounted memory without forcing tasks to flush to the
> > > mm-wide counters on each minor page fault. The flush operation should
> > > be cheap: we only have a few counters, adjacent in struct task, and we
> > > don't atomically write to the mm counters unless we've changed
> > > something since the last flush.
> > >
> > > Signed-off-by: Daniel Colascione <dancol@google.com>
> > > ---
> > >  kernel/sched/core.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > index a7bf32aabfda..7f197a7698ee 100644
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -3429,6 +3429,9 @@ asmlinkage __visible void __sched schedule(void)
> > >         struct task_struct *tsk = current;
> > >
> > >         sched_submit_work(tsk);
> > > +       if (tsk->mm)
> > > +               sync_mm_rss(tsk->mm);
> > > +
> > >         do {
> > >                 preempt_disable();
> > >                 __schedule(false);
> > >
> >
> >
> > Ping? Is this approach just a bad idea? We could instead just manually
> sync
> > all mm-attached tasks at counter-retrieval time.
>
> IMHO, yes, it should be done when user want to see which would be really
> cold path while this shecule function is hot.
>

The problem with doing it that way is that we need to look at each task
attached to a particular mm. AFAIK (and please tell me if I'm wrong), the
only way to do that is to iterate over all processes, and for each process
attached to the mm we want, iterate over all its tasks (since each one has
to have the same mm, I think). Does that sound right?

[-- Attachment #2: Type: text/html, Size: 4248 bytes --]

next prev parent reply	other threads:[~2018-02-22  0:23 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-05 22:03 Daniel Colascione
2018-02-21 19:05 ` Daniel Colascione
2018-02-22  0:16   ` Minchan Kim
2018-02-22  0:23     ` Daniel Colascione [this message]
2018-02-22  2:06       ` Minchan Kim
2018-02-22  2:46         ` [PATCH] Synchronize task mm counters on demand Daniel Colascione
2018-02-23  2:01           ` Minchan Kim
2018-02-23  2:09             ` Daniel Colascione
2018-02-23  2:24               ` Daniel Colascione
2018-02-23  2:28               ` Minchan Kim
2018-02-23  2:43                 ` Daniel Colascione
2018-02-23  3:12                   ` Minchan Kim
2018-02-23  9:50           ` f66871fb4c: WARNING:inconsistent_lock_state kernel test robot
2018-02-22  2:49         ` [PATCH] Synchronize task mm counters on context switch Daniel Colascione
2018-02-23  8:11           ` Michal Hocko
2018-02-23 16:34             ` Daniel Colascione
2018-02-23 17:50               ` Michal Hocko
2018-02-23 18:47                 ` Daniel Colascione
2018-02-27 10:02                   ` Michal Hocko
2018-02-22  9:40         ` Peter Zijlstra
2018-02-22 13:06           ` Minchan Kim
2018-02-22 16:23           ` Daniel Colascione

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKOZuetc7DepPPO6DmMp9APNz5+8+KansNBr_ijuuyCTu=v1mg@mail.gmail.com' \
    --to=dancol@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox