From: Yafang Shao <laoar.shao@gmail.com>
To: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
Linux MM <linux-mm@kvack.org>,
"open list:BLOCK LAYER" <linux-block@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2 0/2] psi: enhance psi with the help of ebpf
Date: Fri, 17 Jul 2020 09:43:48 +0800 [thread overview]
Message-ID: <CALOAHbBkQbw49T=22zdiK9BzEvy7fEmCmhhJh3mTkm3JvjsD_g@mail.gmail.com> (raw)
In-Reply-To: <CALvZod4D4H70XJYUcY=5NxHcRUff+17Qz2OegXVm8wnoZ1TfuA@mail.gmail.com>
On Fri, Jul 17, 2020 at 1:04 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Wed, Jul 15, 2020 at 8:19 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Thu, Jul 16, 2020 at 12:36 AM Shakeel Butt <shakeelb@google.com> wrote:
> > >
> > > Hi Yafang,
> > >
> > > On Tue, Mar 31, 2020 at 3:05 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > >
> > > > PSI gives us a powerful way to anaylze memory pressure issue, but we can
> > > > make it more powerful with the help of tracepoint, kprobe, ebpf and etc.
> > > > Especially with ebpf we can flexiblely get more details of the memory
> > > > pressure.
> > > >
> > > > In orderc to achieve this goal, a new parameter is added into
> > > > psi_memstall_{enter, leave}, which indicates the specific type of a
> > > > memstall. There're totally ten memstalls by now,
> > > > MEMSTALL_KSWAPD
> > > > MEMSTALL_RECLAIM_DIRECT
> > > > MEMSTALL_RECLAIM_MEMCG
> > > > MEMSTALL_RECLAIM_HIGH
> > > > MEMSTALL_KCOMPACTD
> > > > MEMSTALL_COMPACT
> > > > MEMSTALL_WORKINGSET_REFAULT
> > > > MEMSTALL_WORKINGSET_THRASH
> > > > MEMSTALL_MEMDELAY
> > > > MEMSTALL_SWAPIO
> > > > With the help of kprobe or tracepoint to trace this newly added agument we
> > > > can know which type of memstall it is and then do corresponding
> > > > improvement. I can also help us to analyze the latency spike caused by
> > > > memory pressure.
> > > >
> > > > But note that we can't use it to build memory pressure for a specific type
> > > > of memstall, e.g. memcg pressure, compaction pressure and etc, because it
> > > > doesn't implement various types of task->in_memstall, e.g.
> > > > task->in_memcgstall, task->in_compactionstall and etc.
> > > >
> > > > Although there're already some tracepoints can help us to achieve this
> > > > goal, e.g.
> > > > vmscan:mm_vmscan_kswapd_{wake, sleep}
> > > > vmscan:mm_vmscan_direct_reclaim_{begin, end}
> > > > vmscan:mm_vmscan_memcg_reclaim_{begin, end}
> > > > /* no tracepoint for memcg high reclaim*/
> > > > compcation:mm_compaction_kcompactd_{wake, sleep}
> > > > compcation:mm_compaction_begin_{begin, end}
> > > > /* no tracepoint for workingset refault */
> > > > /* no tracepoint for workingset thrashing */
> > > > /* no tracepoint for use memdelay */
> > > > /* no tracepoint for swapio */
> > > > but psi_memstall_{enter, leave} gives us a unified entrance for all
> > > > types of memstall and we don't need to add many begin and end tracepoints
> > > > that hasn't been implemented yet.
> > > >
> > > > Patch #2 gives us an example of how to use it with ebpf. With the help of
> > > > ebpf we can trace a specific task, application, container and etc. It also
> > > > can help us to analyze the spread of latencies and whether they were
> > > > clustered at a point of time or spread out over long periods of time.
> > > >
> > > > To summarize, with the pressure data in /proc/pressure/memroy we know that
> > > > the system is under memory pressure, and then with the newly added tracing
> > > > facility in this patchset we can get the reason of this memory pressure,
> > > > and then thinks about how to make the change.
> > > > The workflow can be illustrated as bellow.
> > > >
> > > > REASON ACTION
> > > > | compcation | improve compcation |
> > > > | vmscan | improve vmscan |
> > > > Memory pressure -| workingset | improve workingset |
> > > > | etc | ... |
> > > >
> > >
> > > I have not looked at the patch series in detail but I wanted to get
> > > your thoughts if it is possible to achieve what I am trying to do with
> > > this patch series.
> > >
> > > At the moment I am only interested in global reclaim and I wanted to
> > > enable alerts like "alert if there is process stuck in global reclaim
> > > for x seconds in last y seconds window" or "alert if all the processes
> > > are stuck in global reclaim for some z seconds".
> > >
> > > I see that using this series I can identify global reclaim but I am
> > > wondering if alert or notifications are possible. Android is using psi
> > > monitors for such alerts but it does not use cgroups, so, most of the
> > > memstalls are related to global reclaim stall. For cgroup environment,
> > > do we need for add support to psi monitor similar to this patch
> > > series?
> > >
> >
> > Hi Shakeel,
> >
> > We use the PSI tracepoints in our kernel to analyze the individual
> > latency caused by memory pressure, but the PSI tracepoints are
> > implemented with a new version as bellow:
> > trace_psi_memstall_enter(_RET_IP_);
> > trace_psi_memstall_leave(_RET_IP_);
> > And then using the _RET_IP_ to identify the specific PSI type.
> >
> > If the _RET_IP_ is at try_to_free_mem_cgroup_pages(), then it means
> > the pressure caused by the memory cgroup, IOW, the limit of memcg is
> > reached and it has to do memcg reclaim. Otherwise we can consider it
> > as global memory pressure.
> > try_to_free_mem_cgroup_pages
> > psi_memstall_enter
> > if (static_branch_likely(&psi_disabled))
> > return;
> > *flags = current->in_memstall;
> > if (*flags)
> > return;
> > trace_psi_memstall_enter(_RET_IP_); <<<<< memcg pressure
> >
>
> Thanks for the response. I am looking for 'always on' monitoring. More
> specifically defining the system level SLIs based on PSI. My concern
> with ftrace is its global shared state and also it is not really for
> 'always on' monitoring. You have mentioned ebpf. Is ebpf fine for
> 'always on' monitoring and is it possible to notify user space by ebpf
> on specific conditions (e.g. a process stuck in global reclaim for 60
> seconds)?
>
ebpf is fine for 'always on' monitoring from my experience, but I'm
not sure whether it is possible to notify user space on specific
conditions.
Notifying user space would be a useful feature, so I think we can have a try.
--
Thanks
Yafang
prev parent reply other threads:[~2020-07-17 1:44 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-31 10:04 Yafang Shao
2020-03-31 10:04 ` [PATCH v2 1/2] psi: introduce various types of memstall Yafang Shao
2020-03-31 10:04 ` [PATCH v2 2/2] psi, tracepoint: introduce tracepoints for psi_memstall_{enter, leave} Yafang Shao
2020-07-15 16:36 ` [PATCH v2 0/2] psi: enhance psi with the help of ebpf Shakeel Butt
2020-07-16 3:18 ` Yafang Shao
2020-07-16 17:04 ` Shakeel Butt
2020-07-17 1:43 ` Yafang Shao [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CALOAHbBkQbw49T=22zdiK9BzEvy7fEmCmhhJh3mTkm3JvjsD_g@mail.gmail.com' \
--to=laoar.shao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=peterz@infradead.org \
--cc=shakeelb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox