From: Baptiste Lepers <baptiste.lepers@gmail.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>,
mgorman@techsingularity.net, akpm@linux-foundation.org,
dhowells@redhat.com, linux-mm@kvack.org, hannes@cmpxchg.org
Subject: Re: Lock overhead in shrink_inactive_list / Slow page reclamation
Date: Mon, 14 Jan 2019 18:25:45 +1100 [thread overview]
Message-ID: <CABdVr8QT_FS+dFrhDjKu3hfP8TzFXS83DxhX=nTtuLNg3kVckg@mail.gmail.com> (raw)
In-Reply-To: <20190114070600.GC21345@dhcp22.suse.cz>
On Mon, Jan 14, 2019 at 6:06 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Mon 14-01-19 10:12:37, Baptiste Lepers wrote:
> > On Sat, Jan 12, 2019 at 4:53 AM Daniel Jordan
> > <daniel.m.jordan@oracle.com> wrote:
> > >
> > > On Fri, Jan 11, 2019 at 02:59:38PM +0100, Michal Hocko wrote:
> > > > On Fri 11-01-19 16:52:17, Baptiste Lepers wrote:
> > > > > Hello,
> > > > >
> > > > > We have a performance issue with the page cache. One of our workload
> > > > > spends more than 50% of it's time in the lru_locks called by
> > > > > shrink_inactive_list in mm/vmscan.c.
> > > >
> > > > Who does contend on the lock? Are there direct reclaimers or is it
> > > > solely kswapd with paths that are faulting the new page cache in?
> > >
> > > Yes, and could you please post your performance data showing the time in
> > > lru_lock? Whatever you have is fine, but using perf with -g would give
> > > callstacks and help answer Michal's question about who's contending.
> >
> > Thanks for the quick answer.
> >
> > The time spent in the lru_lock is mainly due to direct reclaimers
> > (reading an mmaped page that causes some readahead to happen). We have
> > tried to play with readahead values, but it doesn't change performance
> > a lot. We have disabled swap on the machine, so kwapd doesn't run.
>
> kswapd runs even without swap storage.
>
> > Our programs run in memory cgroups, but I don't think that the issue
> > directly comes from cgroups (I might be wrong though).
>
> Do you use hard/high limit on those cgroups. Because those would be a
> source of the reclaim.
>
> > Here is the callchain that I have using perf report --no-children;
> > (Paste here https://pastebin.com/151x4QhR )
> >
> > 44.30% swapper [kernel.vmlinux] [k] intel_idle
> > # The machine is idle mainly because it waits in that lru_locks,
> > which is the 2nd function in the report:
> > 10.98% testradix [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
> > |--10.33%--_raw_spin_lock_irq
> > | |
> > | --10.12%--shrink_inactive_list
> > | shrink_node_memcg
> > | shrink_node
> > | do_try_to_free_pages
> > | try_to_free_mem_cgroup_pages
> > | try_charge
> > | mem_cgroup_try_charge
>
> And here it shows this is indeed the case. You are hitting the hard
> limit and that causes direct reclaim to shrink the memcg.
>
> If you do not really need a strong isolation between cgroups then I
> would suggest to not set the hard limit and rely on the global memory
> reclaim to do the background reclaim which is less aggressive and more
> pro-active.
Thanks for the suggestion.
We actually need the hard limit in that case, but the problem occurs
even without cgroups (we mmap a 1TB file and we only have 64GB of
RAM). Basically the page cache fills up quickly and then reading the
mmaped file becomes "slow" (400-500MB/s instead of the initial
2.6GB/s). I'm just wondering if there is a way to make page
reclamation a bit faster, especially given that our workload is read
only.
shrink_inactive_list only seem to reclaim 32 pages with the default
setting and takes lru_lock twice to do that, so that's a lock of
locking per KB. Increasing the SWAP_CLUSTER_MAX value helped a bit,
but this is still quite slow.
And thanks for the precision on kwapd, I didn't know it was running
even without swap :)
Baptiste.
> --
> Michal Hocko
> SUSE Labs
next prev parent reply other threads:[~2019-01-14 7:25 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-11 5:52 Baptiste Lepers
2019-01-11 13:59 ` Michal Hocko
2019-01-11 17:53 ` Daniel Jordan
2019-01-13 23:12 ` Baptiste Lepers
2019-01-13 23:12 ` Baptiste Lepers
2019-01-14 7:06 ` Michal Hocko
2019-01-14 7:25 ` Baptiste Lepers [this message]
2019-01-14 7:25 ` Baptiste Lepers
2019-01-14 7:44 ` Michal Hocko
2019-01-14 15:22 ` Kirill Tkhai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CABdVr8QT_FS+dFrhDjKu3hfP8TzFXS83DxhX=nTtuLNg3kVckg@mail.gmail.com' \
--to=baptiste.lepers@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=daniel.m.jordan@oracle.com \
--cc=dhowells@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox