From: Yang Shi <yang.shi@linux.alibaba.com>
To: mhocko@kernel.org, willy@infradead.org,
ldufour@linux.vnet.ibm.com, akpm@linux-foundation.org,
peterz@infradead.org, mingo@redhat.com, acme@kernel.org,
alexander.shishkin@linux.intel.com, jolsa@redhat.com,
namhyung@kernel.org
Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: [RFC v2 0/2] mm: zap pages with read mmap_sem in munmap for large mapping
Date: Tue, 19 Jun 2018 07:34:14 +0800 [thread overview]
Message-ID: <1529364856-49589-1-git-send-email-yang.shi@linux.alibaba.com> (raw)
Background:
Recently, when we ran some vm scalability tests on machines with large memory,
we ran into a couple of mmap_sem scalability issues when unmapping large memory
space, please refer to https://lkml.org/lkml/2017/12/14/733 and
https://lkml.org/lkml/2018/2/20/576.
History:
Then akpm suggested to unmap large mapping section by section and drop mmap_sem
at a time to mitigate it (see https://lkml.org/lkml/2018/3/6/784).
V1 patch series was submitted to the mailing list per Andrewa??s suggestion
(see https://lkml.org/lkml/2018/3/20/786). Then I received a lot great feedback
and suggestions.
Then this topic was discussed on LSFMM summit 2018. In the summit, Michal Hock
suggested (also in the v1 patches review) to try "two phases" approach. Zapping
pages with read mmap_sem, then doing via cleanup with write mmap_sem (for
discussion detail, see https://lwn.net/Articles/753269/)
So, I came up with the V2 patch series per this suggestion. Here I don't call
madvise(MADV_DONTNEED) directly since it is a little different from what munmap
does, so I use unmap_region() as what do_munmap() does.
The patches may need more cleanup and refactor, but it sounds better to let the
community start review the patches early to make sure I'm on the right track.
Regression and performance data:
Test is run on a machine with 32 cores of E5-2680 @ 2.70GHz and 384GB memory
Regression test with full LTP and trinity (munmap) with setting thresh to 4K in
the code (just for regression test only) so that the new code can be covered
better and trinity (munmap) test manipulates 4K mapping.
No regression issue is reported and the system survives under trinity (munmap)
test for 4 hours until I abort the test.
Throughput of page faults (#/s) with the below stress-ng test:
stress-ng --mmap 0 --mmap-bytes 80G --mmap-file --metrics --perf
--timeout 600s
pristine patched delta
89.41K/sec 97.29K/sec +8.8%
The number looks a little bit better than v1.
Yang Shi (2):
uprobes: make vma_has_uprobes non-static
mm: mmap: zap pages with read mmap_sem for large mapping
include/linux/uprobes.h | 7 ++++
kernel/events/uprobes.c | 2 +-
mm/mmap.c | 148 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 155 insertions(+), 2 deletions(-)
next reply other threads:[~2018-06-18 23:34 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-18 23:34 Yang Shi [this message]
2018-06-18 23:34 ` [RFC v2 PATCH 1/2] uprobes: make vma_has_uprobes non-static Yang Shi
2018-06-18 23:34 ` [RFC v2 PATCH 2/2] mm: mmap: zap pages with read mmap_sem for large mapping Yang Shi
2018-06-19 10:02 ` Peter Zijlstra
2018-06-19 21:13 ` Yang Shi
2018-06-20 7:17 ` Michal Hocko
2018-06-20 16:23 ` Yang Shi
2018-06-19 22:17 ` Nadav Amit
2018-06-19 23:08 ` Yang Shi
2018-06-20 0:31 ` Nadav Amit
2018-06-20 7:18 ` Michal Hocko
2018-06-20 17:12 ` Nadav Amit
2018-06-20 18:42 ` Yang Shi
2018-06-23 1:01 ` Yang Shi
2018-06-25 9:14 ` Michal Hocko
2018-06-26 0:06 ` Yang Shi
2018-06-26 7:43 ` Peter Zijlstra
2018-06-27 1:03 ` Yang Shi
2018-06-27 7:24 ` Michal Hocko
2018-06-27 17:23 ` Yang Shi
2018-06-28 11:51 ` Michal Hocko
2018-06-28 19:10 ` Yang Shi
2018-06-29 0:59 ` Yang Shi
2018-06-29 11:39 ` Michal Hocko
2018-06-29 16:50 ` Yang Shi
2018-06-29 11:34 ` Michal Hocko
2018-06-29 16:45 ` Yang Shi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1529364856-49589-1-git-send-email-yang.shi@linux.alibaba.com \
--to=yang.shi@linux.alibaba.com \
--cc=acme@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=jolsa@redhat.com \
--cc=ldufour@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox