From: Yang Shi <yang.shi@linux.alibaba.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
mhocko@kernel.org, vbabka@suse.cz, hannes@cmpxchg.org,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Kirill Tkhai <ktkhai@virtuozzo.com>
Subject: Re: [PATCH 1/2] mm: vmscan: skip KSM page in direct reclaim if priority is low
Date: Fri, 21 Dec 2018 21:36:13 -0800 [thread overview]
Message-ID: <7bf27972-b7b6-4404-d289-bdfd7e3a0c52@linux.alibaba.com> (raw)
In-Reply-To: <20181221140142.GA4322@redhat.com>
On 12/21/18 6:01 AM, Andrea Arcangeli wrote:
> Hello Yang,
>
> On Thu, Dec 20, 2018 at 10:33:26PM -0800, Yang Shi wrote:
>>
>> On 12/20/18 10:04 PM, Hugh Dickins wrote:
>>> On Thu, 20 Dec 2018, Andrew Morton wrote:
>>>> Is anyone interested in reviewing this? Seems somewhat serious.
>>>> Thanks.
>>> Somewhat serious, but no need to rush.
>>>
>>>> From: Yang Shi <yang.shi@linux.alibaba.com>
>>>> Subject: mm: vmscan: skip KSM page in direct reclaim if priority is low
>>>>
>>>> When running a stress test, we occasionally run into the below hang issue:
>>> Artificial load presumably.
>>>
>>>> INFO: task ksmd:205 blocked for more than 360 seconds.
>>>> Tainted: G E 4.9.128-001.ali3000_nightly_20180925_264.alios7.x86_64 #1
>>> 4.9-stable does not contain Andrea's 4.13 commit 2c653d0ee2ae
>>> ("ksm: introduce ksm_max_page_sharing per page deduplication limit").
>>>
>>> The patch below is more economical than Andrea's, but I don't think
>>> a second workaround should be added, unless Andrea's is shown to be
>>> insufficient, even with its ksm_max_page_sharing tuned down to suit.
>>>
>>> Yang, please try to reproduce on upstream, or backport Andrea's to
>>> 4.9-stable - thanks.
> I think it's reasonable to backport it and it should be an easy
> backport. Just make sure to backport
> b4fecc67cc569b14301f5a1111363d5818b8da5e too which was the only bug
> there was in the initial patch and it happened with
> "merge_across_nodes = 0" (not the default).
>
> We shipped it in production years ago and it was pretty urgent for
> those workloads that initially run into this issue.
Hi Andrea,
Thank you and Hugh for pointing out this commit. I will backport them to
our kernel. Not sure if 4.9-stable needs this or not.
>
>> I believe Andrea's commit could workaround this problem too by limiting
>> the number of sharing pages.
>>
>> However, IMHO, even though we just have a few hundred pages share one
>> KSM page, it still sounds not worth reclaiming it in direct reclaim in
>> low priority. According to Andrea's commit log, it still takes a few
> You've still to walk the entire chain for compaction and memory
> hotplug, otherwise the KSM page becomes practically
> unmovable. Allowing the rmap chain to grow to infinitely is still not
> ok.
Yes, definitely agree.
>
> If the page should be reclaimed or not in direct reclaim is already
> told by page_referenced(), the more mappings there are the more likely
> at least one was touched and has the young bit set in the pte.
>
>> msec to walk the rmap for 256 shared pages.
> Those ~2.5msec was in the context of page migration: in the previous
> sentence I specified it takes 10usec for the IPI and all other stuff
> page migration has to do (which also largely depends on multiple
> factors like the total number of CPUs).
>
> page_referenced() doesn't flush the TLB during the rmap walk when it
> clears the accessed bit, so it's orders of magnitude faster than the
> real page migration at walking the KSM rmap chain.
>
> If the page migration latency of 256 max mappings is a concern the max
> sharing can be configured at runtime or the default max sharing can be
> reduced to 10 to give a max latency of ~100usec and it would still
> give a fairly decent x10 compression ratio. That's a minor detail to
> change if that's a concern.
>
> The only difference compared to all other page types is KSM pages can
> occasionally merge very aggressively and the apps have no way to limit
> the merging or even avoid it. We simply can't ask the app to create
> fewer equal pages..
>
> This is why the max sharing has to be limited inside KSM, then we
> don't need anything special in the VM anymore to threat KSM pages.
>
> As opposed the max sharing of COW anon memory post fork is limited by
> the number of fork invocations, for MAP_SHARED the sharing is limited
> by the number of mmaps, those don't tend to escalate to the million or
> they would run into other limits first. It's reasonable to expect the
> developer to optimize the app to create fewer mmaps or to use thread
> instead of processes to reduce the VM overhead in general (which will
> improve the rmap walks too).
>
> Note the MAP_SHARED/PRIVATE/anon-COW sharing can exceed 256 mappings
> too, you've just to fork 257 times in a row or much more realistically
> mmap the same glibc library 257 times in a row, so if something KSM is
> now less of a concern for occasional page_referenced worst case
> latencies, than all the rest of the page types.
>
> KSM by enforcing the max sharing is now the most RMAP walk
> computational complexity friendly of all the page types out there. So
> there's no need to threat it specially at low priority reclaim scans.
Thanks a lot. The above is very informative and helpful. I agree KSM
page can't grow insanely and make it less concerned in reclaim path with
max sharing limit. I don't insist on keeping my patch although we still
can think of some artificial scenarios which may go insane. But, it
should be very unlikely in real world workload with a sane max sharing
page limit.
BTW, happy holiday guys.
Regards,
Yang
>
> Thanks,
> Andrea
prev parent reply other threads:[~2018-12-22 5:36 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-07 19:16 Yang Shi
2018-11-07 19:16 ` [PATCH 2/2] mm: ksm: do not block on page lock when searching stable tree Yang Shi
2018-11-23 7:03 ` Kirill Tkhai
2018-12-20 22:45 ` [PATCH 1/2] mm: vmscan: skip KSM page in direct reclaim if priority is low Andrew Morton
2018-12-21 6:04 ` Hugh Dickins
2018-12-21 6:04 ` Hugh Dickins
2018-12-21 6:33 ` Yang Shi
2018-12-21 14:01 ` Andrea Arcangeli
2018-12-22 5:36 ` Yang Shi [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7bf27972-b7b6-4404-d289-bdfd7e3a0c52@linux.alibaba.com \
--to=yang.shi@linux.alibaba.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=ktkhai@virtuozzo.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox