From: Gavin Guo <gavin.guo@canonical.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>,
Davidlohr Bueso <dave@stgolabs.net>,
linux-mm@kvack.org, Petr Holasek <pholasek@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Arjan van de Ven <arjan@linux.intel.com>
Subject: Re: [PATCH 1/1] ksm: introduce ksm_max_page_sharing per page deduplication limit
Date: Thu, 22 Sep 2016 18:48:49 +0800 [thread overview]
Message-ID: <CA+eFSM33iAS98t5QU_+iOGH7F2VvMErwRvuuHnQU2JowZ8cMHg@mail.gmail.com> (raw)
In-Reply-To: <20160921153421.GA4716@redhat.com>
Hi Andrea,
On Wed, Sep 21, 2016 at 11:34 PM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> Hello Gavin,
>
> On Wed, Sep 21, 2016 at 11:12:19PM +0800, Gavin Guo wrote:
>> Recently, a similar bug can also be observed under the numad process
>> with the v4.4 Ubuntu kernel or the latest upstream kernel. However, I
>> think the patch should be useful to mitigate the symptom. I tried to
>> search the mailing list and found the patch finally didn't be merged
>> into the upstream kernel. If there are any problems which drop the
>> patch?
>
> Zero known problems, in fact it's running in production in both RHEL7
> and RHEL6 for a while. The RHEL customers are not affected anymore for
> a while now.
>
> It's a critical computational complexity fix, if using KSM in
> enterprise production. Hugh already Acked it as well.
>
> It's included in -mm and Andrew submitted it once upstream, but it
> bounced probably perhaps it was not the right time in the merge window
> cycle.
>
> Or perhaps because it's complex but I wouldn't know how to simplify it
> but there's no bug at all in the code.
>
> I would suggest Andrew to send it once again when he feels it's a good
> time to do so.
>
>> The numad process tried to migrate a qemu process of 33GB memory.
>> Finally, it stuck in the csd_lock_wait function which causes the qemu
>> process hung and the virtual machine has high CPU usage and hung also.
>> With KSM disabled, the symptom disappeared.
>
> Until it's merged upstream you can cherrypick from my aa.git tree
> these three commits:
>
> https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/commit/?id=9384142e4ce830898abcefc4f0479c4533fa5bbc
> https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/commit/?id=4b293be7e20c8e8731a4fdc3c3bf6047304d0cc8
> https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/commit/?id=44c0d79c2c223c54ffe3fabc893963fc5963d611
>
> They're in -mm too.
>
>> What happens here is that do_migrate_pages (frame #10) acquires the
>> mmap_sem semaphore that everything else is waiting for (and that
>> eventually produce the hang warnings), and it holds that semaphore for
>> the duration of the page migration.
>>
>> crash> bt 2950
>> PID: 2950 TASK: ffff885f97745280 CPU: 49 COMMAND: "numad"
>> [exception RIP: smp_call_function_single+219]
>> RIP: ffffffff81103a0b RSP: ffff885f8fb4fb28 RFLAGS: 00000202
>> RAX: 0000000000000000 RBX: 0000000000000013 RCX: 0000000000000000
>> RDX: 0000000000000003 RSI: 0000000000000100 RDI: 0000000000000286
>> RBP: ffff885f8fb4fb70 R8: 0000000000000000 R9: 0000000000080000
>> R10: 0000000000000000 R11: ffff883faf917c88 R12: ffffffff810725f0
>> R13: 0000000000000013 R14: ffffffff810725f0 R15: ffff885f8fb4fbc8
>> CS: 0010 SS: 0018
>> #0 [ffff885f8fb4fb30] kvm_unmap_rmapp at ffffffffc01f1c3e [kvm]
>> #1 [ffff885f8fb4fb78] smp_call_function_many at ffffffff81103db3
>> #2 [ffff885f8fb4fbc0] native_flush_tlb_others at ffffffff8107279d
>> #3 [ffff885f8fb4fc08] flush_tlb_page at ffffffff81072a95
>> #4 [ffff885f8fb4fc30] ptep_clear_flush at ffffffff811d048e
>> #5 [ffff885f8fb4fc60] try_to_unmap_one at ffffffff811cb1c7
>> #6 [ffff885f8fb4fcd0] rmap_walk_ksm at ffffffff811e6f91
>> #7 [ffff885f8fb4fd28] rmap_walk at ffffffff811cc1bf
>> #8 [ffff885f8fb4fd80] try_to_unmap at ffffffff811cc46b
>> #9 [ffff885f8fb4fdc8] migrate_pages at ffffffff811f26d8
>> #10 [ffff885f8fb4fe80] do_migrate_pages at ffffffff811e15f7
>> #11 [ffff885f8fb4fef8] sys_migrate_pages at ffffffff811e187d
>> #12 [ffff885f8fb4ff50] entry_SYSCALL_64_fastpath at ffffffff818244f2
>>
>> After some investigations, I've tried to disassemble the coredump and
>> finally find the stable_node->hlist is as long as 2306920 entries.
>
> Yep, this is definitely getting fixed by the three commits above and
> the problem is in rmap_walk_ksm like you found above. With that
> applied you can't ever run into hangs anymore with KSM enabled, no
> matter the workload and the amount of memory in guest and host.
>
> numad isn't required to reproduce it, some swapping is enough.
>
> It limits the de-duplication factor to 256 times, like a x256 times
> compression, a x256 compression factor is clearly more than enough. So
> effectively the list you found that was too long, gets hard-limited to
> 256 entries with my patch applied. The limit is configurable at runtime:
>
> /* Maximum number of page slots sharing a stable node */
> static int ksm_max_page_sharing = 256;
>
> If you want to increase the limit (careful: that will increase
> the rmap_walk_ksm computation time) you can echo $newsharinglimit >
> /sys/kernel/mm/ksm/max_page_sharing.
>
> Hope this helps,
> Andrea
Thank you for the detail explanation. I've cherry-picked these patches
and now doing the verification. I'll get back to you if there is any
problem. Thanks!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-09-22 10:48 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-10 18:44 RFC [PATCH 0/1] " Andrea Arcangeli
2015-11-10 18:44 ` [PATCH 1/1] " Andrea Arcangeli
2015-12-09 16:19 ` Petr Holasek
2015-12-09 17:15 ` Andrea Arcangeli
2015-12-09 18:10 ` Andrea Arcangeli
2015-12-10 16:06 ` Petr Holasek
2015-12-11 0:31 ` Andrew Morton
2016-01-14 23:36 ` Hugh Dickins
2016-01-16 17:49 ` Andrea Arcangeli
2016-01-16 18:00 ` Arjan van de Ven
2016-01-18 8:14 ` Hugh Dickins
2016-01-18 14:43 ` Arjan van de Ven
2016-01-18 9:10 ` Hugh Dickins
2016-01-18 9:45 ` Hugh Dickins
2016-01-18 17:46 ` Andrea Arcangeli
2016-03-17 21:34 ` Hugh Dickins
2016-03-17 21:50 ` Andrew Morton
2016-03-18 16:27 ` Andrea Arcangeli
2016-01-18 11:01 ` Mel Gorman
2016-01-18 22:19 ` Andrea Arcangeli
2016-01-19 10:43 ` Mel Gorman
2016-04-06 20:33 ` Rik van Riel
2016-04-06 22:02 ` Andrea Arcangeli
2016-09-21 15:12 ` Gavin Guo
2016-09-21 15:34 ` Andrea Arcangeli
2016-09-22 10:48 ` Gavin Guo [this message]
2016-10-28 6:26 ` Gavin Guo
2016-10-28 18:31 ` Andrea Arcangeli
2017-04-20 3:14 ` Gavin Guo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CA+eFSM33iAS98t5QU_+iOGH7F2VvMErwRvuuHnQU2JowZ8cMHg@mail.gmail.com \
--to=gavin.guo@canonical.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=arjan@linux.intel.com \
--cc=dave@stgolabs.net \
--cc=hughd@google.com \
--cc=linux-mm@kvack.org \
--cc=pholasek@redhat.com \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox