From: Kirill Tkhai <ktkhai@virtuozzo.com>
To: Oleksandr Natalenko <oleksandr@redhat.com>
Cc: linux-kernel@vger.kernel.org, Vlastimil Babka <vbabka@suse.cz>,
Michal Hocko <mhocko@suse.com>,
Matthew Wilcox <willy@infradead.org>,
Pavel Tatashin <pasha.tatashin@soleen.com>,
Timofey Titovets <nefelim4ag@gmail.com>,
Aaron Tomlin <atomlin@redhat.com>,
linux-mm@kvack.org
Subject: Re: [PATCH RFC 0/4] mm/ksm: add option to automerge VMAs
Date: Tue, 14 May 2019 12:12:16 +0300 [thread overview]
Message-ID: <8f146863-5963-81b2-ed20-6428d1da353c@virtuozzo.com> (raw)
In-Reply-To: <20190514063043.ojhsb6d3ohxx4wur@butterfly.localdomain>
On 14.05.2019 09:30, Oleksandr Natalenko wrote:
> Hi.
>
> On Mon, May 13, 2019 at 03:37:56PM +0300, Kirill Tkhai wrote:
>>> Yes, I get your point. But the intention is to avoid another hacky trick
>>> (LD_PRELOAD), thus *something* should *preferably* be done on the
>>> kernel level instead.
>>
>> I don't think so. Does userspace hack introduce some overhead? It does not
>> look so. Why should we think about mergeable VMAs in page fault handler?!
>> This is the last thing we want to think in page fault handler.
>>
>> Also, there is difficult synchronization in page fault handlers, and it's
>> easy to make a mistake. So, there is a mistake in [3/4], and you call
>> ksm_enter() with mmap_sem read locked, while normal way is to call it
>> with write lock (see madvise_need_mmap_write()).
>>
>> So, let's don't touch this path. Small optimization for unlikely case will
>> introduce problems in optimization for likely case in the future.
>
> Yup, you're right, I've missed the fact that write lock is needed there.
> Re-vamping locking there is not my intention, so lets find another
> solution.
>
>>> Also, just for the sake of another piece of stats here:
>>>
>>> $ echo "$(cat /sys/kernel/mm/ksm/pages_sharing) * 4 / 1024" | bc
>>> 526
>>
>> This all requires attentive analysis. The number looks pretty big for me.
>> What are the pages you get merged there? This may be just zero pages,
>> you have identical.
>>
>> E.g., your browser want to work fast. It introduces smart schemes,
>> and preallocates many pages in background (mmap + write 1 byte to a page),
>> so in further it save some time (no page fault + alloc), when page is
>> really needed. But your change merges these pages and kills this
>> optimization. Sounds not good, does this?
>>
>> I think, we should not think we know and predict better than application
>> writers, what they want from kernel. Let's people decide themselves
>> in dependence of their workload. The only exception is some buggy
>> or old applications, which impossible to change, so force madvise
>> workaround may help. But only in case there are really such applications...
>>
>> I'd researched what pages you have duplicated in these 526 MB. Maybe
>> you find, no action is required or a report to userspace application
>> to use madvise is needed.
>
> OK, I agree, this is a good argument to move decision to userspace.
>
>>> 2) what kinds of opt-out we should maintain? Like, what if force_madvise
>>> is called, but the task doesn't want some VMAs to be merged? This will
>>> required new flag anyway, it seems. And should there be another
>>> write-only file to unmerge everything forcibly for specific task?
>>
>> For example,
>>
>> Merge:
>> #echo $task > /sys/kernel/mm/ksm/force_madvise
>
> Immediate question: what should be actually done on this? I see 2
> options:
>
> 1) mark all VMAs as mergeable + set some flag for mmap() to mark all
> further allocations as mergeable as well;
> 2) just mark all the VMAs as mergeable; userspace can call this
> periodically to mark new VMAs.
>
> My prediction is that 2) is less destructive, and the decision is
> preserved predominantly to userspace, thus it would be a desired option.
Let's see, how we use KSM now. It's good for virtual machines: people
install the same distribution in several VMs, and they have the same
packages and the same files. When you read a file inside VM, its pages
are file cache for the VM, but they are anonymous pages for host kernel.
Hypervisor marks VM memory as mergeable, and host KSM merges the same
anonymous pages together. Many of file cache inside VM is constant
content, so we have good KSM compression on such the file pages.
The result we have is explainable and expected.
But we don't know anything about pages, you have merged on your laptop.
We can't make any assumptions before analysis of applications, which
produce such the pages. Let's check what happens before we try to implement
some specific design (if we really need something to implement).
The rest is just technical details. We may implement everything we need
on top of this (even implement a polling of /proc/[pid]/maps and write
a task and address of vma to force_madvise or similar file).
Kirill
next prev parent reply other threads:[~2019-05-14 9:12 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-10 7:21 Oleksandr Natalenko
2019-05-10 7:21 ` [PATCH RFC 1/4] mm/ksm: introduce ksm_enter() helper Oleksandr Natalenko
2019-05-10 7:21 ` [PATCH RFC 2/4] mm/ksm: introduce VM_UNMERGEABLE Oleksandr Natalenko
2019-05-10 7:21 ` [PATCH RFC 3/4] mm/ksm: allow anonymous memory automerging Oleksandr Natalenko
2019-05-10 7:21 ` [PATCH RFC 4/4] mm/ksm: add automerging documentation Oleksandr Natalenko
2019-05-13 10:38 ` [PATCH RFC 0/4] mm/ksm: add option to automerge VMAs Kirill Tkhai
2019-05-13 11:33 ` Oleksandr Natalenko
2019-05-13 11:48 ` Timofey Titovets
2019-05-13 12:01 ` Oleksandr Natalenko
2019-05-13 12:06 ` Oleksandr Natalenko
2019-05-13 12:37 ` Kirill Tkhai
2019-05-14 6:30 ` Oleksandr Natalenko
2019-05-14 9:12 ` Kirill Tkhai [this message]
2019-05-14 13:33 ` Oleksandr Natalenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8f146863-5963-81b2-ed20-6428d1da353c@virtuozzo.com \
--to=ktkhai@virtuozzo.com \
--cc=atomlin@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=nefelim4ag@gmail.com \
--cc=oleksandr@redhat.com \
--cc=pasha.tatashin@soleen.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox