linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Oleksandr Natalenko <oleksandr@redhat.com>
To: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: linux-kernel@vger.kernel.org, Vlastimil Babka <vbabka@suse.cz>,
	Michal Hocko <mhocko@suse.com>,
	Matthew Wilcox <willy@infradead.org>,
	Pavel Tatashin <pasha.tatashin@soleen.com>,
	Timofey Titovets <nefelim4ag@gmail.com>,
	Aaron Tomlin <atomlin@redhat.com>,
	linux-mm@kvack.org
Subject: Re: [PATCH RFC 0/4] mm/ksm: add option to automerge VMAs
Date: Mon, 13 May 2019 13:33:15 +0200	[thread overview]
Message-ID: <20190513113314.lddxv4kv5ajjldae@butterfly.localdomain> (raw)
In-Reply-To: <36a71f93-5a32-b154-b01d-2a420bca2679@virtuozzo.com>

Hi.

On Mon, May 13, 2019 at 01:38:43PM +0300, Kirill Tkhai wrote:
> On 10.05.2019 10:21, Oleksandr Natalenko wrote:
> > By default, KSM works only on memory that is marked by madvise(). And the
> > only way to get around that is to either:
> > 
> >   * use LD_PRELOAD; or
> >   * patch the kernel with something like UKSM or PKSM.
> >
> > Instead, lets implement a so-called "always" mode, which allows marking
> > VMAs as mergeable on do_anonymous_page() call automatically.
> >
> > The submission introduces a new sysctl knob as well as kernel cmdline option
> > to control which mode to use. The default mode is to maintain old
> > (madvise-based) behaviour.
> >
> > Due to security concerns, this submission also introduces VM_UNMERGEABLE
> > vmaflag for apps to explicitly opt out of automerging. Because of adding
> > a new vmaflag, the whole work is available for 64-bit architectures only.
> >> This patchset is based on earlier Timofey's submission [1], but it doesn't
> > use dedicated kthread to walk through the list of tasks/VMAs.
> > 
> > For my laptop it saves up to 300 MiB of RAM for usual workflow (browser,
> > terminal, player, chats etc). Timofey's submission also mentions
> > containerised workload that benefits from automerging too.
> 
> This all approach looks complicated for me, and I'm not sure the shown profit
> for desktop is big enough to introduce contradictory vma flags, boot option
> and advance page fault handler. Also, 32/64bit defines do not look good for
> me. I had tried something like this on my laptop some time ago, and
> the result was bad even in absolute (not in memory percentage) meaning.
> Isn't LD_PRELOAD trick enough to desktop? Your workload is same all the time,
> so you may statically insert correct preload to /etc/profile and replace
> your mmap forever.
>
> Speaking about containers, something like this may have a sense, I think.
> The probability of that several containers have the same pages are higher,
> than that desktop applications have the same pages; also LD_PRELOAD for
> containers is not applicable. 

Yes, I get your point. But the intention is to avoid another hacky trick
(LD_PRELOAD), thus *something* should *preferably* be done on the
kernel level instead.

> But 1)this could be made for trusted containers only (are there similar
> issues with KSM like with hardware side-channel attacks?!);

Regarding side-channel attacks, yes, I think so. Were those openssl guys
who complained about it?..

> 2) the most
> shared data for containers in my experience is file cache, which is not
> supported by KSM.
> 
> There are good results by the link [1], but it's difficult to analyze
> them without knowledge about what happens inside them there.
> 
> Some of tests have "VM" prefix. What the reason the hypervisor don't mark
> their VMAs as mergeable? Can't this be fixed in hypervisor? What is the
> generic reason that VMAs are not marked in all the tests?

Timofey, could you please address this?

Also, just for the sake of another piece of stats here:

$ echo "$(cat /sys/kernel/mm/ksm/pages_sharing) * 4 / 1024" | bc
526

> In case of there is a fundamental problem of calling madvise, can't we
> just implement an easier workaround like a new write-only file:
> 
> #echo $task > /sys/kernel/mm/ksm/force_madvise
> 
> which will mark all anon VMAs as mergeable for a passed task's mm?
> 
> A small userspace daemon may write mergeable tasks there from time to time.
> 
> Then we won't need to introduce additional vm flags and to change
> anon pagefault handler, and the changes will be small and only
> related to mm/ksm.c, and good enough for both 32 and 64 bit machines.

Yup, looks appealing. Two concerns, though:

1) we are falling back to scanning through the list of tasks (I guess
this is what we wanted to avoid, although this time it happens in the
userspace);

2) what kinds of opt-out we should maintain? Like, what if force_madvise
is called, but the task doesn't want some VMAs to be merged? This will
required new flag anyway, it seems. And should there be another
write-only file to unmerge everything forcibly for specific task?

Thanks.

P.S. Cc'ing Pavel properly this time.

-- 
  Best regards,
    Oleksandr Natalenko (post-factum)
    Senior Software Maintenance Engineer


  reply	other threads:[~2019-05-13 11:33 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-10  7:21 Oleksandr Natalenko
2019-05-10  7:21 ` [PATCH RFC 1/4] mm/ksm: introduce ksm_enter() helper Oleksandr Natalenko
2019-05-10  7:21 ` [PATCH RFC 2/4] mm/ksm: introduce VM_UNMERGEABLE Oleksandr Natalenko
2019-05-10  7:21 ` [PATCH RFC 3/4] mm/ksm: allow anonymous memory automerging Oleksandr Natalenko
2019-05-10  7:21 ` [PATCH RFC 4/4] mm/ksm: add automerging documentation Oleksandr Natalenko
2019-05-13 10:38 ` [PATCH RFC 0/4] mm/ksm: add option to automerge VMAs Kirill Tkhai
2019-05-13 11:33   ` Oleksandr Natalenko [this message]
2019-05-13 11:48     ` Timofey Titovets
2019-05-13 12:01       ` Oleksandr Natalenko
2019-05-13 12:06         ` Oleksandr Natalenko
2019-05-13 12:37     ` Kirill Tkhai
2019-05-14  6:30       ` Oleksandr Natalenko
2019-05-14  9:12         ` Kirill Tkhai
2019-05-14 13:33           ` Oleksandr Natalenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190513113314.lddxv4kv5ajjldae@butterfly.localdomain \
    --to=oleksandr@redhat.com \
    --cc=atomlin@redhat.com \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=nefelim4ag@gmail.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox