linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Oleksandr Natalenko <oleksandr@redhat.com>
To: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: linux-kernel@vger.kernel.org, Vlastimil Babka <vbabka@suse.cz>,
	Michal Hocko <mhocko@suse.com>,
	Matthew Wilcox <willy@infradead.org>,
	Pavel Tatashin <pasha.tatashin@soleen.com>,
	Timofey Titovets <nefelim4ag@gmail.com>,
	Aaron Tomlin <atomlin@redhat.com>,
	linux-mm@kvack.org
Subject: Re: [PATCH RFC 0/4] mm/ksm: add option to automerge VMAs
Date: Tue, 14 May 2019 08:30:43 +0200	[thread overview]
Message-ID: <20190514063043.ojhsb6d3ohxx4wur@butterfly.localdomain> (raw)
In-Reply-To: <a3870e32-3a27-e6df-fcb2-79080cdd167a@virtuozzo.com>

Hi.

On Mon, May 13, 2019 at 03:37:56PM +0300, Kirill Tkhai wrote:
> > Yes, I get your point. But the intention is to avoid another hacky trick
> > (LD_PRELOAD), thus *something* should *preferably* be done on the
> > kernel level instead.
> 
> I don't think so. Does userspace hack introduce some overhead? It does not
> look so. Why should we think about mergeable VMAs in page fault handler?!
> This is the last thing we want to think in page fault handler.
> 
> Also, there is difficult synchronization in page fault handlers, and it's
> easy to make a mistake. So, there is a mistake in [3/4], and you call
> ksm_enter() with mmap_sem read locked, while normal way is to call it
> with write lock (see madvise_need_mmap_write()).
> 
> So, let's don't touch this path. Small optimization for unlikely case will
> introduce problems in optimization for likely case in the future.

Yup, you're right, I've missed the fact that write lock is needed there.
Re-vamping locking there is not my intention, so lets find another
solution.

> > Also, just for the sake of another piece of stats here:
> > 
> > $ echo "$(cat /sys/kernel/mm/ksm/pages_sharing) * 4 / 1024" | bc
> > 526
> 
> This all requires attentive analysis. The number looks pretty big for me.
> What are the pages you get merged there? This may be just zero pages,
> you have identical.
> 
> E.g., your browser want to work fast. It introduces smart schemes,
> and preallocates many pages in background (mmap + write 1 byte to a page),
> so in further it save some time (no page fault + alloc), when page is
> really needed. But your change merges these pages and kills this
> optimization. Sounds not good, does this?
> 
> I think, we should not think we know and predict better than application
> writers, what they want from kernel. Let's people decide themselves
> in dependence of their workload. The only exception is some buggy
> or old applications, which impossible to change, so force madvise
> workaround may help. But only in case there are really such applications...
> 
> I'd researched what pages you have duplicated in these 526 MB. Maybe
> you find, no action is required or a report to userspace application
> to use madvise is needed.

OK, I agree, this is a good argument to move decision to userspace.

> > 2) what kinds of opt-out we should maintain? Like, what if force_madvise
> > is called, but the task doesn't want some VMAs to be merged? This will
> > required new flag anyway, it seems. And should there be another
> > write-only file to unmerge everything forcibly for specific task?
> 
> For example,
> 
> Merge:
> #echo $task > /sys/kernel/mm/ksm/force_madvise

Immediate question: what should be actually done on this? I see 2
options:

1) mark all VMAs as mergeable + set some flag for mmap() to mark all
further allocations as mergeable as well;
2) just mark all the VMAs as mergeable; userspace can call this
periodically to mark new VMAs.

My prediction is that 2) is less destructive, and the decision is
preserved predominantly to userspace, thus it would be a desired option.

> Unmerge:
> #echo -$task > /sys/kernel/mm/ksm/force_madvise

Okay.

> In case of task don't want to merge some VMA, we just should skip it at all.

This way we lose some flexibility, IMO, but I get you point.

Thanks.

-- 
  Best regards,
    Oleksandr Natalenko (post-factum)
    Senior Software Maintenance Engineer


  reply	other threads:[~2019-05-14  6:30 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-10  7:21 Oleksandr Natalenko
2019-05-10  7:21 ` [PATCH RFC 1/4] mm/ksm: introduce ksm_enter() helper Oleksandr Natalenko
2019-05-10  7:21 ` [PATCH RFC 2/4] mm/ksm: introduce VM_UNMERGEABLE Oleksandr Natalenko
2019-05-10  7:21 ` [PATCH RFC 3/4] mm/ksm: allow anonymous memory automerging Oleksandr Natalenko
2019-05-10  7:21 ` [PATCH RFC 4/4] mm/ksm: add automerging documentation Oleksandr Natalenko
2019-05-13 10:38 ` [PATCH RFC 0/4] mm/ksm: add option to automerge VMAs Kirill Tkhai
2019-05-13 11:33   ` Oleksandr Natalenko
2019-05-13 11:48     ` Timofey Titovets
2019-05-13 12:01       ` Oleksandr Natalenko
2019-05-13 12:06         ` Oleksandr Natalenko
2019-05-13 12:37     ` Kirill Tkhai
2019-05-14  6:30       ` Oleksandr Natalenko [this message]
2019-05-14  9:12         ` Kirill Tkhai
2019-05-14 13:33           ` Oleksandr Natalenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190514063043.ojhsb6d3ohxx4wur@butterfly.localdomain \
    --to=oleksandr@redhat.com \
    --cc=atomlin@redhat.com \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=nefelim4ag@gmail.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox