From: Hugh Dickins <hugh.dickins@tiscali.co.uk>
To: Izik Eidus <ieidus@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
Rik van Riel <riel@redhat.com>, Chris Wright <chrisw@redhat.com>,
Nick Piggin <nickpiggin@yahoo.com.au>,
Andrew Morton <akpm@linux-foundation.org>,
"Justin M. Forbes" <jmforbes@linuxtx.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 13/12] ksm: fix munlock during exit_mmap deadlock
Date: Thu, 27 Aug 2009 20:11:05 +0100 (BST) [thread overview]
Message-ID: <Pine.LNX.4.64.0908271958330.1973@sister.anvils> (raw)
In-Reply-To: <4A95AE06.305@redhat.com>
On Thu, 27 Aug 2009, Izik Eidus wrote:
> Andrea Arcangeli wrote:
> > On Wed, Aug 26, 2009 at 11:54:36PM +0300, Izik Eidus wrote:
> >
> > > But before getting into this, why is it so important to break the ksm
> > > pages when madvise(UNMERGEABLE) get called?
> > >
Good question.
> >
> > The moment ksm pages are swappable, there's no apparent reason why
> > anybody should ask the kernel to break any ksm page if the application
> > themselfs aren't writing to them in the first place (triggering
> > copy-on-write in app context which already handles TIF_MEMDIE just
> > fine).
> >
>
> I think I am the one that should be blamed for breaking the ksm pages when
> running unmeregable (If I remember right),
No, I think it was me to blame for that: looking back at the /dev/ksm KSM
and your draft of madvise KSM, I don't see breaking on unmerge in either.
> but I think Hugh had a good case why we want to keep it... ? (If I remember
> right again...)
There were several reasons for adding it, but all rather weak.
The "good case" you're half-remembering is probably that if we didn't
break ksm when doing madvise MADV_UNMERGEABLE, we'd lose track of any
KSM pages in the vma we're removing VM_MERGEABLE from (since we only
ever scan VM_MERGEABLEs), and so would be liable to build up more and
more unswappable KSM pages, well beyond the limit which can be imposed
by max_kernel_pages.
It is important to keep that accounting right, at least for so long as
they're unswappable; but it amounts to only a weak case, because we
could perfectly well have two vm_flags, one to say actively try to
merge here, and another to say there might still be KSM pages here.
(Perhaps with some restructuring that could instead be driven from
the stable tree end, rather than through the mm_slots and vmas.)
Reasons for avoiding two vm_flags: simplicity; minimizing KSM
footprint outside of ksm.c; and... VM_MERGEABLE has selfishly taken
bit 31 of vm_flags, the next vm_flag is going to involve some thought
on the best way to expand it (on 32-bit arches). It's not an atomic
field so it shouldn't be hard; we used to minimize the use of 64-bit
on 32-bit but maybe gcc's unsigned long long handling is pretty good
nowadays and it's no issue? Or maybe the issue was always avoiding
64-bit arithmetic, no issue with bitflags? I don't know myself.
Other reasons for breaking ksm when unmerging: my sense that we ought
to provide a way to undo whatever is done (useful when testing, but
does that make it worth the effort? particularly if much of the testing
goes into testing precisely that feature!); and a notion that people
worried about covert channels would want to be able to undo merging
absolutely (but would they have been using MADV_MERGEABLE in the
first place, even have KSM configured in and running? seems unlikely).
It may be that MADV_UNMERGEABLE isn't really needed (I think I even
admitted once that probably nobody would use it other than we testing
it). Yet I hesitate to rip it out: somehow it still seems right to
have it in there. Why did you have unregistering in the /dev/ksm KSM?
>
> > In oom deadlock terms madvise(UNMERGEABLE) is the only place that is
> > 100% fine at breaking KSM pages, because it runs with right tsk->mm
> > and page allocation will notice TIF_MEMDIE set on tsk.
> >
> > If we remove "echo 2" only remaining "unsafe" spot is the break_cow in
> > kksmd context when memcmp fails and similar during the scan.
> >
> >
> I didnt talk here about the bug..., I talked about the behavior...
> It is the feeling that the oom will kill applications calling into
> UNMERGEABLE, even thought this application shouldn't die, just because it had
> big amount of memory shared and it unmerged it in the wrong time?...
The OOM killer makes its choices based upon total_vm (and some other
things): doesn't even consider rss, and certainly not KSM sharing.
So whenever out-of-memory, the OOM killer might choose to kill a very
highly KSM-merged process, freeing very little memory, just because
it _appears_ big from the outside. There's nothing special to the
unmerging case, other than that being a good way to require lots of
page allocations in a single system call. I don't see anything
unfair about it being killed at the point of unmerging: much more
unfair that it be killed when highly merged.
>
> But probably this thoughts have no end, and we are better stick with something
> practical that can work clean and simple...
>
> So what I think is this:
> echo 2 is something we want in this version beacuse we dont support swapping
> of the shared pages, so we got to allow some how to break the pages...
>
> and echo 2 got to have UNMERGEABLE break the shared pages when its madvise get
> called...
>
> So maybe it is just better to leave it like that?
> > > When thinking about it, lets say I want to use ksm to scan 2 applications
> > > and merged their STATIC identical data, and then i want to stop scanning
> > > them after i know ksm merged the pages, as soon as i will try to
> > > unregister this 2 applications ksm will unmerge the pages, so we dont
> > > allow such thing for the user (we can tell him ofcurse for such case to
> > > use normal way of sharing, so this isnt a really strong case for this)
> > >
> >
> > For the app it will be tricky to know when the pages are merged
> > though, right now it could only wait a "while"... so I don't really
> > see madvise(UNMERGEABLE) as useful regardless how we implement
> > it... but then this goes beyond the scope of this bug because as said
> > madvise(UNMERGEABLE) is the only place that breaks ksm pages as safe
> > as regular write fault in oom context because of it running in the
> > process context (not echo 2 or kksmd context).
> >
> Yea, I agree about that this case was idiotic :), Actually I thought about
> case where application get little bit more info, but leave it, it is not worth
> it, traditional sharing is much better for such cases.
I didn't seem idiotic to me, but I hadn't realized the ksmd timelapse
uncertainty Andrea points out. Well, I'm not keen to change the way
it's working at present, but I do think you're right to question all
these aspects of unmerging.
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-08-27 19:11 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-03 12:08 [PATCH 0/12] ksm: stats, oom, doc, misc Hugh Dickins
2009-08-03 12:10 ` [PATCH 1/12] ksm: rename kernel_pages_allocated Hugh Dickins
2009-08-03 14:21 ` Izik Eidus
2009-08-03 16:48 ` Andrea Arcangeli
2009-08-03 12:11 ` [PATCH 2/12] ksm: move pages_sharing updates Hugh Dickins
2009-08-03 14:34 ` Izik Eidus
2009-08-03 16:53 ` Andrea Arcangeli
2009-08-03 17:34 ` Hugh Dickins
2009-08-03 12:11 ` [PATCH 3/12] ksm: pages_unshared and pages_volatile Hugh Dickins
2009-08-03 14:54 ` Izik Eidus
2009-08-04 21:49 ` Andrew Morton
2009-08-05 11:39 ` Hugh Dickins
2009-08-05 15:11 ` Andrea Arcangeli
2009-08-03 12:12 ` [PATCH 4/12] ksm: break cow once unshared Hugh Dickins
2009-08-03 16:00 ` Izik Eidus
2009-08-03 12:14 ` [PATCH 5/12] ksm: keep quiet while list empty Hugh Dickins
2009-08-03 16:55 ` Izik Eidus
2009-08-04 21:59 ` Andrew Morton
2009-08-05 11:54 ` Hugh Dickins
2009-08-03 12:15 ` [PATCH 6/12] ksm: five little cleanups Hugh Dickins
2009-08-04 12:41 ` Izik Eidus
2009-08-03 12:16 ` [PATCH 7/12] ksm: fix endless loop on oom Hugh Dickins
2009-08-04 12:55 ` Izik Eidus
2009-08-03 12:17 ` [PATCH 8/12] ksm: distribute remove_mm_from_lists Hugh Dickins
2009-08-04 13:03 ` Izik Eidus
2009-08-03 12:18 ` [PATCH 9/12] ksm: fix oom deadlock Hugh Dickins
2009-08-04 19:32 ` Izik Eidus
2009-08-25 14:58 ` Andrea Arcangeli
2009-08-25 15:22 ` [PATCH 13/12] ksm: fix munlock during exit_mmap deadlock Andrea Arcangeli
2009-08-25 17:49 ` Hugh Dickins
2009-08-25 18:10 ` Andrea Arcangeli
2009-08-25 18:58 ` Hugh Dickins
2009-08-25 19:45 ` Andrea Arcangeli
2009-08-26 16:18 ` Justin M. Forbes
2009-08-26 19:17 ` Hugh Dickins
2009-08-26 19:44 ` Andrea Arcangeli
2009-08-26 19:57 ` Hugh Dickins
2009-08-26 20:28 ` Andrea Arcangeli
2009-08-26 20:54 ` Izik Eidus
2009-08-26 21:14 ` Andrea Arcangeli
2009-08-26 21:49 ` Izik Eidus
2009-08-27 19:11 ` Hugh Dickins [this message]
2009-08-27 19:35 ` Izik Eidus
2009-08-26 22:00 ` David Rientjes
2009-08-26 20:29 ` Hugh Dickins
2009-08-25 17:35 ` [PATCH 9/12] ksm: fix oom deadlock Hugh Dickins
2009-08-25 17:47 ` Andrea Arcangeli
2009-08-03 12:19 ` [PATCH 10/12] ksm: sysfs and defaults Hugh Dickins
2009-08-04 19:34 ` Izik Eidus
2009-08-03 12:21 ` [PATCH 11/12] ksm: add some documentation Hugh Dickins
2009-08-04 19:35 ` Izik Eidus
2009-08-03 12:22 ` [PATCH 12/12] ksm: remove VM_MERGEABLE_FLAGS Hugh Dickins
2009-08-04 19:35 ` Izik Eidus
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0908271958330.1973@sister.anvils \
--to=hugh.dickins@tiscali.co.uk \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=chrisw@redhat.com \
--cc=ieidus@redhat.com \
--cc=jmforbes@linuxtx.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nickpiggin@yahoo.com.au \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox