linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hugh.dickins@tiscali.co.uk>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <ieidus@redhat.com>, Rik van Riel <riel@redhat.com>,
	Chris Wright <chrisw@redhat.com>,
	Nick Piggin <nickpiggin@yahoo.com.au>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Justin M. Forbes" <jmforbes@linuxtx.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 9/12] ksm: fix oom deadlock
Date: Tue, 25 Aug 2009 18:35:56 +0100 (BST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0908251738070.30372@sister.anvils> (raw)
In-Reply-To: <20090825145832.GP14722@random.random>

On Tue, 25 Aug 2009, Andrea Arcangeli wrote:
> On Mon, Aug 03, 2009 at 01:18:16PM +0100, Hugh Dickins wrote:
> > tables which have been freed for reuse; and even do_anonymous_page
> > and __do_fault need to check they're not being called by break_ksm
> > to reinstate a pte after zap_pte_range has zapped that page table.
> 
> This deadlocks exit_mmap in an infinite loop when there's some region
> locked. mlock calls gup and pretends to page fault successfully if
> there's a vma existing on the region, but it doesn't page fault
> anymore because of the mm_count being 0 already, so follow_page fails
> and gup retries the page fault forever.

That's right.  Justin alerted me to this issue last night, and at first
I was utterly mystified (and couldn't reproduce).  But a look at the
.jpg in the Fedora bugzilla, and another look at KSM 9/12, brought
me to the same conclusion that you've reached.

The _right_ solution (without even knowing of this problem) is
coincidentally being discussed currently in a different thread,
"make munlock fast when mlock is canceled by sigkill".  It's just
idiotic that munlock (in this case, munlocking pages on exit) should
be trying to fault in pages, and that causes its own problems when
mlock of a large area goes OOM and invokes the OOM killer on itself
(the munlock hangs trying to fault in what the mlock failed to do:
at this instant I forget whether that deadlocks the system, or
causes the wrong processes to be killed - I've several other OOM
fixes to make).

I have now made a patch with munlock_vma_pages_range() doing a
follow_page() loop instead of faulting in; but I've not yet tested
it properly, and it's rather mixed up with three other topics
(a coredump GUP flag to __get_user_pages to govern the ZERO_PAGE
shortcut, instead of confused guesses; reinstating do_anonymous
ZERO_PAGE; cleaning away unnecessary GUP flags).  It's something
that will need exposure in mmotm before going any further, whereas
this ksm_test_exit() issue needs a safe fix quicker than that.

I was pondering what to do when you wrote in.

> And generally I don't like to add those checks to page fault fast path.

I'd prefer not to have them too, but haven't yet worked out how to
get along safely without them.

> 
> Given we check mm_users == 0 (ksm_test_exit) after taking mmap_sem in
> unmerge_and_remove_all_rmap_items, why do we actually need to care
> that a page fault happens? We hold mmap_sem so we're guaranteed to see
> mm_users == 0 and we won't ever break COW on that mm with mm_users ==
> 0 so I think those troublesome checks from page fault can be simply
> removed.

break_ksm called from madvise(,,MADV_UNMERGEABLE) does have down_write
of mmap_sem.  break_ksm called from "echo 2 >/sys/kernel/mm/ksm/run"
has down_read of mmap_sem (taken in unmerge_and_remove_all_rmap_items).
break_ksm called from any of ksmd's break_cows has down_read of mmap_sem.

But the mmap_sem is not enough to exclude the mm exiting
(until __ksm_exit does its little down_write,up_write dance):
break_cow etc. do the ksm_test_exit check on mm_users before
proceeding any further, but that's just not enough to prevent
break_ksm's handle_pte_fault racing with exit_mmap - hence the
ksm_test_exits in mm/memory.c, to stop ptes being instantiated
after the final zap thinks it's wiped the pagetables.

Let's look at your actual patch...

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2009-08-25 22:04 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-03 12:08 [PATCH 0/12] ksm: stats, oom, doc, misc Hugh Dickins
2009-08-03 12:10 ` [PATCH 1/12] ksm: rename kernel_pages_allocated Hugh Dickins
2009-08-03 14:21   ` Izik Eidus
2009-08-03 16:48     ` Andrea Arcangeli
2009-08-03 12:11 ` [PATCH 2/12] ksm: move pages_sharing updates Hugh Dickins
2009-08-03 14:34   ` Izik Eidus
2009-08-03 16:53   ` Andrea Arcangeli
2009-08-03 17:34     ` Hugh Dickins
2009-08-03 12:11 ` [PATCH 3/12] ksm: pages_unshared and pages_volatile Hugh Dickins
2009-08-03 14:54   ` Izik Eidus
2009-08-04 21:49   ` Andrew Morton
2009-08-05 11:39     ` Hugh Dickins
2009-08-05 15:11       ` Andrea Arcangeli
2009-08-03 12:12 ` [PATCH 4/12] ksm: break cow once unshared Hugh Dickins
2009-08-03 16:00   ` Izik Eidus
2009-08-03 12:14 ` [PATCH 5/12] ksm: keep quiet while list empty Hugh Dickins
2009-08-03 16:55   ` Izik Eidus
2009-08-04 21:59   ` Andrew Morton
2009-08-05 11:54     ` Hugh Dickins
2009-08-03 12:15 ` [PATCH 6/12] ksm: five little cleanups Hugh Dickins
2009-08-04 12:41   ` Izik Eidus
2009-08-03 12:16 ` [PATCH 7/12] ksm: fix endless loop on oom Hugh Dickins
2009-08-04 12:55   ` Izik Eidus
2009-08-03 12:17 ` [PATCH 8/12] ksm: distribute remove_mm_from_lists Hugh Dickins
2009-08-04 13:03   ` Izik Eidus
2009-08-03 12:18 ` [PATCH 9/12] ksm: fix oom deadlock Hugh Dickins
2009-08-04 19:32   ` Izik Eidus
2009-08-25 14:58   ` Andrea Arcangeli
2009-08-25 15:22     ` [PATCH 13/12] ksm: fix munlock during exit_mmap deadlock Andrea Arcangeli
2009-08-25 17:49       ` Hugh Dickins
2009-08-25 18:10         ` Andrea Arcangeli
2009-08-25 18:58           ` Hugh Dickins
2009-08-25 19:45             ` Andrea Arcangeli
2009-08-26 16:18               ` Justin M. Forbes
2009-08-26 19:17               ` Hugh Dickins
2009-08-26 19:44                 ` Andrea Arcangeli
2009-08-26 19:57                   ` Hugh Dickins
2009-08-26 20:28                     ` Andrea Arcangeli
2009-08-26 20:54                     ` Izik Eidus
2009-08-26 21:14                       ` Andrea Arcangeli
2009-08-26 21:49                         ` Izik Eidus
2009-08-27 19:11                           ` Hugh Dickins
2009-08-27 19:35                             ` Izik Eidus
2009-08-26 22:00                         ` David Rientjes
2009-08-26 20:29                   ` Hugh Dickins
2009-08-25 17:35     ` Hugh Dickins [this message]
2009-08-25 17:47       ` [PATCH 9/12] ksm: fix oom deadlock Andrea Arcangeli
2009-08-03 12:19 ` [PATCH 10/12] ksm: sysfs and defaults Hugh Dickins
2009-08-04 19:34   ` Izik Eidus
2009-08-03 12:21 ` [PATCH 11/12] ksm: add some documentation Hugh Dickins
2009-08-04 19:35   ` Izik Eidus
2009-08-03 12:22 ` [PATCH 12/12] ksm: remove VM_MERGEABLE_FLAGS Hugh Dickins
2009-08-04 19:35   ` Izik Eidus

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0908251738070.30372@sister.anvils \
    --to=hugh.dickins@tiscali.co.uk \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=chrisw@redhat.com \
    --cc=ieidus@redhat.com \
    --cc=jmforbes@linuxtx.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox