linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Rientjes <rientjes@google.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: mm: fix BUG in __split_huge_page_pmd
Date: Tue, 15 Oct 2013 10:53:10 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LNX.2.00.1310151029040.12481@eggly.anvils> (raw)
In-Reply-To: <20131015144827.C45DDE0090@blue.fi.intel.com>

On Tue, 15 Oct 2013, Kirill A. Shutemov wrote:
> Andrea Arcangeli wrote:
> > Hi Hugh,
> > 
> > On Tue, Oct 15, 2013 at 04:08:28AM -0700, Hugh Dickins wrote:
> > > Occasionally we hit the BUG_ON(pmd_trans_huge(*pmd)) at the end of
> > > __split_huge_page_pmd(): seen when doing madvise(,,MADV_DONTNEED).
> > > 
> > > It's invalid: we don't always have down_write of mmap_sem there:
> > > a racing do_huge_pmd_wp_page() might have copied-on-write to another
> > > huge page before our split_huge_page() got the anon_vma lock.
> > > 
> > 
> > I don't get exactly the scenario with do_huge_pmd_wp_page(), could you
> > elaborate?
> 
> I think the scenario is follow:
> 
> 	CPU0:					CPU1
> 
> __split_huge_page_pmd()
> 	page = pmd_page(*pmd);
> 					do_huge_pmd_wp_page() copy the
> 					page and changes pmd (the same as on CPU0)
> 					to point to newly copied page.
> 	split_huge_page(page)
> 	where page is original page,
> 	not allocated on COW.
> 	pmd still points on huge page.
> 
> 
> Hugh, have I got it correctly?

Yes, that's correct, that's what I've been assuming the race is.
With CPU0 split_huge_page_pmd() being called from zap_pmd_range()
in the service of madvise(,,MADV_DONTNEED).

I don't have the stacktrace to hand: could perfectly well dig it out
in an hour or two, but honestly, it adds nothing more to the picture.
I have no trace of the CPU1 side of things, and have merely surmised
that it was doing a COW.

As to whether the MADV_DONTNEED down_read optimization is important:
I don't recall, might be able to discover justification in old mail,
0a27a14a6292 doesn't actually say; but in general we're much better
off using down_read than down_write where it's safe to do so.

But more importantly, MADV_DONTNEED down_read across zap_page_range
is building on the fact that file invalidation/truncation can already
call zap_page_range without touching mmap_sem at all: not a problem
for traditional anon-only THP, but something you'll have had to worry
about for THPageCache.

I'm afraid Andrea's mail about concurrent madvises gives me far more
to think about than I have time for: seems to get into problems he
knows a lot about but I'm unfamiliar with.  If this patch looks good
for now on its own, let's put it in; but no problem if you guys prefer
to wait for a fuller solution of more problems, we can ride with this
one internally for the moment.

And I should admit that the crash has occurred too rarely for us yet
to be able to judge whether this patch actually fixes it in practice.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2013-10-15 17:53 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-15 11:08 Hugh Dickins
2013-10-15 11:32 ` Kirill A. Shutemov
2013-10-15 14:41   ` Andrea Arcangeli
2013-10-15 14:34 ` Andrea Arcangeli
2013-10-15 14:48   ` Kirill A. Shutemov
2013-10-15 15:58     ` Andrea Arcangeli
2013-10-15 17:53     ` Hugh Dickins [this message]
2013-10-15 18:55       ` Andrea Arcangeli
2013-10-15 19:28         ` Naoya Horiguchi
2013-10-15 19:44           ` Andrea Arcangeli
2013-10-15 20:16             ` Naoya Horiguchi
2013-10-15 20:30               ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LNX.2.00.1310151029040.12481@eggly.anvils \
    --to=hughd@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox