linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: David Ahern <david.ahern@oracle.com>,
	David Miller <davem@davemloft.net>,
	sparclinux@vger.kernel.org
Cc: linux-mm <linux-mm@kvack.org>, LKML <linux-kernel@vger.kernel.org>
Subject: Re: 4.0.0-rc4: panic in free_block
Date: Sat, 21 Mar 2015 11:49:12 -0700	[thread overview]
Message-ID: <CA+55aFwXmDom=GKE=K2QVqp_RUtOPQ0v5kCArATqQEKUOZ6OrA@mail.gmail.com> (raw)
In-Reply-To: <550DAE23.7030000@oracle.com>

On Sat, Mar 21, 2015 at 10:45 AM, David Ahern <david.ahern@oracle.com> wrote:
>
> You raise a lot of valid questions and something to look into. But if the
> root cause were such a fundamental issue (CPU memory ordering, compiler bug,
> etc) why would it only occur on this one code path -- free with SLAB and
> NUMA -- and so consistently?

So the consistency could easily come from a compiler bug (or a missing
barrier in the kernel code) that just happens to trigger in a single
place (or in a few places, but then that's the only place that gets
exercised heavily enough to show it).

I agree that an actual hardware bug is unlikely, although that too is
possible: I can pretty much guarantee that if it were a CPU bug, it
wouldn't be some "memory ordering is entirely broken" bug in general,
it would be some very specific case that only happens with just the
right instruction timing and mix.

That said, while I bring up a CPU bug as a possibility, I really do
agree that it is *very* unlikely. Memory ordering is hard, and yes,
you can get it wrong, but at the same time CPU designers very much
know about it and tend to be pretty damn good about it. And as you
say, it generally wouldn't be *that* consistent. It might be
consistent for one particular kernel build (due to very particular
instruction mix and timings), but over lots of versions of the code
and many different debug options? Very very very unlikely.

> Continuing to poke around, but open to any suggestions. I have enabled every
> DEBUG I can find in the memory code and nothing is popping out. In terms of
> races wouldn't all the DEBUG checks affect timing? Yet, I am still seeing
> the same stack traces due to the same root cause.

Yes, generally debug options would change timings sufficiently that
any particular low-level race would certainly go away or at least
become much harder to hit. So if you have enabled spinlock debugging
etc, I don't really believe in a hw bug. It's  more likely that there
is some kernel architecture-specific code that triggers it. Or even
generic code that just happens to work on other cases due to random
issues (ie memory alignment etc).

I *would* suggest looking at that "memmove()" code. It really looks
like crap. It seems to do things byte-at-a-time for the overlapping
case, and the code seems to depend on memcpy always doing things
low-to-high, but there are multiple different memcpy implementations
so I don't know that that is always true. If one of the memcpy
functions sometimes copies the other way depending on size etc, it
could screw up.

Basically, that sparc64 memmove() implementation looks like it was
written by a dyslexic 5-year-old as a throw-away hack, and then never
got fixed.

Davem? I don't read sparc assembly, so I'm *really* not going to try
to verify that (a) all the memcpy implementations always copy
low-to-high and (b) that I even read the address comparisons in
memmove.S right.

I mention memmove just because it's actually fairly unusual for the
kernel. At the same time, if it really is broken for overlapping
regions, I'd expect *some* other places to show breakage too. So it's
probably fine, even if it does look very very bad to do things one
byte at a time backwards as a fallback.

                             Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-03-21 18:49 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-20 15:07 David Ahern
2015-03-20 16:48 ` Linus Torvalds
2015-03-20 16:53   ` David Ahern
2015-03-20 16:58     ` Linus Torvalds
2015-03-20 18:05       ` David Ahern
2015-03-20 18:53         ` Linus Torvalds
2015-03-20 19:04           ` David Ahern
2015-03-20 19:47         ` David Miller
2015-03-20 19:54           ` David Ahern
2015-03-20 20:19             ` David Miller
2015-03-20 19:42       ` David Miller
2015-03-20 20:01       ` Dave Hansen
2015-03-20 21:17 ` Linus Torvalds
2015-03-20 22:49   ` David Ahern
2015-03-21  0:18     ` David Ahern
2015-03-21  0:34       ` David Rientjes
2015-03-21  0:39         ` David Ahern
2015-03-21  0:47       ` Linus Torvalds
2015-03-21 17:45         ` David Ahern
2015-03-21 18:49           ` Linus Torvalds [this message]
2015-03-22 17:36             ` David Miller
2015-03-22 19:25               ` Bob Picco
2015-03-22 19:47               ` Linus Torvalds
2015-03-22 22:23                 ` David Miller
2015-03-22 23:35                   ` David Ahern
2015-03-22 23:54                     ` David Miller
2015-03-23  0:03                       ` David Ahern
2015-03-23  2:00                         ` David Miller
2015-03-23  2:19                           ` David Miller
2015-03-23 16:25                             ` David Miller
2015-03-23 16:51                               ` John Stoffel
2015-03-23 19:16                                 ` David Miller
2015-03-23 19:56                                   ` John Stoffel
2015-03-23 20:08                                     ` David Miller
2015-03-23 17:00                               ` Linus Torvalds
2015-03-23 19:08                                 ` David Miller
2015-03-23 19:47                                   ` Linus Torvalds
2015-03-23 19:52                                     ` David Miller
2015-03-23 17:34                               ` David Ahern
2015-03-23 19:35                                 ` David Miller
2015-03-23 19:58                                   ` David Ahern
2015-03-24  1:01                                   ` David Ahern
2015-03-24 14:57                               ` Bob Picco
2015-03-24 16:05                                 ` David Miller
2015-03-22 23:49                   ` Linus Torvalds
2015-03-22 23:57                     ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+55aFwXmDom=GKE=K2QVqp_RUtOPQ0v5kCArATqQEKUOZ6OrA@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=david.ahern@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sparclinux@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox