From: Linus Torvalds <torvalds@linux-foundation.org>
To: David Ahern <david.ahern@oracle.com>
Cc: linux-mm <linux-mm@kvack.org>, LKML <linux-kernel@vger.kernel.org>
Subject: Re: 4.0.0-rc4: panic in free_block
Date: Fri, 20 Mar 2015 17:47:51 -0700 [thread overview]
Message-ID: <CA+55aFwyuVWHMq_oc_hfwWcu6RaPGSifXD9-adX2_TOa-L+PHA@mail.gmail.com> (raw)
In-Reply-To: <550CB8D1.9030608@oracle.com>
On Fri, Mar 20, 2015 at 5:18 PM, David Ahern <david.ahern@oracle.com> wrote:
> On 3/20/15 4:49 PM, David Ahern wrote:
>>
>> I did ask around and apparently this bug is hit only with the new M7
>> processors. DaveM: that's why you are not hitting this.
Quite frankly, this smells even more like an architecture bug. It
could be anywhere: it could be a CPU memory ordering issue, a compiler
bug, or a missing barrier or other thing.
How confident are you in the M7 memory ordering rules? It's a fairly
new core, no? With new speculative reads etc? Maybe the Linux
spinlocks don't have the right serialization, and more aggressive
reordering in the new core shows a bug?
Looking at this code, if this is a race, I see a few things that are
worth checking out
- it does a very much overlapping "memmove()". The
sparc/lib/memmove.S file looks suspiciously bad (is that a
byte-at-a-time loop? Is it even correctly checking overlap?)
- it relies on both percpu data and a spinlock. I'm sure the sparc
spinlock code has been tested *extensively* with old cores, but maybe
some new speculative read ends up breaking them?
I'm assuming M7 still TSO and 'ldsub' has acquire semantics? Is it
configurable like some sparc versions? I'm wondering whether the
Solaris locks might have some extra memory barriers due to supporting
the other (weaker) sparc memory models, and maybe they hid some M7
"feature" by mistake...
*Some* of the sparc memcpy routines have odd membar's in them. Why
would a TSO machine need a memory barrier inside a memcpy. That just
makes me go "Ehh?"
> Here's another data point: If I disable NUMA I don't see the problem.
> Performance drops, but no NULL pointer splats which would have been panics.
So the NUMA case triggers the per-node "n->shared" logic, which
*should* be protected by "n->list_lock". Maybe there is some bug there
- but since that code seems to do ok on x86-64 (and apparently older
sparc too), I really would look at arch-specific issues first.
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-03-21 0:47 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-20 15:07 David Ahern
2015-03-20 16:48 ` Linus Torvalds
2015-03-20 16:53 ` David Ahern
2015-03-20 16:58 ` Linus Torvalds
2015-03-20 18:05 ` David Ahern
2015-03-20 18:53 ` Linus Torvalds
2015-03-20 19:04 ` David Ahern
2015-03-20 19:47 ` David Miller
2015-03-20 19:54 ` David Ahern
2015-03-20 20:19 ` David Miller
2015-03-20 19:42 ` David Miller
2015-03-20 20:01 ` Dave Hansen
2015-03-20 21:17 ` Linus Torvalds
2015-03-20 22:49 ` David Ahern
2015-03-21 0:18 ` David Ahern
2015-03-21 0:34 ` David Rientjes
2015-03-21 0:39 ` David Ahern
2015-03-21 0:47 ` Linus Torvalds [this message]
2015-03-21 17:45 ` David Ahern
2015-03-21 18:49 ` Linus Torvalds
2015-03-22 17:36 ` David Miller
2015-03-22 19:25 ` Bob Picco
2015-03-22 19:47 ` Linus Torvalds
2015-03-22 22:23 ` David Miller
2015-03-22 23:35 ` David Ahern
2015-03-22 23:54 ` David Miller
2015-03-23 0:03 ` David Ahern
2015-03-23 2:00 ` David Miller
2015-03-23 2:19 ` David Miller
2015-03-23 16:25 ` David Miller
2015-03-23 16:51 ` John Stoffel
2015-03-23 19:16 ` David Miller
2015-03-23 19:56 ` John Stoffel
2015-03-23 20:08 ` David Miller
2015-03-23 17:00 ` Linus Torvalds
2015-03-23 19:08 ` David Miller
2015-03-23 19:47 ` Linus Torvalds
2015-03-23 19:52 ` David Miller
2015-03-23 17:34 ` David Ahern
2015-03-23 19:35 ` David Miller
2015-03-23 19:58 ` David Ahern
2015-03-24 1:01 ` David Ahern
2015-03-24 14:57 ` Bob Picco
2015-03-24 16:05 ` David Miller
2015-03-22 23:49 ` Linus Torvalds
2015-03-22 23:57 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CA+55aFwyuVWHMq_oc_hfwWcu6RaPGSifXD9-adX2_TOa-L+PHA@mail.gmail.com \
--to=torvalds@linux-foundation.org \
--cc=david.ahern@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox