From: Shane Nay <snay@google.com>
To: fa.linux.kernel@googlegroups.com
Cc: "H. Peter Anvin" <hpa@zytor.com>,
Stefan Assmann <sassmann@kpanic.de>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
akpm@linux-foundation.org, tony.luck@intel.com,
andi@firstfloor.org, mingo@elte.hu, rick@vanrein.org,
rdunlap@xenotime.net, Nancy Yuen <yuenn@google.com>,
Michael Ditto <mditto@google.com>
Subject: Re: [PATCH v2 0/3] support for broken memory modules (BadRAM)
Date: Fri, 24 Jun 2011 14:10:45 -0700 (PDT) [thread overview]
Message-ID: <532cc290-4b9c-4eb2-91d4-aa66c01bb3a0@glegroupsg2000goo.googlegroups.com> (raw)
In-Reply-To: <fa.fHPNPTsllvyE/7DxrKwiwgVbVww@ifi.uio.no>
> > For those with a lot of ranges,
> > like Google, the command line is insufficient.
>
> Not if you recognise that there is a pattern :-)
>
> Google does not seem to have realised that, and is simply listing
> the pages that are defected. IMHO, but being the BadRAM author I
> can hardly be called objective, this is the added value of BadRAM,
> that it understands the nature of the problem and solves it with
> an elegant concept at the right level of abstraction.
No, we have realized patterns when there is one. It depends on the specific defect that is at play. There are several different defect types, and incidence rate with respect to the defect being observed. We do observe "classic" failures of the type you are describing, where with the physical addressing information (bank, row, column), we can reproducibly cause errors to occur along that path.
One problem is that badram syntax doesn't cleanly mesh with all modern systems. For instance, not all chipsets have power-of-two bank interleave. Holes in addressing also create trouble on some systems.
Other defects look like white noise, these are typically indicative of manufacturing process defects.
When we find a crisp-pattern in the data, it's not always the entirety of that bit-maskable pattern which is effected. There can be interleaved subtractions from the underlying pattern orthogonal to interleave.
IMHO, badram is a good tool for it's intended purpose. They aren't really mutually exclusive anyway. We're cleaning up our existing patches to send out early next week. However, we had at one time had a way of inserting badram syntax generated e820's from command line along with passed in e820's, and extended versions. That bit isn't in our tree right now, but it's possible, and we're looking to see if we can make it work with the existing code.
> s (and
> living by them) for failing memory pages. One property of BadRAM,
> namely that it does not slow down your system (you have less
> pages on hand, but that's all) may or may not apply to an e820-based
> approach. I don't know if e820 is ever consulted after boot?
>
> > How common are nontrivial patterns on real hardware? This would be
> > interesting to hear from Google or another large user.
>
> Yes. And "non-trivial" would mean that the patterns waste more space
> than fair, *because of* the generalisation to patterns.
>
> If you plug 10 DIMMs into your machine, and each has a faulty row
> somewhere, then you will get into trouble if you stick to 5 patterns.
> But if you happen to run into a faulty DIMM from time to time, the
> patterns should be your way out.
>
> > I have to say I think Google's point that truncating the list is
> > unacceptable...
>
> Of course, that is true. This is why memmap=... does not work.
> It has nothing to do with BadRAM however, there will never be more
> than 5 patterns.
>
> > that would mean running in a known-bad configuration,
> > and even a hard crash would be better.
>
> ..which is so sensible that it was of course taken into account in
> the BadRAM design!
>
>
> Cheers,
> -Rick
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majo...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next parent reply other threads:[~2011-06-24 21:10 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <fa.fHPNPTsllvyE/7DxrKwiwgVbVww@ifi.uio.no>
2011-06-24 21:10 ` Shane Nay [this message]
2011-06-28 2:33 ` Craig Bergstrom
2011-06-29 8:08 ` Rick van Rein
2011-06-29 15:28 ` craig lkml
2011-06-29 16:06 ` Craig Bergstrom
2011-06-29 21:24 ` Tony Luck
2011-06-30 14:32 ` Jody Belka
2011-06-22 11:18 Stefan Assmann
2011-06-22 18:00 ` Andrew Morton
2011-06-22 18:06 ` Josh Boyer
2011-06-22 18:09 ` Randy Dunlap
2011-06-22 18:11 ` Nancy Yuen
2011-06-22 18:13 ` H. Peter Anvin
2011-06-22 19:01 ` Nancy Yuen
2011-06-22 19:06 ` H. Peter Anvin
2011-06-22 18:24 ` Andi Kleen
2011-06-22 18:38 ` Andrew Morton
2011-06-22 18:56 ` Andi Kleen
2011-06-22 19:05 ` H. Peter Anvin
2011-06-22 19:15 ` Andi Kleen
2011-06-22 20:25 ` H. Peter Anvin
2011-06-22 20:28 ` Andi Kleen
2011-06-22 20:18 ` Stefan Assmann
2011-06-23 10:33 ` Rick van Rein
2011-06-23 10:49 ` Rick van Rein
2011-06-23 10:10 ` Rick van Rein
2011-06-22 18:15 ` H. Peter Anvin
2011-06-22 20:30 ` Stefan Assmann
2011-06-22 20:33 ` H. Peter Anvin
2011-06-23 13:39 ` Matthew Garrett
2011-06-23 14:08 ` Stefan Assmann
2011-06-23 14:12 ` Matthew Garrett
2011-06-23 15:37 ` Stefan Assmann
2011-06-23 16:30 ` H. Peter Anvin
2011-06-24 0:59 ` Andi Kleen
2011-06-23 17:00 ` Andi Kleen
2011-06-23 17:12 ` Luck, Tony
2011-06-24 1:03 ` Craig Bergstrom
2011-06-24 1:08 ` Andi Kleen
2011-06-24 1:22 ` Craig Bergstrom
2011-06-24 8:05 ` Rick van Rein
2011-06-24 14:34 ` Craig Bergstrom
2011-06-24 16:16 ` H. Peter Anvin
2011-06-24 16:40 ` Luck, Tony
2011-06-24 16:56 ` Rick van Rein
2011-06-24 17:14 ` H. Peter Anvin
-- strict thread matches above, loose matches on Subject: below --
2011-06-21 9:23 Stefan Assmann
2011-06-21 22:02 ` Andrew Morton
2011-06-22 11:11 ` Stefan Assmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=532cc290-4b9c-4eb2-91d4-aa66c01bb3a0@glegroupsg2000goo.googlegroups.com \
--to=snay@google.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=fa.linux.kernel@googlegroups.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mditto@google.com \
--cc=mingo@elte.hu \
--cc=rdunlap@xenotime.net \
--cc=rick@vanrein.org \
--cc=sassmann@kpanic.de \
--cc=tony.luck@intel.com \
--cc=yuenn@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox