linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Rick van Rein <rick@vanrein.org>
To: "H. Peter Anvin" <hpa@zytor.com>, Stefan Assmann <sassmann@kpanic.de>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, tony.luck@intel.com,
	andi@firstfloor.org, mingo@elte.hu, rick@vanrein.org,
	rdunlap@xenotime.net, Nancy Yuen <yuenn@google.com>,
	Michael Ditto <mditto@google.com>
Subject: Re: [PATCH v2 0/3] support for broken memory modules (BadRAM)
Date: Thu, 23 Jun 2011 10:33:20 +0000	[thread overview]
Message-ID: <20110623103320.GB2910@phantom.vanrein.org> (raw)
In-Reply-To: <4E0251AB.8090702@zytor.com> <4E024E31.50901@kpanic.de> <4E023142.1080605@zytor.com>

Hello,

> We already support the equivalent functionality with
> memmap=<address>$<length> for those with only a few ranges...

This is not a realistic option for people whose memory failed.
Google is quite right when they say they hit thousands of erroneous
pages.  If you have, say, a static discharge damaging the buffers
from the cell array to the outside world, then the entire row or
column behind that buffer will fail.  I've seen many such examples.

> For those with a lot of ranges,
> like Google, the command line is insufficient.

Not if you recognise that there is a pattern :-)

Google does not seem to have realised that, and is simply listing
the pages that are defected.  IMHO, but being the BadRAM author I
can hardly be called objective, this is the added value of BadRAM,
that it understands the nature of the problem and solves it with
an elegant concept at the right level of abstraction.

> So far the use case I had in mind wasn't "thousands of entries". However
> expanding the e820 table is probably an issue that could be dealt with
> separately ?

This could help with other approaches as well -- as mentioned,
there have been attempts to get BadRAM into GRUB, so that the
kernel needn't be aware of it.  But adding BadRAM or expanding
the e820 table are both cases of changing the kernel, and in that
case I thought it'd be best to actually solve the problem and
not upgrade the messenger.

> Well if too much low memory is bad, you're screwed anyway, not? :)

If the kernel is always loaded in a fixed location, yes.  That
is one assumption that the kernel makes (made?) that will only
work if all your memory is good.

> At the moment I don't see any arguments why this patchset couldn't play
> along nicely or get enhanced to support what Google needs, but I don't
> know Googles patches yet.

Changes to e820 should not interfere with setting flags (and
living by them) for failing memory pages.  One property of BadRAM,
namely that it does not slow down your system (you have less
pages on hand, but that's all) may or may not apply to an e820-based
approach.  I don't know if e820 is ever consulted after boot?

> How common are nontrivial patterns on real hardware?  This would be
> interesting to hear from Google or another large user.

Yes.  And "non-trivial" would mean that the patterns waste more space
than fair, *because of* the generalisation to patterns.

If you plug 10 DIMMs into your machine, and each has a faulty row
somewhere, then you will get into trouble if you stick to 5 patterns.
But if you happen to run into a faulty DIMM from time to time, the
patterns should be your way out.

> I have to say I think Google's point that truncating the list is
> unacceptable...

Of course, that is true.  This is why memmap=... does not work.
It has nothing to do with BadRAM however, there will never be more
than 5 patterns.

> that would mean running in a known-bad configuration,
> and even a hard crash would be better.

...which is so sensible that it was of course taken into account in
the BadRAM design!


Cheers,
 -Rick

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2011-06-23 10:33 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-22 11:18 Stefan Assmann
2011-06-22 11:18 ` [PATCH v2 1/3] Add string parsing function get_next_ulong Stefan Assmann
2011-06-22 11:18 ` [PATCH v2 2/3] support for broken memory modules (BadRAM) Stefan Assmann
2011-06-22 11:18 ` [PATCH v2 3/3] Add documentation and credits for BadRAM Stefan Assmann
2011-06-22 18:00 ` [PATCH v2 0/3] support for broken memory modules (BadRAM) Andrew Morton
2011-06-22 18:06   ` Josh Boyer
2011-06-22 18:09   ` Randy Dunlap
2011-06-22 18:11     ` Nancy Yuen
2011-06-22 18:13   ` H. Peter Anvin
2011-06-22 19:01     ` Nancy Yuen
2011-06-22 19:06       ` H. Peter Anvin
2011-06-22 18:24   ` Andi Kleen
2011-06-22 18:38     ` Andrew Morton
2011-06-22 18:56       ` Andi Kleen
2011-06-22 19:05         ` H. Peter Anvin
2011-06-22 19:15           ` Andi Kleen
2011-06-22 20:25             ` H. Peter Anvin
2011-06-22 20:28               ` Andi Kleen
2011-06-22 19:46   ` [PATCH] x86: e820: Eliminate bubble sort from sanitize_e820_map Mike Ditto
2011-06-22 20:18   ` [PATCH v2 0/3] support for broken memory modules (BadRAM) Stefan Assmann
2011-06-23 10:33     ` Rick van Rein [this message]
2011-06-23 10:49       ` Rick van Rein
2011-06-23 10:10   ` Rick van Rein
2011-06-22 18:15 ` H. Peter Anvin
2011-06-22 20:30   ` Stefan Assmann
2011-06-22 20:33     ` H. Peter Anvin
2011-06-23 13:39 ` Matthew Garrett
2011-06-23 14:08   ` Stefan Assmann
2011-06-23 14:12     ` Matthew Garrett
2011-06-23 15:37       ` Stefan Assmann
2011-06-23 16:30         ` H. Peter Anvin
2011-06-24  0:59           ` Andi Kleen
2011-06-23 17:00         ` Andi Kleen
2011-06-23 17:12           ` Luck, Tony
2011-06-24  1:03             ` Craig Bergstrom
2011-06-24  1:08               ` Andi Kleen
2011-06-24  1:22                 ` Craig Bergstrom
2011-06-24  8:05               ` Rick van Rein
2011-06-24 14:34                 ` Craig Bergstrom
2011-06-24 16:16                 ` H. Peter Anvin
2011-06-24 16:40                   ` Luck, Tony
2011-06-24 16:56                     ` Rick van Rein
2011-06-24 17:14                       ` H. Peter Anvin
     [not found] <fa.fHPNPTsllvyE/7DxrKwiwgVbVww@ifi.uio.no>
2011-06-24 21:10 ` Shane Nay
2011-06-28  2:33   ` Craig Bergstrom
2011-06-29  8:08     ` Rick van Rein
2011-06-29 15:28       ` craig lkml
2011-06-29 16:06         ` Craig Bergstrom
2011-06-29 21:24           ` Tony Luck
2011-06-30 14:32       ` Jody Belka
  -- strict thread matches above, loose matches on Subject: below --
2011-06-21  9:23 Stefan Assmann
2011-06-21 22:02 ` Andrew Morton
2011-06-22 11:11   ` Stefan Assmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110623103320.GB2910@phantom.vanrein.org \
    --to=rick@vanrein.org \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mditto@google.com \
    --cc=mingo@elte.hu \
    --cc=rdunlap@xenotime.net \
    --cc=sassmann@kpanic.de \
    --cc=tony.luck@intel.com \
    --cc=yuenn@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox