linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andi Kleen <ak@suse.de>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Brian Lindahl <Brian.Lindahl@spacedev.com>, linux-mm@kvack.org
Subject: Re: ECC error correction - page isolation
Date: Fri, 2 Jun 2006 05:10:49 +0200	[thread overview]
Message-ID: <200606020510.49877.ak@suse.de> (raw)
In-Reply-To: <447F94B3.7030807@yahoo.com.au>

> Good summary. I'll just add a couple of things: in recent kernels
> we have a page migration facility which should be able to take care
> of moving process and pagecache pages for you, without walking rmap
> or killing the process (assuming you're talking about correctable
> ECC errors).

I think he means uncorrected errors. Correctable errors can be fixed up
by a scrubber without anything else noticing.

Ok if your system doesn't support getting rid of them without an atomic
operation you might need to "stop the world" on MP, but that's relatively
easy using stop_machine().

> This may not quite have the right in-kernel API for you use yet, but
> it shouldn't be difficult to add.
> 
> > 
> > If it's kernel space there are several cases:
> > - Free page (count == 0). Easy: ignore it.
> 
> Also, if you want to isolate the free page, you can allocate it,
> and tuck it away in a list somewhere (or just forget about it
> completely).

Normally it's rare that a bit breaks completely. Usually they just toggle
for some reason and are ok again if you rewrite them (how to do the rewrite without
triggering an MCE can be tricky BTW). Or the glitch wasn't in the RAM transistors
itself, but on some bus, then it might also be ok again on retry. 

What more often happens is that a DIMM (or rather a chip on a DIMM) breaks 
completely. In this case you need to remove the whole chip. This
can be often done in hardware using "chipkill" (which is kind a special
case of hardware RAM RAID).

Anyways you usually need to remove a large memory area, much bigger than a page, 
in this case  and it's more like memory hot unplug (which we don't quite 
support yet, but it's being worked on ...) 

Of course that's all for normal systems. If you're in a space craft (as I 
gather from the original poster's domain name) 
crossing the Van Allen belts or doing a solar storm it might be very different. 
But even then I would expect bits to more often just switch than break completely. 
Maybe for a Jupiter probe it's different and chips might really spoil.

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2006-06-02  3:10 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-01 18:06 Brian Lindahl
2006-06-01 23:46 ` Andi Kleen
2006-06-02  1:30   ` Nick Piggin
2006-06-02  3:10     ` Andi Kleen [this message]
2006-06-02  3:15       ` Nick Piggin
2006-06-05 23:36 Brian Lindahl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200606020510.49877.ak@suse.de \
    --to=ak@suse.de \
    --cc=Brian.Lindahl@spacedev.com \
    --cc=linux-mm@kvack.org \
    --cc=nickpiggin@yahoo.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox