From: Andi Kleen <ak@suse.de>
To: Brian Lindahl <Brian.Lindahl@spacedev.com>
Cc: linux-mm@kvack.org
Subject: Re: ECC error correction - page isolation
Date: Fri, 2 Jun 2006 01:46:33 +0200 [thread overview]
Message-ID: <200606020146.33703.ak@suse.de> (raw)
In-Reply-To: <069061BE1B26524C85EC01E0F5CC3CC30163E1F1@rigel.headquarters.spacedev.com>
On Thursday 01 June 2006 20:06, Brian Lindahl wrote:
> We have a board that gives us access to ECC error counts and ECC error
> status (4 bits, each corresponding to a different error). A background
> process performs a scrub (read, rewrite) on individual raw memory pages to
> activate the ECC. When the error count changes (an error is detected), I'd
> like to be able to isolate the page, if unused. The pages are scrubbed as
> raw physical addresses (page numbers) via a ioctl command on /dev/mem. Is
> there a facility that will allow me to map this physical address range to a
> page entity in the kernel so that I can isolate it and mark it as unusable,
> or reboot if it's active? Is there a better way to do this (i.e. avoiding
> the mapping phase and interact directly with physical page entities in the
> kernel)? Where should I begin my journey into mm in the kernel? What
> structures, functions and globals should I be looking at?
>
> Going this deep in the kernel is pretty foreign to me, so any help would be
> appreciated. Thanks in advance!
I did a prototype for something like this years ago. It is relatively
complicated.
If you get machine checks in normal accesses you have to bootstrap
yourself. This means it has to be handed off to a thread to be able
to take locks safely. For a scrubber that can be ignored. Doing
it from arbitary context requires some tricks.
Then you have to take a look at the struct page associated with
the address. If it's a rmap page (you'll need a 2.6 kernel) you
can walk the rmap chains to find the processes that have
the page mapped. You can look at the PTEs and
the page bits to see if it's dirty or not. For clean pages
the page can be just dropped. Otherwise you have
to kill the process (or send them a signal they could handle)
There is no generic function to do the rmap walk right now, but it's not too
hard.
If it's kernel space there are several cases:
- Free page (count == 0). Easy: ignore it.
- Reserved - e.g. page itself or kernel code - panic
- Slab (slab bit set) - panic
- Page table (cannot be detected right now, but you could
change your architecture to set special bits) - handle like
process error
- buffer cache: toss or IO/error if it was dirty
- Probably more cases
Most can be figured out by looking at the various bits in struct page
-Andi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-06-01 23:46 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-06-01 18:06 Brian Lindahl
2006-06-01 23:46 ` Andi Kleen [this message]
2006-06-02 1:30 ` Nick Piggin
2006-06-02 3:10 ` Andi Kleen
2006-06-02 3:15 ` Nick Piggin
2006-06-05 23:36 Brian Lindahl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200606020146.33703.ak@suse.de \
--to=ak@suse.de \
--cc=Brian.Lindahl@spacedev.com \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox