Re: [PATCH 0/3] mm: Swap checksum - Cesar Eduardo Barros

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Cesar Eduardo Barros <cesarb@cesarb.net>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Hugh Dickins <hughd@google.com>
Subject: Re: [PATCH 0/3] mm: Swap checksum
Date: Mon, 24 May 2010 07:50:31 -0300	[thread overview]
Message-ID: <4BFA59F7.2020606@cesarb.net> (raw)
In-Reply-To: <AANLkTin_BV6nWlmX6aXTaHvzH-DnsFIVxP5hz4aZYlqH@mail.gmail.com>

Em 23-05-2010 23:05, Minchan Kim escreveu:
> On Mon, May 24, 2010 at 9:57 AM, Cesar Eduardo Barros<cesarb@cesarb.net>  wrote:
>> The internal ECC of the disk will not save you - a quick Google search found
>> an instance of someone with silent data corruption caused by a faulty *power
>> supply*.[1]
>>
>> And if it is silent corruption, without the checksums you will not notice it
>> - it will just be dismissed as "oh, Firefox just crashed again" or similar
>> (the same as bit flips on RAM without ECC).
>
> When I read your comment, suddenly some thought occurred to me.
> If we can't believe ECC of the disk, why do we separate error
> detection logic between file system and swap disk?
>
> I mean it make sense that put crc detection into block layer?
> It can make sure any block I/O.

There are differences as to where the checksum will be stored.

For the filesystem (and for the software suspend image), you have to 
store the checksums in the disk itself, and the correct place (and 
ordering requirements) depends on the filesystem. Also, most filesystems 
do not currently have on-disk checksums.

For the swap, it is much simpler; the checksums can be stored in memory 
(since they do not matter after a reboot; the swap contents are simply 
discarded). This also gives better performance, since the checksums do 
not have to be separately written, and more flexibility, since the 
kernel can use whatever kind of checksum it wants and can store the 
checksum in whatever data structure it choses, without worrying about 
compatibility.

And, in fact, there is a CRC code in the block layer; it is 
CONFIG_BLK_DEV_INTEGRITY. However, it is not a generic solution; it 
needs some extra prerequisites (like a disk low-level formatted with 
sectors with >512 bytes).

> And what's BER of disk?
> Is it usual to meet the problem?

It is unusual enough that most people who meet it will not notice.

And the filesystem developers (who understand about these things more 
than me) seem to be trending towards adding checksums to their 
filesystems. The point of this patch is to meet the same level of safety 
as btrfs (thus the choice of crc32c, which is what btrfs uses).

> In normal desktop, some app killed are not critical. If the
> application is critical, maybe app have to logic fault handling.
> Firefox has session restore feature and Office program has temporal
> save feature.

In fact, crashing is the "best" outcome here. The worst outcome is your 
application silently corrupting the data you then save to disk.

> On the other hand, in server, does it designed well to use swap disk
> until we meet bit error of disk?

Servers are the ones which would benefit the most, as their RAM is 
usually very reliable (they tend to use ECC memory). Their disk 
subsystems, however, are also more reliable.

Desktop systems (especially "no-name" brand ones) do not gain as much, 
since their RAM is usually unprotected; however, they are also the ones 
which have better chance of having low-quality power supplies, uncommon 
storage media (USB flash drives as the root disk is an example), and 
problematic I/O subsystems.

> My feel is that it seem to be rather overkill.

Yes, it is a bit overkill, except when you are using software suspend. 
While the software suspend image is not protected by this patch (I am 
already thinking of a separate patch to add checksums to it), the 
swapped out pages are (software suspend uses both a memory image saved 
to the swap partition and the normal swapped out pages).

But you do not have to use it if you think it is overkill - I even added 
a kernel parameter to easily disable it.

>> The swap checksum only protects the page against being silently corrupted
>> while on the disk and at least to some degree on the I/O path between the
>> memory and the disk. It does not protect against broken kernel-mode code
>> writing to the wrong address, nor against broken hardware (or hardware
>> misconfigured by broken drivers) doing DMA to wrong addresses. It also does
>> not protect against hardware errors in the RAM itself (you have ECC memory
>> for that).
>>
>> That is, the code assumes the memory containing the checksums will not be
>> corrupted, because if it is, you have worse problems (and the CRC error here
>> would be a *good* thing, since it would make you notice something is not
>> quite right).
>>
>
> Which is high between BER of RAM and disk?
> It's a just question. :)

I have no idea.

However, we can do nothing in software against RAM errors; it would kill 
performance too much. Against disk errors, however, we can do a lot 
(software RAID-1 is just one of the simplest examples).

-- 
Cesar Eduardo Barros
cesarb@cesarb.net
cesar.barros@gmail.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-05-24 10:50 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-22 18:08 Cesar Eduardo Barros
2010-05-22 18:08 ` [PATCH 1/3] mm/swapfile.c: better messages for swap_info_get Cesar Eduardo Barros
2010-05-22 18:13   ` Borislav Petkov
2010-05-22 18:18     ` Cesar Eduardo Barros
2010-05-22 18:08 ` [PATCH 2/3] kernel/power/swap.c: do not use end_swap_bio_read Cesar Eduardo Barros
2010-05-22 18:08 ` [PATCH 3/3] mm: Swap checksum Cesar Eduardo Barros
2010-05-23 15:19   ` Avi Kivity
2010-05-23 18:58     ` Cesar Eduardo Barros
2010-05-24  6:41       ` Avi Kivity
2010-05-24  7:32         ` Nick Piggin
2010-05-24 10:51           ` Avi Kivity
2010-05-24 11:24         ` Cesar Eduardo Barros
2010-05-23 14:03 ` [PATCH 0/3] " Minchan Kim
2010-05-23 18:32   ` Cesar Eduardo Barros
2010-05-24  0:09     ` Minchan Kim
2010-05-24  0:57       ` Cesar Eduardo Barros
2010-05-24  2:05         ` Minchan Kim
2010-05-24 10:50           ` Cesar Eduardo Barros [this message]
2010-05-25 23:52             ` Minchan Kim
2010-05-26 10:21               ` Cesar Eduardo Barros
2010-05-26 15:31                 ` Minchan Kim
2010-05-26 21:28                   ` Valdis.Kletnieks
2010-05-26 22:45                     ` Minchan Kim
2010-05-26 23:19                       ` Cesar Eduardo Barros
2010-05-26 23:27                         ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BFA59F7.2020606@cesarb.net \
    --to=cesarb@cesarb.net \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox