linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@suse.de>
To: Avi Kivity <avi@redhat.com>
Cc: Cesar Eduardo Barros <cesarb@cesarb.net>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/3] mm: Swap checksum
Date: Mon, 24 May 2010 17:32:59 +1000	[thread overview]
Message-ID: <20100524073259.GW2516@laptop> (raw)
In-Reply-To: <4BFA1F92.2080802@redhat.com>

On Mon, May 24, 2010 at 09:41:22AM +0300, Avi Kivity wrote:
> On 05/23/2010 09:58 PM, Cesar Eduardo Barros wrote:
> >Em 23-05-2010 12:19, Avi Kivity escreveu:
> >>On 64-bit, we may be able to store the checksum in the pte, if the swap
> >>device is small enough.
> >
> >Which pte?
> 
> All of them.
> 
> >Correct me if I am wrong, but I do not think all pages written to
> >the swap have exactly one pte pointing to them. And I have not
> >looked at the shmem.c code yet, but does it even use ptes?
> 
> Well, the ptes need the swap address written into them, so they are
> already found and updated somehow.  All that's needed is to update
> the value written to also include the checksum.
> 
> >It might be possible (find all ptes and write the 32-bit checksum
> >to them, do something else for shmem, have two different code
> >paths for small/large swapfiles), but I do not know if the memory
> >savings are worth the extra complexity (especially the need for
> >two separate code paths).
> 
> Certainly not at first, but later it may be worthwhile.
> 
> >
> >>If we take the trouble to touch the page, we may as well compare it
> >>against zero, and if so drop it instead of swapping it out.
> >
> >The problem with this is that the page is touched deep inside the
> >crc32c code, which might even be using hardware instructions
> >(crc32c-intel). So we would need to read it two times to compare
> >against zero.
> 
> The second read is very cheap since the page is already in cache.
> Also, we fail early when any word is nonzero, so usually the compare
> exits quickly.

For a page being written back from pagecache to disk, or for a
page being swapped out, the contents are likely cache cold and
likely not to be used in future either. Therefore a crc routine
for that would do well to minimise cache pollution.


> >One possibility could be to compare the full page against zero
> >only if its crc is a specific value (the crc32c of a page full of
> >zeros). This would not be too slow (we would be wasting time only
> >when we have a very high probability of saving much more time),
> >and not need to touch the crc32c code at all. I would only have to
> >look at how this messes up the state tracking (i.e. how to make it
> >track the fact that, instead of getting written out, this is now a
> >zeroed page).
> 
> Instead of returning a swap pte to be written to the page tables,
> return a zeroed pte.

A pte_none pte, to be precise.

I wonder, though. If we no longer trust block devices to give the
correct data back, should we provide a meta block device to do error
detection? No production filesystem on Linux has checksums (well, ext4
has a few). Of the ones that add checksumming, I'd say most will not do
data checksumming (and for direct IO it is not done).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-05-24  7:33 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-22 18:08 [PATCH 0/3] " Cesar Eduardo Barros
2010-05-22 18:08 ` [PATCH 1/3] mm/swapfile.c: better messages for swap_info_get Cesar Eduardo Barros
2010-05-22 18:13   ` Borislav Petkov
2010-05-22 18:18     ` Cesar Eduardo Barros
2010-05-22 18:08 ` [PATCH 2/3] kernel/power/swap.c: do not use end_swap_bio_read Cesar Eduardo Barros
2010-05-22 18:08 ` [PATCH 3/3] mm: Swap checksum Cesar Eduardo Barros
2010-05-23 15:19   ` Avi Kivity
2010-05-23 18:58     ` Cesar Eduardo Barros
2010-05-24  6:41       ` Avi Kivity
2010-05-24  7:32         ` Nick Piggin [this message]
2010-05-24 10:51           ` Avi Kivity
2010-05-24 11:24         ` Cesar Eduardo Barros
2010-05-23 14:03 ` [PATCH 0/3] " Minchan Kim
2010-05-23 18:32   ` Cesar Eduardo Barros
2010-05-24  0:09     ` Minchan Kim
2010-05-24  0:57       ` Cesar Eduardo Barros
2010-05-24  2:05         ` Minchan Kim
2010-05-24 10:50           ` Cesar Eduardo Barros
2010-05-25 23:52             ` Minchan Kim
2010-05-26 10:21               ` Cesar Eduardo Barros
2010-05-26 15:31                 ` Minchan Kim
2010-05-26 21:28                   ` Valdis.Kletnieks
2010-05-26 22:45                     ` Minchan Kim
2010-05-26 23:19                       ` Cesar Eduardo Barros
2010-05-26 23:27                         ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100524073259.GW2516@laptop \
    --to=npiggin@suse.de \
    --cc=avi@redhat.com \
    --cc=cesarb@cesarb.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox