Re: [Ksummit-discuss] [TECH TOPIC] Project Banbury

ksummit.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Matthew Wilcox <willy6545@gmail.com>,
	ksummit-discuss@lists.linuxfoundation.org
Subject: Re: [Ksummit-discuss] [TECH TOPIC] Project Banbury
Date: Sun, 16 Sep 2018 09:37:50 -0700	[thread overview]
Message-ID: <1537115870.3056.1.camel@HansenPartnership.com> (raw)
In-Reply-To: <CAFhKne8kiF6k-QUJ9x-cCyBcVvfuWKdcUtQZNz=1sx_iHR+64g@mail.gmail.com>

On Fri, 2018-09-14 at 18:28 +0100, Matthew Wilcox wrote:
> We've all pulled the wrong drive out of a machine or unplugged a USB
> key before the write back has completely finished. You try to plug it
> back in, but the damage is done. The pending writes are lost, the
> filesystem is damaged and full of errors and you are having a Bad
> Day. What if ... plugging the drive back in could be made to work?

For a lot of modern external storage devices this simply can't be made
to work.  The reason is they all have an internal write back cache to
make operations faster and if they're SATA they may lie about it and if
they're USB they always lie about it.  For these devices we have a set
of writes that we think are completed but in-fact only hit the device
cache.  When you pulled it out, the cache was lost and so were these
writes.  This is unfixable on the host side unless there's some way we
can get the device to tell us it has a write back cache and behave
correctly with regard to flushes.

Even for devices that behave correctly, we currently have no real way
to repeat the I/O that was lost in the powered down cache, unless you
have a way to cope with this case (it doesn't seem to be accounted for
in your plan)?  The reason is we use barrier type caches which assume
everything behind them is available to the device (either on disk or in
the cache).  The block layer would need some way to replay I/Os (in
order) from the last barrier because some of them might have been lost
from the cache.

Provided we have write through caches (not a given), the lower layer
error handling will mostly take care of repeating the lost but
unacknowledged I/O provided you preserve the queue, so I agree that
part can work, but the big thing is having a write through cache.

James

next prev parent reply	other threads:[~2018-09-16 16:37 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-14 17:28 Matthew Wilcox
2018-09-16 10:53 ` Hannes Reinecke
2018-09-16 12:45   ` Matthew Wilcox
2018-09-18  8:17     ` Hannes Reinecke
2018-09-16 16:03 ` Laurent Pinchart
2018-09-16 16:25   ` Linus Torvalds
2018-09-16 16:37 ` James Bottomley [this message]
2018-09-16 19:25   ` Theodore Y. Ts'o
2018-09-16 23:58 ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1537115870.3056.1.camel@HansenPartnership.com \
    --to=james.bottomley@hansenpartnership.com \
    --cc=ksummit-discuss@lists.linuxfoundation.org \
    --cc=willy6545@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox