From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 5F89FC9A for ; Sun, 16 Sep 2018 16:37:53 +0000 (UTC) Received: from bedivere.hansenpartnership.com (bedivere.hansenpartnership.com [66.63.167.143]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id F2A55102 for ; Sun, 16 Sep 2018 16:37:52 +0000 (UTC) Message-ID: <1537115870.3056.1.camel@HansenPartnership.com> From: James Bottomley To: Matthew Wilcox , ksummit-discuss@lists.linuxfoundation.org Date: Sun, 16 Sep 2018 09:37:50 -0700 In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: Re: [Ksummit-discuss] [TECH TOPIC] Project Banbury List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Fri, 2018-09-14 at 18:28 +0100, Matthew Wilcox wrote: > We've all pulled the wrong drive out of a machine or unplugged a USB > key before the write back has completely finished. You try to plug it > back in, but the damage is done. The pending writes are lost, the > filesystem is damaged and full of errors and you are having a Bad > Day. What if ... plugging the drive back in could be made to work? For a lot of modern external storage devices this simply can't be made to work. The reason is they all have an internal write back cache to make operations faster and if they're SATA they may lie about it and if they're USB they always lie about it. For these devices we have a set of writes that we think are completed but in-fact only hit the device cache. When you pulled it out, the cache was lost and so were these writes. This is unfixable on the host side unless there's some way we can get the device to tell us it has a write back cache and behave correctly with regard to flushes. Even for devices that behave correctly, we currently have no real way to repeat the I/O that was lost in the powered down cache, unless you have a way to cope with this case (it doesn't seem to be accounted for in your plan)? The reason is we use barrier type caches which assume everything behind them is available to the device (either on disk or in the cache). The block layer would need some way to replay I/Os (in order) from the last barrier because some of them might have been lost from the cache. Provided we have write through caches (not a given), the lower layer error handling will mostly take care of repeating the lost but unacknowledged I/O provided you preserve the queue, so I agree that part can work, but the big thing is having a write through cache. James