From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail137.messagelabs.com (mail137.messagelabs.com [216.82.249.19]) by kanga.kvack.org (Postfix) with SMTP id 589226B004D for ; Thu, 13 Aug 2009 18:20:34 -0400 (EDT) Received: by qyk36 with SMTP id 36so953236qyk.12 for ; Thu, 13 Aug 2009 15:20:39 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <87f94c370908131428u75dfe496x1b7d90b94833bf80@mail.gmail.com> References: <200908122007.43522.ngupta@vflare.org> <20090813151312.GA13559@linux.intel.com> <20090813162621.GB1915@phenom2.trippelsdorf.de> <87f94c370908131115r680a7523w3cdbc78b9e82373c@mail.gmail.com> <87f94c370908131428u75dfe496x1b7d90b94833bf80@mail.gmail.com> Date: Thu, 13 Aug 2009 15:20:39 -0700 Message-ID: <46b8a8850908131520s747e045cnd8db9493e072939d@mail.gmail.com> Subject: Re: Discard support (was Re: [PATCH] swap: send callback when swap slot is freed) From: Richard Sharpe Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org To: Greg Freemyer Cc: david@lang.hm, Markus Trippelsdorf , Matthew Wilcox , Hugh Dickins , Nitin Gupta , Ingo Molnar , Peter Zijlstra , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-scsi@vger.kernel.org, linux-ide@vger.kernel.org, Linux RAID List-ID: On Thu, Aug 13, 2009 at 2:28 PM, Greg Freemyer wro= te: > On Thu, Aug 13, 2009 at 4:44 PM, wrote: >> On Thu, 13 Aug 2009, Greg Freemyer wrote: >> >>> On Thu, Aug 13, 2009 at 12:33 PM, wrote: >>>> >>>> On Thu, 13 Aug 2009, Markus Trippelsdorf wrote: >>>> >>>>> On Thu, Aug 13, 2009 at 08:13:12AM -0700, Matthew Wilcox wrote: >>>>>> >>>>>> I am planning a complete overhaul of the discard work. =A0Users can = send >>>>>> down discard requests as frequently as they like. =A0The block layer= will >>>>>> cache them, and invalidate them if writes come through. =A0Periodica= lly, >>>>>> the block layer will send down a TRIM or an UNMAP (depending on the >>>>>> underlying device) and get rid of the blocks that have remained >>>>>> unwanted >>>>>> in the interim. >>>>> >>>>> That is a very good idea. I've tested your original TRIM implementati= on >>>>> on >>>>> my Vertex yesterday and it was awful ;-). The SSD needs hundreds of >>>>> milliseconds to digest a single TRIM command. And since your >>>>> implementation >>>>> sends a TRIM for each extent of each deleted file, the whole system i= s >>>>> unusable after a short while. >>>>> An optimal solution would be to consolidate the discard requests, bun= dle >>>>> them and send them to the drive as infrequent as possible. >>>> >>>> or queue them up and send them when the drive is idle (you would need = to >>>> keep track to make sure the space isn't re-used) >>>> >>>> as an example, if you would consider spinning down a drive you don't h= urt >>>> performance by sending accumulated trim commands. >>>> >>>> David Lang >>> >>> An alternate approach is the block layer maintain its own bitmap of >>> used unused sectors / blocks. Unmap commands from the filesystem just >>> cause the bitmap to be updated. =A0No other effect. >> >> how does the block layer know what blocks are unused by the filesystem? >> >> or would it be a case of the filesystem generating discard/trim requests= to >> the block layer so that it can maintain it's bitmap, and then the block >> layer generating the requests to the drive below it? >> >> David Lang > > Yes, my thought.was that block layer would consume the discard/trim > requests from the filesystem in realtime to maintain the bitmap, then > at some later point in time when the system has extra resources it > would generate the calls down to the lower layers and eventually the > drive. Why should the block layer be forced to maintain something that is probably of use for only a limited number of cases? For example, the devices I work on already maintain their own mapping of HOST-visible LBAs to underlying storage, and I suspect that most such devices do. So, you are duplicating something that we already do, and there is no way that I am aware of to synchronise the two. All we really need, I believe is for the UNMAP requests to come down to us with writes barriered until we respond, and it is a relatively cheap operation, although writes that are already in the cache and uncommitted to disk present some issues if an UNMAP request comes down for recently written blocks. > I highlight the lower layers because mdraid is also going to have to > be in the mix if raid5/6 is in use. =A0ie. At a minimum it will have to > adjust the block range to align with the stripe boundaries. > > Greg > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at =A0http://vger.kernel.org/majordomo-info.html > --=20 Regards, Richard Sharpe -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org