linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ric Wheeler <rwheeler@redhat.com>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Matthew Wilcox <willy@linux.intel.com>,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	Nitin Gupta <ngupta@vflare.org>, Ingo Molnar <mingo@elte.hu>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-scsi@vger.kernel.org, linux-ide@vger.kernel.org
Subject: Re: Discard support (was Re: [PATCH] swap: send callback when swap slot is freed)
Date: Thu, 13 Aug 2009 14:22:43 -0400	[thread overview]
Message-ID: <4A8459F3.5060703@redhat.com> (raw)
In-Reply-To: <1250178192.3901.54.camel@mulgrave.site>

On 08/13/2009 11:43 AM, James Bottomley wrote:
> On Thu, 2009-08-13 at 08:13 -0700, Matthew Wilcox wrote:
>    
>> On Wed, Aug 12, 2009 at 11:48:27PM +0100, Hugh Dickins wrote:
>>      
>>> But fundamentally, though I can see how this cutdown communication
>>> path is useful to compcache, I'd much rather deal with it by the more
>>> general discard route if we can.  (I'm one of those still puzzled by
>>> the way swap is mixed up with block device in compcache: probably
>>> because I never found time to pay attention when you explained.)
>>>
>>> You're right to question the utility of the current swap discard
>>> placement.  That code is almost a year old, written from a position
>>> of great ignorance, yet only now do we appear to be on the threshold
>>> of having an SSD which really supports TRIM (ah, the Linux ATA TRIM
>>> support seems to have gone missing now, but perhaps it's been
>>> waiting for a reality to check against too - Willy?).
>>>        
>> I am indeed waiting for hardware with TRIM support to appear on my
>> desk before resubmitting the TRIM code.  It'd also be nice to be able to
>> get some performance numbers.
>>
>>      
>>> I won't be surprised if we find that we need to move swap discard
>>> support much closer to swap_free (though I know from trying before
>>> that it's much messier there): in which case, even if we decided to
>>> keep your hotline to compcache (to avoid allocating bios etc.), it
>>> would be better placed alongside.
>>>        
>> It turns out there are a lot of tradeoffs involved with discard, and
>> they're different between TRIM and UNMAP.
>>
>> Let's start with UNMAP.  This SCSI command is used by giant arrays.
>> They want to do Thin Provisioning, so allocate physical storage to virtual
>> LUNs on demand, and want to deallocate it when they get an UNMAP command.
>> They allocate storage in large chunks (hundreds of kilobytes at a time).
>> They only care about discards that enable them to free an entire chunk.
>> The vast majority of users *do not care* about these arrays, because
>> they don't have one, and will never be able to afford one.  We should
>> ignore the desires of these vendors when designing our software.
>>      
>
> Fundamentally, unmap, trim and write_same do similar things, so
> realistically they all map to discard in linux.
>
> Ignoring the desires of the enterprise isn't an option, since they are a
> good base for us.  However, they really do need to step up with a useful
> patch set for discussion that does what they want, so in the interim I'm
> happy with any proposal that doesn't actively damage what the enterprise
> wants to do with trim/write_same.
>    

I definitely agree - the UNMAP support and the needs of array users is a 
critical part of the solution.

I would also dispute the contention that this is irrelevant to most 
users - even those of us who don't personally use arrays almost always 
use them indirectly since major banks, airlines, etc all use them to 
store our data :-)

>    
>> Solid State Drives are introducing an ATA command called TRIM.  SSDs
>> generally have an intenal mapping layer, and due to their low, low seek
>> penalty, will happily remap blocks anywhere on the flash.  They want
>> to know when a block isn't in use any more, so they don't have to copy
>> it around when they want to erase the chunk of storage that it's on.
>> The unfortunate thing about the TRIM command is that it's not NCQ, so
>> all NCQ commands have to finish, then we can send the TRIM command and
>> wait for it to finish, then we can send NCQ commands again.
>>      
>
> That's a bit of a silly protocol oversight ... I assume there's no way
> it can be corrected?
>
>    
>> So TRIM isn't free, and there's a better way for the drive to find
>> out that the contents of a block no longer matter -- write some new
>> data to it.  So if we just swapped a page in, and we're going to swap
>> something else back out again soon, just write it to the same location
>> instead of to a fresh location.  You've saved a command, and you've
>> saved the drive some work, plus you've allowed other users to continue
>> accessing the drive in the meantime.
>>
>> I am planning a complete overhaul of the discard work.  Users can send
>> down discard requests as frequently as they like.  The block layer will
>> cache them, and invalidate them if writes come through.  Periodically,
>> the block layer will send down a TRIM or an UNMAP (depending on the
>> underlying device) and get rid of the blocks that have remained unwanted
>> in the interim.
>>
>> Thoughts on that are welcome.
>>      
>
> What you're basically planning is discard accumulation ... it's
> certainly closer to what the enterprise is looking for, so no objections
> from me.
>
> James
>
>    

This sounds like a good approach to me as well. I think that both TRIM 
and UNMAP use case will benefit from coalescing these discard requests,

Ric

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-08-13 18:23 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-12 14:37 [PATCH] swap: send callback when swap slot is freed Nitin Gupta
2009-08-12 22:48 ` Hugh Dickins
2009-08-13  2:30   ` Nitin Gupta
2009-08-13  6:53     ` Peter Zijlstra
2009-08-13 14:44       ` Nitin Gupta
2009-08-13 17:45     ` Hugh Dickins
2009-08-13  2:41   ` Nitin Gupta
2009-08-13  5:05     ` compcache as a pre-swap area (was: [PATCH] swap: send callback when swap slot is freed) Al Boldi
2009-08-13 17:31       ` Nitin Gupta
2009-08-14  4:02         ` Al Boldi
2009-08-14  4:53           ` compcache as a pre-swap area Nitin Gupta
2009-08-14 15:49             ` Al Boldi
2009-08-15 11:00               ` Al Boldi
2009-08-13 15:13   ` Discard support (was Re: [PATCH] swap: send callback when swap slot is freed) Matthew Wilcox
2009-08-13 15:17     ` david
2009-08-13 15:26       ` Matthew Wilcox
2009-08-13 15:43     ` James Bottomley
2009-08-13 18:22       ` Ric Wheeler [this message]
2009-08-13 16:13     ` Nitin Gupta
2009-08-13 16:26     ` Markus Trippelsdorf
2009-08-13 16:33       ` david
2009-08-13 18:15         ` Greg Freemyer
2009-08-13 19:18           ` James Bottomley
2009-08-13 20:31             ` Richard Sharpe
2009-08-14 22:03             ` Mark Lord
2009-08-14 22:54               ` Greg Freemyer
2009-08-15 13:12                 ` Mark Lord
2009-08-13 20:44           ` david
2009-08-13 20:54             ` Bryan Donlan
2009-08-14 22:10               ` Mark Lord
2009-08-14 23:21                 ` Chris Worley
2009-08-14 23:45                   ` Matthew Wilcox
2009-08-15  0:19                     ` Chris Worley
2009-08-15  0:30                       ` Greg Freemyer
2009-08-15  0:38                         ` Chris Worley
2009-08-15  1:55                           ` Greg Freemyer
2009-08-15 13:20                           ` Mark Lord
2009-08-16 22:52                             ` Chris Worley
2009-08-17  2:03                               ` Mark Lord
2009-08-15 12:59                       ` James Bottomley
2009-08-15 13:22                         ` Mark Lord
2009-08-15 13:55                           ` James Bottomley
2009-08-15 17:39                             ` jim owens
2009-08-16 17:08                               ` Robert Hancock
2009-08-16 14:05                             ` Alan Cox
2009-08-16 14:16                               ` Mark Lord
2009-08-16 15:34                               ` Arjan van de Ven
2009-08-16 15:44                                 ` Theodore Tso
2009-08-16 17:28                                   ` Mark Lord
2009-08-16 17:37                                     ` Mark Lord
2009-08-17 16:30                                       ` Bill Davidsen
2009-08-17 16:56                                         ` jim owens
2009-08-17 17:14                                           ` Bill Davidsen
2009-08-17 17:37                                             ` jim owens
2009-08-16 17:37                                     ` Mark Lord
2009-08-16 15:52                                 ` James Bottomley
2009-08-16 16:32                                   ` Mark Lord
2009-08-16 18:07                                     ` James Bottomley
2009-08-16 18:19                                       ` Mark Lord
2009-08-16 18:24                                         ` James Bottomley
2009-08-17 16:37                                           ` Bill Davidsen
2009-08-17 17:08                                             ` Greg Freemyer
2009-08-17 17:19                                               ` James Bottomley
2009-08-17 18:16                                                 ` Ric Wheeler
2009-08-17 18:21                                                 ` Greg Freemyer
2009-08-17 19:18                                                   ` James Bottomley
2009-08-17 20:19                                                     ` Mark Lord
2009-08-17 20:28                                                       ` James Bottomley
2009-08-17 20:28                                               ` Mark Lord
2009-08-16 16:59                                   ` Christoph Hellwig
2009-08-17  4:24                                     ` Douglas Gilbert
2009-08-17 13:56                                     ` James Bottomley
2009-08-17 14:10                                       ` Matthew Wilcox
2009-08-17 19:12                                         ` Christoph Hellwig
2009-08-17 19:24                                           ` James Bottomley
2009-08-16 21:50                                   ` Discard support Roland Dreier
2009-08-16 22:06                                     ` Jeff Garzik
2009-08-16 22:13                                     ` Theodore Tso
2009-08-16 22:51                                       ` Mark Lord
2009-08-16 19:29                                 ` Discard support (was Re: [PATCH] swap: send callback when swap slot is freed) Alan Cox
2009-08-16 23:05                                   ` John Robinson
2009-08-17  2:05                                     ` Mark Lord
2009-08-13 21:28             ` Greg Freemyer
2009-08-13 22:20               ` Richard Sharpe
2009-08-14  0:19                 ` Greg Freemyer
     [not found]                   ` <46b8a8850908131758s781b07f6v2729483c0e50ae7a@mail.gmail.com>
2009-08-14 21:33                     ` Greg Freemyer
2009-08-14 21:56                       ` Discard support Roland Dreier
2009-08-14 22:10                         ` Greg Freemyer
2009-08-13 17:19     ` Discard support (was Re: [PATCH] swap: send callback when swap slot is freed) Hugh Dickins
2009-08-13 18:08     ` Douglas Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A8459F3.5060703@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=ngupta@vflare.org \
    --cc=peterz@infradead.org \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox