linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Shaohua Li <shli@kernel.org>
Cc: linux-mm@kvack.org, riel@redhat.com, minchan@kernel.org,
	kmpark@infradead.org, hughd@google.com, aquini@redhat.com
Subject: Re: [patch 2/4 v6]swap: make swap discard async
Date: Wed, 17 Jul 2013 15:09:13 -0700	[thread overview]
Message-ID: <20130717150913.1286deef1a27bf2d2712e16f@linux-foundation.org> (raw)
In-Reply-To: <20130715204341.GB7925@kernel.org>

On Tue, 16 Jul 2013 04:43:41 +0800 Shaohua Li <shli@kernel.org> wrote:

> swap can do cluster discard for SSD, which is good, but there are some problems
> here:
> 1. swap do the discard just before page reclaim gets a swap entry and writes
> the disk sectors. This is useless for high end SSD, because an overwrite to a
> sector implies a discard to original sector too. A discard + overwrite ==
> overwrite.
> 2. the purpose of doing discard is to improve SSD firmware garbage collection.
> Idealy we should send discard as early as possible, so firmware can do
> something smart. Sending discard just after swap entry is freed is considered
> early compared to sending discard before write. Of course, if workload is
> already bound to gc speed, sending discard earlier or later doesn't make
> difference.
> 3. block discard is a sync API, which will delay scan_swap_map() significantly.
> 4. Write and discard command can be executed parallel in PCIe SSD. Making
> swap discard async can make execution more efficiently.
> 
> This patch makes swap discard async and move discard to where swap entry is
> freed. Discard and write have no dependence now, so above issues can be avoided.
> Idealy we should do discard for any freed sectors, but some SSD discard is very
> slow. This patch still does discard for a whole cluster. 
> 
> My test does a several round of 'mmap, write, unmap', which will trigger a lot
> of swap discard. In a fusionio card, with this patch, the test runtime is
> reduced to 18% of the time without it, so around 5.5x faster.
> 
> ...
>
> +static void swap_do_scheduled_discard(struct swap_info_struct *si)
> +{
> +	struct swap_cluster_info *info;
> +	unsigned int idx;
> +
> +	info = si->cluster_info;
> +
> +	while (!cluster_is_null(&si->discard_cluster_head)) {
> +		idx = cluster_next(&si->discard_cluster_head);
> +
> +		cluster_set_next_flag(&si->discard_cluster_head,
> +						cluster_next(&info[idx]), 0);
> +		if (cluster_next(&si->discard_cluster_tail) == idx) {
> +			cluster_set_null(&si->discard_cluster_head);
> +			cluster_set_null(&si->discard_cluster_tail);
> +		}
> +		spin_unlock(&si->lock);
> +
> +		discard_swap_cluster(si, idx * SWAPFILE_CLUSTER,
> +				SWAPFILE_CLUSTER);
> +
> +		spin_lock(&si->lock);
> +		cluster_set_flag(&info[idx], CLUSTER_FLAG_FREE);

Wait.  How can we do this?  We dropped the spinlock, so `idx' is now
invalid.

> +		if (cluster_is_null(&si->free_cluster_head)) {
> +			cluster_set_next_flag(&si->free_cluster_head,
> +						idx, 0);
> +			cluster_set_next_flag(&si->free_cluster_tail,
> +						idx, 0);
> +		} else {
> +			unsigned int next;
> +
> +			next = cluster_next(&si->free_cluster_tail);
> +			cluster_set_next(&info[next], idx);
> +			cluster_set_next_flag(&si->free_cluster_tail,
> +						idx, 0);

ditto.

> +		}
> +		memset(si->swap_map + idx * SWAPFILE_CLUSTER,
> +				0, SWAPFILE_CLUSTER);

again.

> +	}
> +}
> +
> 
> ...
>
> @@ -331,19 +414,6 @@ static unsigned long scan_swap_map(struc
>  			si->cluster_nr = SWAPFILE_CLUSTER - 1;
>  			goto checks;
>  		}
> -		if (si->flags & SWP_PAGE_DISCARD) {
> -			/*
> -			 * Start range check on racing allocations, in case
> -			 * they overlap the cluster we eventually decide on
> -			 * (we scan without swap_lock to allow preemption).
> -			 * It's hardly conceivable that cluster_nr could be
> -			 * wrapped during our scan, but don't depend on it.
> -			 */
> -			if (si->lowest_alloc)
> -				goto checks;
> -			si->lowest_alloc = si->max;
> -			si->highest_alloc = 0;
> -		}
>  check_cluster:
>  		if (!cluster_is_null(&si->free_cluster_head)) {
>  			offset = cluster_next(&si->free_cluster_head) *
> @@ -351,15 +421,22 @@ check_cluster:
>  			last_in_cluster = offset + SWAPFILE_CLUSTER - 1;
>  			si->cluster_next = offset;
>  			si->cluster_nr = SWAPFILE_CLUSTER - 1;
> -			found_free_cluster = 1;
>  			goto checks;
>  		} else if (si->cluster_info) {
>  			/*
> +			 * we don't have free cluster but have some clusters in
> +			 * discarding, do discard now and reclaim them
> +			 */
> +			if (!cluster_is_null(&si->discard_cluster_head)) {
> +				swap_do_scheduled_discard(si);
> +				goto check_cluster;

Again, swap_do_scheduled_discard() might have dropped the lock.  The
state which scan_swap_map() has copied in from the swap_info_struct is
now invalidated.  `scan_base' and `offset' might have changed. 
si->cluster_nr may have changed.  



> +			}
> +
> +			/*
>  			 * Checking free cluster is fast enough, we can do the
>  			 * check every time
>  			 */
>  			si->cluster_nr = 0;
> -			si->lowest_alloc = 0;
>  			goto checks;
>  		}
> 
> ...
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-07-17 22:09 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-15 20:43 Shaohua Li
2013-07-17 22:09 ` Andrew Morton [this message]
2013-07-18 10:37   ` Shaohua Li
2013-07-22 10:05 Shaohua Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130717150913.1286deef1a27bf2d2712e16f@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=aquini@redhat.com \
    --cc=hughd@google.com \
    --cc=kmpark@infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=riel@redhat.com \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox