* [PATCH 1/9] swapfile: swapon needs larger size type
2008-11-25 21:35 [PATCH 0/9] swapfile: cleanups and solidstate mods Hugh Dickins
@ 2008-11-25 21:36 ` Hugh Dickins
2008-11-25 21:37 ` [PATCH 2/9] swapfile: remove SWP_ACTIVE mask Hugh Dickins
` (4 subsequent siblings)
5 siblings, 0 replies; 18+ messages in thread
From: Hugh Dickins @ 2008-11-25 21:36 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm
sys_swapon()'s swapfilesize (better renamed swapfilepages) is declared
as an int, but should be an unsigned long like the maxpages it's compared
against: on 64-bit (with 4kB pages) a swapfile of 2^44 bytes was rejected
with "Swap area shorter than signature indicates".
Signed-off-by: Hugh Dickins <hugh@veritas.com>
---
mkswap needs its own fixes for this: I'll be sending to Karel.
mm/swapfile.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- swapfile0/mm/swapfile.c 2008-11-24 13:27:00.000000000 +0000
+++ swapfile1/mm/swapfile.c 2008-11-25 12:41:17.000000000 +0000
@@ -1452,7 +1452,7 @@ asmlinkage long sys_swapon(const char __
int nr_extents = 0;
sector_t span;
unsigned long maxpages = 1;
- int swapfilesize;
+ unsigned long swapfilepages;
unsigned short *swap_map = NULL;
struct page *page = NULL;
struct inode *inode = NULL;
@@ -1530,7 +1530,7 @@ asmlinkage long sys_swapon(const char __
goto bad_swap;
}
- swapfilesize = i_size_read(inode) >> PAGE_SHIFT;
+ swapfilepages = i_size_read(inode) >> PAGE_SHIFT;
/*
* Read the swap header.
@@ -1607,7 +1607,7 @@ asmlinkage long sys_swapon(const char __
error = -EINVAL;
if (!maxpages)
goto bad_swap;
- if (swapfilesize && maxpages > swapfilesize) {
+ if (swapfilepages && maxpages > swapfilepages) {
printk(KERN_WARNING
"Swap area shorter than signature indicates\n");
goto bad_swap;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread* [PATCH 2/9] swapfile: remove SWP_ACTIVE mask
2008-11-25 21:35 [PATCH 0/9] swapfile: cleanups and solidstate mods Hugh Dickins
2008-11-25 21:36 ` [PATCH 1/9] swapfile: swapon needs larger size type Hugh Dickins
@ 2008-11-25 21:37 ` Hugh Dickins
2008-11-25 21:37 ` [PATCH 3/9] swapfile: remove surplus whitespace Hugh Dickins
` (3 subsequent siblings)
5 siblings, 0 replies; 18+ messages in thread
From: Hugh Dickins @ 2008-11-25 21:37 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm
Remove the SWP_ACTIVE mask: it just obscures the SWP_WRITEOK flag.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
---
include/linux/swap.h | 1 -
mm/swapfile.c | 4 ++--
2 files changed, 2 insertions(+), 3 deletions(-)
--- swapfile1/include/linux/swap.h 2008-11-24 13:27:00.000000000 +0000
+++ swapfile2/include/linux/swap.h 2008-11-25 12:41:19.000000000 +0000
@@ -120,7 +120,6 @@ struct swap_extent {
enum {
SWP_USED = (1 << 0), /* is slot in swap_info[] used? */
SWP_WRITEOK = (1 << 1), /* ok to write to this swap? */
- SWP_ACTIVE = (SWP_USED | SWP_WRITEOK),
/* add others here before... */
SWP_SCANNING = (1 << 8), /* refcount in scan_swap_map */
};
--- swapfile1/mm/swapfile.c 2008-11-25 12:41:17.000000000 +0000
+++ swapfile2/mm/swapfile.c 2008-11-25 12:41:19.000000000 +0000
@@ -1222,7 +1222,7 @@ asmlinkage long sys_swapoff(const char _
spin_lock(&swap_lock);
for (type = swap_list.head; type >= 0; type = swap_info[type].next) {
p = swap_info + type;
- if ((p->flags & SWP_ACTIVE) == SWP_ACTIVE) {
+ if (p->flags & SWP_WRITEOK) {
if (p->swap_file->f_mapping == mapping)
break;
}
@@ -1665,7 +1665,7 @@ asmlinkage long sys_swapon(const char __
else
p->prio = --least_priority;
p->swap_map = swap_map;
- p->flags = SWP_ACTIVE;
+ p->flags |= SWP_WRITEOK;
nr_swap_pages += nr_good_pages;
total_swap_pages += nr_good_pages;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread* [PATCH 3/9] swapfile: remove surplus whitespace
2008-11-25 21:35 [PATCH 0/9] swapfile: cleanups and solidstate mods Hugh Dickins
2008-11-25 21:36 ` [PATCH 1/9] swapfile: swapon needs larger size type Hugh Dickins
2008-11-25 21:37 ` [PATCH 2/9] swapfile: remove SWP_ACTIVE mask Hugh Dickins
@ 2008-11-25 21:37 ` Hugh Dickins
2008-11-25 21:39 ` [PATCH 4/9] swapfile: remove v0 SWAP-SPACE message Hugh Dickins
` (2 subsequent siblings)
5 siblings, 0 replies; 18+ messages in thread
From: Hugh Dickins @ 2008-11-25 21:37 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm
Remove trailing whitespace from swapfile.c, and odd swap_show() alignment.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
---
mm/swapfile.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
--- swapfile2/mm/swapfile.c 2008-11-25 12:41:19.000000000 +0000
+++ swapfile3/mm/swapfile.c 2008-11-25 12:41:24.000000000 +0000
@@ -92,7 +92,7 @@ static inline unsigned long scan_swap_ma
unsigned long offset, last_in_cluster;
int latency_ration = LATENCY_LIMIT;
- /*
+ /*
* We try to cluster swap pages by allocating them sequentially
* in swap. Once we've allocated SWAPFILE_CLUSTER pages this
* way, however, we resort to first-free allocation, starting
@@ -269,7 +269,7 @@ bad_nofile:
printk(KERN_ERR "swap_free: %s%08lx\n", Bad_file, entry.val);
out:
return NULL;
-}
+}
static int swap_entry_free(struct swap_info_struct *p, unsigned long offset)
{
@@ -736,10 +736,10 @@ static int try_to_unuse(unsigned int typ
break;
}
- /*
+ /*
* Get a page for the entry, using the existing swap
* cache page if there is one. Otherwise, get a clean
- * page and read the swap into it.
+ * page and read the swap into it.
*/
swap_map = &si->swap_map[i];
entry = swp_entry(type, i);
@@ -1202,7 +1202,7 @@ asmlinkage long sys_swapoff(const char _
char * pathname;
int i, type, prev;
int err;
-
+
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -1395,12 +1395,12 @@ static int swap_show(struct seq_file *sw
file = ptr->swap_file;
len = seq_path(swap, &file->f_path, " \t\n\\");
seq_printf(swap, "%*s%s\t%u\t%u\t%d\n",
- len < 40 ? 40 - len : 1, " ",
- S_ISBLK(file->f_path.dentry->d_inode->i_mode) ?
+ len < 40 ? 40 - len : 1, " ",
+ S_ISBLK(file->f_path.dentry->d_inode->i_mode) ?
"partition" : "file\t",
- ptr->pages << (PAGE_SHIFT - 10),
- ptr->inuse_pages << (PAGE_SHIFT - 10),
- ptr->prio);
+ ptr->pages << (PAGE_SHIFT - 10),
+ ptr->inuse_pages << (PAGE_SHIFT - 10),
+ ptr->prio);
return 0;
}
@@ -1556,7 +1556,7 @@ asmlinkage long sys_swapon(const char __
error = -EINVAL;
goto bad_swap;
}
-
+
switch (swap_header_version) {
case 1:
printk(KERN_ERR "version 0 swap is no longer supported. "
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread* [PATCH 4/9] swapfile: remove v0 SWAP-SPACE message
2008-11-25 21:35 [PATCH 0/9] swapfile: cleanups and solidstate mods Hugh Dickins
` (2 preceding siblings ...)
2008-11-25 21:37 ` [PATCH 3/9] swapfile: remove surplus whitespace Hugh Dickins
@ 2008-11-25 21:39 ` Hugh Dickins
2008-11-25 21:40 ` [PATCH 5/9] swapfile: rearrange scan and swap_info Hugh Dickins
2008-11-25 21:44 ` [PATCH 6/9] swapfile: swapon use discard (trim) Hugh Dickins
5 siblings, 0 replies; 18+ messages in thread
From: Hugh Dickins @ 2008-11-25 21:39 UTC (permalink / raw)
To: Andrew Morton; +Cc: KAMEZAWA Hiroyuki, linux-mm
The kernel has not supported v0 SWAP-SPACE since 2.5.22: I think we can now
safely drop its "version 0 swap is no longer supported" message - just say
"Unable to find swap-space signature" as usual. This removes one level of
indentation from a stretch of sys_swapon().
I'd have liked to be specific, saying "Unable to find SWAPSPACE2 signature",
but it's just too confusing that the version 1 signature shows the number 2.
Irrelevant nearby cleanup: kmap(page) already gives page_address(page).
Signed-off-by: Hugh Dickins <hugh@veritas.com>
---
This reindentation clashes with memcg-swap-cgroup-for-remembering-usage.patch
See the [PATCH 0/9] message for two hunks to replace its final hunk.
mm/swapfile.c | 146 +++++++++++++++++++++---------------------------
1 file changed, 65 insertions(+), 81 deletions(-)
--- swapfile3/mm/swapfile.c 2008-11-25 12:41:24.000000000 +0000
+++ swapfile4/mm/swapfile.c 2008-11-25 12:41:26.000000000 +0000
@@ -1447,7 +1447,6 @@ asmlinkage long sys_swapon(const char __
int i, prev;
int error;
union swap_header *swap_header = NULL;
- int swap_header_version;
unsigned int nr_good_pages = 0;
int nr_extents = 0;
sector_t span;
@@ -1544,101 +1543,86 @@ asmlinkage long sys_swapon(const char __
error = PTR_ERR(page);
goto bad_swap;
}
- kmap(page);
- swap_header = page_address(page);
+ swap_header = kmap(page);
- if (!memcmp("SWAP-SPACE",swap_header->magic.magic,10))
- swap_header_version = 1;
- else if (!memcmp("SWAPSPACE2",swap_header->magic.magic,10))
- swap_header_version = 2;
- else {
+ if (memcmp("SWAPSPACE2", swap_header->magic.magic, 10)) {
printk(KERN_ERR "Unable to find swap-space signature\n");
error = -EINVAL;
goto bad_swap;
}
- switch (swap_header_version) {
- case 1:
- printk(KERN_ERR "version 0 swap is no longer supported. "
- "Use mkswap -v1 %s\n", name);
+ /* swap partition endianess hack... */
+ if (swab32(swap_header->info.version) == 1) {
+ swab32s(&swap_header->info.version);
+ swab32s(&swap_header->info.last_page);
+ swab32s(&swap_header->info.nr_badpages);
+ for (i = 0; i < swap_header->info.nr_badpages; i++)
+ swab32s(&swap_header->info.badpages[i]);
+ }
+ /* Check the swap header's sub-version */
+ if (swap_header->info.version != 1) {
+ printk(KERN_WARNING
+ "Unable to handle swap header version %d\n",
+ swap_header->info.version);
error = -EINVAL;
goto bad_swap;
- case 2:
- /* swap partition endianess hack... */
- if (swab32(swap_header->info.version) == 1) {
- swab32s(&swap_header->info.version);
- swab32s(&swap_header->info.last_page);
- swab32s(&swap_header->info.nr_badpages);
- for (i = 0; i < swap_header->info.nr_badpages; i++)
- swab32s(&swap_header->info.badpages[i]);
- }
- /* Check the swap header's sub-version and the size of
- the swap file and bad block lists */
- if (swap_header->info.version != 1) {
- printk(KERN_WARNING
- "Unable to handle swap header version %d\n",
- swap_header->info.version);
- error = -EINVAL;
- goto bad_swap;
- }
+ }
- p->lowest_bit = 1;
- p->cluster_next = 1;
+ p->lowest_bit = 1;
+ p->cluster_next = 1;
- /*
- * Find out how many pages are allowed for a single swap
- * device. There are two limiting factors: 1) the number of
- * bits for the swap offset in the swp_entry_t type and
- * 2) the number of bits in the a swap pte as defined by
- * the different architectures. In order to find the
- * largest possible bit mask a swap entry with swap type 0
- * and swap offset ~0UL is created, encoded to a swap pte,
- * decoded to a swp_entry_t again and finally the swap
- * offset is extracted. This will mask all the bits from
- * the initial ~0UL mask that can't be encoded in either
- * the swp_entry_t or the architecture definition of a
- * swap pte.
- */
- maxpages = swp_offset(pte_to_swp_entry(swp_entry_to_pte(swp_entry(0,~0UL)))) - 1;
- if (maxpages > swap_header->info.last_page)
- maxpages = swap_header->info.last_page;
- p->highest_bit = maxpages - 1;
+ /*
+ * Find out how many pages are allowed for a single swap
+ * device. There are two limiting factors: 1) the number of
+ * bits for the swap offset in the swp_entry_t type and
+ * 2) the number of bits in the a swap pte as defined by
+ * the different architectures. In order to find the
+ * largest possible bit mask a swap entry with swap type 0
+ * and swap offset ~0UL is created, encoded to a swap pte,
+ * decoded to a swp_entry_t again and finally the swap
+ * offset is extracted. This will mask all the bits from
+ * the initial ~0UL mask that can't be encoded in either
+ * the swp_entry_t or the architecture definition of a
+ * swap pte.
+ */
+ maxpages = swp_offset(pte_to_swp_entry(
+ swp_entry_to_pte(swp_entry(0, ~0UL)))) - 1;
+ if (maxpages > swap_header->info.last_page)
+ maxpages = swap_header->info.last_page;
+ p->highest_bit = maxpages - 1;
- error = -EINVAL;
- if (!maxpages)
- goto bad_swap;
- if (swapfilepages && maxpages > swapfilepages) {
- printk(KERN_WARNING
- "Swap area shorter than signature indicates\n");
- goto bad_swap;
- }
- if (swap_header->info.nr_badpages && S_ISREG(inode->i_mode))
- goto bad_swap;
- if (swap_header->info.nr_badpages > MAX_SWAP_BADPAGES)
- goto bad_swap;
+ error = -EINVAL;
+ if (!maxpages)
+ goto bad_swap;
+ if (swapfilepages && maxpages > swapfilepages) {
+ printk(KERN_WARNING
+ "Swap area shorter than signature indicates\n");
+ goto bad_swap;
+ }
+ if (swap_header->info.nr_badpages && S_ISREG(inode->i_mode))
+ goto bad_swap;
+ if (swap_header->info.nr_badpages > MAX_SWAP_BADPAGES)
+ goto bad_swap;
- /* OK, set up the swap map and apply the bad block list */
- swap_map = vmalloc(maxpages * sizeof(short));
- if (!swap_map) {
- error = -ENOMEM;
- goto bad_swap;
- }
+ /* OK, set up the swap map and apply the bad block list */
+ swap_map = vmalloc(maxpages * sizeof(short));
+ if (!swap_map) {
+ error = -ENOMEM;
+ goto bad_swap;
+ }
- error = 0;
- memset(swap_map, 0, maxpages * sizeof(short));
- for (i = 0; i < swap_header->info.nr_badpages; i++) {
- int page_nr = swap_header->info.badpages[i];
- if (page_nr <= 0 || page_nr >= swap_header->info.last_page)
- error = -EINVAL;
- else
- swap_map[page_nr] = SWAP_MAP_BAD;
- }
- nr_good_pages = swap_header->info.last_page -
- swap_header->info.nr_badpages -
- 1 /* header page */;
- if (error)
+ memset(swap_map, 0, maxpages * sizeof(short));
+ for (i = 0; i < swap_header->info.nr_badpages; i++) {
+ int page_nr = swap_header->info.badpages[i];
+ if (page_nr <= 0 || page_nr >= swap_header->info.last_page) {
+ error = -EINVAL;
goto bad_swap;
+ }
+ swap_map[page_nr] = SWAP_MAP_BAD;
}
+ nr_good_pages = swap_header->info.last_page -
+ swap_header->info.nr_badpages -
+ 1 /* header page */;
if (nr_good_pages) {
swap_map[0] = SWAP_MAP_BAD;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread* [PATCH 5/9] swapfile: rearrange scan and swap_info
2008-11-25 21:35 [PATCH 0/9] swapfile: cleanups and solidstate mods Hugh Dickins
` (3 preceding siblings ...)
2008-11-25 21:39 ` [PATCH 4/9] swapfile: remove v0 SWAP-SPACE message Hugh Dickins
@ 2008-11-25 21:40 ` Hugh Dickins
2008-11-25 21:44 ` [PATCH 6/9] swapfile: swapon use discard (trim) Hugh Dickins
5 siblings, 0 replies; 18+ messages in thread
From: Hugh Dickins @ 2008-11-25 21:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm
Before making functional changes, rearrange scan_swap_map() to simplify
subsequent diffs. Actually, there is one functional change in there:
leave cluster_nr negative while scanning for a new cluster - resetting
it early increased the likelihood that when we have difficulty finding
a free cluster, another task may come in and try doing exactly the same
- just a waste of cpu.
Before making functional changes, rearrange struct swap_info_struct
slightly: flags will be needed as an unsigned long (for wait_on_bit),
next is a good int to pair with prio, old_block_size is uninteresting
so shift it to the end.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
---
include/linux/swap.h | 8 ++--
mm/swapfile.c | 66 ++++++++++++++++++++++-------------------
2 files changed, 41 insertions(+), 33 deletions(-)
--- swapfile4/include/linux/swap.h 2008-11-25 12:41:19.000000000 +0000
+++ swapfile5/include/linux/swap.h 2008-11-25 12:41:31.000000000 +0000
@@ -133,14 +133,14 @@ enum {
* The in-memory structure used to track swap areas.
*/
struct swap_info_struct {
- unsigned int flags;
+ unsigned long flags;
int prio; /* swap priority */
+ int next; /* next entry on swap list */
struct file *swap_file;
struct block_device *bdev;
struct list_head extent_list;
struct swap_extent *curr_swap_extent;
- unsigned old_block_size;
- unsigned short * swap_map;
+ unsigned short *swap_map;
unsigned int lowest_bit;
unsigned int highest_bit;
unsigned int cluster_next;
@@ -148,7 +148,7 @@ struct swap_info_struct {
unsigned int pages;
unsigned int max;
unsigned int inuse_pages;
- int next; /* next entry on swap list */
+ unsigned int old_block_size;
};
struct swap_list_t {
--- swapfile4/mm/swapfile.c 2008-11-25 12:41:26.000000000 +0000
+++ swapfile5/mm/swapfile.c 2008-11-25 12:41:31.000000000 +0000
@@ -89,7 +89,8 @@ void swap_unplug_io_fn(struct backing_de
static inline unsigned long scan_swap_map(struct swap_info_struct *si)
{
- unsigned long offset, last_in_cluster;
+ unsigned long offset;
+ unsigned long last_in_cluster;
int latency_ration = LATENCY_LIMIT;
/*
@@ -103,10 +104,13 @@ static inline unsigned long scan_swap_ma
*/
si->flags += SWP_SCANNING;
- if (unlikely(!si->cluster_nr)) {
- si->cluster_nr = SWAPFILE_CLUSTER - 1;
- if (si->pages - si->inuse_pages < SWAPFILE_CLUSTER)
- goto lowest;
+ offset = si->cluster_next;
+
+ if (unlikely(!si->cluster_nr--)) {
+ if (si->pages - si->inuse_pages < SWAPFILE_CLUSTER) {
+ si->cluster_nr = SWAPFILE_CLUSTER - 1;
+ goto checks;
+ }
spin_unlock(&swap_lock);
offset = si->lowest_bit;
@@ -118,43 +122,47 @@ static inline unsigned long scan_swap_ma
last_in_cluster = offset + SWAPFILE_CLUSTER;
else if (offset == last_in_cluster) {
spin_lock(&swap_lock);
- si->cluster_next = offset-SWAPFILE_CLUSTER+1;
- goto cluster;
+ offset -= SWAPFILE_CLUSTER - 1;
+ si->cluster_next = offset;
+ si->cluster_nr = SWAPFILE_CLUSTER - 1;
+ goto checks;
}
if (unlikely(--latency_ration < 0)) {
cond_resched();
latency_ration = LATENCY_LIMIT;
}
}
+
+ offset = si->lowest_bit;
spin_lock(&swap_lock);
- goto lowest;
+ si->cluster_nr = SWAPFILE_CLUSTER - 1;
}
- si->cluster_nr--;
-cluster:
- offset = si->cluster_next;
- if (offset > si->highest_bit)
-lowest: offset = si->lowest_bit;
-checks: if (!(si->flags & SWP_WRITEOK))
+checks:
+ if (!(si->flags & SWP_WRITEOK))
goto no_page;
if (!si->highest_bit)
goto no_page;
- if (!si->swap_map[offset]) {
- if (offset == si->lowest_bit)
- si->lowest_bit++;
- if (offset == si->highest_bit)
- si->highest_bit--;
- si->inuse_pages++;
- if (si->inuse_pages == si->pages) {
- si->lowest_bit = si->max;
- si->highest_bit = 0;
- }
- si->swap_map[offset] = 1;
- si->cluster_next = offset + 1;
- si->flags -= SWP_SCANNING;
- return offset;
+ if (offset > si->highest_bit)
+ offset = si->lowest_bit;
+ if (si->swap_map[offset])
+ goto scan;
+
+ if (offset == si->lowest_bit)
+ si->lowest_bit++;
+ if (offset == si->highest_bit)
+ si->highest_bit--;
+ si->inuse_pages++;
+ if (si->inuse_pages == si->pages) {
+ si->lowest_bit = si->max;
+ si->highest_bit = 0;
}
+ si->swap_map[offset] = 1;
+ si->cluster_next = offset + 1;
+ si->flags -= SWP_SCANNING;
+ return offset;
+scan:
spin_unlock(&swap_lock);
while (++offset <= si->highest_bit) {
if (!si->swap_map[offset]) {
@@ -167,7 +175,7 @@ checks: if (!(si->flags & SWP_WRITEOK))
}
}
spin_lock(&swap_lock);
- goto lowest;
+ goto checks;
no_page:
si->flags -= SWP_SCANNING;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread* [PATCH 6/9] swapfile: swapon use discard (trim)
2008-11-25 21:35 [PATCH 0/9] swapfile: cleanups and solidstate mods Hugh Dickins
` (4 preceding siblings ...)
2008-11-25 21:40 ` [PATCH 5/9] swapfile: rearrange scan and swap_info Hugh Dickins
@ 2008-11-25 21:44 ` Hugh Dickins
2008-11-25 21:46 ` [PATCH 7/9] swapfile: swap allocation use discard Hugh Dickins
` (3 more replies)
5 siblings, 4 replies; 18+ messages in thread
From: Hugh Dickins @ 2008-11-25 21:44 UTC (permalink / raw)
To: Andrew Morton
Cc: David Woodhouse, Jens Axboe, Matthew Wilcox, Joern Engel,
James Bottomley, Donjun Shin, Tejun Heo, linux-mm, linux-kernel
When adding swap, all the old data on swap can be forgotten: sys_swapon()
discard all but the header page of the swap partition (or every extent
but the header of the swap file), to give a solidstate swap device the
opportunity to optimize its wear-levelling.
If that succeeds, note SWP_DISCARDABLE for later use, and report it
with a "D" at the right end of the kernel's "Adding ... swap" message.
Perhaps something should be shown in /proc/swaps (swapon -s), but we
have to be more cautious before making any addition to that format.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
---
swapfile.c cleanup patches 0-5 just went to linux-mm: patches 6-9
may be of wider interest, so I'm extending the Cc list for them.
include/linux/swap.h | 1 +
mm/swapfile.c | 39 +++++++++++++++++++++++++++++++++++++--
2 files changed, 38 insertions(+), 2 deletions(-)
--- swapfile5/include/linux/swap.h 2008-11-25 12:41:31.000000000 +0000
+++ swapfile6/include/linux/swap.h 2008-11-25 12:41:34.000000000 +0000
@@ -120,6 +120,7 @@ struct swap_extent {
enum {
SWP_USED = (1 << 0), /* is slot in swap_info[] used? */
SWP_WRITEOK = (1 << 1), /* ok to write to this swap? */
+ SWP_DISCARDABLE = (1 << 2), /* blkdev supports discard */
/* add others here before... */
SWP_SCANNING = (1 << 8), /* refcount in scan_swap_map */
};
--- swapfile5/mm/swapfile.c 2008-11-25 12:41:31.000000000 +0000
+++ swapfile6/mm/swapfile.c 2008-11-25 12:41:34.000000000 +0000
@@ -84,6 +84,37 @@ void swap_unplug_io_fn(struct backing_de
up_read(&swap_unplug_sem);
}
+/*
+ * swapon tell device that all the old swap contents can be discarded,
+ * to allow the swap device to optimize its wear-levelling.
+ */
+static int discard_swap(struct swap_info_struct *si)
+{
+ struct swap_extent *se;
+ int err = 0;
+
+ list_for_each_entry(se, &si->extent_list, list) {
+ sector_t start_block = se->start_block << (PAGE_SHIFT - 9);
+ pgoff_t nr_blocks = se->nr_pages << (PAGE_SHIFT - 9);
+
+ if (se->start_page == 0) {
+ /* Do not discard the swap header page! */
+ start_block += 1 << (PAGE_SHIFT - 9);
+ nr_blocks -= 1 << (PAGE_SHIFT - 9);
+ if (!nr_blocks)
+ continue;
+ }
+
+ err = blkdev_issue_discard(si->bdev, start_block,
+ nr_blocks, GFP_KERNEL);
+ if (err)
+ break;
+
+ cond_resched();
+ }
+ return err; /* That will often be -EOPNOTSUPP */
+}
+
#define SWAPFILE_CLUSTER 256
#define LATENCY_LIMIT 256
@@ -1649,6 +1680,9 @@ asmlinkage long sys_swapon(const char __
goto bad_swap;
}
+ if (discard_swap(p) == 0)
+ p->flags |= SWP_DISCARDABLE;
+
mutex_lock(&swapon_mutex);
spin_lock(&swap_lock);
if (swap_flags & SWAP_FLAG_PREFER)
@@ -1662,9 +1696,10 @@ asmlinkage long sys_swapon(const char __
total_swap_pages += nr_good_pages;
printk(KERN_INFO "Adding %uk swap on %s. "
- "Priority:%d extents:%d across:%lluk\n",
+ "Priority:%d extents:%d across:%lluk%s\n",
nr_good_pages<<(PAGE_SHIFT-10), name, p->prio,
- nr_extents, (unsigned long long)span<<(PAGE_SHIFT-10));
+ nr_extents, (unsigned long long)span<<(PAGE_SHIFT-10),
+ (p->flags & SWP_DISCARDABLE) ? " D" : "");
/* insert swap space into swap_list: */
prev = -1;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread* [PATCH 7/9] swapfile: swap allocation use discard
2008-11-25 21:44 ` [PATCH 6/9] swapfile: swapon use discard (trim) Hugh Dickins
@ 2008-11-25 21:46 ` Hugh Dickins
2008-12-01 0:29 ` [PATCH 10/9] swapfile: change discard pgoff_t to sector_t Hugh Dickins
2008-11-25 21:46 ` [PATCH 8/9] swapfile: swapon randomize if nonrot Hugh Dickins
` (2 subsequent siblings)
3 siblings, 1 reply; 18+ messages in thread
From: Hugh Dickins @ 2008-11-25 21:46 UTC (permalink / raw)
To: Andrew Morton
Cc: David Woodhouse, Jens Axboe, Matthew Wilcox, Joern Engel,
James Bottomley, Donjun Shin, Tejun Heo, linux-mm, linux-kernel
When scan_swap_map() finds a free cluster of swap pages to allocate,
discard the old contents of the cluster if the device supports discard.
But don't bother when swap is so fragmented that we allocate single pages.
Be careful about racing allocations made while we're scanning for
a cluster; and hold up allocations made while we're discarding.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
---
include/linux/swap.h | 3 +
mm/swapfile.c | 119 ++++++++++++++++++++++++++++++++++++++++-
2 files changed, 121 insertions(+), 1 deletion(-)
--- swapfile6/include/linux/swap.h 2008-11-25 12:41:34.000000000 +0000
+++ swapfile7/include/linux/swap.h 2008-11-25 12:41:40.000000000 +0000
@@ -121,6 +121,7 @@ enum {
SWP_USED = (1 << 0), /* is slot in swap_info[] used? */
SWP_WRITEOK = (1 << 1), /* ok to write to this swap? */
SWP_DISCARDABLE = (1 << 2), /* blkdev supports discard */
+ SWP_DISCARDING = (1 << 3), /* now discarding a free cluster */
/* add others here before... */
SWP_SCANNING = (1 << 8), /* refcount in scan_swap_map */
};
@@ -144,6 +145,8 @@ struct swap_info_struct {
unsigned short *swap_map;
unsigned int lowest_bit;
unsigned int highest_bit;
+ unsigned int lowest_alloc; /* while preparing discard cluster */
+ unsigned int highest_alloc; /* while preparing discard cluster */
unsigned int cluster_next;
unsigned int cluster_nr;
unsigned int pages;
--- swapfile6/mm/swapfile.c 2008-11-25 12:41:34.000000000 +0000
+++ swapfile7/mm/swapfile.c 2008-11-25 12:41:40.000000000 +0000
@@ -115,14 +115,62 @@ static int discard_swap(struct swap_info
return err; /* That will often be -EOPNOTSUPP */
}
+/*
+ * swap allocation tell device that a cluster of swap can now be discarded,
+ * to allow the swap device to optimize its wear-levelling.
+ */
+static void discard_swap_cluster(struct swap_info_struct *si,
+ pgoff_t start_page, pgoff_t nr_pages)
+{
+ struct swap_extent *se = si->curr_swap_extent;
+ int found_extent = 0;
+
+ while (nr_pages) {
+ struct list_head *lh;
+
+ if (se->start_page <= start_page &&
+ start_page < se->start_page + se->nr_pages) {
+ pgoff_t offset = start_page - se->start_page;
+ sector_t start_block = se->start_block + offset;
+ pgoff_t nr_blocks = se->nr_pages - offset;
+
+ if (nr_blocks > nr_pages)
+ nr_blocks = nr_pages;
+ start_page += nr_blocks;
+ nr_pages -= nr_blocks;
+
+ if (!found_extent++)
+ si->curr_swap_extent = se;
+
+ start_block <<= PAGE_SHIFT - 9;
+ nr_blocks <<= PAGE_SHIFT - 9;
+ if (blkdev_issue_discard(si->bdev, start_block,
+ nr_blocks, GFP_NOIO))
+ break;
+ }
+
+ lh = se->list.next;
+ if (lh == &si->extent_list)
+ lh = lh->next;
+ se = list_entry(lh, struct swap_extent, list);
+ }
+}
+
+static int wait_for_discard(void *word)
+{
+ schedule();
+ return 0;
+}
+
#define SWAPFILE_CLUSTER 256
#define LATENCY_LIMIT 256
static inline unsigned long scan_swap_map(struct swap_info_struct *si)
{
unsigned long offset;
- unsigned long last_in_cluster;
+ unsigned long last_in_cluster = 0;
int latency_ration = LATENCY_LIMIT;
+ int found_free_cluster = 0;
/*
* We try to cluster swap pages by allocating them sequentially
@@ -142,6 +190,19 @@ static inline unsigned long scan_swap_ma
si->cluster_nr = SWAPFILE_CLUSTER - 1;
goto checks;
}
+ if (si->flags & SWP_DISCARDABLE) {
+ /*
+ * Start range check on racing allocations, in case
+ * they overlap the cluster we eventually decide on
+ * (we scan without swap_lock to allow preemption).
+ * It's hardly conceivable that cluster_nr could be
+ * wrapped during our scan, but don't depend on it.
+ */
+ if (si->lowest_alloc)
+ goto checks;
+ si->lowest_alloc = si->max;
+ si->highest_alloc = 0;
+ }
spin_unlock(&swap_lock);
offset = si->lowest_bit;
@@ -156,6 +217,7 @@ static inline unsigned long scan_swap_ma
offset -= SWAPFILE_CLUSTER - 1;
si->cluster_next = offset;
si->cluster_nr = SWAPFILE_CLUSTER - 1;
+ found_free_cluster = 1;
goto checks;
}
if (unlikely(--latency_ration < 0)) {
@@ -167,6 +229,7 @@ static inline unsigned long scan_swap_ma
offset = si->lowest_bit;
spin_lock(&swap_lock);
si->cluster_nr = SWAPFILE_CLUSTER - 1;
+ si->lowest_alloc = 0;
}
checks:
@@ -191,6 +254,60 @@ checks:
si->swap_map[offset] = 1;
si->cluster_next = offset + 1;
si->flags -= SWP_SCANNING;
+
+ if (si->lowest_alloc) {
+ /*
+ * Only set when SWP_DISCARDABLE, and there's a scan
+ * for a free cluster in progress or just completed.
+ */
+ if (found_free_cluster) {
+ /*
+ * To optimize wear-levelling, discard the
+ * old data of the cluster, taking care not to
+ * discard any of its pages that have already
+ * been allocated by racing tasks (offset has
+ * already stepped over any at the beginning).
+ */
+ if (offset < si->highest_alloc &&
+ si->lowest_alloc <= last_in_cluster)
+ last_in_cluster = si->lowest_alloc - 1;
+ si->flags |= SWP_DISCARDING;
+ spin_unlock(&swap_lock);
+
+ if (offset < last_in_cluster)
+ discard_swap_cluster(si, offset,
+ last_in_cluster - offset + 1);
+
+ spin_lock(&swap_lock);
+ si->lowest_alloc = 0;
+ si->flags &= ~SWP_DISCARDING;
+
+ smp_mb(); /* wake_up_bit advises this */
+ wake_up_bit(&si->flags, ilog2(SWP_DISCARDING));
+
+ } else if (si->flags & SWP_DISCARDING) {
+ /*
+ * Delay using pages allocated by racing tasks
+ * until the whole discard has been issued. We
+ * could defer that delay until swap_writepage,
+ * but it's easier to keep this self-contained.
+ */
+ spin_unlock(&swap_lock);
+ wait_on_bit(&si->flags, ilog2(SWP_DISCARDING),
+ wait_for_discard, TASK_UNINTERRUPTIBLE);
+ spin_lock(&swap_lock);
+ } else {
+ /*
+ * Note pages allocated by racing tasks while
+ * scan for a free cluster is in progress, so
+ * that its final discard can exclude them.
+ */
+ if (offset < si->lowest_alloc)
+ si->lowest_alloc = offset;
+ if (offset > si->highest_alloc)
+ si->highest_alloc = offset;
+ }
+ }
return offset;
scan:
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread* [PATCH 10/9] swapfile: change discard pgoff_t to sector_t
2008-11-25 21:46 ` [PATCH 7/9] swapfile: swap allocation use discard Hugh Dickins
@ 2008-12-01 0:29 ` Hugh Dickins
2008-12-03 0:47 ` Andrew Morton
0 siblings, 1 reply; 18+ messages in thread
From: Hugh Dickins @ 2008-12-01 0:29 UTC (permalink / raw)
To: Andrew Morton
Cc: David Woodhouse, Jens Axboe, Matthew Wilcox, Joern Engel,
James Bottomley, Donjun Shin, Tejun Heo, linux-mm, linux-kernel
Change pgoff_t nr_blocks in discard_swap() and discard_swap_cluster() to
sector_t: given the constraints on swap offsets (in particular, the 5 bits
of swap type accommodated in the same unsigned long), pgoff_t was actually
safe as is, but it certainly looked worrying when shifted left.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
---
To follow 9/9 swapfile-swap-allocation-cycle-if-nonrot.patch
mm/swapfile.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- swapfile9/mm/swapfile.c 2008-11-26 12:19:00.000000000 +0000
+++ swapfile10/mm/swapfile.c 2008-11-28 20:36:44.000000000 +0000
@@ -96,7 +96,7 @@ static int discard_swap(struct swap_info
list_for_each_entry(se, &si->extent_list, list) {
sector_t start_block = se->start_block << (PAGE_SHIFT - 9);
- pgoff_t nr_blocks = se->nr_pages << (PAGE_SHIFT - 9);
+ sector_t nr_blocks = se->nr_pages << (PAGE_SHIFT - 9);
if (se->start_page == 0) {
/* Do not discard the swap header page! */
@@ -133,7 +133,7 @@ static void discard_swap_cluster(struct
start_page < se->start_page + se->nr_pages) {
pgoff_t offset = start_page - se->start_page;
sector_t start_block = se->start_block + offset;
- pgoff_t nr_blocks = se->nr_pages - offset;
+ sector_t nr_blocks = se->nr_pages - offset;
if (nr_blocks > nr_pages)
nr_blocks = nr_pages;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 10/9] swapfile: change discard pgoff_t to sector_t
2008-12-01 0:29 ` [PATCH 10/9] swapfile: change discard pgoff_t to sector_t Hugh Dickins
@ 2008-12-03 0:47 ` Andrew Morton
2008-12-03 12:52 ` Hugh Dickins
0 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2008-12-03 0:47 UTC (permalink / raw)
To: Hugh Dickins
Cc: dwmw2, jens.axboe, matthew, joern, James.Bottomley, djshin90,
teheo, linux-mm, linux-kernel
On Mon, 1 Dec 2008 00:29:41 +0000 (GMT)
Hugh Dickins <hugh@veritas.com> wrote:
> Change pgoff_t nr_blocks in discard_swap() and discard_swap_cluster() to
> sector_t: given the constraints on swap offsets (in particular, the 5 bits
> of swap type accommodated in the same unsigned long), pgoff_t was actually
> safe as is, but it certainly looked worrying when shifted left.
>
> Signed-off-by: Hugh Dickins <hugh@veritas.com>
> ---
> To follow 9/9 swapfile-swap-allocation-cycle-if-nonrot.patch
>
> mm/swapfile.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> --- swapfile9/mm/swapfile.c 2008-11-26 12:19:00.000000000 +0000
> +++ swapfile10/mm/swapfile.c 2008-11-28 20:36:44.000000000 +0000
> @@ -96,7 +96,7 @@ static int discard_swap(struct swap_info
>
> list_for_each_entry(se, &si->extent_list, list) {
> sector_t start_block = se->start_block << (PAGE_SHIFT - 9);
> - pgoff_t nr_blocks = se->nr_pages << (PAGE_SHIFT - 9);
> + sector_t nr_blocks = se->nr_pages << (PAGE_SHIFT - 9);
but, but, that didn't change anything? se->nr_pages must be cast to
sector_t?
> if (se->start_page == 0) {
> /* Do not discard the swap header page! */
> @@ -133,7 +133,7 @@ static void discard_swap_cluster(struct
> start_page < se->start_page + se->nr_pages) {
> pgoff_t offset = start_page - se->start_page;
> sector_t start_block = se->start_block + offset;
> - pgoff_t nr_blocks = se->nr_pages - offset;
> + sector_t nr_blocks = se->nr_pages - offset;
>
> if (nr_blocks > nr_pages)
> nr_blocks = nr_pages;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 10/9] swapfile: change discard pgoff_t to sector_t
2008-12-03 0:47 ` Andrew Morton
@ 2008-12-03 12:52 ` Hugh Dickins
0 siblings, 0 replies; 18+ messages in thread
From: Hugh Dickins @ 2008-12-03 12:52 UTC (permalink / raw)
To: Andrew Morton
Cc: dwmw2, jens.axboe, matthew, joern, James.Bottomley, djshin90,
teheo, linux-mm, linux-kernel
On Tue, 2 Dec 2008, Andrew Morton wrote:
> On Mon, 1 Dec 2008 00:29:41 +0000 (GMT)
> Hugh Dickins <hugh@veritas.com> wrote:
>
> > - pgoff_t nr_blocks = se->nr_pages << (PAGE_SHIFT - 9);
> > + sector_t nr_blocks = se->nr_pages << (PAGE_SHIFT - 9);
>
> but, but, that didn't change anything? se->nr_pages must be cast to
> sector_t?
I'm squirming, you are right, thanks for fixing it up.
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 8/9] swapfile: swapon randomize if nonrot
2008-11-25 21:44 ` [PATCH 6/9] swapfile: swapon use discard (trim) Hugh Dickins
2008-11-25 21:46 ` [PATCH 7/9] swapfile: swap allocation use discard Hugh Dickins
@ 2008-11-25 21:46 ` Hugh Dickins
2008-11-26 1:20 ` Andrew Morton
2008-12-01 0:32 ` [PATCH 11/9] swapfile: let others seed random Hugh Dickins
2008-11-25 21:47 ` [PATCH 9/9] swapfile: swap allocation cycle if nonrot Hugh Dickins
2008-11-26 1:17 ` [PATCH 6/9] swapfile: swapon use discard (trim) Andrew Morton
3 siblings, 2 replies; 18+ messages in thread
From: Hugh Dickins @ 2008-11-25 21:46 UTC (permalink / raw)
To: Andrew Morton
Cc: David Woodhouse, Jens Axboe, Matthew Wilcox, Joern Engel,
James Bottomley, Donjun Shin, Tejun Heo, linux-mm, linux-kernel
Swap allocation has always started from the beginning of the swap area;
but if we're dealing with a solidstate swap device which can only remap
blocks within limited zones, that would sooner wear out the first zone.
Therefore sys_swapon() test whether blk_queue is non-rotational,
and if so randomize the cluster_next starting position for allocation.
If blk_queue is nonrot, note SWP_SOLIDSTATE for later use, and report it
with an "SS" at the right end of the kernel's "Adding ... swap" message
(so that if it's both nonrot and discardable, "SSD" will be shown there).
Perhaps something should be shown in /proc/swaps (swapon -s), but we
have to be more cautious before making any addition to that format.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
---
But how to get my SD card, accessed by USB card reader, reported as NONROT?
include/linux/swap.h | 1 +
mm/swapfile.c | 11 +++++++++--
2 files changed, 10 insertions(+), 2 deletions(-)
--- swapfile7/include/linux/swap.h 2008-11-25 12:41:40.000000000 +0000
+++ swapfile8/include/linux/swap.h 2008-11-25 12:41:42.000000000 +0000
@@ -122,6 +122,7 @@ enum {
SWP_WRITEOK = (1 << 1), /* ok to write to this swap? */
SWP_DISCARDABLE = (1 << 2), /* blkdev supports discard */
SWP_DISCARDING = (1 << 3), /* now discarding a free cluster */
+ SWP_SOLIDSTATE = (1 << 4), /* blkdev seeks are cheap */
/* add others here before... */
SWP_SCANNING = (1 << 8), /* refcount in scan_swap_map */
};
--- swapfile7/mm/swapfile.c 2008-11-25 12:41:40.000000000 +0000
+++ swapfile8/mm/swapfile.c 2008-11-25 12:41:42.000000000 +0000
@@ -16,6 +16,7 @@
#include <linux/namei.h>
#include <linux/shm.h>
#include <linux/blkdev.h>
+#include <linux/random.h>
#include <linux/writeback.h>
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
@@ -1797,6 +1798,11 @@ asmlinkage long sys_swapon(const char __
goto bad_swap;
}
+ if (blk_queue_nonrot(bdev_get_queue(p->bdev))) {
+ p->flags |= SWP_SOLIDSTATE;
+ srandom32((u32)get_seconds());
+ p->cluster_next = 1 + (random32() % p->highest_bit);
+ }
if (discard_swap(p) == 0)
p->flags |= SWP_DISCARDABLE;
@@ -1813,10 +1819,11 @@ asmlinkage long sys_swapon(const char __
total_swap_pages += nr_good_pages;
printk(KERN_INFO "Adding %uk swap on %s. "
- "Priority:%d extents:%d across:%lluk%s\n",
+ "Priority:%d extents:%d across:%lluk %s%s\n",
nr_good_pages<<(PAGE_SHIFT-10), name, p->prio,
nr_extents, (unsigned long long)span<<(PAGE_SHIFT-10),
- (p->flags & SWP_DISCARDABLE) ? " D" : "");
+ (p->flags & SWP_SOLIDSTATE) ? "SS" : "",
+ (p->flags & SWP_DISCARDABLE) ? "D" : "");
/* insert swap space into swap_list: */
prev = -1;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 8/9] swapfile: swapon randomize if nonrot
2008-11-25 21:46 ` [PATCH 8/9] swapfile: swapon randomize if nonrot Hugh Dickins
@ 2008-11-26 1:20 ` Andrew Morton
2008-11-26 3:38 ` Matthew Wilcox
2008-12-01 0:32 ` [PATCH 11/9] swapfile: let others seed random Hugh Dickins
1 sibling, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2008-11-26 1:20 UTC (permalink / raw)
To: Hugh Dickins
Cc: dwmw2, jens.axboe, matthew, joern, James.Bottomley, djshin90,
teheo, linux-mm, linux-kernel
On Tue, 25 Nov 2008 21:46:56 +0000 (GMT)
Hugh Dickins <hugh@veritas.com> wrote:
> But how to get my SD card, accessed by USB card reader, reported as NONROT?
Dunno. udev rules, perhaps?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 8/9] swapfile: swapon randomize if nonrot
2008-11-26 1:20 ` Andrew Morton
@ 2008-11-26 3:38 ` Matthew Wilcox
0 siblings, 0 replies; 18+ messages in thread
From: Matthew Wilcox @ 2008-11-26 3:38 UTC (permalink / raw)
To: Andrew Morton
Cc: Hugh Dickins, dwmw2, jens.axboe, joern, James.Bottomley,
djshin90, teheo, linux-mm, linux-kernel
On Tue, Nov 25, 2008 at 05:20:39PM -0800, Andrew Morton wrote:
> On Tue, 25 Nov 2008 21:46:56 +0000 (GMT)
> Hugh Dickins <hugh@veritas.com> wrote:
>
> > But how to get my SD card, accessed by USB card reader, reported as NONROT?
>
> Dunno. udev rules, perhaps?
I didn't see patch 8/9, but the 'non-rotating' bit is in word 217 of the
inquiry data. Unfortunately, Jeff insisted that we only report the
contents of that bit for devices claiming ATA-8 support, which is
ridiculous as even the Intel SSDs only claim conformance to ATA-7.
I notice that Jens was allowed to ignore Jeff's insane requirement and
doesn't have to check ATA revision at all.
--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 11/9] swapfile: let others seed random
2008-11-25 21:46 ` [PATCH 8/9] swapfile: swapon randomize if nonrot Hugh Dickins
2008-11-26 1:20 ` Andrew Morton
@ 2008-12-01 0:32 ` Hugh Dickins
1 sibling, 0 replies; 18+ messages in thread
From: Hugh Dickins @ 2008-12-01 0:32 UTC (permalink / raw)
To: Andrew Morton
Cc: David Woodhouse, Jens Axboe, Matthew Wilcox, Joern Engel,
James Bottomley, Donjun Shin, Tejun Heo, Arjan van de Ven,
Matt Mackall, linux-mm, linux-kernel
Remove the srandom32((u32)get_seconds()) from non-rotational swapon:
there's been a coincidental discussion of earlier randomization, assume
that goes ahead, let swapon be a client rather than stirring for itself.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
---
To follow 10/9
mm/swapfile.c | 1 -
1 file changed, 1 deletion(-)
--- swapfile10/mm/swapfile.c 2008-11-28 20:36:44.000000000 +0000
+++ swapfile11/mm/swapfile.c 2008-11-28 20:37:16.000000000 +0000
@@ -1842,7 +1842,6 @@ asmlinkage long sys_swapon(const char __
if (blk_queue_nonrot(bdev_get_queue(p->bdev))) {
p->flags |= SWP_SOLIDSTATE;
- srandom32((u32)get_seconds());
p->cluster_next = 1 + (random32() % p->highest_bit);
}
if (discard_swap(p) == 0)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 9/9] swapfile: swap allocation cycle if nonrot
2008-11-25 21:44 ` [PATCH 6/9] swapfile: swapon use discard (trim) Hugh Dickins
2008-11-25 21:46 ` [PATCH 7/9] swapfile: swap allocation use discard Hugh Dickins
2008-11-25 21:46 ` [PATCH 8/9] swapfile: swapon randomize if nonrot Hugh Dickins
@ 2008-11-25 21:47 ` Hugh Dickins
2008-11-26 1:17 ` [PATCH 6/9] swapfile: swapon use discard (trim) Andrew Morton
3 siblings, 0 replies; 18+ messages in thread
From: Hugh Dickins @ 2008-11-25 21:47 UTC (permalink / raw)
To: Andrew Morton
Cc: David Woodhouse, Jens Axboe, Matthew Wilcox, Joern Engel,
James Bottomley, Donjun Shin, Tejun Heo, linux-mm, linux-kernel
Though attempting to find free clusters (Andrea), swap allocation has
always restarted its searches from the beginning of the swap area (sct),
to reduce seek times between swap pages, by not scattering them all over
the partition.
But on a solidstate swap device, seeks are cheap, and block remapping
to level the wear may be limited by zones: in that case it's better to
cycle around the whole partition.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
---
mm/swapfile.c | 50 ++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 46 insertions(+), 4 deletions(-)
--- swapfile8/mm/swapfile.c 2008-11-25 12:41:42.000000000 +0000
+++ swapfile9/mm/swapfile.c 2008-11-25 12:41:44.000000000 +0000
@@ -169,6 +169,7 @@ static int wait_for_discard(void *word)
static inline unsigned long scan_swap_map(struct swap_info_struct *si)
{
unsigned long offset;
+ unsigned long scan_base;
unsigned long last_in_cluster = 0;
int latency_ration = LATENCY_LIMIT;
int found_free_cluster = 0;
@@ -181,10 +182,11 @@ static inline unsigned long scan_swap_ma
* all over the entire swap partition, so that we reduce
* overall disk seek times between swap pages. -- sct
* But we do now try to find an empty cluster. -Andrea
+ * And we let swap pages go all over an SSD partition. Hugh
*/
si->flags += SWP_SCANNING;
- offset = si->cluster_next;
+ scan_base = offset = si->cluster_next;
if (unlikely(!si->cluster_nr--)) {
if (si->pages - si->inuse_pages < SWAPFILE_CLUSTER) {
@@ -206,7 +208,16 @@ static inline unsigned long scan_swap_ma
}
spin_unlock(&swap_lock);
- offset = si->lowest_bit;
+ /*
+ * If seek is expensive, start searching for new cluster from
+ * start of partition, to minimize the span of allocated swap.
+ * But if seek is cheap, search from our current position, so
+ * that swap is allocated from all over the partition: if the
+ * Flash Translation Layer only remaps within limited zones,
+ * we don't want to wear out the first zone too quickly.
+ */
+ if (!(si->flags & SWP_SOLIDSTATE))
+ scan_base = offset = si->lowest_bit;
last_in_cluster = offset + SWAPFILE_CLUSTER - 1;
/* Locate the first empty (unaligned) cluster */
@@ -228,6 +239,27 @@ static inline unsigned long scan_swap_ma
}
offset = si->lowest_bit;
+ last_in_cluster = offset + SWAPFILE_CLUSTER - 1;
+
+ /* Locate the first empty (unaligned) cluster */
+ for (; last_in_cluster < scan_base; offset++) {
+ if (si->swap_map[offset])
+ last_in_cluster = offset + SWAPFILE_CLUSTER;
+ else if (offset == last_in_cluster) {
+ spin_lock(&swap_lock);
+ offset -= SWAPFILE_CLUSTER - 1;
+ si->cluster_next = offset;
+ si->cluster_nr = SWAPFILE_CLUSTER - 1;
+ found_free_cluster = 1;
+ goto checks;
+ }
+ if (unlikely(--latency_ration < 0)) {
+ cond_resched();
+ latency_ration = LATENCY_LIMIT;
+ }
+ }
+
+ offset = scan_base;
spin_lock(&swap_lock);
si->cluster_nr = SWAPFILE_CLUSTER - 1;
si->lowest_alloc = 0;
@@ -239,7 +271,7 @@ checks:
if (!si->highest_bit)
goto no_page;
if (offset > si->highest_bit)
- offset = si->lowest_bit;
+ scan_base = offset = si->lowest_bit;
if (si->swap_map[offset])
goto scan;
@@ -323,8 +355,18 @@ scan:
latency_ration = LATENCY_LIMIT;
}
}
+ offset = si->lowest_bit;
+ while (++offset < scan_base) {
+ if (!si->swap_map[offset]) {
+ spin_lock(&swap_lock);
+ goto checks;
+ }
+ if (unlikely(--latency_ration < 0)) {
+ cond_resched();
+ latency_ration = LATENCY_LIMIT;
+ }
+ }
spin_lock(&swap_lock);
- goto checks;
no_page:
si->flags -= SWP_SCANNING;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 6/9] swapfile: swapon use discard (trim)
2008-11-25 21:44 ` [PATCH 6/9] swapfile: swapon use discard (trim) Hugh Dickins
` (2 preceding siblings ...)
2008-11-25 21:47 ` [PATCH 9/9] swapfile: swap allocation cycle if nonrot Hugh Dickins
@ 2008-11-26 1:17 ` Andrew Morton
2008-11-26 6:02 ` Hugh Dickins
3 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2008-11-26 1:17 UTC (permalink / raw)
To: Hugh Dickins
Cc: dwmw2, jens.axboe, matthew, joern, James.Bottomley, djshin90,
teheo, linux-mm, linux-kernel
On Tue, 25 Nov 2008 21:44:34 +0000 (GMT)
Hugh Dickins <hugh@veritas.com> wrote:
> When adding swap, all the old data on swap can be forgotten: sys_swapon()
> discard all but the header page of the swap partition (or every extent
> but the header of the swap file), to give a solidstate swap device the
> opportunity to optimize its wear-levelling.
>
> If that succeeds, note SWP_DISCARDABLE for later use, and report it
> with a "D" at the right end of the kernel's "Adding ... swap" message.
> Perhaps something should be shown in /proc/swaps (swapon -s), but we
> have to be more cautious before making any addition to that format.
>
When reading the above text it's a bit hard to tell whether it's
talking about "this is how things are at present" or "this is how
things are after the patch". This is fairly common with Hugh
changelogs.
> ---
> swapfile.c cleanup patches 0-5 just went to linux-mm: patches 6-9
> may be of wider interest, so I'm extending the Cc list for them.
>
> include/linux/swap.h | 1 +
> mm/swapfile.c | 39 +++++++++++++++++++++++++++++++++++++--
> 2 files changed, 38 insertions(+), 2 deletions(-)
>
> --- swapfile5/include/linux/swap.h 2008-11-25 12:41:31.000000000 +0000
> +++ swapfile6/include/linux/swap.h 2008-11-25 12:41:34.000000000 +0000
> @@ -120,6 +120,7 @@ struct swap_extent {
> enum {
> SWP_USED = (1 << 0), /* is slot in swap_info[] used? */
> SWP_WRITEOK = (1 << 1), /* ok to write to this swap? */
> + SWP_DISCARDABLE = (1 << 2), /* blkdev supports discard */
> /* add others here before... */
> SWP_SCANNING = (1 << 8), /* refcount in scan_swap_map */
> };
> --- swapfile5/mm/swapfile.c 2008-11-25 12:41:31.000000000 +0000
> +++ swapfile6/mm/swapfile.c 2008-11-25 12:41:34.000000000 +0000
> @@ -84,6 +84,37 @@ void swap_unplug_io_fn(struct backing_de
> up_read(&swap_unplug_sem);
> }
>
> +/*
> + * swapon tell device that all the old swap contents can be discarded,
> + * to allow the swap device to optimize its wear-levelling.
> + */
> +static int discard_swap(struct swap_info_struct *si)
> +{
> + struct swap_extent *se;
> + int err = 0;
> +
> + list_for_each_entry(se, &si->extent_list, list) {
> + sector_t start_block = se->start_block << (PAGE_SHIFT - 9);
> + pgoff_t nr_blocks = se->nr_pages << (PAGE_SHIFT - 9);
I trust we don't have any shift overflows here.
It's a bit dissonant to see a pgoff_t with "blocks" in its name. But
swap is like that..
> + if (se->start_page == 0) {
> + /* Do not discard the swap header page! */
> + start_block += 1 << (PAGE_SHIFT - 9);
> + nr_blocks -= 1 << (PAGE_SHIFT - 9);
> + if (!nr_blocks)
> + continue;
> + }
> +
> + err = blkdev_issue_discard(si->bdev, start_block,
> + nr_blocks, GFP_KERNEL);
> + if (err)
> + break;
> +
> + cond_resched();
> + }
> + return err; /* That will often be -EOPNOTSUPP */
> +}
> +
> #define SWAPFILE_CLUSTER 256
> #define LATENCY_LIMIT 256
>
> @@ -1649,6 +1680,9 @@ asmlinkage long sys_swapon(const char __
> goto bad_swap;
> }
>
> + if (discard_swap(p) == 0)
> + p->flags |= SWP_DISCARDABLE;
> +
> mutex_lock(&swapon_mutex);
> spin_lock(&swap_lock);
> if (swap_flags & SWAP_FLAG_PREFER)
> @@ -1662,9 +1696,10 @@ asmlinkage long sys_swapon(const char __
> total_swap_pages += nr_good_pages;
>
> printk(KERN_INFO "Adding %uk swap on %s. "
> - "Priority:%d extents:%d across:%lluk\n",
> + "Priority:%d extents:%d across:%lluk%s\n",
> nr_good_pages<<(PAGE_SHIFT-10), name, p->prio,
> - nr_extents, (unsigned long long)span<<(PAGE_SHIFT-10));
> + nr_extents, (unsigned long long)span<<(PAGE_SHIFT-10),
> + (p->flags & SWP_DISCARDABLE) ? " D" : "");
>
> /* insert swap space into swap_list: */
> prev = -1;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 6/9] swapfile: swapon use discard (trim)
2008-11-26 1:17 ` [PATCH 6/9] swapfile: swapon use discard (trim) Andrew Morton
@ 2008-11-26 6:02 ` Hugh Dickins
0 siblings, 0 replies; 18+ messages in thread
From: Hugh Dickins @ 2008-11-26 6:02 UTC (permalink / raw)
To: Andrew Morton
Cc: dwmw2, jens.axboe, matthew, joern, James.Bottomley, djshin90,
teheo, linux-mm, linux-kernel
On Tue, 25 Nov 2008, Andrew Morton wrote:
> On Tue, 25 Nov 2008 21:44:34 +0000 (GMT)
> Hugh Dickins <hugh@veritas.com> wrote:
>
> > When adding swap, all the old data on swap can be forgotten: sys_swapon()
> > discard all but the header page of the swap partition (or every extent
> > but the header of the swap file), to give a solidstate swap device the
> > opportunity to optimize its wear-levelling.
> >
> > If that succeeds, note SWP_DISCARDABLE for later use, and report it
> > with a "D" at the right end of the kernel's "Adding ... swap" message.
> > Perhaps something should be shown in /proc/swaps (swapon -s), but we
> > have to be more cautious before making any addition to that format.
>
> When reading the above text it's a bit hard to tell whether it's
> talking about "this is how things are at present" or "this is how
> things are after the patch". This is fairly common with Hugh
> changelogs.
;) Sorry about that - yes, that's often true. In this case, it's
all talking about how things are after the patch. I think it's that
first sentence which bothers you - "all the old data on swap can be
forgotten". In this case, I'm meaning "it's a good idea to let the
device know that it can forget about all the old data"; but it's easy
to imagine another patch coming from me in which the same sentence
would mean "we've got a terrible data-loss bug, such that all the
data already written to swap gets erased". Let's hope I didn't
implement the latter.
> > +static int discard_swap(struct swap_info_struct *si)
> > +{
> > + struct swap_extent *se;
> > + int err = 0;
> > +
> > + list_for_each_entry(se, &si->extent_list, list) {
> > + sector_t start_block = se->start_block << (PAGE_SHIFT - 9);
> > + pgoff_t nr_blocks = se->nr_pages << (PAGE_SHIFT - 9);
>
> I trust we don't have any shift overflows here.
>
> It's a bit dissonant to see a pgoff_t with "blocks" in its name. But
> swap is like that..
In fact we don't have a shift overflow there, but you've such a good eye.
I noticed that "pgoff_t nr_blocks" line just as I was about to send off
the patches, and had a little worry about it. By that time I was at
the stage that if I went into the patch and changed a few pgoff_ts
to sector_ts at the last minute, likelihood was I'd screw something
up badly, in one CONFIG combination or another, and if I delayed
it'd be tomorrow.
It would be good to make that change when built and tested,
just for reassurance. There isn't a shift overflow as it stands,
but the reasons are too contingent for my liking: on 64-bit there
isn't an issue because pgoff_t is as big as sector_t; on 32-bit,
it's because a swp_entry_t is an unsigned long, and it has to
contain five bits for the "type" (which of the 30 or 32 swapfiles
is addressed), and the pages-to-sectors shift is less than 5 on
all 32-bit machines for the foreseeable future. Oh, and it also
relies on the fact that by the time we're setting up swap extents,
we've already curtailed the size to what's usable by a swp_entry_t,
if that's less than the size given in the swap header.
So, not actually a bug there, but certainly a source of anxiety,
better eliminated.
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 18+ messages in thread