* [RFC][PATCH 0/3] fix memcg to do swap account in right way (avoid swap account leak)
@ 2009-05-22 7:57 KAMEZAWA Hiroyuki
2009-05-22 8:01 ` [RFC][PATCH 1/3] add SWAP_HAS_CACHE flag to swapmap KAMEZAWA Hiroyuki
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-05-22 7:57 UTC (permalink / raw)
To: linux-mm; +Cc: nishimura, balbir, hugh.dickins, akpm, hannes
Thank you all for comments to previous patches. This is new one.
(Please see when you have free time, I don't want to annoy anyone ;)
Major difference of this version to old ones is
- old ones tries to fix swap handling itself...
- this one tries to fix memcg's swap accounting.
I like ideas in this patch set. (But I may need more tests.)
Major concept of this patch set for fixing mis-accounting of memcg is
"ignore a ref from swapcache when we uncharge swap."
Consists of following 3 patches. Maybe patch 1/3 can be a concern for people who
don't use memcg.
[1/3] Adding SWAP_HAS_CACHE flag to swap_map[] array.
Add an flag to indicate "there is swap cache" instead of "refcnt from swapcache"
By this, we'll be able to know refcnt to swap without find_get_page(swapper_space).
[2/3] fix memcg to handle refcnt to swap.
There is an issue that "all swap references gone but it can't be freed/uncharged
because its swapcache is not on memcg's LRU".
To fix this, this patch tries to unaccount swap even if there is swap-cache.
Need careful tests (and some fix) but I think this is a good way to go.
This patch uncharge swap account but swp_entry is still used by swap-cache.
So, some more work to reclaim unnecesary swap-cache will be required (at vm_swap_full())
[3/3] count # of swap caches with "unused swp_entries".
This patch just counts # of swap caches whose swp_entry has no reference.
This counter + vm_swap_full() will allow us to write a function to reclaim
swp_entry which is unused.
Any comments are welcome.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
* [RFC][PATCH 1/3] add SWAP_HAS_CACHE flag to swapmap.
2009-05-22 7:57 [RFC][PATCH 0/3] fix memcg to do swap account in right way (avoid swap account leak) KAMEZAWA Hiroyuki
@ 2009-05-22 8:01 ` KAMEZAWA Hiroyuki
2009-05-22 8:04 ` [RFC][PATCH 2/3] fix memcg swap account to handle swap ref itself KAMEZAWA Hiroyuki
2009-05-22 8:05 ` [RFC][PATCH 3/3] count swap caches whose swp_entry can be freed KAMEZAWA Hiroyuki
2 siblings, 0 replies; 4+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-05-22 8:01 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-mm, nishimura, balbir, hugh.dickins, akpm, hannes
Now, in reference counting at swap_map, there are 2 kinds of references.
- reference from SwapCache.
- reference from swap entry
Then, to find the swap_entry can be freed or not, following code is used.
check swap's refcount == 1.
find_get_page(swapper_space) -> NULL means there isn't any other reference.
On the other hand, swap-in code does following ops.
swap_dupcate()
---------(*)
add_to_swap()
Obviously, above (*) is race window. But even if above occurs, global LRU
will finally reclaim the page when memory shrlinking code runs.
It has been no problem ..but...when memcg used, this control of swap
reference is one of reason for leaking account information.
(You can say memcg desgin is bad but, IIUC, swap-cache itself includes
tons of loose operations. It makes situation difficult.)
This patch modifies swap_map's reference counting
from
SWAP_MAP_MAX=7fff
SWAP_MAP_BAD=8000 (from Linux 2.4 age)
to
SWAP_MAP_MAX=0x7ffe
SWAP_MAP_BAD=0x7fff
and adds a bit of
SWAP_HAS_CACHE=0x8000
to indicates that there is SwapCache.
This patch adds following new functions.
- swapcache_prepare() - called at swapin_readahead()
- swapcache_free() - called at freeing swp_entry for swapcache.
A fix for memcg will follow this.
Changelog: v1->v2.
- modifed logic to use 0x8000 bit.
- don't add new arguments to usual functions.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
include/linux/swap.h | 12 ++-
mm/shmem.c | 2
mm/swap_state.c | 15 ++-
mm/swapfile.c | 192 ++++++++++++++++++++++++++++++++++++++-------------
mm/vmscan.c | 2
5 files changed, 164 insertions(+), 59 deletions(-)
Index: mmotm-2.6.30-May17/include/linux/swap.h
===================================================================
--- mmotm-2.6.30-May17.orig/include/linux/swap.h
+++ mmotm-2.6.30-May17/include/linux/swap.h
@@ -129,8 +129,10 @@ enum {
#define SWAP_CLUSTER_MAX 32
-#define SWAP_MAP_MAX 0x7fff
-#define SWAP_MAP_BAD 0x8000
+#define SWAP_MAP_MAX 0x7ffe
+#define SWAP_HAS_CACHE 0x8000
+#define SWAP_MAP_BAD 0x7fff
+#define SWAP_MAP_MASK 0x7fff
/*
* The in-memory structure used to track swap areas.
@@ -303,6 +305,8 @@ extern swp_entry_t get_swap_page_of_type
extern int swap_duplicate(swp_entry_t);
extern int valid_swaphandles(swp_entry_t, unsigned long *);
extern void swap_free(swp_entry_t);
+extern void swapcache_free(swp_entry_t);
+extern int swapcache_prepare(swp_entry_t);
extern int free_swap_and_cache(swp_entry_t);
extern int swap_type_of(dev_t, sector_t, struct block_device **);
extern unsigned int count_swap_pages(int, int);
@@ -376,6 +380,10 @@ static inline void swap_free(swp_entry_t
{
}
+static inline void swapcache_free(swp_entry_t swp)
+{
+}
+
static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask,
struct vm_area_struct *vma, unsigned long addr)
{
Index: mmotm-2.6.30-May17/mm/swapfile.c
===================================================================
--- mmotm-2.6.30-May17.orig/mm/swapfile.c
+++ mmotm-2.6.30-May17/mm/swapfile.c
@@ -60,6 +60,11 @@ static DEFINE_MUTEX(swapon_mutex);
*/
static DECLARE_RWSEM(swap_unplug_sem);
+static inline int swap_has_ref(unsigned short count)
+{
+ return count & SWAP_MAP_MASK;
+}
+
void swap_unplug_io_fn(struct backing_dev_info *unused_bdi, struct page *page)
{
swp_entry_t entry;
@@ -167,7 +172,8 @@ static int wait_for_discard(void *word)
#define SWAPFILE_CLUSTER 256
#define LATENCY_LIMIT 256
-static inline unsigned long scan_swap_map(struct swap_info_struct *si)
+static inline unsigned long
+scan_swap_map(struct swap_info_struct *si, int cache)
{
unsigned long offset;
unsigned long scan_base;
@@ -285,7 +291,11 @@ checks:
si->lowest_bit = si->max;
si->highest_bit = 0;
}
- si->swap_map[offset] = 1;
+ if (cache)
+ si->swap_map[offset] = SWAP_HAS_CACHE; /* via get_swap_page() */
+ else
+ si->swap_map[offset] = 1; /* via alloc_swap_block() */
+
si->cluster_next = offset + 1;
si->flags -= SWP_SCANNING;
@@ -373,7 +383,11 @@ no_page:
si->flags -= SWP_SCANNING;
return 0;
}
-
+/*
+ * Allocate swap entry and set SWAP_HAS_CACHE flag. Following logic after
+ * this functions should call add_to_swap_cache(). In general, when a swap
+ * entry is allocated, there is a page to be written out.
+ */
swp_entry_t get_swap_page(void)
{
struct swap_info_struct *si;
@@ -401,7 +415,7 @@ swp_entry_t get_swap_page(void)
continue;
swap_list.next = next;
- offset = scan_swap_map(si);
+ offset = scan_swap_map(si, 1);
if (offset) {
spin_unlock(&swap_lock);
return swp_entry(type, offset);
@@ -415,6 +429,10 @@ noswap:
return (swp_entry_t) {0};
}
+/*
+ * Allocate swap entry with swap's refcnt = 1. Not for allocating swap cache
+ * Used by kernel/power/....via alloc_swapdev_block().
+ */
swp_entry_t get_swap_page_of_type(int type)
{
struct swap_info_struct *si;
@@ -424,7 +442,7 @@ swp_entry_t get_swap_page_of_type(int ty
si = swap_info + type;
if (si->flags & SWP_WRITEOK) {
nr_swap_pages--;
- offset = scan_swap_map(si);
+ offset = scan_swap_map(si, 0);
if (offset) {
spin_unlock(&swap_lock);
return swp_entry(type, offset);
@@ -470,26 +488,40 @@ bad_nofile:
out:
return NULL;
}
-
-static int swap_entry_free(struct swap_info_struct *p, swp_entry_t ent)
+/*
+ * Returns remaining refcnt for swap reference. A flag bit SWAP_HAS_CACHE
+ * is retuned, too. Then, if (return value == SWAP_HAS_CACHE), it's time to
+ * delete swap cache.
+ */
+static int swap_entry_free(struct swap_info_struct *p,
+ swp_entry_t ent,
+ int cache)
{
unsigned long offset = swp_offset(ent);
int count = p->swap_map[offset];
- if (count < SWAP_MAP_MAX) {
- count--;
- p->swap_map[offset] = count;
- if (!count) {
- if (offset < p->lowest_bit)
- p->lowest_bit = offset;
- if (offset > p->highest_bit)
- p->highest_bit = offset;
- if (p->prio > swap_info[swap_list.next].prio)
- swap_list.next = p - swap_info;
- nr_swap_pages++;
- p->inuse_pages--;
- mem_cgroup_uncharge_swap(ent);
- }
+ if (!cache) {
+ if ((count & SWAP_MAP_MASK) >= SWAP_MAP_MAX)
+ return SWAP_MAP_BAD;
+ VM_BUG_ON(!(count & SWAP_MAP_MASK));
+ count -= 1;
+ } else {
+ /* Even if SWAP_MAP_BAD, we can drop swap cache. */
+ VM_BUG_ON(!(count & SWAP_HAS_CACHE));
+ count &= SWAP_MAP_MASK; /* Drops SWAP_HAS_CACHE bit */
+ }
+
+ p->swap_map[offset] = count;
+ if (!count) {
+ if (offset < p->lowest_bit)
+ p->lowest_bit = offset;
+ if (offset > p->highest_bit)
+ p->highest_bit = offset;
+ if (p->prio > swap_info[swap_list.next].prio)
+ swap_list.next = p - swap_info;
+ nr_swap_pages++;
+ p->inuse_pages--;
+ mem_cgroup_uncharge_swap(ent);
}
return count;
}
@@ -504,7 +536,19 @@ void swap_free(swp_entry_t entry)
p = swap_info_get(entry);
if (p) {
- swap_entry_free(p, entry);
+ swap_entry_free(p, entry, 0);
+ spin_unlock(&swap_lock);
+ }
+}
+
+/* called at freeing swap cache */
+void swapcache_free(swp_entry_t entry)
+{
+ struct swap_info_struct *p;
+
+ p = swap_info_get(entry);
+ if (p) {
+ swap_entry_free(p, entry, 1);
spin_unlock(&swap_lock);
}
}
@@ -521,8 +565,8 @@ static inline int page_swapcount(struct
entry.val = page_private(page);
p = swap_info_get(entry);
if (p) {
- /* Subtract the 1 for the swap cache itself */
- count = p->swap_map[swp_offset(entry)] - 1;
+ /* Ignore reference from swap cache. */
+ count = p->swap_map[swp_offset(entry)] & SWAP_MAP_MASK;
spin_unlock(&swap_lock);
}
return count;
@@ -584,7 +628,7 @@ int free_swap_and_cache(swp_entry_t entr
p = swap_info_get(entry);
if (p) {
- if (swap_entry_free(p, entry) == 1) {
+ if (swap_entry_free(p, entry, 0) == SWAP_HAS_CACHE) {
page = find_get_page(&swapper_space, entry.val);
if (page && !trylock_page(page)) {
page_cache_release(page);
@@ -891,12 +935,13 @@ static unsigned int find_next_to_unuse(s
i = 1;
}
count = si->swap_map[i];
- if (count && count != SWAP_MAP_BAD)
+ if (count && ((count & SWAP_MAP_MASK) != SWAP_MAP_BAD))
break;
}
return i;
}
+
/*
* We completely avoid races by reading each swap page in advance,
* and then search for the process using it. All the necessary
@@ -995,13 +1040,13 @@ static int try_to_unuse(unsigned int typ
*/
shmem = 0;
swcount = *swap_map;
- if (swcount > 1) {
+ if (swap_has_ref(swcount)) {
if (start_mm == &init_mm)
shmem = shmem_unuse(entry, page);
else
retval = unuse_mm(start_mm, entry, page);
}
- if (*swap_map > 1) {
+ if (swap_has_ref(*swap_map)) {
int set_start_mm = (*swap_map >= swcount);
struct list_head *p = &start_mm->mmlist;
struct mm_struct *new_start_mm = start_mm;
@@ -1011,7 +1056,7 @@ static int try_to_unuse(unsigned int typ
atomic_inc(&new_start_mm->mm_users);
atomic_inc(&prev_mm->mm_users);
spin_lock(&mmlist_lock);
- while (*swap_map > 1 && !retval && !shmem &&
+ while (swap_has_ref(*swap_map) && !retval && !shmem &&
(p = p->next) != &start_mm->mmlist) {
mm = list_entry(p, struct mm_struct, mmlist);
if (!atomic_inc_not_zero(&mm->mm_users))
@@ -1057,11 +1102,11 @@ static int try_to_unuse(unsigned int typ
}
/*
- * How could swap count reach 0x7fff when the maximum
- * pid is 0x7fff, and there's no way to repeat a swap
- * page within an mm (except in shmem, where it's the
- * shared object which takes the reference count)?
- * We believe SWAP_MAP_MAX cannot occur in Linux 2.4.
+ * How could swap count reach 0x7ffe ? there's no way to
+ * repeat a swap page within an mm (except in shmem,
+ * where it's the shared object which takes the reference
+ * count)?
+ * We believe SWAP_MAP_MAX cannot occur in the most case..
*
* If that's wrong, then we should worry more about
* exit_mmap() and do_munmap() cases described above:
@@ -1069,9 +1114,10 @@ static int try_to_unuse(unsigned int typ
* We know "Undead"s can happen, they're okay, so don't
* report them; but do report if we reset SWAP_MAP_MAX.
*/
- if (*swap_map == SWAP_MAP_MAX) {
+ if ((*swap_map & SWAP_MAP_MASK) == SWAP_MAP_MAX) {
spin_lock(&swap_lock);
- *swap_map = 1;
+ /* just remember we have cache...*/
+ *swap_map = SWAP_HAS_CACHE;
spin_unlock(&swap_lock);
reset_overflow = 1;
}
@@ -1089,7 +1135,8 @@ static int try_to_unuse(unsigned int typ
* pages would be incorrect if swap supported "shared
* private" pages, but they are handled by tmpfs files.
*/
- if ((*swap_map > 1) && PageDirty(page) && PageSwapCache(page)) {
+ if (swap_has_ref(*swap_map) &&
+ PageDirty(page) && PageSwapCache(page)) {
struct writeback_control wbc = {
.sync_mode = WB_SYNC_NONE,
};
@@ -1942,12 +1989,16 @@ void si_swapinfo(struct sysinfo *val)
*
* Note: if swap_map[] reaches SWAP_MAP_MAX the entries are treated as
* "permanent", but will be reclaimed by the next swapoff.
+ *
+ * Returns 1 at success but this operation should never fails under usual
+ * conditions. If swap_duplicate() is called against freed entry, it's bug.
+ * (there may be swap cache but we ignore it.)
*/
int swap_duplicate(swp_entry_t entry)
{
struct swap_info_struct * p;
unsigned long offset, type;
- int result = 0;
+ int count, result = 0;
if (is_migration_entry(entry))
return 1;
@@ -1959,17 +2010,21 @@ int swap_duplicate(swp_entry_t entry)
offset = swp_offset(entry);
spin_lock(&swap_lock);
- if (offset < p->max && p->swap_map[offset]) {
- if (p->swap_map[offset] < SWAP_MAP_MAX - 1) {
- p->swap_map[offset]++;
- result = 1;
- } else if (p->swap_map[offset] <= SWAP_MAP_MAX) {
- if (swap_overflow++ < 5)
- printk(KERN_WARNING "swap_dup: swap entry overflow\n");
- p->swap_map[offset] = SWAP_MAP_MAX;
- result = 1;
- }
+ if (offset >= p->max)
+ goto out_unlock;
+
+ count = p->swap_map[offset] & SWAP_MAP_MASK;
+ if (count < SWAP_MAP_MAX - 1) {
+ p->swap_map[offset] += 1;
+ result = 1;
+ } else if (count <= SWAP_MAP_MAX) {
+ if (swap_overflow++ < 5)
+ printk(KERN_WARNING "swap_dup: swap entry overflow\n");
+ /* don't overwrite SWAP_HAS_CACHE flag */
+ p->swap_map[offset] |= SWAP_MAP_MAX;
+ result = 1;
}
+out_unlock:
spin_unlock(&swap_lock);
out:
return result;
@@ -1979,6 +2034,47 @@ bad_file:
goto out;
}
+/*
+ * return only when there is no swapcache.
+ * difference from find_get_page(&swapper_space,...) is that find_get_page()
+ * cannot catch entries whic is now being added/deleted.
+ *
+ * Returns 0 if swap ifself is freed.
+ * Returns 1 if there is swap cache
+ * Returns -EAGAIN if swap cache operation is under racy condition.
+ */
+
+int swapcache_prepare(swp_entry_t entry)
+{
+ struct swap_info_struct * p;
+ unsigned long offset, type;
+ int result = 0;
+
+ VM_BUG_ON(is_migration_entry(entry));
+
+ type = swp_type(entry);
+ if (type >= nr_swapfiles)
+ goto bad_file;
+ p = type + swap_info;
+ offset = swp_offset(entry);
+
+ spin_lock(&swap_lock);
+ if (offset < p->max && p->swap_map[offset]) {
+ if (!(p->swap_map[offset] & SWAP_HAS_CACHE)) {
+ p->swap_map[offset] |= SWAP_HAS_CACHE;
+ result = 1;
+ } else
+ result = -EAGAIN;
+ }
+ spin_unlock(&swap_lock);
+out:
+ return result;
+bad_file:
+ printk(KERN_ERR "swapcache_prepare: %s%08lx\n", Bad_file, entry.val);
+ goto out;
+}
+
+
struct swap_info_struct *
get_swap_info_struct(unsigned type)
{
Index: mmotm-2.6.30-May17/mm/swap_state.c
===================================================================
--- mmotm-2.6.30-May17.orig/mm/swap_state.c
+++ mmotm-2.6.30-May17/mm/swap_state.c
@@ -162,11 +162,11 @@ int add_to_swap(struct page *page)
return 1;
case -EEXIST:
/* Raced with "speculative" read_swap_cache_async */
- swap_free(entry);
+ swapcache_free(entry);
continue;
default:
/* -ENOMEM radix-tree allocation failure */
- swap_free(entry);
+ swapcache_free(entry);
return 0;
}
}
@@ -188,8 +188,7 @@ void delete_from_swap_cache(struct page
__delete_from_swap_cache(page);
spin_unlock_irq(&swapper_space.tree_lock);
- mem_cgroup_uncharge_swapcache(page, entry);
- swap_free(entry);
+ swapcache_free(entry);
page_cache_release(page);
}
@@ -293,9 +292,11 @@ struct page *read_swap_cache_async(swp_e
/*
* Swap entry may have been freed since our caller observed it.
*/
- if (!swap_duplicate(entry))
+ err = swapcache_prepare(entry);
+ if (!err) /* this swap is freed */
break;
-
+ if (err == -EAGAIN)/* race with other swap ops, retry. */
+ continue;
/*
* Associate the page with swap entry in the swap cache.
* May fail (-EEXIST) if there is already a page associated
@@ -317,7 +318,7 @@ struct page *read_swap_cache_async(swp_e
}
ClearPageSwapBacked(new_page);
__clear_page_locked(new_page);
- swap_free(entry);
+ swapcache_free(entry);
} while (err != -ENOMEM);
if (new_page)
Index: mmotm-2.6.30-May17/mm/vmscan.c
===================================================================
--- mmotm-2.6.30-May17.orig/mm/vmscan.c
+++ mmotm-2.6.30-May17/mm/vmscan.c
@@ -478,7 +478,7 @@ static int __remove_mapping(struct addre
__delete_from_swap_cache(page);
spin_unlock_irq(&mapping->tree_lock);
mem_cgroup_uncharge_swapcache(page, swap);
- swap_free(swap);
+ swapcache_free(swap);
} else {
__remove_from_page_cache(page);
spin_unlock_irq(&mapping->tree_lock);
Index: mmotm-2.6.30-May17/mm/shmem.c
===================================================================
--- mmotm-2.6.30-May17.orig/mm/shmem.c
+++ mmotm-2.6.30-May17/mm/shmem.c
@@ -1097,7 +1097,7 @@ static int shmem_writepage(struct page *
shmem_swp_unmap(entry);
unlock:
spin_unlock(&info->lock);
- swap_free(swap);
+ swapcache_free(swap);
redirty:
set_page_dirty(page);
if (wbc->for_reclaim)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
* [RFC][PATCH 2/3] fix memcg swap account to handle swap ref itself
2009-05-22 7:57 [RFC][PATCH 0/3] fix memcg to do swap account in right way (avoid swap account leak) KAMEZAWA Hiroyuki
2009-05-22 8:01 ` [RFC][PATCH 1/3] add SWAP_HAS_CACHE flag to swapmap KAMEZAWA Hiroyuki
@ 2009-05-22 8:04 ` KAMEZAWA Hiroyuki
2009-05-22 8:05 ` [RFC][PATCH 3/3] count swap caches whose swp_entry can be freed KAMEZAWA Hiroyuki
2 siblings, 0 replies; 4+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-05-22 8:04 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-mm, nishimura, balbir, hugh.dickins, akpm, hannes
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Now, memory cgroup's swap account is called when swap entry is
completely free. But that bahavior doesn't work well when
"swap is freed but there is still swap cache" case.
We have to wait until global LRU run and kick it out...
This patch modifes memcg's swap account to be uncharged when references to
swap goes down to 0 even if there is a swap cache. Then, account for swap
is effectively decremented.
To do this, I moved swapcache uncharge code under swap_lock.
(modified swapcache_free()).
In viewpoint of the system level (i.e. not cgroup level), swp_entry
itself is not freed until global LRU runs.
Then, some clever operation as vm_swap_full() may be necessary..
But, swap account information itself is uncharged and memcg will work fine.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
include/linux/swap.h | 9 +++++----
mm/memcontrol.c | 16 ++++++++++------
mm/shmem.c | 2 +-
mm/swap_state.c | 8 ++++----
mm/swapfile.c | 14 +++++++++++---
mm/vmscan.c | 3 +--
6 files changed, 32 insertions(+), 20 deletions(-)
Index: mmotm-2.6.30-May17/mm/swapfile.c
===================================================================
--- mmotm-2.6.30-May17.orig/mm/swapfile.c
+++ mmotm-2.6.30-May17/mm/swapfile.c
@@ -521,8 +521,9 @@ static int swap_entry_free(struct swap_i
swap_list.next = p - swap_info;
nr_swap_pages++;
p->inuse_pages--;
- mem_cgroup_uncharge_swap(ent);
}
+ if (!swap_has_ref(count))
+ mem_cgroup_uncharge_swap(ent);
return count;
}
@@ -542,13 +543,20 @@ void swap_free(swp_entry_t entry)
}
/* called at freeing swap cache */
-void swapcache_free(swp_entry_t entry)
+void swapcache_free(swp_entry_t entry, struct page *page)
{
struct swap_info_struct *p;
+ int ret;
p = swap_info_get(entry);
if (p) {
- swap_entry_free(p, entry, 1);
+ ret = swap_entry_free(p, entry, 1);
+ if (page) {
+ if (ret) /* swap still remains */
+ mem_cgroup_uncharge_swapcache(page, entry, 1);
+ else /* this was the last user of swap entry */
+ mem_cgroup_uncharge_swapcache(page, entry, 0);
+ }
spin_unlock(&swap_lock);
}
}
Index: mmotm-2.6.30-May17/include/linux/swap.h
===================================================================
--- mmotm-2.6.30-May17.orig/include/linux/swap.h
+++ mmotm-2.6.30-May17/include/linux/swap.h
@@ -305,7 +305,7 @@ extern swp_entry_t get_swap_page_of_type
extern int swap_duplicate(swp_entry_t);
extern int valid_swaphandles(swp_entry_t, unsigned long *);
extern void swap_free(swp_entry_t);
-extern void swapcache_free(swp_entry_t);
+extern void swapcache_free(swp_entry_t, struct page*);
extern int swapcache_prepare(swp_entry_t);
extern int free_swap_and_cache(swp_entry_t);
extern int swap_type_of(dev_t, sector_t, struct block_device **);
@@ -339,10 +339,11 @@ static inline void disable_swap_token(vo
}
#ifdef CONFIG_CGROUP_MEM_RES_CTLR
-extern void mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent);
+extern void
+mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, int swapout);
#else
static inline void
-mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent)
+mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, int swapout)
{
}
#endif
@@ -380,7 +381,7 @@ static inline void swap_free(swp_entry_t
{
}
-static inline void swapcache_free(swp_entry_t swp)
+static inline void swapcache_free(swp_entry_t swp, struct page *page)
{
}
Index: mmotm-2.6.30-May17/mm/memcontrol.c
===================================================================
--- mmotm-2.6.30-May17.orig/mm/memcontrol.c
+++ mmotm-2.6.30-May17/mm/memcontrol.c
@@ -1564,18 +1564,22 @@ void mem_cgroup_uncharge_cache_page(stru
* called after __delete_from_swap_cache() and drop "page" account.
* memcg information is recorded to swap_cgroup of "ent"
*/
-void mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent)
+void mem_cgroup_uncharge_swapcache(struct page *page,
+ swp_entry_t ent, int swapout)
{
struct mem_cgroup *memcg;
+ int ctype = MEM_CGROUP_CHARGE_TYPE_SWAPOUT;
- memcg = __mem_cgroup_uncharge_common(page,
- MEM_CGROUP_CHARGE_TYPE_SWAPOUT);
+ /* swap_entry is freed ? */
+ if (!swapout)
+ ctype = MEM_CGROUP_CHARGE_TYPE_MAPPED;
+ memcg = __mem_cgroup_uncharge_common(page, ctype);
/* record memcg information */
- if (do_swap_account && memcg) {
+ if (do_swap_account && swapout && memcg) {
swap_cgroup_record(ent, css_id(&memcg->css));
mem_cgroup_get(memcg);
}
- if (memcg)
+ if (swapout && memcg)
css_put(&memcg->css);
}
#endif
@@ -1583,7 +1587,7 @@ void mem_cgroup_uncharge_swapcache(struc
#ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
/*
* called from swap_entry_free(). remove record in swap_cgroup and
- * uncharge "memsw" account.
+ * uncharge "memsw" account. we are under swap_lock.
*/
void mem_cgroup_uncharge_swap(swp_entry_t ent)
{
Index: mmotm-2.6.30-May17/mm/shmem.c
===================================================================
--- mmotm-2.6.30-May17.orig/mm/shmem.c
+++ mmotm-2.6.30-May17/mm/shmem.c
@@ -1097,7 +1097,7 @@ static int shmem_writepage(struct page *
shmem_swp_unmap(entry);
unlock:
spin_unlock(&info->lock);
- swapcache_free(swap);
+ swapcache_free(swap, page);
redirty:
set_page_dirty(page);
if (wbc->for_reclaim)
Index: mmotm-2.6.30-May17/mm/swap_state.c
===================================================================
--- mmotm-2.6.30-May17.orig/mm/swap_state.c
+++ mmotm-2.6.30-May17/mm/swap_state.c
@@ -162,11 +162,11 @@ int add_to_swap(struct page *page)
return 1;
case -EEXIST:
/* Raced with "speculative" read_swap_cache_async */
- swapcache_free(entry);
+ swapcache_free(entry, page);
continue;
default:
/* -ENOMEM radix-tree allocation failure */
- swapcache_free(entry);
+ swapcache_free(entry, page);
return 0;
}
}
@@ -188,7 +188,7 @@ void delete_from_swap_cache(struct page
__delete_from_swap_cache(page);
spin_unlock_irq(&swapper_space.tree_lock);
- swapcache_free(entry);
+ swapcache_free(entry, page);
page_cache_release(page);
}
@@ -318,7 +318,7 @@ struct page *read_swap_cache_async(swp_e
}
ClearPageSwapBacked(new_page);
__clear_page_locked(new_page);
- swapcache_free(entry);
+ swapcache_free(entry, new_page);
} while (err != -ENOMEM);
if (new_page)
Index: mmotm-2.6.30-May17/mm/vmscan.c
===================================================================
--- mmotm-2.6.30-May17.orig/mm/vmscan.c
+++ mmotm-2.6.30-May17/mm/vmscan.c
@@ -477,8 +477,7 @@ static int __remove_mapping(struct addre
swp_entry_t swap = { .val = page_private(page) };
__delete_from_swap_cache(page);
spin_unlock_irq(&mapping->tree_lock);
- mem_cgroup_uncharge_swapcache(page, swap);
- swapcache_free(swap);
+ swapcache_free(swap, page);
} else {
__remove_from_page_cache(page);
spin_unlock_irq(&mapping->tree_lock);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
* [RFC][PATCH 3/3] count swap caches whose swp_entry can be freed.
2009-05-22 7:57 [RFC][PATCH 0/3] fix memcg to do swap account in right way (avoid swap account leak) KAMEZAWA Hiroyuki
2009-05-22 8:01 ` [RFC][PATCH 1/3] add SWAP_HAS_CACHE flag to swapmap KAMEZAWA Hiroyuki
2009-05-22 8:04 ` [RFC][PATCH 2/3] fix memcg swap account to handle swap ref itself KAMEZAWA Hiroyuki
@ 2009-05-22 8:05 ` KAMEZAWA Hiroyuki
2 siblings, 0 replies; 4+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-05-22 8:05 UTC (permalink / raw)
To: KAMEZAWA Hiroyuki; +Cc: linux-mm, nishimura, balbir, hugh.dickins, akpm, hannes
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Count a swap entry which is just a swapcache.
i.e. swap entry can be freed immediately.
This counter tells us there is a chance to reclaim swap entries.
Maybe good for mem+swap controller.
(Freeing routine itself is a homework...)
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
include/linux/swap.h | 1 +
mm/swapfile.c | 15 ++++++++++++---
2 files changed, 13 insertions(+), 3 deletions(-)
Index: mmotm-2.6.30-May17/include/linux/swap.h
===================================================================
--- mmotm-2.6.30-May17.orig/include/linux/swap.h
+++ mmotm-2.6.30-May17/include/linux/swap.h
@@ -156,6 +156,7 @@ struct swap_info_struct {
unsigned int max;
unsigned int inuse_pages;
unsigned int old_block_size;
+ unsigned int orphan_swap_cache;
};
struct swap_list_t {
Index: mmotm-2.6.30-May17/mm/swapfile.c
===================================================================
--- mmotm-2.6.30-May17.orig/mm/swapfile.c
+++ mmotm-2.6.30-May17/mm/swapfile.c
@@ -291,9 +291,10 @@ checks:
si->lowest_bit = si->max;
si->highest_bit = 0;
}
- if (cache)
+ if (cache) {
si->swap_map[offset] = SWAP_HAS_CACHE; /* via get_swap_page() */
- else
+ si->orphan_swap_cache++;
+ } else
si->swap_map[offset] = 1; /* via alloc_swap_block() */
si->cluster_next = offset + 1;
@@ -521,9 +522,14 @@ static int swap_entry_free(struct swap_i
swap_list.next = p - swap_info;
nr_swap_pages++;
p->inuse_pages--;
+ if (cache)
+ p->orphan_swap_cache--;
}
- if (!swap_has_ref(count))
+ if (!swap_has_ref(count)) {
mem_cgroup_uncharge_swap(ent);
+ if (count & SWAP_HAS_CACHE)
+ p->orphan_swap_cache++;
+ }
return count;
}
@@ -2022,6 +2028,9 @@ int swap_duplicate(swp_entry_t entry)
goto out_unlock;
count = p->swap_map[offset] & SWAP_MAP_MASK;
+ if (!count && (p->swap_map[offset] & SWAP_HAS_CACHE))
+ p->orphan_swap_cache++;
+
if (count < SWAP_MAP_MAX - 1) {
p->swap_map[offset] += 1;
result = 1;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-05-22 8:07 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-22 7:57 [RFC][PATCH 0/3] fix memcg to do swap account in right way (avoid swap account leak) KAMEZAWA Hiroyuki
2009-05-22 8:01 ` [RFC][PATCH 1/3] add SWAP_HAS_CACHE flag to swapmap KAMEZAWA Hiroyuki
2009-05-22 8:04 ` [RFC][PATCH 2/3] fix memcg swap account to handle swap ref itself KAMEZAWA Hiroyuki
2009-05-22 8:05 ` [RFC][PATCH 3/3] count swap caches whose swp_entry can be freed KAMEZAWA Hiroyuki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox