[PATCH -mm 0/6] Per-memcg slab shrinkers

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH -mm 0/6] Per-memcg slab shrinkers
@ 2014-07-28  9:31 Vladimir Davydov
  2014-07-28  9:31 ` [PATCH -mm 1/6] list_lru, shrinkers: introduce list_lru_shrink_{count,walk} Vladimir Davydov
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Vladimir Davydov @ 2014-07-28  9:31 UTC (permalink / raw)
  To: akpm
  Cc: hannes, mhocko, glommer, david, viro, gthelen, mgorman, linux-mm,
	linux-kernel

[ It's been a long time since I sent the last version of this set, so
  I'm restarting the versioning. For those, who are interested in the
  patch set history, see https://lkml.org/lkml/2014/2/5/358 ]

Hi,

This patch set introduces per-memcg slab shrinkers support and
implements per-memcg fs (dcache, icache) shrinkers. It was initially
proposed by Glauber Costa.

The idea lying behind this is to make the list_lru structure per-memcg
and put objects relating to a particular memcg to the corresponding
list. This way, to turn a shrinker using list_lru for organizing
reclaimable objects to memcg aware one it's enough to initialize its
list_lru as memcg aware.

Please, note that even with this set, current kmemcg implementation has
serious flaws, which make it unusable in production:

 - Kmem-only reclaim, which would trigger on hitting memory.kmem.limit,
   is not implemented yet. This makes memory.kmem.limite < memory.limit
   setups unusable. We are not quite sure if we really need a separate
   knob for kmem.limit though (see the discussion at
   https://lkml.org/lkml/2014/7/16/412).

 - Since kmem cache self destruction patch set was withdrawn due to
   performance reasons (https://lkml.org/lkml/2014/7/15/361), per memcg
   kmem caches, which have objects on css offline, are still leaked. I'm
   planning to introduce a shrinker for such caches.

 - Per-memcg arrays of kmem_cache's and list_lru's can only grow and are
   never shrunk. Since the number of offline memcg's hanging around is
   practically unlimited, these arrays may become really huge and result
   in various problems even if nobody uses cgroups right now. I'm
   considering using flex_array's for those caches so that we could
   reclaim their parts on memory pressure.

That's why I still leave CONFIG_MEMCG_KMEM marked as "only for
development/testing".

The patch set is organized as follows:
 - patches 1 and 2 make list_lru and fs-private shrinker interfaces
   neater and suitable for extending towards per-memcg reclaim;
 - patch 3 introduces per-memcg slab shrinker core;
 - patch 4 makes list_lru memcg-aware and patch 5 marks dcache and
   icache shrinkers as memcg aware.
 - patch 6 extends memcg iterator to include offline css's to allow
   kmem reclaim from dead css's.

Thanks,

Vladimir Davydov (6):
  list_lru, shrinkers: introduce list_lru_shrink_{count,walk}
  fs: consolidate {nr,free}_cached_objects args in shrink_control
  vmscan: shrink slab on memcg pressure
  list_lru: add per-memcg lists
  fs: make shrinker memcg aware
  memcg: iterator: do not skip offline css

 fs/dcache.c                |   14 ++-
 fs/gfs2/main.c             |    2 +-
 fs/gfs2/quota.c            |    6 +-
 fs/inode.c                 |    7 +-
 fs/internal.h              |    7 +-
 fs/super.c                 |   45 ++++----
 fs/xfs/xfs_buf.c           |    9 +-
 fs/xfs/xfs_qm.c            |    9 +-
 fs/xfs/xfs_super.c         |    7 +-
 include/linux/fs.h         |    6 +-
 include/linux/list_lru.h   |   82 +++++++++-----
 include/linux/memcontrol.h |   64 +++++++++++
 include/linux/shrinker.h   |   10 +-
 mm/list_lru.c              |  132 +++++++++++++++++++----
 mm/memcontrol.c            |  258 ++++++++++++++++++++++++++++++++++++++++----
 mm/vmscan.c                |   94 ++++++++++++----
 mm/workingset.c            |    9 +-
 17 files changed, 615 insertions(+), 146 deletions(-)

-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH -mm 1/6] list_lru, shrinkers: introduce list_lru_shrink_{count,walk}
  2014-07-28  9:31 [PATCH -mm 0/6] Per-memcg slab shrinkers Vladimir Davydov
@ 2014-07-28  9:31 ` Vladimir Davydov
  2014-07-28  9:31 ` [PATCH -mm 2/6] fs: consolidate {nr,free}_cached_objects args in shrink_control Vladimir Davydov
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Vladimir Davydov @ 2014-07-28  9:31 UTC (permalink / raw)
  To: akpm
  Cc: hannes, mhocko, glommer, david, viro, gthelen, mgorman, linux-mm,
	linux-kernel

NUMA aware slab shrinkers use the list_lru structure to distribute
objects coming from different NUMA nodes to different lists. Whenever
such a shrinker needs to count or scan objects from a particular node,
it issues commands like this:

        count = list_lru_count_node(lru, sc->nid);
        freed = list_lru_walk_node(lru, sc->nid, isolate_func,
                                   isolate_arg, &sc->nr_to_scan);

where sc is an instance of the shrink_control structure passed to it
from vmscan.

To simplify this, let's add special list_lru functions to be used by
shrinkers, list_lru_shrink_count() and list_lru_shrink_walk(), which
consolidate the nid and nr_to_scan arguments in the shrink_control
structure.

This will also allow us to avoid patching shrinkers that use list_lru
when we make shrink_slab() per-memcg - all we will have to do is extend
the shrink_control structure to include the target memcg and make
list_lru_shrink_{count,walk} handle this appropriately.

Suggested-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 fs/dcache.c              |   14 ++++++--------
 fs/gfs2/quota.c          |    6 +++---
 fs/inode.c               |    7 +++----
 fs/internal.h            |    7 +++----
 fs/super.c               |   22 ++++++++++------------
 fs/xfs/xfs_buf.c         |    7 +++----
 fs/xfs/xfs_qm.c          |    7 +++----
 include/linux/list_lru.h |   16 ++++++++++++++++
 mm/workingset.c          |    6 +++---
 9 files changed, 50 insertions(+), 42 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index b7e8b20f797b..2c4337076488 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -913,24 +913,22 @@ dentry_lru_isolate(struct list_head *item, spinlock_t *lru_lock, void *arg)
 /**
  * prune_dcache_sb - shrink the dcache
  * @sb: superblock
- * @nr_to_scan : number of entries to try to free
- * @nid: which node to scan for freeable entities
+ * @sc: shrink control, passed to list_lru_shrink_walk()
  *
- * Attempt to shrink the superblock dcache LRU by @nr_to_scan entries. This is
- * done when we need more memory an called from the superblock shrinker
+ * Attempt to shrink the superblock dcache LRU by @sc->nr_to_scan entries. This
+ * is done when we need more memory and called from the superblock shrinker
  * function.
  *
  * This function may fail to free any resources if all the dentries are in
  * use.
  */
-long prune_dcache_sb(struct super_block *sb, unsigned long nr_to_scan,
-		     int nid)
+long prune_dcache_sb(struct super_block *sb, struct shrink_control *sc)
 {
 	LIST_HEAD(dispose);
 	long freed;
 
-	freed = list_lru_walk_node(&sb->s_dentry_lru, nid, dentry_lru_isolate,
-				       &dispose, &nr_to_scan);
+	freed = list_lru_shrink_walk(&sb->s_dentry_lru, sc,
+				     dentry_lru_isolate, &dispose);
 	shrink_dentry_list(&dispose);
 	return freed;
 }
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 64b29f7f6b4c..6292d79fc340 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -171,8 +171,8 @@ static unsigned long gfs2_qd_shrink_scan(struct shrinker *shrink,
 	if (!(sc->gfp_mask & __GFP_FS))
 		return SHRINK_STOP;
 
-	freed = list_lru_walk_node(&gfs2_qd_lru, sc->nid, gfs2_qd_isolate,
-				   &dispose, &sc->nr_to_scan);
+	freed = list_lru_shrink_walk(&gfs2_qd_lru, sc,
+				     gfs2_qd_isolate, &dispose);
 
 	gfs2_qd_dispose(&dispose);
 
@@ -182,7 +182,7 @@ static unsigned long gfs2_qd_shrink_scan(struct shrinker *shrink,
 static unsigned long gfs2_qd_shrink_count(struct shrinker *shrink,
 					  struct shrink_control *sc)
 {
-	return vfs_pressure_ratio(list_lru_count_node(&gfs2_qd_lru, sc->nid));
+	return vfs_pressure_ratio(list_lru_shrink_count(&gfs2_qd_lru, sc));
 }
 
 struct shrinker gfs2_qd_shrinker = {
diff --git a/fs/inode.c b/fs/inode.c
index 5938f3928944..89b4d6f41020 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -748,14 +748,13 @@ inode_lru_isolate(struct list_head *item, spinlock_t *lru_lock, void *arg)
  * to trim from the LRU. Inodes to be freed are moved to a temporary list and
  * then are freed outside inode_lock by dispose_list().
  */
-long prune_icache_sb(struct super_block *sb, unsigned long nr_to_scan,
-		     int nid)
+long prune_icache_sb(struct super_block *sb, struct shrink_control *sc)
 {
 	LIST_HEAD(freeable);
 	long freed;
 
-	freed = list_lru_walk_node(&sb->s_inode_lru, nid, inode_lru_isolate,
-				       &freeable, &nr_to_scan);
+	freed = list_lru_shrink_walk(&sb->s_inode_lru, sc,
+				     inode_lru_isolate, &freeable);
 	dispose_list(&freeable);
 	return freed;
 }
diff --git a/fs/internal.h b/fs/internal.h
index 465742407466..3db5f6e41cd7 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -14,6 +14,7 @@ struct file_system_type;
 struct linux_binprm;
 struct path;
 struct mount;
+struct shrink_control;
 
 /*
  * block_dev.c
@@ -107,8 +108,7 @@ extern int open_check_o_direct(struct file *f);
  * inode.c
  */
 extern spinlock_t inode_sb_list_lock;
-extern long prune_icache_sb(struct super_block *sb, unsigned long nr_to_scan,
-			    int nid);
+extern long prune_icache_sb(struct super_block *sb, struct shrink_control *sc);
 extern void inode_add_lru(struct inode *inode);
 
 /*
@@ -125,8 +125,7 @@ extern int invalidate_inodes(struct super_block *, bool);
  */
 extern struct dentry *__d_alloc(struct super_block *, const struct qstr *);
 extern int d_set_mounted(struct dentry *dentry);
-extern long prune_dcache_sb(struct super_block *sb, unsigned long nr_to_scan,
-			    int nid);
+extern long prune_dcache_sb(struct super_block *sb, struct shrink_control *sc);
 
 /*
  * read_write.c
diff --git a/fs/super.c b/fs/super.c
index 872b26bf06dd..b4f5679d0d8c 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -78,27 +78,27 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
 	if (sb->s_op->nr_cached_objects)
 		fs_objects = sb->s_op->nr_cached_objects(sb, sc->nid);
 
-	inodes = list_lru_count_node(&sb->s_inode_lru, sc->nid);
-	dentries = list_lru_count_node(&sb->s_dentry_lru, sc->nid);
+	inodes = list_lru_shrink_count(&sb->s_inode_lru, sc);
+	dentries = list_lru_shrink_count(&sb->s_dentry_lru, sc);
 	total_objects = dentries + inodes + fs_objects + 1;
 
 	/* proportion the scan between the caches */
 	dentries = mult_frac(sc->nr_to_scan, dentries, total_objects);
 	inodes = mult_frac(sc->nr_to_scan, inodes, total_objects);
+	fs_objects = mult_frac(sc->nr_to_scan, fs_objects, total_objects);
 
 	/*
 	 * prune the dcache first as the icache is pinned by it, then
 	 * prune the icache, followed by the filesystem specific caches
 	 */
-	freed = prune_dcache_sb(sb, dentries, sc->nid);
-	freed += prune_icache_sb(sb, inodes, sc->nid);
+	sc->nr_to_scan = dentries;
+	freed = prune_dcache_sb(sb, sc);
+	sc->nr_to_scan = inodes;
+	freed += prune_icache_sb(sb, sc);
 
-	if (fs_objects) {
-		fs_objects = mult_frac(sc->nr_to_scan, fs_objects,
-								total_objects);
+	if (fs_objects)
 		freed += sb->s_op->free_cached_objects(sb, fs_objects,
 						       sc->nid);
-	}
 
 	drop_super(sb);
 	return freed;
@@ -124,10 +124,8 @@ static unsigned long super_cache_count(struct shrinker *shrink,
 		total_objects = sb->s_op->nr_cached_objects(sb,
 						 sc->nid);
 
-	total_objects += list_lru_count_node(&sb->s_dentry_lru,
-						 sc->nid);
-	total_objects += list_lru_count_node(&sb->s_inode_lru,
-						 sc->nid);
+	total_objects += list_lru_shrink_count(&sb->s_dentry_lru, sc);
+	total_objects += list_lru_shrink_count(&sb->s_inode_lru, sc);
 
 	total_objects = vfs_pressure_ratio(total_objects);
 	return total_objects;
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index a6dc83e70ece..1a5e178fd8d0 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1572,10 +1572,9 @@ xfs_buftarg_shrink_scan(
 					struct xfs_buftarg, bt_shrinker);
 	LIST_HEAD(dispose);
 	unsigned long		freed;
-	unsigned long		nr_to_scan = sc->nr_to_scan;
 
-	freed = list_lru_walk_node(&btp->bt_lru, sc->nid, xfs_buftarg_isolate,
-				       &dispose, &nr_to_scan);
+	freed = list_lru_shrink_walk(&btp->bt_lru, sc,
+				     xfs_buftarg_isolate, &dispose);
 
 	while (!list_empty(&dispose)) {
 		struct xfs_buf *bp;
@@ -1594,7 +1593,7 @@ xfs_buftarg_shrink_count(
 {
 	struct xfs_buftarg	*btp = container_of(shrink,
 					struct xfs_buftarg, bt_shrinker);
-	return list_lru_count_node(&btp->bt_lru, sc->nid);
+	return list_lru_shrink_count(&btp->bt_lru, sc);
 }
 
 void
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index ba284f6469db..76640cd73a23 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -618,7 +618,6 @@ xfs_qm_shrink_scan(
 	struct xfs_qm_isolate	isol;
 	unsigned long		freed;
 	int			error;
-	unsigned long		nr_to_scan = sc->nr_to_scan;
 
 	if ((sc->gfp_mask & (__GFP_FS|__GFP_WAIT)) != (__GFP_FS|__GFP_WAIT))
 		return 0;
@@ -626,8 +625,8 @@ xfs_qm_shrink_scan(
 	INIT_LIST_HEAD(&isol.buffers);
 	INIT_LIST_HEAD(&isol.dispose);
 
-	freed = list_lru_walk_node(&qi->qi_lru, sc->nid, xfs_qm_dquot_isolate, &isol,
-					&nr_to_scan);
+	freed = list_lru_shrink_walk(&qi->qi_lru, sc,
+				     xfs_qm_dquot_isolate, &isol);
 
 	error = xfs_buf_delwri_submit(&isol.buffers);
 	if (error)
@@ -652,7 +651,7 @@ xfs_qm_shrink_count(
 	struct xfs_quotainfo	*qi = container_of(shrink,
 					struct xfs_quotainfo, qi_shrinker);
 
-	return list_lru_count_node(&qi->qi_lru, sc->nid);
+	return list_lru_shrink_count(&qi->qi_lru, sc);
 }
 
 /*
diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
index f3434533fbf8..f500a2e39b13 100644
--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -9,6 +9,7 @@
 
 #include <linux/list.h>
 #include <linux/nodemask.h>
+#include <linux/shrinker.h>
 
 /* list_lru_walk_cb has to always return one of those */
 enum lru_status {
@@ -81,6 +82,13 @@ bool list_lru_del(struct list_lru *lru, struct list_head *item);
  * Callers that want such a guarantee need to provide an outer lock.
  */
 unsigned long list_lru_count_node(struct list_lru *lru, int nid);
+
+static inline unsigned long list_lru_shrink_count(struct list_lru *lru,
+						  struct shrink_control *sc)
+{
+	return list_lru_count_node(lru, sc->nid);
+}
+
 static inline unsigned long list_lru_count(struct list_lru *lru)
 {
 	long count = 0;
@@ -120,6 +128,14 @@ unsigned long list_lru_walk_node(struct list_lru *lru, int nid,
 				 unsigned long *nr_to_walk);
 
 static inline unsigned long
+list_lru_shrink_walk(struct list_lru *lru, struct shrink_control *sc,
+		     list_lru_walk_cb isolate, void *cb_arg)
+{
+	return list_lru_walk_node(lru, sc->nid, isolate, cb_arg,
+				  &sc->nr_to_scan);
+}
+
+static inline unsigned long
 list_lru_walk(struct list_lru *lru, list_lru_walk_cb isolate,
 	      void *cb_arg, unsigned long nr_to_walk)
 {
diff --git a/mm/workingset.c b/mm/workingset.c
index f7216fa7da27..d4fa7fb10a52 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -275,7 +275,7 @@ static unsigned long count_shadow_nodes(struct shrinker *shrinker,
 
 	/* list_lru lock nests inside IRQ-safe mapping->tree_lock */
 	local_irq_disable();
-	shadow_nodes = list_lru_count_node(&workingset_shadow_nodes, sc->nid);
+	shadow_nodes = list_lru_shrink_count(&workingset_shadow_nodes, sc);
 	local_irq_enable();
 
 	pages = node_present_pages(sc->nid);
@@ -376,8 +376,8 @@ static unsigned long scan_shadow_nodes(struct shrinker *shrinker,
 
 	/* list_lru lock nests inside IRQ-safe mapping->tree_lock */
 	local_irq_disable();
-	ret =  list_lru_walk_node(&workingset_shadow_nodes, sc->nid,
-				  shadow_lru_isolate, NULL, &sc->nr_to_scan);
+	ret =  list_lru_shrink_walk(&workingset_shadow_nodes, sc,
+				    shadow_lru_isolate, NULL);
 	local_irq_enable();
 	return ret;
 }
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH -mm 2/6] fs: consolidate {nr,free}_cached_objects args in shrink_control
  2014-07-28  9:31 [PATCH -mm 0/6] Per-memcg slab shrinkers Vladimir Davydov
  2014-07-28  9:31 ` [PATCH -mm 1/6] list_lru, shrinkers: introduce list_lru_shrink_{count,walk} Vladimir Davydov
@ 2014-07-28  9:31 ` Vladimir Davydov
  2014-07-28  9:31 ` [PATCH -mm 3/6] vmscan: shrink slab on memcg pressure Vladimir Davydov
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Vladimir Davydov @ 2014-07-28  9:31 UTC (permalink / raw)
  To: akpm
  Cc: hannes, mhocko, glommer, david, viro, gthelen, mgorman, linux-mm,
	linux-kernel

We are going to make FS shrinkers memcg-aware. To achieve that, we will
have to pass the memcg to scan to the nr_cached_objects and
free_cached_objects VFS methods, which currently take only the NUMA node
to scan. Since the shrink_control structure already holds the node, and
the memcg to scan will be added to it when we introduce memcg-aware
vmscan, let us consolidate the methods' arguments in this structure to
keep things clean.

Suggested-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 fs/super.c         |   12 ++++++------
 fs/xfs/xfs_super.c |    7 +++----
 include/linux/fs.h |    6 ++++--
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index b4f5679d0d8c..1f34321e15b4 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -76,7 +76,7 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
 		return SHRINK_STOP;
 
 	if (sb->s_op->nr_cached_objects)
-		fs_objects = sb->s_op->nr_cached_objects(sb, sc->nid);
+		fs_objects = sb->s_op->nr_cached_objects(sb, sc);
 
 	inodes = list_lru_shrink_count(&sb->s_inode_lru, sc);
 	dentries = list_lru_shrink_count(&sb->s_dentry_lru, sc);
@@ -96,9 +96,10 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
 	sc->nr_to_scan = inodes;
 	freed += prune_icache_sb(sb, sc);
 
-	if (fs_objects)
-		freed += sb->s_op->free_cached_objects(sb, fs_objects,
-						       sc->nid);
+	if (fs_objects) {
+		sc->nr_to_scan = fs_objects;
+		freed += sb->s_op->free_cached_objects(sb, sc);
+	}
 
 	drop_super(sb);
 	return freed;
@@ -121,8 +122,7 @@ static unsigned long super_cache_count(struct shrinker *shrink,
 	 * s_op->nr_cached_objects().
 	 */
 	if (sb->s_op && sb->s_op->nr_cached_objects)
-		total_objects = sb->s_op->nr_cached_objects(sb,
-						 sc->nid);
+		total_objects = sb->s_op->nr_cached_objects(sb, sc);
 
 	total_objects += list_lru_shrink_count(&sb->s_dentry_lru, sc);
 	total_objects += list_lru_shrink_count(&sb->s_inode_lru, sc);
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 986c5577c4e9..0df5f4d7150f 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1521,7 +1521,7 @@ xfs_fs_mount(
 static long
 xfs_fs_nr_cached_objects(
 	struct super_block	*sb,
-	int			nid)
+	struct shrink_control	*sc)
 {
 	return xfs_reclaim_inodes_count(XFS_M(sb));
 }
@@ -1529,10 +1529,9 @@ xfs_fs_nr_cached_objects(
 static long
 xfs_fs_free_cached_objects(
 	struct super_block	*sb,
-	long			nr_to_scan,
-	int			nid)
+	struct shrink_control	*sc)
 {
-	return xfs_reclaim_inodes_nr(XFS_M(sb), nr_to_scan);
+	return xfs_reclaim_inodes_nr(XFS_M(sb), sc->nr_to_scan);
 }
 
 static const struct super_operations xfs_super_operations = {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 26b4970e9fb8..6193236aca16 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1563,8 +1563,10 @@ struct super_operations {
 			       loff_t);
 #endif
 	int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
-	long (*nr_cached_objects)(struct super_block *, int);
-	long (*free_cached_objects)(struct super_block *, long, int);
+	long (*nr_cached_objects)(struct super_block *,
+				  struct shrink_control *);
+	long (*free_cached_objects)(struct super_block *,
+				    struct shrink_control *);
 };
 
 /*
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH -mm 3/6] vmscan: shrink slab on memcg pressure
  2014-07-28  9:31 [PATCH -mm 0/6] Per-memcg slab shrinkers Vladimir Davydov
  2014-07-28  9:31 ` [PATCH -mm 1/6] list_lru, shrinkers: introduce list_lru_shrink_{count,walk} Vladimir Davydov
  2014-07-28  9:31 ` [PATCH -mm 2/6] fs: consolidate {nr,free}_cached_objects args in shrink_control Vladimir Davydov
@ 2014-07-28  9:31 ` Vladimir Davydov
  2014-07-28  9:31 ` [PATCH -mm 4/6] list_lru: add per-memcg lists Vladimir Davydov
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Vladimir Davydov @ 2014-07-28  9:31 UTC (permalink / raw)
  To: akpm
  Cc: hannes, mhocko, glommer, david, viro, gthelen, mgorman, linux-mm,
	linux-kernel

This patch makes direct reclaim path shrink slab not only on global
memory pressure, but also when we reach the user memory limit of a
memcg. To achieve that, it makes shrink_slab() walk over the memcg
hierarchy and run shrinkers marked as memcg-aware on the target memcg
and all its descendants. The memcg to scan is passed in a shrink_control
structure; memcg-unaware shrinkers are still called only on global
memory pressure with memcg=NULL. It is up to the shrinker how to
organize the objects it is responsible for to achieve per-memcg reclaim.

Note that we do not intend to have true per memcg per node reclaim.
Since most memcgs are small and typically confined to a single NUMA node
or two by external means and therefore do not need the scalability NUMA
aware shrinkers provide, we actually call per node shrinking only for
the global list (memcg=NULL), while per-memcg lists are always scanned
only once irrespective of the nodemask with nid=0.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 include/linux/memcontrol.h |   22 +++++++++++
 include/linux/shrinker.h   |   10 ++++-
 mm/memcontrol.c            |   45 ++++++++++++++++++++-
 mm/vmscan.c                |   94 ++++++++++++++++++++++++++++++++++----------
 4 files changed, 149 insertions(+), 22 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index e0752d204d9e..d0f3d8f0990c 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -68,6 +68,9 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage,
 struct lruvec *mem_cgroup_zone_lruvec(struct zone *, struct mem_cgroup *);
 struct lruvec *mem_cgroup_page_lruvec(struct page *, struct zone *);
 
+unsigned long mem_cgroup_zone_reclaimable_pages(struct zone *zone,
+						struct mem_cgroup *memcg);
+
 bool __mem_cgroup_same_or_subtree(const struct mem_cgroup *root_memcg,
 				  struct mem_cgroup *memcg);
 bool task_in_mem_cgroup(struct task_struct *task,
@@ -251,6 +254,12 @@ static inline struct lruvec *mem_cgroup_page_lruvec(struct page *page,
 	return &zone->lruvec;
 }
 
+static inline unsigned long mem_cgroup_zone_reclaimable_pages(struct zone *zone,
+							struct mem_cgroup *)
+{
+	return 0;
+}
+
 static inline struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page)
 {
 	return NULL;
@@ -421,6 +430,9 @@ static inline bool memcg_kmem_enabled(void)
 	return static_key_false(&memcg_kmem_enabled_key);
 }
 
+bool memcg_kmem_is_active(struct mem_cgroup *memcg);
+bool memcg_kmem_should_reclaim(struct mem_cgroup *memcg);
+
 /*
  * In general, we'll do everything in our power to not incur in any overhead
  * for non-memcg users for the kmem functions. Not even a function call, if we
@@ -554,6 +566,16 @@ static inline bool memcg_kmem_enabled(void)
 	return false;
 }
 
+static inline bool memcg_kmem_is_active(struct mem_cgroup *memcg)
+{
+	return false;
+}
+
+static inline bool memcg_kmem_should_reclaim(struct mem_cgroup *memcg)
+{
+	return false;
+}
+
 static inline bool
 memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order)
 {
diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 68c097077ef0..ab79b174bfbe 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -20,8 +20,15 @@ struct shrink_control {
 
 	/* shrink from these nodes */
 	nodemask_t nodes_to_scan;
+
+	/* shrink from this memory cgroup hierarchy (if not NULL) */
+	struct mem_cgroup *target_mem_cgroup;
+
 	/* current node being shrunk (for NUMA aware shrinkers) */
 	int nid;
+
+	/* current memcg being shrunk (for memcg aware shrinkers) */
+	struct mem_cgroup *memcg;
 };
 
 #define SHRINK_STOP (~0UL)
@@ -63,7 +70,8 @@ struct shrinker {
 #define DEFAULT_SEEKS 2 /* A good number if you don't know better. */
 
 /* Flags */
-#define SHRINKER_NUMA_AWARE (1 << 0)
+#define SHRINKER_NUMA_AWARE	(1 << 0)
+#define SHRINKER_MEMCG_AWARE	(1 << 1)
 
 extern int register_shrinker(struct shrinker *);
 extern void unregister_shrinker(struct shrinker *);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b7c9a202dee9..6a96a3994692 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -391,7 +391,7 @@ static inline void memcg_kmem_set_active(struct mem_cgroup *memcg)
 	set_bit(KMEM_ACCOUNTED_ACTIVE, &memcg->kmem_account_flags);
 }
 
-static bool memcg_kmem_is_active(struct mem_cgroup *memcg)
+bool memcg_kmem_is_active(struct mem_cgroup *memcg)
 {
 	return test_bit(KMEM_ACCOUNTED_ACTIVE, &memcg->kmem_account_flags);
 }
@@ -1409,6 +1409,34 @@ void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru,
 	VM_BUG_ON((long)(*lru_size) < 0);
 }
 
+unsigned long mem_cgroup_zone_reclaimable_pages(struct zone *zone,
+						struct mem_cgroup *memcg)
+{
+	int nid = zone_to_nid(zone);
+	int zid = zone_idx(zone);
+	unsigned long nr = 0;
+	unsigned int lru_mask;
+	struct mem_cgroup *iter;
+
+	lru_mask = LRU_ALL_FILE;
+	if (do_swap_account)
+		lru_mask |= LRU_ALL_ANON;
+
+	iter = memcg;
+	do {
+		struct mem_cgroup_per_zone *mz;
+		enum lru_list lru;
+
+		mz = &memcg->nodeinfo[nid]->zoneinfo[zid];
+		for_each_lru(lru) {
+			if (BIT(lru) & lru_mask)
+				nr += mz->lru_size[lru];
+		}
+	} while ((iter = mem_cgroup_iter(memcg, iter, NULL)) != NULL);
+
+	return nr;
+}
+
 /*
  * Checks whether given mem is same or in the root_mem_cgroup's
  * hierarchy subtree
@@ -2886,6 +2914,21 @@ static void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size)
 		css_put(&memcg->css);
 }
 
+bool memcg_kmem_should_reclaim(struct mem_cgroup *memcg)
+{
+	struct mem_cgroup *iter;
+
+	iter = memcg;
+	do {
+		if (memcg_kmem_is_active(iter)) {
+			mem_cgroup_iter_break(memcg, iter);
+			return true;
+		}
+	} while ((iter = mem_cgroup_iter(memcg, iter, NULL)) != NULL);
+
+	return false;
+}
+
 /*
  * helper for acessing a memcg's index. It will be used as an index in the
  * child cache array in kmem_cache, and also to derive its name. This function
diff --git a/mm/vmscan.c b/mm/vmscan.c
index d698f4f7b0f2..290dd18a6959 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -340,6 +340,33 @@ shrink_slab_node(struct shrink_control *shrinkctl, struct shrinker *shrinker,
 	return freed;
 }
 
+static unsigned long
+run_shrinker(struct shrink_control *shrinkctl, struct shrinker *shrinker,
+	     unsigned long nr_pages_scanned, unsigned long lru_pages)
+{
+	unsigned long freed = 0;
+
+	/*
+	 * Since most memory cgroups are small and typically confined to a
+	 * single NUMA node or two by external means and therefore do not need
+	 * the scalability NUMA aware shrinkers provide, we implement per node
+	 * shrinking only for the global list.
+	 */
+	if (!(shrinker->flags & SHRINKER_NUMA_AWARE) ||
+	    shrinkctl->memcg) {
+		shrinkctl->nid = 0;
+		return shrink_slab_node(shrinkctl, shrinker,
+					nr_pages_scanned, lru_pages);
+	}
+
+	for_each_node_mask(shrinkctl->nid, shrinkctl->nodes_to_scan) {
+		if (node_online(shrinkctl->nid))
+			freed += shrink_slab_node(shrinkctl, shrinker,
+						  nr_pages_scanned, lru_pages);
+	}
+	return freed;
+}
+
 /*
  * Call the shrink functions to age shrinkable caches
  *
@@ -381,20 +408,34 @@ unsigned long shrink_slab(struct shrink_control *shrinkctl,
 	}
 
 	list_for_each_entry(shrinker, &shrinker_list, list) {
-		if (!(shrinker->flags & SHRINKER_NUMA_AWARE)) {
-			shrinkctl->nid = 0;
-			freed += shrink_slab_node(shrinkctl, shrinker,
-					nr_pages_scanned, lru_pages);
+		/*
+		 * Call memcg-unaware shrinkers only on global pressure.
+		 */
+		if (!(shrinker->flags & SHRINKER_MEMCG_AWARE)) {
+			if (!shrinkctl->target_mem_cgroup) {
+				shrinkctl->memcg = NULL;
+				freed += run_shrinker(shrinkctl, shrinker,
+						nr_pages_scanned, lru_pages);
+			}
 			continue;
 		}
 
-		for_each_node_mask(shrinkctl->nid, shrinkctl->nodes_to_scan) {
-			if (node_online(shrinkctl->nid))
-				freed += shrink_slab_node(shrinkctl, shrinker,
+		/*
+		 * For memcg-aware shrinkers iterate over the target memcg
+		 * hierarchy and run the shrinker on each kmem-active memcg
+		 * found in the hierarchy.
+		 */
+		shrinkctl->memcg = shrinkctl->target_mem_cgroup;
+		do {
+			if (!shrinkctl->memcg ||
+			    memcg_kmem_is_active(shrinkctl->memcg))
+				freed += run_shrinker(shrinkctl, shrinker,
 						nr_pages_scanned, lru_pages);
-
-		}
+		} while ((shrinkctl->memcg =
+			  mem_cgroup_iter(shrinkctl->target_mem_cgroup,
+					  shrinkctl->memcg, NULL)) != NULL);
 	}
+
 	up_read(&shrinker_rwsem);
 out:
 	cond_resched();
@@ -2369,6 +2410,7 @@ static bool shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 	gfp_t orig_mask;
 	struct shrink_control shrink = {
 		.gfp_mask = sc->gfp_mask,
+		.target_mem_cgroup = sc->target_mem_cgroup,
 	};
 	enum zone_type requested_highidx = gfp_zone(sc->gfp_mask);
 	bool reclaimable = false;
@@ -2388,17 +2430,22 @@ static bool shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 					gfp_zone(sc->gfp_mask), sc->nodemask) {
 		if (!populated_zone(zone))
 			continue;
+
+		if (global_reclaim(sc) &&
+		    !cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
+			continue;
+
+		lru_pages += global_reclaim(sc) ?
+				zone_reclaimable_pages(zone) :
+				mem_cgroup_zone_reclaimable_pages(zone,
+						sc->target_mem_cgroup);
+		node_set(zone_to_nid(zone), shrink.nodes_to_scan);
+
 		/*
 		 * Take care memory controller reclaiming has small influence
 		 * to global LRU.
 		 */
 		if (global_reclaim(sc)) {
-			if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
-				continue;
-
-			lru_pages += zone_reclaimable_pages(zone);
-			node_set(zone_to_nid(zone), shrink.nodes_to_scan);
-
 			if (sc->priority != DEF_PRIORITY &&
 			    !zone_reclaimable(zone))
 				continue;	/* Let kswapd poll it */
@@ -2446,12 +2493,11 @@ static bool shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 	}
 
 	/*
-	 * Don't shrink slabs when reclaiming memory from over limit cgroups
-	 * but do shrink slab at least once when aborting reclaim for
-	 * compaction to avoid unevenly scanning file/anon LRU pages over slab
-	 * pages.
+	 * Shrink slabs at least once when aborting reclaim for compaction
+	 * to avoid unevenly scanning file/anon LRU pages over slab pages.
 	 */
-	if (global_reclaim(sc)) {
+	if (global_reclaim(sc) ||
+	    memcg_kmem_should_reclaim(sc->target_mem_cgroup)) {
 		shrink_slab(&shrink, sc->nr_scanned, lru_pages);
 		if (reclaim_state) {
 			sc->nr_reclaimed += reclaim_state->reclaimed_slab;
@@ -2753,6 +2799,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
 	struct zonelist *zonelist;
 	unsigned long nr_reclaimed;
 	int nid;
+	struct reclaim_state reclaim_state;
 	struct scan_control sc = {
 		.nr_to_reclaim = SWAP_CLUSTER_MAX,
 		.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
@@ -2773,6 +2820,10 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
 
 	zonelist = NODE_DATA(nid)->node_zonelists;
 
+	lockdep_set_current_reclaim_state(sc.gfp_mask);
+	reclaim_state.reclaimed_slab = 0;
+	current->reclaim_state = &reclaim_state;
+
 	trace_mm_vmscan_memcg_reclaim_begin(0,
 					    sc.may_writepage,
 					    sc.gfp_mask);
@@ -2781,6 +2832,9 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
 
 	trace_mm_vmscan_memcg_reclaim_end(nr_reclaimed);
 
+	current->reclaim_state = NULL;
+	lockdep_clear_current_reclaim_state();
+
 	return nr_reclaimed;
 }
 #endif
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH -mm 4/6] list_lru: add per-memcg lists
  2014-07-28  9:31 [PATCH -mm 0/6] Per-memcg slab shrinkers Vladimir Davydov
                   ` (2 preceding siblings ...)
  2014-07-28  9:31 ` [PATCH -mm 3/6] vmscan: shrink slab on memcg pressure Vladimir Davydov
@ 2014-07-28  9:31 ` Vladimir Davydov
  2014-07-28  9:31 ` [PATCH -mm 5/6] fs: make shrinker memcg aware Vladimir Davydov
  2014-07-28  9:31 ` [PATCH -mm 6/6] memcg: iterator: do not skip offline css Vladimir Davydov
  5 siblings, 0 replies; 7+ messages in thread
From: Vladimir Davydov @ 2014-07-28  9:31 UTC (permalink / raw)
  To: akpm
  Cc: hannes, mhocko, glommer, david, viro, gthelen, mgorman, linux-mm,
	linux-kernel

There are several FS shrinkers, including super_block::s_shrink, that
keep reclaimable objects in the list_lru structure. Hence to turn them
to memcg-aware shrinkers, it is enough to make list_lru per-memcg.

This patch does the trick. It adds an array of LRU lists to the list_lru
structure, one for each kmem-active memcg, and dispatches every item
addition or removal operation to the list corresponding to the memcg the
item is accounted to.

To make a list_lru user memcg-aware, it's enough to pass
memcg_aware=true to list_lru_init, everything else is done
automatically.

Note, this patch removes VM_BUG_ON(!current->mm) from
memcg_{stop,resume}_kmem_account. This is, because these functions may
be invoked by memcg_register_list_lru while mounting filesystems on
early init, where we don't have ->mm yet. Calling them from kernel
threads won't hurt anyway.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 fs/gfs2/main.c             |    2 +-
 fs/super.c                 |    4 +-
 fs/xfs/xfs_buf.c           |    2 +-
 fs/xfs/xfs_qm.c            |    2 +-
 include/linux/list_lru.h   |   86 ++++++++++--------
 include/linux/memcontrol.h |   42 +++++++++
 mm/list_lru.c              |  132 +++++++++++++++++++++++-----
 mm/memcontrol.c            |  208 ++++++++++++++++++++++++++++++++++++++++----
 mm/workingset.c            |    3 +-
 9 files changed, 402 insertions(+), 79 deletions(-)

diff --git a/fs/gfs2/main.c b/fs/gfs2/main.c
index 82b6ac829656..fb51e99a0281 100644
--- a/fs/gfs2/main.c
+++ b/fs/gfs2/main.c
@@ -84,7 +84,7 @@ static int __init init_gfs2_fs(void)
 	if (error)
 		return error;
 
-	error = list_lru_init(&gfs2_qd_lru);
+	error = list_lru_init(&gfs2_qd_lru, false);
 	if (error)
 		goto fail_lru;
 
diff --git a/fs/super.c b/fs/super.c
index 1f34321e15b4..477102d59c7e 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -187,9 +187,9 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags)
 	INIT_HLIST_BL_HEAD(&s->s_anon);
 	INIT_LIST_HEAD(&s->s_inodes);
 
-	if (list_lru_init(&s->s_dentry_lru))
+	if (list_lru_init(&s->s_dentry_lru, false))
 		goto fail;
-	if (list_lru_init(&s->s_inode_lru))
+	if (list_lru_init(&s->s_inode_lru, false))
 		goto fail;
 
 	init_rwsem(&s->s_umount);
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 1a5e178fd8d0..405ff6044a60 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1669,7 +1669,7 @@ xfs_alloc_buftarg(
 	if (xfs_setsize_buftarg_early(btp, bdev))
 		goto error;
 
-	if (list_lru_init(&btp->bt_lru))
+	if (list_lru_init(&btp->bt_lru, false))
 		goto error;
 
 	btp->bt_shrinker.count_objects = xfs_buftarg_shrink_count;
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 76640cd73a23..cb7267297783 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -670,7 +670,7 @@ xfs_qm_init_quotainfo(
 
 	qinf = mp->m_quotainfo = kmem_zalloc(sizeof(xfs_quotainfo_t), KM_SLEEP);
 
-	error = list_lru_init(&qinf->qi_lru);
+	error = list_lru_init(&qinf->qi_lru, false);
 	if (error)
 		goto out_free_qinf;
 
diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
index f500a2e39b13..cf1e73825431 100644
--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -11,6 +11,8 @@
 #include <linux/nodemask.h>
 #include <linux/shrinker.h>
 
+struct list_lru;
+
 /* list_lru_walk_cb has to always return one of those */
 enum lru_status {
 	LRU_REMOVED,		/* item removed from list */
@@ -29,16 +31,50 @@ struct list_lru_node {
 	long			nr_items;
 } ____cacheline_aligned_in_smp;
 
+struct memcg_list_lru_params {
+	/* list_lru which this struct is for */
+	struct list_lru		*owner;
+
+	/* list node for connecting to the list of all memcg-aware lrus */
+	struct list_head	list;
+
+	struct rcu_head		rcu_head;
+
+	/* array of per-memcg lrus, indexed by mem_cgroup->kmemcg_id */
+	struct list_lru_node	*node[0];
+};
+
 struct list_lru {
 	struct list_lru_node	*node;
 	nodemask_t		active_nodes;
+#ifdef CONFIG_MEMCG_KMEM
+	struct memcg_list_lru_params *memcg_params;
+#endif
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	struct lock_class_key	*key;
+#endif
 };
 
+#ifdef CONFIG_MEMCG_KMEM
+static inline bool list_lru_memcg_aware(struct list_lru *lru)
+{
+	return !!lru->memcg_params;
+}
+#else
+static inline bool list_lru_memcg_aware(struct list_lru *lru)
+{
+	return false;
+}
+#endif
+
+void list_lru_node_init(struct list_lru_node *n, struct list_lru *lru);
+
 void list_lru_destroy(struct list_lru *lru);
-int list_lru_init_key(struct list_lru *lru, struct lock_class_key *key);
-static inline int list_lru_init(struct list_lru *lru)
+int list_lru_init_key(struct list_lru *lru, bool memcg_aware,
+		      struct lock_class_key *key);
+static inline int list_lru_init(struct list_lru *lru, bool memcg_aware)
 {
-	return list_lru_init_key(lru, NULL);
+	return list_lru_init_key(lru, memcg_aware, NULL);
 }
 
 /**
@@ -76,28 +112,20 @@ bool list_lru_del(struct list_lru *lru, struct list_head *item);
  * list_lru_count_node: return the number of objects currently held by @lru
  * @lru: the lru pointer.
  * @nid: the node id to count from.
+ * @memcg: the memcg to count from
  *
  * Always return a non-negative number, 0 for empty lists. There is no
  * guarantee that the list is not updated while the count is being computed.
  * Callers that want such a guarantee need to provide an outer lock.
  */
-unsigned long list_lru_count_node(struct list_lru *lru, int nid);
+unsigned long list_lru_count_node(struct list_lru *lru,
+				  int nid, struct mem_cgroup *memcg);
+unsigned long list_lru_count(struct list_lru *lru);
 
 static inline unsigned long list_lru_shrink_count(struct list_lru *lru,
 						  struct shrink_control *sc)
 {
-	return list_lru_count_node(lru, sc->nid);
-}
-
-static inline unsigned long list_lru_count(struct list_lru *lru)
-{
-	long count = 0;
-	int nid;
-
-	for_each_node_mask(nid, lru->active_nodes)
-		count += list_lru_count_node(lru, nid);
-
-	return count;
+	return list_lru_count_node(lru, sc->nid, sc->memcg);
 }
 
 typedef enum lru_status
@@ -106,6 +134,7 @@ typedef enum lru_status
  * list_lru_walk_node: walk a list_lru, isolating and disposing freeable items.
  * @lru: the lru pointer.
  * @nid: the node id to scan from.
+ * @memcg: the memcg to scan from.
  * @isolate: callback function that is resposible for deciding what to do with
  *  the item currently being scanned
  * @cb_arg: opaque type that will be passed to @isolate
@@ -123,31 +152,18 @@ typedef enum lru_status
  *
  * Return value: the number of objects effectively removed from the LRU.
  */
-unsigned long list_lru_walk_node(struct list_lru *lru, int nid,
+unsigned long list_lru_walk_node(struct list_lru *lru,
+				 int nid, struct mem_cgroup *memcg,
 				 list_lru_walk_cb isolate, void *cb_arg,
 				 unsigned long *nr_to_walk);
+unsigned long list_lru_walk(struct list_lru *lru, list_lru_walk_cb isolate,
+			    void *cb_arg, unsigned long nr_to_walk);
 
 static inline unsigned long
 list_lru_shrink_walk(struct list_lru *lru, struct shrink_control *sc,
 		     list_lru_walk_cb isolate, void *cb_arg)
 {
-	return list_lru_walk_node(lru, sc->nid, isolate, cb_arg,
-				  &sc->nr_to_scan);
-}
-
-static inline unsigned long
-list_lru_walk(struct list_lru *lru, list_lru_walk_cb isolate,
-	      void *cb_arg, unsigned long nr_to_walk)
-{
-	long isolated = 0;
-	int nid;
-
-	for_each_node_mask(nid, lru->active_nodes) {
-		isolated += list_lru_walk_node(lru, nid, isolate,
-					       cb_arg, &nr_to_walk);
-		if (nr_to_walk <= 0)
-			break;
-	}
-	return isolated;
+	return list_lru_walk_node(lru, sc->nid, sc->memcg,
+				  isolate, cb_arg, &sc->nr_to_scan);
 }
 #endif /* _LRU_LIST_H */
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d0f3d8f0990c..962e36cb95ae 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -53,6 +53,21 @@ struct mem_cgroup_reclaim_cookie {
 	unsigned int generation;
 };
 
+/*
+ * Iteration constructs for visiting all cgroups (under a tree).  If
+ * loops are exited prematurely (break), mem_cgroup_iter_break() must
+ * be used for reference counting.
+ */
+#define for_each_mem_cgroup_tree(iter, root)		\
+	for (iter = mem_cgroup_iter(root, NULL, NULL);	\
+	     iter != NULL;				\
+	     iter = mem_cgroup_iter(root, iter, NULL))
+
+#define for_each_mem_cgroup(iter)			\
+	for (iter = mem_cgroup_iter(NULL, NULL, NULL);	\
+	     iter != NULL;				\
+	     iter = mem_cgroup_iter(NULL, iter, NULL))
+
 #ifdef CONFIG_MEMCG
 int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
 			  gfp_t gfp_mask, struct mem_cgroup **memcgp);
@@ -459,6 +474,12 @@ void memcg_free_cache_params(struct kmem_cache *s);
 int memcg_update_cache_size(struct kmem_cache *s, int num_groups);
 void memcg_update_array_size(int num_groups);
 
+int memcg_register_list_lru(struct list_lru *lru);
+void memcg_unregister_list_lru(struct list_lru *lru);
+struct list_lru_node *memcg_list_lru(struct list_lru *lru,
+				     struct mem_cgroup *memcg);
+struct list_lru_node *memcg_list_lru_from_obj(struct list_lru *lru, void *obj);
+
 struct kmem_cache *
 __memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp);
 
@@ -611,6 +632,27 @@ memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp)
 {
 	return cachep;
 }
+
+static inline int memcg_register_list_lru(struct list_lru *lru)
+{
+	return 0;
+}
+
+static inline void memcg_unregister_list_lru(struct list_lru *lru)
+{
+}
+
+static inline struct list_lru_node *memcg_list_lru(struct list_lru *lru,
+						   struct mem_cgroup *memcg)
+{
+	return NULL;
+}
+
+static inline struct list_lru_node *memcg_list_lru_from_obj(struct list_lru *lru,
+							    void *obj)
+{
+	return NULL;
+}
 #endif /* CONFIG_MEMCG_KMEM */
 #endif /* _LINUX_MEMCONTROL_H */
 
diff --git a/mm/list_lru.c b/mm/list_lru.c
index f1a0db194173..b914f0930c67 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -9,17 +9,24 @@
 #include <linux/mm.h>
 #include <linux/list_lru.h>
 #include <linux/slab.h>
+#include <linux/memcontrol.h>
 
 bool list_lru_add(struct list_lru *lru, struct list_head *item)
 {
-	int nid = page_to_nid(virt_to_page(item));
-	struct list_lru_node *nlru = &lru->node[nid];
+	int nid = -1;
+	struct list_lru_node *nlru;
+
+	nlru = memcg_list_lru_from_obj(lru, item);
+	if (!nlru) {
+		nid = page_to_nid(virt_to_page(item));
+		nlru = &lru->node[nid];
+	}
 
 	spin_lock(&nlru->lock);
 	WARN_ON_ONCE(nlru->nr_items < 0);
 	if (list_empty(item)) {
 		list_add_tail(item, &nlru->list);
-		if (nlru->nr_items++ == 0)
+		if (nlru->nr_items++ == 0 && nid >= 0)
 			node_set(nid, lru->active_nodes);
 		spin_unlock(&nlru->lock);
 		return true;
@@ -31,13 +38,19 @@ EXPORT_SYMBOL_GPL(list_lru_add);
 
 bool list_lru_del(struct list_lru *lru, struct list_head *item)
 {
-	int nid = page_to_nid(virt_to_page(item));
-	struct list_lru_node *nlru = &lru->node[nid];
+	int nid = -1;
+	struct list_lru_node *nlru;
+
+	nlru = memcg_list_lru_from_obj(lru, item);
+	if (!nlru) {
+		nid = page_to_nid(virt_to_page(item));
+		nlru = &lru->node[nid];
+	}
 
 	spin_lock(&nlru->lock);
 	if (!list_empty(item)) {
 		list_del_init(item);
-		if (--nlru->nr_items == 0)
+		if (--nlru->nr_items == 0 && nid >= 0)
 			node_clear(nid, lru->active_nodes);
 		WARN_ON_ONCE(nlru->nr_items < 0);
 		spin_unlock(&nlru->lock);
@@ -48,12 +61,18 @@ bool list_lru_del(struct list_lru *lru, struct list_head *item)
 }
 EXPORT_SYMBOL_GPL(list_lru_del);
 
-unsigned long
-list_lru_count_node(struct list_lru *lru, int nid)
+unsigned long list_lru_count_node(struct list_lru *lru,
+				  int nid, struct mem_cgroup *memcg)
 {
 	unsigned long count = 0;
 	struct list_lru_node *nlru = &lru->node[nid];
 
+	if (memcg) {
+		nlru = memcg_list_lru(lru, memcg);
+		if (!nlru)
+			return 0;
+	}
+
 	spin_lock(&nlru->lock);
 	WARN_ON_ONCE(nlru->nr_items < 0);
 	count += nlru->nr_items;
@@ -63,15 +82,41 @@ list_lru_count_node(struct list_lru *lru, int nid)
 }
 EXPORT_SYMBOL_GPL(list_lru_count_node);
 
-unsigned long
-list_lru_walk_node(struct list_lru *lru, int nid, list_lru_walk_cb isolate,
-		   void *cb_arg, unsigned long *nr_to_walk)
+unsigned long list_lru_count(struct list_lru *lru)
+{
+	long count = 0;
+	int nid;
+
+	for_each_node_mask(nid, lru->active_nodes)
+		count += list_lru_count_node(lru, nid, NULL);
+
+	if (list_lru_memcg_aware(lru)) {
+		struct mem_cgroup *memcg;
+
+		for_each_mem_cgroup(memcg)
+			count += list_lru_count_node(lru, 0, memcg);
+	}
+	return count;
+}
+EXPORT_SYMBOL_GPL(list_lru_count);
+
+unsigned long list_lru_walk_node(struct list_lru *lru,
+				 int nid, struct mem_cgroup *memcg,
+				 list_lru_walk_cb isolate, void *cb_arg,
+				 unsigned long *nr_to_walk)
 {
 
-	struct list_lru_node	*nlru = &lru->node[nid];
+	struct list_lru_node *nlru = &lru->node[nid];
 	struct list_head *item, *n;
 	unsigned long isolated = 0;
 
+	if (memcg) {
+		nlru = memcg_list_lru(lru, memcg);
+		if (!nlru)
+			return 0;
+		nid = -1;
+	}
+
 	spin_lock(&nlru->lock);
 restart:
 	list_for_each_safe(item, n, &nlru->list) {
@@ -90,7 +135,7 @@ restart:
 		case LRU_REMOVED_RETRY:
 			assert_spin_locked(&nlru->lock);
 		case LRU_REMOVED:
-			if (--nlru->nr_items == 0)
+			if (--nlru->nr_items == 0 && nid >= 0)
 				node_clear(nid, lru->active_nodes);
 			WARN_ON_ONCE(nlru->nr_items < 0);
 			isolated++;
@@ -124,7 +169,47 @@ restart:
 }
 EXPORT_SYMBOL_GPL(list_lru_walk_node);
 
-int list_lru_init_key(struct list_lru *lru, struct lock_class_key *key)
+unsigned long list_lru_walk(struct list_lru *lru, list_lru_walk_cb isolate,
+			    void *cb_arg, unsigned long nr_to_walk)
+{
+	long isolated = 0;
+	int nid;
+
+	for_each_node_mask(nid, lru->active_nodes) {
+		isolated += list_lru_walk_node(lru, nid, NULL,
+					isolate, cb_arg, &nr_to_walk);
+		if (nr_to_walk <= 0)
+			break;
+	}
+	if (list_lru_memcg_aware(lru)) {
+		struct mem_cgroup *memcg;
+
+		for_each_mem_cgroup(memcg) {
+			isolated += list_lru_walk_node(lru, 0, memcg,
+						isolate, cb_arg, &nr_to_walk);
+			if (nr_to_walk <= 0) {
+				mem_cgroup_iter_break(NULL, memcg);
+				break;
+			}
+		}
+	}
+	return isolated;
+}
+EXPORT_SYMBOL_GPL(list_lru_walk);
+
+void list_lru_node_init(struct list_lru_node *n, struct list_lru *lru)
+{
+	spin_lock_init(&n->lock);
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	if (lru->key)
+		lockdep_set_class(&n->lock, lru->key);
+#endif
+	INIT_LIST_HEAD(&n->list);
+	n->nr_items = 0;
+}
+
+int list_lru_init_key(struct list_lru *lru, bool memcg_aware,
+		      struct lock_class_key *key)
 {
 	int i;
 	size_t size = sizeof(*lru->node) * nr_node_ids;
@@ -133,13 +218,19 @@ int list_lru_init_key(struct list_lru *lru, struct lock_class_key *key)
 	if (!lru->node)
 		return -ENOMEM;
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	lru->key = key;
+#endif
 	nodes_clear(lru->active_nodes);
-	for (i = 0; i < nr_node_ids; i++) {
-		spin_lock_init(&lru->node[i].lock);
-		if (key)
-			lockdep_set_class(&lru->node[i].lock, key);
-		INIT_LIST_HEAD(&lru->node[i].list);
-		lru->node[i].nr_items = 0;
+	for (i = 0; i < nr_node_ids; i++)
+		list_lru_node_init(&lru->node[i], lru);
+
+#ifdef CONFIG_MEMCG_KMEM
+	lru->memcg_params = NULL;
+#endif
+	if (memcg_aware && memcg_register_list_lru(lru)) {
+		list_lru_destroy(lru);
+		return -ENOMEM;
 	}
 	return 0;
 }
@@ -147,6 +238,7 @@ EXPORT_SYMBOL_GPL(list_lru_init_key);
 
 void list_lru_destroy(struct list_lru *lru)
 {
+	memcg_unregister_list_lru(lru);
 	kfree(lru->node);
 }
 EXPORT_SYMBOL_GPL(list_lru_destroy);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6a96a3994692..1030bba4b94f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1268,21 +1268,6 @@ void mem_cgroup_iter_break(struct mem_cgroup *root,
 		css_put(&prev->css);
 }
 
-/*
- * Iteration constructs for visiting all cgroups (under a tree).  If
- * loops are exited prematurely (break), mem_cgroup_iter_break() must
- * be used for reference counting.
- */
-#define for_each_mem_cgroup_tree(iter, root)		\
-	for (iter = mem_cgroup_iter(root, NULL, NULL);	\
-	     iter != NULL;				\
-	     iter = mem_cgroup_iter(root, iter, NULL))
-
-#define for_each_mem_cgroup(iter)			\
-	for (iter = mem_cgroup_iter(NULL, NULL, NULL);	\
-	     iter != NULL;				\
-	     iter = mem_cgroup_iter(NULL, iter, NULL))
-
 void __mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item idx)
 {
 	struct mem_cgroup *memcg;
@@ -3140,13 +3125,11 @@ static void memcg_unregister_cache(struct kmem_cache *cachep)
  */
 static inline void memcg_stop_kmem_account(void)
 {
-	VM_BUG_ON(!current->mm);
 	current->memcg_kmem_skip_account++;
 }
 
 static inline void memcg_resume_kmem_account(void)
 {
-	VM_BUG_ON(!current->mm);
 	current->memcg_kmem_skip_account--;
 }
 
@@ -3436,6 +3419,193 @@ void __memcg_kmem_uncharge_pages(struct page *page, int order)
 	VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page);
 	memcg_uncharge_kmem(memcg, PAGE_SIZE << order);
 }
+
+/*
+ * List of all memcg-aware list_lrus, linked through
+ * memcg_list_lru_params->list, protected by memcg_slab_mutex.
+ */
+static LIST_HEAD(memcg_list_lrus);
+
+static void memcg_free_list_lru_params(struct memcg_list_lru_params *params,
+				       int size)
+{
+	int i;
+
+	for (i = 0; i < size; i++)
+		kfree(params->node[i]);
+	kfree(params);
+}
+
+static struct memcg_list_lru_params *
+memcg_alloc_list_lru_params(struct list_lru *lru, int size)
+{
+	struct memcg_list_lru_params *params, *old_params;
+	int i, old_size = 0;
+
+	memcg_stop_kmem_account();
+	params = kzalloc(sizeof(*params) + size * sizeof(*params->node),
+			 GFP_KERNEL);
+	if (!params)
+		goto out;
+
+	old_params = lru->memcg_params;
+	if (old_params)
+		old_size = memcg_limited_groups_array_size;
+
+	for (i = old_size; i < size; i++) {
+		struct list_lru_node *n;
+
+		n = kmalloc(sizeof(*n), GFP_KERNEL);
+		if (!n) {
+			memcg_free_list_lru_params(params, size);
+			params = NULL;
+			goto out;
+		}
+		list_lru_node_init(n, lru);
+		params->node[i] = n;
+	}
+
+	if (old_params)
+		memcpy(params->node, old_params->node,
+		       old_size * sizeof(*params->node));
+out:
+	memcg_resume_kmem_account();
+	return params;
+}
+
+int memcg_register_list_lru(struct list_lru *lru)
+{
+	struct memcg_list_lru_params *params;
+
+	if (mem_cgroup_disabled())
+		return 0;
+
+	BUG_ON(lru->memcg_params);
+
+	mutex_lock(&memcg_slab_mutex);
+	params = memcg_alloc_list_lru_params(lru,
+			memcg_limited_groups_array_size);
+	if (!params) {
+		mutex_unlock(&memcg_slab_mutex);
+		return -ENOMEM;
+	}
+	params->owner = lru;
+	list_add(&params->list, &memcg_list_lrus);
+	lru->memcg_params = params;
+	mutex_unlock(&memcg_slab_mutex);
+
+	return 0;
+}
+
+void memcg_unregister_list_lru(struct list_lru *lru)
+{
+	struct memcg_list_lru_params *params = lru->memcg_params;
+
+	if (!params)
+		return;
+
+	BUG_ON(params->owner != lru);
+
+	mutex_lock(&memcg_slab_mutex);
+	list_del(&params->list);
+	memcg_free_list_lru_params(params, memcg_limited_groups_array_size);
+	mutex_unlock(&memcg_slab_mutex);
+
+	lru->memcg_params = NULL;
+}
+
+static int memcg_update_all_list_lrus(int num_groups)
+{
+	struct memcg_list_lru_params *params, *tmp, *new_params;
+	struct list_lru *lru;
+	int new_size;
+
+	lockdep_assert_held(&memcg_slab_mutex);
+
+	if (num_groups <= memcg_limited_groups_array_size)
+		return 0;
+
+	new_size = memcg_caches_array_size(num_groups);
+
+	list_for_each_entry_safe(params, tmp, &memcg_list_lrus, list) {
+		lru = params->owner;
+
+		new_params = memcg_alloc_list_lru_params(lru, new_size);
+		if (!new_params)
+			return -ENOMEM;
+
+		new_params->owner = lru;
+		list_replace(&params->list, &new_params->list);
+
+		rcu_assign_pointer(lru->memcg_params, new_params);
+		kfree_rcu(params, rcu_head);
+	}
+	return 0;
+}
+
+/**
+ * memcg_list_lru: get list_lru node corresponding to memory cgroup
+ * @lru: the list_lru
+ * @memcg: the memory cgroup
+ *
+ * Returns NULL if no node corresponds to @memcg in @lru.
+ */
+struct list_lru_node *memcg_list_lru(struct list_lru *lru,
+				     struct mem_cgroup *memcg)
+{
+	struct memcg_list_lru_params *params;
+	struct list_lru_node *n;
+
+	if (!lru->memcg_params)
+		return NULL;
+	if (!memcg_kmem_is_active(memcg))
+		return NULL;
+
+	rcu_read_lock();
+	params = rcu_dereference(lru->memcg_params);
+	n = params->node[memcg_cache_id(memcg)];
+	rcu_read_unlock();
+
+	return n;
+}
+
+/**
+ * memcg_list_lru_from_obj: get list_lru node corresponding to memory cgroup
+ * which object is accounted to
+ * @lru: the list_lru
+ * @obj: the object ptr
+ *
+ * Return NULL if no node corresponds to the memory cgroup which @obj is
+ * accounted to or if @obj is not accounted to any memory cgroup.
+ *
+ * The object must be allocated from kmem.
+ */
+struct list_lru_node *memcg_list_lru_from_obj(struct list_lru *lru, void *obj)
+{
+	struct mem_cgroup *memcg = NULL;
+	struct kmem_cache *cachep;
+	struct page_cgroup *pc;
+	struct page *page;
+
+	if (!lru->memcg_params)
+		return NULL;
+
+	page = virt_to_head_page(obj);
+	if (PageSlab(page)) {
+		cachep = page->slab_cache;
+		if (!is_root_cache(cachep))
+			memcg = cachep->memcg_params->memcg;
+	} else {
+		/* page allocated with alloc_kmem_pages */
+		pc = lookup_page_cgroup(page);
+		if (PageCgroupUsed(pc))
+			memcg = pc->mem_cgroup;
+	}
+	if (!memcg)
+		return NULL;
+
+	return memcg_list_lru(lru, memcg);
+}
 #else
 static inline void memcg_unregister_all_caches(struct mem_cgroup *memcg)
 {
@@ -4215,7 +4385,9 @@ static int __memcg_activate_kmem(struct mem_cgroup *memcg,
 	 * memcg_params.
 	 */
 	mutex_lock(&memcg_slab_mutex);
-	err = memcg_update_all_caches(memcg_id + 1);
+	err = memcg_update_all_list_lrus(memcg_id + 1);
+	if (!err)
+		err = memcg_update_all_caches(memcg_id + 1);
 	mutex_unlock(&memcg_slab_mutex);
 	if (err)
 		goto out_rmid;
diff --git a/mm/workingset.c b/mm/workingset.c
index d4fa7fb10a52..f8aae7497723 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -399,7 +399,8 @@ static int __init workingset_init(void)
 {
 	int ret;
 
-	ret = list_lru_init_key(&workingset_shadow_nodes, &shadow_nodes_key);
+	ret = list_lru_init_key(&workingset_shadow_nodes, false,
+				&shadow_nodes_key);
 	if (ret)
 		goto err;
 	ret = register_shrinker(&workingset_shadow_shrinker);
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH -mm 5/6] fs: make shrinker memcg aware
  2014-07-28  9:31 [PATCH -mm 0/6] Per-memcg slab shrinkers Vladimir Davydov
                   ` (3 preceding siblings ...)
  2014-07-28  9:31 ` [PATCH -mm 4/6] list_lru: add per-memcg lists Vladimir Davydov
@ 2014-07-28  9:31 ` Vladimir Davydov
  2014-07-28  9:31 ` [PATCH -mm 6/6] memcg: iterator: do not skip offline css Vladimir Davydov
  5 siblings, 0 replies; 7+ messages in thread
From: Vladimir Davydov @ 2014-07-28  9:31 UTC (permalink / raw)
  To: akpm
  Cc: hannes, mhocko, glommer, david, viro, gthelen, mgorman, linux-mm,
	linux-kernel

Now, to make any list_lru-based shrinker memcg aware we should only
initialize its list_lru as memcg-enabled. Let's do it for the general FS
shrinker (super_block::s_shrink) and mark it as memcg aware.

There are other FS-specific shrinkers that use list_lru for storing
objects, such as XFS and GFS2 dquot cache shrinkers, but since they
reclaim objects that are shared among different cgroups, there is no
point making them memcg aware.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 fs/super.c |   17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 477102d59c7e..2e5ed2b51b37 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -34,6 +34,7 @@
 #include <linux/cleancache.h>
 #include <linux/fsnotify.h>
 #include <linux/lockdep.h>
+#include <linux/memcontrol.h>
 #include "internal.h"
 
 
@@ -187,9 +188,9 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags)
 	INIT_HLIST_BL_HEAD(&s->s_anon);
 	INIT_LIST_HEAD(&s->s_inodes);
 
-	if (list_lru_init(&s->s_dentry_lru, false))
+	if (list_lru_init(&s->s_dentry_lru, true))
 		goto fail;
-	if (list_lru_init(&s->s_inode_lru, false))
+	if (list_lru_init(&s->s_inode_lru, true))
 		goto fail;
 
 	init_rwsem(&s->s_umount);
@@ -225,7 +226,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags)
 	s->s_shrink.scan_objects = super_cache_scan;
 	s->s_shrink.count_objects = super_cache_count;
 	s->s_shrink.batch = 1024;
-	s->s_shrink.flags = SHRINKER_NUMA_AWARE;
+	s->s_shrink.flags = SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE;
 	return s;
 
 fail:
@@ -280,6 +281,16 @@ void deactivate_locked_super(struct super_block *s)
 		unregister_shrinker(&s->s_shrink);
 		fs->kill_sb(s);
 
+		/*
+		 * list_lru_destroy() may sleep on memcg-aware lrus. Since
+		 * put_super() calls destroy_super() under a spin lock, we must
+		 * unregister lrus from memcg here to avoid sleeping in atomic
+		 * context. It's safe, because by the time we get here, lrus
+		 * must be empty.
+		 */
+		memcg_unregister_list_lru(&s->s_dentry_lru);
+		memcg_unregister_list_lru(&s->s_inode_lru);
+
 		put_filesystem(fs);
 		put_super(s);
 	} else {
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH -mm 6/6] memcg: iterator: do not skip offline css
  2014-07-28  9:31 [PATCH -mm 0/6] Per-memcg slab shrinkers Vladimir Davydov
                   ` (4 preceding siblings ...)
  2014-07-28  9:31 ` [PATCH -mm 5/6] fs: make shrinker memcg aware Vladimir Davydov
@ 2014-07-28  9:31 ` Vladimir Davydov
  5 siblings, 0 replies; 7+ messages in thread
From: Vladimir Davydov @ 2014-07-28  9:31 UTC (permalink / raw)
  To: akpm
  Cc: hannes, mhocko, glommer, david, viro, gthelen, mgorman, linux-mm,
	linux-kernel

After offline, memcg may still host kmem objects, which must be shrinked
on memory pressure.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 mm/memcontrol.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 1030bba4b94f..bf55088f0152 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1100,8 +1100,7 @@ skip_node:
 	 */
 	if (next_css) {
 		if ((next_css == &root->css) ||
-		    ((next_css->flags & CSS_ONLINE) &&
-		     css_tryget_online(next_css)))
+		    css_tryget(next_css))
 			return mem_cgroup_from_css(next_css);
 
 		prev_css = next_css;
@@ -1147,7 +1146,7 @@ mem_cgroup_iter_load(struct mem_cgroup_reclaim_iter *iter,
 		 * would be returned all the time.
 		 */
 		if (position && position != root &&
-		    !css_tryget_online(&position->css))
+		    !css_tryget(&position->css))
 			position = NULL;
 	}
 	return position;
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-07-28  9:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-28  9:31 [PATCH -mm 0/6] Per-memcg slab shrinkers Vladimir Davydov
2014-07-28  9:31 ` [PATCH -mm 1/6] list_lru, shrinkers: introduce list_lru_shrink_{count,walk} Vladimir Davydov
2014-07-28  9:31 ` [PATCH -mm 2/6] fs: consolidate {nr,free}_cached_objects args in shrink_control Vladimir Davydov
2014-07-28  9:31 ` [PATCH -mm 3/6] vmscan: shrink slab on memcg pressure Vladimir Davydov
2014-07-28  9:31 ` [PATCH -mm 4/6] list_lru: add per-memcg lists Vladimir Davydov
2014-07-28  9:31 ` [PATCH -mm 5/6] fs: make shrinker memcg aware Vladimir Davydov
2014-07-28  9:31 ` [PATCH -mm 6/6] memcg: iterator: do not skip offline css Vladimir Davydov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox