* [PATCH 1/2] gfp: add __GFP_NOACCOUNT
@ 2015-05-05 9:45 Vladimir Davydov
2015-05-05 9:45 ` [PATCH 2/2] kernfs: do not account ino_ida allocations to memcg Vladimir Davydov
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Vladimir Davydov @ 2015-05-05 9:45 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Tejun Heo, Christoph Lameter,
Pekka Enberg, David Rientjes, Joonsoo Kim, Greg Thelen, linux-mm,
cgroups, linux-kernel
Not all kmem allocations should be accounted to memcg. The following
patch gives an example when accounting of a certain type of allocations
to memcg can effectively result in a memory leak. This patch adds the
__GFP_NOACCOUNT flag which if passed to kmalloc and friends will force
the allocation to go through the root cgroup. It will be used by the
next patch.
Note, since in case of kmemleak enabled each kmalloc implies yet another
allocation from the kmemleak_object cache, we add __GFP_NOACCOUNT to
gfp_kmemleak_mask.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
include/linux/gfp.h | 2 ++
include/linux/memcontrol.h | 4 ++++
mm/kmemleak.c | 3 ++-
3 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 97a9373e61e8..37c422df2a0f 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -30,6 +30,7 @@ struct vm_area_struct;
#define ___GFP_HARDWALL 0x20000u
#define ___GFP_THISNODE 0x40000u
#define ___GFP_RECLAIMABLE 0x80000u
+#define ___GFP_NOACCOUNT 0x100000u
#define ___GFP_NOTRACK 0x200000u
#define ___GFP_NO_KSWAPD 0x400000u
#define ___GFP_OTHER_NODE 0x800000u
@@ -87,6 +88,7 @@ struct vm_area_struct;
#define __GFP_HARDWALL ((__force gfp_t)___GFP_HARDWALL) /* Enforce hardwall cpuset memory allocs */
#define __GFP_THISNODE ((__force gfp_t)___GFP_THISNODE)/* No fallback, no policies */
#define __GFP_RECLAIMABLE ((__force gfp_t)___GFP_RECLAIMABLE) /* Page is reclaimable */
+#define __GFP_NOACCOUNT ((__force gfp_t)___GFP_NOACCOUNT) /* Don't account to memcg */
#define __GFP_NOTRACK ((__force gfp_t)___GFP_NOTRACK) /* Don't track with kmemcheck */
#define __GFP_NO_KSWAPD ((__force gfp_t)___GFP_NO_KSWAPD)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 72dff5fb0d0c..6c8918114804 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -463,6 +463,8 @@ memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order)
if (!memcg_kmem_enabled())
return true;
+ if (gfp & __GFP_NOACCOUNT)
+ return true;
/*
* __GFP_NOFAIL allocations will move on even if charging is not
* possible. Therefore we don't even try, and have this allocation
@@ -522,6 +524,8 @@ memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp)
{
if (!memcg_kmem_enabled())
return cachep;
+ if (gfp & __GFP_NOACCOUNT)
+ return cachep;
if (gfp & __GFP_NOFAIL)
return cachep;
if (in_interrupt() || (!current->mm) || (current->flags & PF_KTHREAD))
diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 5405aff5a590..f0fe4f2c1fa7 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -115,7 +115,8 @@
#define BYTES_PER_POINTER sizeof(void *)
/* GFP bitmask for kmemleak internal allocations */
-#define gfp_kmemleak_mask(gfp) (((gfp) & (GFP_KERNEL | GFP_ATOMIC)) | \
+#define gfp_kmemleak_mask(gfp) (((gfp) & (GFP_KERNEL | GFP_ATOMIC | \
+ __GFP_NOACCOUNT)) | \
__GFP_NORETRY | __GFP_NOMEMALLOC | \
__GFP_NOWARN)
--
1.7.10.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 17+ messages in thread* [PATCH 2/2] kernfs: do not account ino_ida allocations to memcg 2015-05-05 9:45 [PATCH 1/2] gfp: add __GFP_NOACCOUNT Vladimir Davydov @ 2015-05-05 9:45 ` Vladimir Davydov 2015-05-05 13:45 ` Tejun Heo 2015-05-06 11:59 ` [PATCH 1/2] gfp: add __GFP_NOACCOUNT Michal Hocko 2015-05-06 14:58 ` Michal Hocko 2 siblings, 1 reply; 17+ messages in thread From: Vladimir Davydov @ 2015-05-05 9:45 UTC (permalink / raw) To: Andrew Morton Cc: Johannes Weiner, Michal Hocko, Tejun Heo, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Greg Thelen, linux-mm, cgroups, linux-kernel root->ino_ida is used for kernfs inode number allocations. Since IDA has a layered structure, different IDs can reside on the same layer, which is currently accounted to some memory cgroup. The problem is that each kmem cache of a memory cgroup has its own directory on sysfs (under /sys/fs/kernel/<cache-name>/cgroup). If the inode number of such a directory or any file in it gets allocated from a layer accounted to the cgroup which the cache is created for, the cgroup will get pinned for good, because one has to free all kmem allocations accounted to a cgroup in order to release it and destroy all its kmem caches. That said we must not account layers of ino_ida to any memory cgroup. Since per net init operations may create new sysfs entries directly (e.g. lo device) or indirectly (nf_conntrack creates a new kmem cache per each namespace, which, in turn, creates new sysfs entries), an easy way to reproduce this issue is by creating network namespace(s) from inside a kmem-active memory cgroup. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> --- fs/kernfs/dir.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index f131fc23ffc4..fffca9517321 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -518,7 +518,14 @@ static struct kernfs_node *__kernfs_new_node(struct kernfs_root *root, if (!kn) goto err_out1; - ret = ida_simple_get(&root->ino_ida, 1, 0, GFP_KERNEL); + /* + * If the ino of the sysfs entry created for a kmem cache gets + * allocated from an ida layer, which is accounted to the memcg that + * owns the cache, the memcg will get pinned forever. So do not account + * ino ida allocations. + */ + ret = ida_simple_get(&root->ino_ida, 1, 0, + GFP_KERNEL | __GFP_NOACCOUNT); if (ret < 0) goto err_out2; kn->ino = ret; -- 1.7.10.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] kernfs: do not account ino_ida allocations to memcg 2015-05-05 9:45 ` [PATCH 2/2] kernfs: do not account ino_ida allocations to memcg Vladimir Davydov @ 2015-05-05 13:45 ` Tejun Heo 2015-05-05 16:04 ` Vladimir Davydov 0 siblings, 1 reply; 17+ messages in thread From: Tejun Heo @ 2015-05-05 13:45 UTC (permalink / raw) To: Vladimir Davydov Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Greg Thelen, linux-mm, cgroups, linux-kernel On Tue, May 05, 2015 at 12:45:43PM +0300, Vladimir Davydov wrote: > root->ino_ida is used for kernfs inode number allocations. Since IDA has > a layered structure, different IDs can reside on the same layer, which > is currently accounted to some memory cgroup. The problem is that each > kmem cache of a memory cgroup has its own directory on sysfs (under > /sys/fs/kernel/<cache-name>/cgroup). If the inode number of such a > directory or any file in it gets allocated from a layer accounted to the > cgroup which the cache is created for, the cgroup will get pinned for > good, because one has to free all kmem allocations accounted to a cgroup > in order to release it and destroy all its kmem caches. That said we > must not account layers of ino_ida to any memory cgroup. > > Since per net init operations may create new sysfs entries directly > (e.g. lo device) or indirectly (nf_conntrack creates a new kmem cache > per each namespace, which, in turn, creates new sysfs entries), an easy > way to reproduce this issue is by creating network namespace(s) from > inside a kmem-active memory cgroup. > > Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Man, that's nasty. For the kernfs part, Acked-by: Tejun Heo <tj@kernel.org> Can you please repost this patch w/ Greg KH cc'd? Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] kernfs: do not account ino_ida allocations to memcg 2015-05-05 13:45 ` Tejun Heo @ 2015-05-05 16:04 ` Vladimir Davydov 0 siblings, 0 replies; 17+ messages in thread From: Vladimir Davydov @ 2015-05-05 16:04 UTC (permalink / raw) To: Tejun Heo Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Greg Thelen, linux-mm, cgroups, linux-kernel On Tue, May 05, 2015 at 09:45:21AM -0400, Tejun Heo wrote: > On Tue, May 05, 2015 at 12:45:43PM +0300, Vladimir Davydov wrote: > > root->ino_ida is used for kernfs inode number allocations. Since IDA has > > a layered structure, different IDs can reside on the same layer, which > > is currently accounted to some memory cgroup. The problem is that each > > kmem cache of a memory cgroup has its own directory on sysfs (under > > /sys/fs/kernel/<cache-name>/cgroup). If the inode number of such a > > directory or any file in it gets allocated from a layer accounted to the > > cgroup which the cache is created for, the cgroup will get pinned for > > good, because one has to free all kmem allocations accounted to a cgroup > > in order to release it and destroy all its kmem caches. That said we > > must not account layers of ino_ida to any memory cgroup. > > > > Since per net init operations may create new sysfs entries directly > > (e.g. lo device) or indirectly (nf_conntrack creates a new kmem cache > > per each namespace, which, in turn, creates new sysfs entries), an easy > > way to reproduce this issue is by creating network namespace(s) from > > inside a kmem-active memory cgroup. > > > > Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> > > Man, that's nasty. For the kernfs part, > > Acked-by: Tejun Heo <tj@kernel.org> Thanks. > > Can you please repost this patch w/ Greg KH cc'd? OK. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] gfp: add __GFP_NOACCOUNT 2015-05-05 9:45 [PATCH 1/2] gfp: add __GFP_NOACCOUNT Vladimir Davydov 2015-05-05 9:45 ` [PATCH 2/2] kernfs: do not account ino_ida allocations to memcg Vladimir Davydov @ 2015-05-06 11:59 ` Michal Hocko 2015-05-06 12:24 ` Vladimir Davydov 2015-05-06 13:16 ` Johannes Weiner 2015-05-06 14:58 ` Michal Hocko 2 siblings, 2 replies; 17+ messages in thread From: Michal Hocko @ 2015-05-06 11:59 UTC (permalink / raw) To: Vladimir Davydov Cc: Andrew Morton, Johannes Weiner, Tejun Heo, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Greg Thelen, linux-mm, cgroups, linux-kernel On Tue 05-05-15 12:45:42, Vladimir Davydov wrote: > Not all kmem allocations should be accounted to memcg. The following > patch gives an example when accounting of a certain type of allocations > to memcg can effectively result in a memory leak. > This patch adds the __GFP_NOACCOUNT flag which if passed to kmalloc > and friends will force the allocation to go through the root > cgroup. It will be used by the next patch. The name of the flag is way too generic. It is not clear that the accounting is KMEMCG related. __GFP_NO_KMEMCG sounds better? I was going to suggest doing per-cache rather than gfp flag and that would actually work just fine for the kmemleak as it uses its own cache already. But the ida_simple_get would be trickier because it doesn't use any special cache and more over only one user seem to have a problem so this doesn't sound like a good fit. So I do not object to opt-out for kmemcg accounting but I really think the name should be changed. > Note, since in case of kmemleak enabled each kmalloc implies yet another > allocation from the kmemleak_object cache, we add __GFP_NOACCOUNT to > gfp_kmemleak_mask. > Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> > --- > include/linux/gfp.h | 2 ++ > include/linux/memcontrol.h | 4 ++++ > mm/kmemleak.c | 3 ++- > 3 files changed, 8 insertions(+), 1 deletion(-) > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index 97a9373e61e8..37c422df2a0f 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -30,6 +30,7 @@ struct vm_area_struct; > #define ___GFP_HARDWALL 0x20000u > #define ___GFP_THISNODE 0x40000u > #define ___GFP_RECLAIMABLE 0x80000u > +#define ___GFP_NOACCOUNT 0x100000u > #define ___GFP_NOTRACK 0x200000u > #define ___GFP_NO_KSWAPD 0x400000u > #define ___GFP_OTHER_NODE 0x800000u > @@ -87,6 +88,7 @@ struct vm_area_struct; > #define __GFP_HARDWALL ((__force gfp_t)___GFP_HARDWALL) /* Enforce hardwall cpuset memory allocs */ > #define __GFP_THISNODE ((__force gfp_t)___GFP_THISNODE)/* No fallback, no policies */ > #define __GFP_RECLAIMABLE ((__force gfp_t)___GFP_RECLAIMABLE) /* Page is reclaimable */ > +#define __GFP_NOACCOUNT ((__force gfp_t)___GFP_NOACCOUNT) /* Don't account to memcg */ > #define __GFP_NOTRACK ((__force gfp_t)___GFP_NOTRACK) /* Don't track with kmemcheck */ > > #define __GFP_NO_KSWAPD ((__force gfp_t)___GFP_NO_KSWAPD) > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 72dff5fb0d0c..6c8918114804 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -463,6 +463,8 @@ memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order) > if (!memcg_kmem_enabled()) > return true; > > + if (gfp & __GFP_NOACCOUNT) > + return true; > /* > * __GFP_NOFAIL allocations will move on even if charging is not > * possible. Therefore we don't even try, and have this allocation > @@ -522,6 +524,8 @@ memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp) > { > if (!memcg_kmem_enabled()) > return cachep; > + if (gfp & __GFP_NOACCOUNT) > + return cachep; > if (gfp & __GFP_NOFAIL) > return cachep; > if (in_interrupt() || (!current->mm) || (current->flags & PF_KTHREAD)) > diff --git a/mm/kmemleak.c b/mm/kmemleak.c > index 5405aff5a590..f0fe4f2c1fa7 100644 > --- a/mm/kmemleak.c > +++ b/mm/kmemleak.c > @@ -115,7 +115,8 @@ > #define BYTES_PER_POINTER sizeof(void *) > > /* GFP bitmask for kmemleak internal allocations */ > -#define gfp_kmemleak_mask(gfp) (((gfp) & (GFP_KERNEL | GFP_ATOMIC)) | \ > +#define gfp_kmemleak_mask(gfp) (((gfp) & (GFP_KERNEL | GFP_ATOMIC | \ > + __GFP_NOACCOUNT)) | \ > __GFP_NORETRY | __GFP_NOMEMALLOC | \ > __GFP_NOWARN) > > -- > 1.7.10.4 > -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] gfp: add __GFP_NOACCOUNT 2015-05-06 11:59 ` [PATCH 1/2] gfp: add __GFP_NOACCOUNT Michal Hocko @ 2015-05-06 12:24 ` Vladimir Davydov 2015-05-06 12:35 ` Michal Hocko 2015-05-06 13:16 ` Johannes Weiner 1 sibling, 1 reply; 17+ messages in thread From: Vladimir Davydov @ 2015-05-06 12:24 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, Johannes Weiner, Tejun Heo, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Greg Thelen, linux-mm, cgroups, linux-kernel On Wed, May 06, 2015 at 01:59:41PM +0200, Michal Hocko wrote: > On Tue 05-05-15 12:45:42, Vladimir Davydov wrote: > > Not all kmem allocations should be accounted to memcg. The following > > patch gives an example when accounting of a certain type of allocations > > to memcg can effectively result in a memory leak. > > > This patch adds the __GFP_NOACCOUNT flag which if passed to kmalloc > > and friends will force the allocation to go through the root > > cgroup. It will be used by the next patch. > > The name of the flag is way too generic. It is not clear that the > accounting is KMEMCG related. __GFP_NO_KMEMCG sounds better? > > I was going to suggest doing per-cache rather than gfp flag and that > would actually work just fine for the kmemleak as it uses its own cache > already. But the ida_simple_get would be trickier because it doesn't use > any special cache and more over only one user seem to have a problem so > this doesn't sound like a good fit. I don't think making this flag per-cache is an option either, but for another reason - it would not be possible to merge such a kmem cache with caches without this flag set. As a result, total memory pressure would increase, even for setups without kmem-active memory cgroups, which does not sound acceptable to me. > > So I do not object to opt-out for kmemcg accounting but I really think > the name should be changed. I named it __GFP_NOACCOUNT to match with __GFP_NOTRACK, which is a very specific flag too (kmemcheck), nevertheless it has a rather generic name. Anyways, what else apart from memcg can account kmem so that we have to mention KMEMCG in the flag name explicitly? Thanks, Vladimir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] gfp: add __GFP_NOACCOUNT 2015-05-06 12:24 ` Vladimir Davydov @ 2015-05-06 12:35 ` Michal Hocko 2015-05-06 13:25 ` Vladimir Davydov 0 siblings, 1 reply; 17+ messages in thread From: Michal Hocko @ 2015-05-06 12:35 UTC (permalink / raw) To: Vladimir Davydov Cc: Andrew Morton, Johannes Weiner, Tejun Heo, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Greg Thelen, linux-mm, cgroups, linux-kernel On Wed 06-05-15 15:24:31, Vladimir Davydov wrote: > On Wed, May 06, 2015 at 01:59:41PM +0200, Michal Hocko wrote: > > On Tue 05-05-15 12:45:42, Vladimir Davydov wrote: > > > Not all kmem allocations should be accounted to memcg. The following > > > patch gives an example when accounting of a certain type of allocations > > > to memcg can effectively result in a memory leak. > > > > > This patch adds the __GFP_NOACCOUNT flag which if passed to kmalloc > > > and friends will force the allocation to go through the root > > > cgroup. It will be used by the next patch. > > > > The name of the flag is way too generic. It is not clear that the > > accounting is KMEMCG related. __GFP_NO_KMEMCG sounds better? > > > > I was going to suggest doing per-cache rather than gfp flag and that > > would actually work just fine for the kmemleak as it uses its own cache > > already. But the ida_simple_get would be trickier because it doesn't use > > any special cache and more over only one user seem to have a problem so > > this doesn't sound like a good fit. > > I don't think making this flag per-cache is an option either, but for > another reason - it would not be possible to merge such a kmem cache > with caches without this flag set. As a result, total memory pressure > would increase, even for setups without kmem-active memory cgroups, > which does not sound acceptable to me. I am not sure I see the performance implications here because kmem accounted memcgs would have their copy of the cache anyway, no? Anyway, I guess it would be good to document these reasons in the changelog. > > So I do not object to opt-out for kmemcg accounting but I really think > > the name should be changed. > > I named it __GFP_NOACCOUNT to match with __GFP_NOTRACK, which is a very > specific flag too (kmemcheck), nevertheless it has a rather generic > name. __GFP_NOTRACK is a bad name IMHO as well. One has to go and check the comment to see this is kmemleak related. > Anyways, what else apart from memcg can account kmem so that we have to > mention KMEMCG in the flag name explicitly? NOACCOUNT doesn't imply kmem at all so it is not clear who is in charge of the accounting. I do not insist on __GFP_NO_KMEMCG of course but it sounds quite specific about its meaning and scope. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] gfp: add __GFP_NOACCOUNT 2015-05-06 12:35 ` Michal Hocko @ 2015-05-06 13:25 ` Vladimir Davydov 2015-05-06 13:55 ` Michal Hocko 0 siblings, 1 reply; 17+ messages in thread From: Vladimir Davydov @ 2015-05-06 13:25 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, Johannes Weiner, Tejun Heo, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Greg Thelen, linux-mm, cgroups, linux-kernel On Wed, May 06, 2015 at 02:35:41PM +0200, Michal Hocko wrote: > On Wed 06-05-15 15:24:31, Vladimir Davydov wrote: > > On Wed, May 06, 2015 at 01:59:41PM +0200, Michal Hocko wrote: > > > On Tue 05-05-15 12:45:42, Vladimir Davydov wrote: > > > > Not all kmem allocations should be accounted to memcg. The following > > > > patch gives an example when accounting of a certain type of allocations > > > > to memcg can effectively result in a memory leak. > > > > > > > This patch adds the __GFP_NOACCOUNT flag which if passed to kmalloc > > > > and friends will force the allocation to go through the root > > > > cgroup. It will be used by the next patch. > > > > > > The name of the flag is way too generic. It is not clear that the > > > accounting is KMEMCG related. __GFP_NO_KMEMCG sounds better? > > > > > > I was going to suggest doing per-cache rather than gfp flag and that > > > would actually work just fine for the kmemleak as it uses its own cache > > > already. But the ida_simple_get would be trickier because it doesn't use > > > any special cache and more over only one user seem to have a problem so > > > this doesn't sound like a good fit. > > > > I don't think making this flag per-cache is an option either, but for > > another reason - it would not be possible to merge such a kmem cache > > with caches without this flag set. As a result, total memory pressure > > would increase, even for setups without kmem-active memory cgroups, > > which does not sound acceptable to me. > > I am not sure I see the performance implications here because kmem > accounted memcgs would have their copy of the cache anyway, no? It's orthogonal. Suppose there are two *global* kmem caches, A and B, which would normally be merged, i.e. A=B. Then we find out that we don't want to account allocations from A to memcg while still accounting allocations from B. Obviously, cache A can no longer be merged with cache B so we have two different caches instead of the only merged one, even if there are *no* memory cgroups at all. That might result in increased memory consumption due to fragmentation. Although it is not really critical, especially counting that SLAB merging was introduced not long before, the idea that enabling an extra feature, such as memcg, without actually using it, may affect the global behavior does not sound good to me. > Anyway, I guess it would be good to document these reasons in the > changelog. > > > > So I do not object to opt-out for kmemcg accounting but I really think > > > the name should be changed. > > > > I named it __GFP_NOACCOUNT to match with __GFP_NOTRACK, which is a very > > specific flag too (kmemcheck), nevertheless it has a rather generic > > name. > > __GFP_NOTRACK is a bad name IMHO as well. One has to go and check the > comment to see this is kmemleak related. I think it's a good practice to go to its definition and check comments when encountering an unknown symbol anyway. With ctags/cscope it's trivial :-) > > > Anyways, what else apart from memcg can account kmem so that we have to > > mention KMEMCG in the flag name explicitly? > > NOACCOUNT doesn't imply kmem at all so it is not clear who is in charge > of the accounting. IMO it is a benefit. If one day for some reason we want to bypass memcg accounting for some other type of allocation somewhere, we can simply reuse it. > I do not insist on __GFP_NO_KMEMCG of course but it sounds quite > specific about its meaning and scope. There is another argument against __GFP_NO_KMEMCG: it is not yet clear if kmem is going to be accounted separately in the unified cgroup hierarchy. As I mentioned before, it is quite difficult to draw the line between user and kernel memory at times - think of buffer_head or radix_tree_node, which are pinned by user pages and therefore cannot be dropped without reclaiming user memory. That said, chances are high that there will be the only knob, memory.max, to limit all types of memory allocations together, in which case __GFP_NO_KMEMCG will look awkward IMO. We could use __GFP_NO_MEMCG (without 'K'), of course, but again, what else except for memcg does full memory accounting so that we have to mention MEMCG explicitly? Thanks, Vladimir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] gfp: add __GFP_NOACCOUNT 2015-05-06 13:25 ` Vladimir Davydov @ 2015-05-06 13:55 ` Michal Hocko 2015-05-06 14:29 ` Vladimir Davydov 0 siblings, 1 reply; 17+ messages in thread From: Michal Hocko @ 2015-05-06 13:55 UTC (permalink / raw) To: Vladimir Davydov Cc: Andrew Morton, Johannes Weiner, Tejun Heo, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Greg Thelen, linux-mm, cgroups, linux-kernel On Wed 06-05-15 16:25:10, Vladimir Davydov wrote: > On Wed, May 06, 2015 at 02:35:41PM +0200, Michal Hocko wrote: > > On Wed 06-05-15 15:24:31, Vladimir Davydov wrote: [...] > > > I don't think making this flag per-cache is an option either, but for > > > another reason - it would not be possible to merge such a kmem cache > > > with caches without this flag set. As a result, total memory pressure > > > would increase, even for setups without kmem-active memory cgroups, > > > which does not sound acceptable to me. > > > > I am not sure I see the performance implications here because kmem > > accounted memcgs would have their copy of the cache anyway, no? > > It's orthogonal. > > Suppose there are two *global* kmem caches, A and B, which would > normally be merged, i.e. A=B. Then we find out that we don't want to > account allocations from A to memcg while still accounting allocations > from B. Obviously, cache A can no longer be merged with cache B so we > have two different caches instead of the only merged one, even if there > are *no* memory cgroups at all. That might result in increased memory > consumption due to fragmentation. Got your point. Thanks for the clarification! > Although it is not really critical, especially counting that SLAB > merging was introduced not long before, the idea that enabling an extra > feature, such as memcg, without actually using it, may affect the global > behavior does not sound good to me. Agreed. > > Anyway, I guess it would be good to document these reasons in the > > changelog. > > > > > > So I do not object to opt-out for kmemcg accounting but I really think > > > > the name should be changed. > > > > > > I named it __GFP_NOACCOUNT to match with __GFP_NOTRACK, which is a very > > > specific flag too (kmemcheck), nevertheless it has a rather generic > > > name. > > > > __GFP_NOTRACK is a bad name IMHO as well. One has to go and check the > > comment to see this is kmemleak related. > > I think it's a good practice to go to its definition and check comments > when encountering an unknown symbol anyway. With ctags/cscope it's > trivial :-) > > > > > > Anyways, what else apart from memcg can account kmem so that we have to > > > mention KMEMCG in the flag name explicitly? > > > > NOACCOUNT doesn't imply kmem at all so it is not clear who is in charge > > of the accounting. > > IMO it is a benefit. If one day for some reason we want to bypass memcg > accounting for some other type of allocation somewhere, we can simply > reuse it. But what if somebody, say a highlevel memory allocator in the kernel, want's to (ab)use this flag for its internal purpose as well? > > I do not insist on __GFP_NO_KMEMCG of course but it sounds quite > > specific about its meaning and scope. > > There is another argument against __GFP_NO_KMEMCG: it is not yet clear > if kmem is going to be accounted separately in the unified cgroup > hierarchy. As I've said, I do not insist on *KMEMCG. __GFP_NO_MEMCG would be generic enough to rule out MEMCG altogether as well. Be it kmem or user memory. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] gfp: add __GFP_NOACCOUNT 2015-05-06 13:55 ` Michal Hocko @ 2015-05-06 14:29 ` Vladimir Davydov 2015-05-06 14:46 ` Michal Hocko 0 siblings, 1 reply; 17+ messages in thread From: Vladimir Davydov @ 2015-05-06 14:29 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, Johannes Weiner, Tejun Heo, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Greg Thelen, linux-mm, cgroups, linux-kernel On Wed, May 06, 2015 at 03:55:20PM +0200, Michal Hocko wrote: > On Wed 06-05-15 16:25:10, Vladimir Davydov wrote: > > On Wed, May 06, 2015 at 02:35:41PM +0200, Michal Hocko wrote: [...] > > > NOACCOUNT doesn't imply kmem at all so it is not clear who is in charge > > > of the accounting. > > > > IMO it is a benefit. If one day for some reason we want to bypass memcg > > accounting for some other type of allocation somewhere, we can simply > > reuse it. > > But what if somebody, say a highlevel memory allocator in the kernel, > want's to (ab)use this flag for its internal purpose as well? We won't let him :-) If we take your argument about future (ab)users seriously, we should also consider what will happen if one wants to use e.g. __GFP_HARDWALL, which BTW has a generic name too although it's cpuset-specific. My point is that MEMCG is the only subsystem of the kernel that tries to do full memory accounting, and there is no point in introducing another one, because we already have it. So we have full right to reserve __GFP_NOACCOUNT for our purposes, just like cpuset reserves __GFP_HARDWALL and kmemcheck __GFP_NOTRACK. Any newcomer must take this into account. Thanks, Vladimir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] gfp: add __GFP_NOACCOUNT 2015-05-06 14:29 ` Vladimir Davydov @ 2015-05-06 14:46 ` Michal Hocko 0 siblings, 0 replies; 17+ messages in thread From: Michal Hocko @ 2015-05-06 14:46 UTC (permalink / raw) To: Vladimir Davydov Cc: Andrew Morton, Johannes Weiner, Tejun Heo, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Greg Thelen, linux-mm, cgroups, linux-kernel On Wed 06-05-15 17:29:51, Vladimir Davydov wrote: [...] > My point is that MEMCG is the only subsystem of the kernel that tries to > do full memory accounting, and there is no point in introducing another > one, because we already have it. Then I really do not get why the gfp flag cannot be specific about that. Anyway, it doesn't really make much sense to bikeshed about the flag here. So if both you and Johannes agree on the name I will not stand in the way. I will go and check into include/linux/gfp.h anytime I will try to remember the flag name... -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] gfp: add __GFP_NOACCOUNT 2015-05-06 11:59 ` [PATCH 1/2] gfp: add __GFP_NOACCOUNT Michal Hocko 2015-05-06 12:24 ` Vladimir Davydov @ 2015-05-06 13:16 ` Johannes Weiner 2015-05-06 13:46 ` Michal Hocko 1 sibling, 1 reply; 17+ messages in thread From: Johannes Weiner @ 2015-05-06 13:16 UTC (permalink / raw) To: Michal Hocko Cc: Vladimir Davydov, Andrew Morton, Tejun Heo, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Greg Thelen, linux-mm, cgroups, linux-kernel On Wed, May 06, 2015 at 01:59:41PM +0200, Michal Hocko wrote: > On Tue 05-05-15 12:45:42, Vladimir Davydov wrote: > > Not all kmem allocations should be accounted to memcg. The following > > patch gives an example when accounting of a certain type of allocations > > to memcg can effectively result in a memory leak. > > > This patch adds the __GFP_NOACCOUNT flag which if passed to kmalloc > > and friends will force the allocation to go through the root > > cgroup. It will be used by the next patch. > > The name of the flag is way too generic. It is not clear that the > accounting is KMEMCG related. The memory controller is the (primary) component that accounts physical memory allocations in the kernel, so I don't see how this would be ambiguous in any way. > __GFP_NO_KMEMCG sounds better? I think that's much worse. I would prefer communicating the desired behavior directly instead of having to derive it from a subsystem name. (And KMEMCG should not even be a term, it's all just the memory controller, i.e. memcg.) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] gfp: add __GFP_NOACCOUNT 2015-05-06 13:16 ` Johannes Weiner @ 2015-05-06 13:46 ` Michal Hocko 2015-05-06 15:00 ` Johannes Weiner 0 siblings, 1 reply; 17+ messages in thread From: Michal Hocko @ 2015-05-06 13:46 UTC (permalink / raw) To: Johannes Weiner Cc: Vladimir Davydov, Andrew Morton, Tejun Heo, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Greg Thelen, linux-mm, cgroups, linux-kernel On Wed 06-05-15 09:16:22, Johannes Weiner wrote: > On Wed, May 06, 2015 at 01:59:41PM +0200, Michal Hocko wrote: > > On Tue 05-05-15 12:45:42, Vladimir Davydov wrote: > > > Not all kmem allocations should be accounted to memcg. The following > > > patch gives an example when accounting of a certain type of allocations > > > to memcg can effectively result in a memory leak. > > > > > This patch adds the __GFP_NOACCOUNT flag which if passed to kmalloc > > > and friends will force the allocation to go through the root > > > cgroup. It will be used by the next patch. > > > > The name of the flag is way too generic. It is not clear that the > > accounting is KMEMCG related. > > The memory controller is the (primary) component that accounts > physical memory allocations in the kernel, so I don't see how this > would be ambiguous in any way. What if a high-level allocator wants to do some accounting as well? E.g. slab allocator accounts {un}reclaimable pages. It is a different thing because the accounting is per-cache rather than gfp based but I just wanted to point out that accounting is rather a wide term. > > __GFP_NO_KMEMCG sounds better? > > I think that's much worse. I would prefer communicating the desired > behavior directly instead of having to derive it from a subsystem > name. > (And KMEMCG should not even be a term, it's all just the memory > controller, i.e. memcg.) I do not mind __GFP_NO_MEMCG either. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] gfp: add __GFP_NOACCOUNT 2015-05-06 13:46 ` Michal Hocko @ 2015-05-06 15:00 ` Johannes Weiner 0 siblings, 0 replies; 17+ messages in thread From: Johannes Weiner @ 2015-05-06 15:00 UTC (permalink / raw) To: Michal Hocko Cc: Vladimir Davydov, Andrew Morton, Tejun Heo, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Greg Thelen, linux-mm, cgroups, linux-kernel On Wed, May 06, 2015 at 03:46:20PM +0200, Michal Hocko wrote: > On Wed 06-05-15 09:16:22, Johannes Weiner wrote: > > On Wed, May 06, 2015 at 01:59:41PM +0200, Michal Hocko wrote: > > > On Tue 05-05-15 12:45:42, Vladimir Davydov wrote: > > > > Not all kmem allocations should be accounted to memcg. The following > > > > patch gives an example when accounting of a certain type of allocations > > > > to memcg can effectively result in a memory leak. > > > > > > > This patch adds the __GFP_NOACCOUNT flag which if passed to kmalloc > > > > and friends will force the allocation to go through the root > > > > cgroup. It will be used by the next patch. > > > > > > The name of the flag is way too generic. It is not clear that the > > > accounting is KMEMCG related. > > > > The memory controller is the (primary) component that accounts > > physical memory allocations in the kernel, so I don't see how this > > would be ambiguous in any way. > > What if a high-level allocator wants to do some accounting as well? > E.g. slab allocator accounts {un}reclaimable pages. It is a different > thing because the accounting is per-cache rather than gfp based but I > just wanted to point out that accounting is rather a wide term. This is a concept we have discussed so many times before. The more generic or fundamental the functionality in its context, the more generic the name. The longer and more specific the name, the more niche its functionality in that context. See schedule(). See open(). Accounting is a broad term, but in the context of a physical memory request I can not think of a more fundamental meaning of "do not account" - without further qualification - then inhibiting what memcg does: accounting the requested memory to the caller. If some allocator would want to modify the accounting of a certain *type* of memory, then that would be more specific, and a flag to accomplish this would be expected to have a longer name. One is a core feature of the VM, the other is a niche aspect of another core feature of the VM. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] gfp: add __GFP_NOACCOUNT 2015-05-05 9:45 [PATCH 1/2] gfp: add __GFP_NOACCOUNT Vladimir Davydov 2015-05-05 9:45 ` [PATCH 2/2] kernfs: do not account ino_ida allocations to memcg Vladimir Davydov 2015-05-06 11:59 ` [PATCH 1/2] gfp: add __GFP_NOACCOUNT Michal Hocko @ 2015-05-06 14:58 ` Michal Hocko 2015-05-06 16:35 ` [PATCH v2] " Vladimir Davydov 2015-05-06 17:52 ` [PATCH 1/2] " Johannes Weiner 2 siblings, 2 replies; 17+ messages in thread From: Michal Hocko @ 2015-05-06 14:58 UTC (permalink / raw) To: Vladimir Davydov Cc: Andrew Morton, Johannes Weiner, Tejun Heo, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Greg Thelen, linux-mm, cgroups, linux-kernel On Tue 05-05-15 12:45:42, Vladimir Davydov wrote: [...] > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index 97a9373e61e8..37c422df2a0f 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -30,6 +30,7 @@ struct vm_area_struct; > #define ___GFP_HARDWALL 0x20000u > #define ___GFP_THISNODE 0x40000u > #define ___GFP_RECLAIMABLE 0x80000u > +#define ___GFP_NOACCOUNT 0x100000u > #define ___GFP_NOTRACK 0x200000u > #define ___GFP_NO_KSWAPD 0x400000u > #define ___GFP_OTHER_NODE 0x800000u > @@ -87,6 +88,7 @@ struct vm_area_struct; > #define __GFP_HARDWALL ((__force gfp_t)___GFP_HARDWALL) /* Enforce hardwall cpuset memory allocs */ > #define __GFP_THISNODE ((__force gfp_t)___GFP_THISNODE)/* No fallback, no policies */ > #define __GFP_RECLAIMABLE ((__force gfp_t)___GFP_RECLAIMABLE) /* Page is reclaimable */ > +#define __GFP_NOACCOUNT ((__force gfp_t)___GFP_NOACCOUNT) /* Don't account to memcg */ The wording suggests that _any_ memcg charge might be skipped by this flag but only kmem part is handled. So either handle the flag in try_charge or, IMO preferably, update the comment here and add WARN_ON{_ONCE}(gfp & __GFP_NOACCOUNT). I do not think we should allow to skip the charge for user pages ATM and warning could tell us about the abuse of the flag. > #define __GFP_NOTRACK ((__force gfp_t)___GFP_NOTRACK) /* Don't track with kmemcheck */ > > #define __GFP_NO_KSWAPD ((__force gfp_t)___GFP_NO_KSWAPD) > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 72dff5fb0d0c..6c8918114804 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -463,6 +463,8 @@ memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order) > if (!memcg_kmem_enabled()) > return true; > > + if (gfp & __GFP_NOACCOUNT) > + return true; > /* > * __GFP_NOFAIL allocations will move on even if charging is not > * possible. Therefore we don't even try, and have this allocation > @@ -522,6 +524,8 @@ memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp) > { > if (!memcg_kmem_enabled()) > return cachep; > + if (gfp & __GFP_NOACCOUNT) > + return cachep; > if (gfp & __GFP_NOFAIL) > return cachep; > if (in_interrupt() || (!current->mm) || (current->flags & PF_KTHREAD)) > diff --git a/mm/kmemleak.c b/mm/kmemleak.c > index 5405aff5a590..f0fe4f2c1fa7 100644 > --- a/mm/kmemleak.c > +++ b/mm/kmemleak.c > @@ -115,7 +115,8 @@ > #define BYTES_PER_POINTER sizeof(void *) > > /* GFP bitmask for kmemleak internal allocations */ > -#define gfp_kmemleak_mask(gfp) (((gfp) & (GFP_KERNEL | GFP_ATOMIC)) | \ > +#define gfp_kmemleak_mask(gfp) (((gfp) & (GFP_KERNEL | GFP_ATOMIC | \ > + __GFP_NOACCOUNT)) | \ > __GFP_NORETRY | __GFP_NOMEMALLOC | \ > __GFP_NOWARN) > > -- > 1.7.10.4 > -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2] gfp: add __GFP_NOACCOUNT 2015-05-06 14:58 ` Michal Hocko @ 2015-05-06 16:35 ` Vladimir Davydov 2015-05-06 17:52 ` [PATCH 1/2] " Johannes Weiner 1 sibling, 0 replies; 17+ messages in thread From: Vladimir Davydov @ 2015-05-06 16:35 UTC (permalink / raw) To: Andrew Morton Cc: Johannes Weiner, Michal Hocko, Tejun Heo, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Greg Thelen, Greg Kroah-Hartman, linux-mm, cgroups, linux-kernel, stable Not all kmem allocations should be accounted to memcg. The following patch gives an example when accounting of a certain type of allocations to memcg can effectively result in a memory leak. This patch adds the __GFP_NOACCOUNT flag which if passed to kmalloc and friends will force the allocation to go through the root cgroup. It will be used by the next patch. Note, since in case of kmemleak enabled each kmalloc implies yet another allocation from the kmemleak_object cache, we add __GFP_NOACCOUNT to gfp_kmemleak_mask. Alternatively, we could introduce a per kmem cache flag disabling accounting for all allocations of a particular kind, but (a) we would not be able to bypass accounting for kmalloc then and (b) a kmem cache with this flag set could not be merged with a kmem cache without this flag, which would increase the number of global caches and therefore fragmentation even if the memory cgroup controller is not used. Despite its generic name, currently __GFP_NOACCOUNT disables accounting only for kmem allocations while user page allocations are always charged. To catch abusing of this flag, a warning is issued on an attempt of passing it to mem_cgroup_try_charge. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: <stable@vger.kernel.org> # 4.0 --- Changes in v2: - explain drawbacks of per kmem cache flag disabling accounting as a possible alternative to a GFP flag in commit message (Michal) - warn if __GFP_NOACCOUNT is passed to mem_cgroup_try_charge (Michal) include/linux/gfp.h | 2 ++ include/linux/memcontrol.h | 4 ++++ mm/kmemleak.c | 3 ++- mm/memcontrol.c | 2 ++ 4 files changed, 10 insertions(+), 1 deletion(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 97a9373e61e8..15928f0647e4 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -30,6 +30,7 @@ struct vm_area_struct; #define ___GFP_HARDWALL 0x20000u #define ___GFP_THISNODE 0x40000u #define ___GFP_RECLAIMABLE 0x80000u +#define ___GFP_NOACCOUNT 0x100000u #define ___GFP_NOTRACK 0x200000u #define ___GFP_NO_KSWAPD 0x400000u #define ___GFP_OTHER_NODE 0x800000u @@ -87,6 +88,7 @@ struct vm_area_struct; #define __GFP_HARDWALL ((__force gfp_t)___GFP_HARDWALL) /* Enforce hardwall cpuset memory allocs */ #define __GFP_THISNODE ((__force gfp_t)___GFP_THISNODE)/* No fallback, no policies */ #define __GFP_RECLAIMABLE ((__force gfp_t)___GFP_RECLAIMABLE) /* Page is reclaimable */ +#define __GFP_NOACCOUNT ((__force gfp_t)___GFP_NOACCOUNT) /* Don't account to kmemcg */ #define __GFP_NOTRACK ((__force gfp_t)___GFP_NOTRACK) /* Don't track with kmemcheck */ #define __GFP_NO_KSWAPD ((__force gfp_t)___GFP_NO_KSWAPD) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 72dff5fb0d0c..6c8918114804 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -463,6 +463,8 @@ memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **memcg, int order) if (!memcg_kmem_enabled()) return true; + if (gfp & __GFP_NOACCOUNT) + return true; /* * __GFP_NOFAIL allocations will move on even if charging is not * possible. Therefore we don't even try, and have this allocation @@ -522,6 +524,8 @@ memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp) { if (!memcg_kmem_enabled()) return cachep; + if (gfp & __GFP_NOACCOUNT) + return cachep; if (gfp & __GFP_NOFAIL) return cachep; if (in_interrupt() || (!current->mm) || (current->flags & PF_KTHREAD)) diff --git a/mm/kmemleak.c b/mm/kmemleak.c index 5405aff5a590..f0fe4f2c1fa7 100644 --- a/mm/kmemleak.c +++ b/mm/kmemleak.c @@ -115,7 +115,8 @@ #define BYTES_PER_POINTER sizeof(void *) /* GFP bitmask for kmemleak internal allocations */ -#define gfp_kmemleak_mask(gfp) (((gfp) & (GFP_KERNEL | GFP_ATOMIC)) | \ +#define gfp_kmemleak_mask(gfp) (((gfp) & (GFP_KERNEL | GFP_ATOMIC | \ + __GFP_NOACCOUNT)) | \ __GFP_NORETRY | __GFP_NOMEMALLOC | \ __GFP_NOWARN) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 14c2f2017e37..41c9e530f1cd 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5462,6 +5462,8 @@ int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm, unsigned int nr_pages = 1; int ret = 0; + WARN_ON_ONCE(gfp_mask & __GFP_NOACCOUNT); + if (mem_cgroup_disabled()) goto out; -- 1.7.10.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] gfp: add __GFP_NOACCOUNT 2015-05-06 14:58 ` Michal Hocko 2015-05-06 16:35 ` [PATCH v2] " Vladimir Davydov @ 2015-05-06 17:52 ` Johannes Weiner 1 sibling, 0 replies; 17+ messages in thread From: Johannes Weiner @ 2015-05-06 17:52 UTC (permalink / raw) To: Michal Hocko Cc: Vladimir Davydov, Andrew Morton, Tejun Heo, Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim, Greg Thelen, linux-mm, cgroups, linux-kernel On Wed, May 06, 2015 at 04:58:14PM +0200, Michal Hocko wrote: > On Tue 05-05-15 12:45:42, Vladimir Davydov wrote: > [...] > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > > index 97a9373e61e8..37c422df2a0f 100644 > > --- a/include/linux/gfp.h > > +++ b/include/linux/gfp.h > > @@ -30,6 +30,7 @@ struct vm_area_struct; > > #define ___GFP_HARDWALL 0x20000u > > #define ___GFP_THISNODE 0x40000u > > #define ___GFP_RECLAIMABLE 0x80000u > > +#define ___GFP_NOACCOUNT 0x100000u > > #define ___GFP_NOTRACK 0x200000u > > #define ___GFP_NO_KSWAPD 0x400000u > > #define ___GFP_OTHER_NODE 0x800000u > > @@ -87,6 +88,7 @@ struct vm_area_struct; > > #define __GFP_HARDWALL ((__force gfp_t)___GFP_HARDWALL) /* Enforce hardwall cpuset memory allocs */ > > #define __GFP_THISNODE ((__force gfp_t)___GFP_THISNODE)/* No fallback, no policies */ > > #define __GFP_RECLAIMABLE ((__force gfp_t)___GFP_RECLAIMABLE) /* Page is reclaimable */ > > +#define __GFP_NOACCOUNT ((__force gfp_t)___GFP_NOACCOUNT) /* Don't account to memcg */ > > The wording suggests that _any_ memcg charge might be skipped by this flag > but only kmem part is handled. > > So either handle the flag in try_charge or, IMO preferably, update the > comment here and add WARN_ON{_ONCE}(gfp & __GFP_NOACCOUNT). I do not > think we should allow to skip the charge for user pages ATM and warning > could tell us about the abuse of the flag. Michal, please just stop. There is no reason to warn the user about this whatsoever. If you want to prevent abuse - whatever that means - program your mailreader to flag patches containing __GFP_NOACCOUNT and review them carefully. This eagerness to clutter the code with defensiveness against the rest of the kernel tree and to disrupt the user over every little blip that has nothing to do with them is really getting old at this point. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2015-05-06 17:52 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-05-05 9:45 [PATCH 1/2] gfp: add __GFP_NOACCOUNT Vladimir Davydov 2015-05-05 9:45 ` [PATCH 2/2] kernfs: do not account ino_ida allocations to memcg Vladimir Davydov 2015-05-05 13:45 ` Tejun Heo 2015-05-05 16:04 ` Vladimir Davydov 2015-05-06 11:59 ` [PATCH 1/2] gfp: add __GFP_NOACCOUNT Michal Hocko 2015-05-06 12:24 ` Vladimir Davydov 2015-05-06 12:35 ` Michal Hocko 2015-05-06 13:25 ` Vladimir Davydov 2015-05-06 13:55 ` Michal Hocko 2015-05-06 14:29 ` Vladimir Davydov 2015-05-06 14:46 ` Michal Hocko 2015-05-06 13:16 ` Johannes Weiner 2015-05-06 13:46 ` Michal Hocko 2015-05-06 15:00 ` Johannes Weiner 2015-05-06 14:58 ` Michal Hocko 2015-05-06 16:35 ` [PATCH v2] " Vladimir Davydov 2015-05-06 17:52 ` [PATCH 1/2] " Johannes Weiner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox