* [PATCH] mm: memcontrol: fix possible css ref leak on oom @ 2016-05-23 16:02 Vladimir Davydov 2016-05-23 17:44 ` Michal Hocko 2016-05-27 17:36 ` Johannes Weiner 0 siblings, 2 replies; 11+ messages in thread From: Vladimir Davydov @ 2016-05-23 16:02 UTC (permalink / raw) To: Andrew Morton; +Cc: Johannes Weiner, Michal Hocko, linux-mm, linux-kernel mem_cgroup_oom may be invoked multiple times while a process is handling a page fault, in which case current->memcg_in_oom will be overwritten leaking the previously taken css reference. Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> --- mm/memcontrol.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 5b48cd25951b..ef8797d34039 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1608,7 +1608,7 @@ static void memcg_oom_recover(struct mem_cgroup *memcg) static void mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order) { - if (!current->memcg_may_oom) + if (!current->memcg_may_oom || current->memcg_in_oom) return; /* * We are in the middle of the charge context here, so we -- 2.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: memcontrol: fix possible css ref leak on oom 2016-05-23 16:02 [PATCH] mm: memcontrol: fix possible css ref leak on oom Vladimir Davydov @ 2016-05-23 17:44 ` Michal Hocko 2016-05-24 8:43 ` Vladimir Davydov 2016-05-27 17:36 ` Johannes Weiner 1 sibling, 1 reply; 11+ messages in thread From: Michal Hocko @ 2016-05-23 17:44 UTC (permalink / raw) To: Vladimir Davydov; +Cc: Andrew Morton, Johannes Weiner, linux-mm, linux-kernel On Mon 23-05-16 19:02:10, Vladimir Davydov wrote: > mem_cgroup_oom may be invoked multiple times while a process is handling > a page fault, in which case current->memcg_in_oom will be overwritten > leaking the previously taken css reference. Have you seen this happening? I was under impression that the page fault paths that have oom enabled will not retry allocations. > Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> That being said I do not have anything against the patch. It is a good safety net I am just not sure this might happen right now and so the patch is not stable candidate. After clarification Acked-by: Michal Hocko <mhocko@suse.com> > --- > mm/memcontrol.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 5b48cd25951b..ef8797d34039 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1608,7 +1608,7 @@ static void memcg_oom_recover(struct mem_cgroup *memcg) > > static void mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order) > { > - if (!current->memcg_may_oom) > + if (!current->memcg_may_oom || current->memcg_in_oom) > return; > /* > * We are in the middle of the charge context here, so we > -- > 2.1.4 -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: memcontrol: fix possible css ref leak on oom 2016-05-23 17:44 ` Michal Hocko @ 2016-05-24 8:43 ` Vladimir Davydov 2016-05-24 8:47 ` Michal Hocko 0 siblings, 1 reply; 11+ messages in thread From: Vladimir Davydov @ 2016-05-24 8:43 UTC (permalink / raw) To: Michal Hocko; +Cc: Andrew Morton, Johannes Weiner, linux-mm, linux-kernel On Mon, May 23, 2016 at 07:44:43PM +0200, Michal Hocko wrote: > On Mon 23-05-16 19:02:10, Vladimir Davydov wrote: > > mem_cgroup_oom may be invoked multiple times while a process is handling > > a page fault, in which case current->memcg_in_oom will be overwritten > > leaking the previously taken css reference. > > Have you seen this happening? I was under impression that the page fault > paths that have oom enabled will not retry allocations. filemap_fault will, for readahead. This is rather unlikely, just like the whole oom scenario, so I haven't faced this leak in production yet, although it's pretty easy to reproduce using a contrived test. However, even if this leak happened on my host, I would probably not notice, because currently we have no clear means of catching css leaks. I'm thinking about adding a file to debugfs containing brief information about all memory cgroups, including dead ones, so that we could at least see how many dead memory cgroups are dangling out there. > > > Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> > > That being said I do not have anything against the patch. It is a good > safety net I am just not sure this might happen right now and so the > patch is not stable candidate. > > After clarification > Acked-by: Michal Hocko <mhocko@suse.com> Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: memcontrol: fix possible css ref leak on oom 2016-05-24 8:43 ` Vladimir Davydov @ 2016-05-24 8:47 ` Michal Hocko 2016-05-24 9:01 ` Vladimir Davydov 0 siblings, 1 reply; 11+ messages in thread From: Michal Hocko @ 2016-05-24 8:47 UTC (permalink / raw) To: Vladimir Davydov; +Cc: Andrew Morton, Johannes Weiner, linux-mm, linux-kernel On Tue 24-05-16 11:43:19, Vladimir Davydov wrote: > On Mon, May 23, 2016 at 07:44:43PM +0200, Michal Hocko wrote: > > On Mon 23-05-16 19:02:10, Vladimir Davydov wrote: > > > mem_cgroup_oom may be invoked multiple times while a process is handling > > > a page fault, in which case current->memcg_in_oom will be overwritten > > > leaking the previously taken css reference. > > > > Have you seen this happening? I was under impression that the page fault > > paths that have oom enabled will not retry allocations. > > filemap_fault will, for readahead. I thought that the readahead is __GFP_NORETRY so we do not trigger OOM killer. > This is rather unlikely, just like the whole oom scenario, so I haven't > faced this leak in production yet, although it's pretty easy to > reproduce using a contrived test. However, even if this leak happened on > my host, I would probably not notice, because currently we have no clear > means of catching css leaks. I'm thinking about adding a file to debugfs > containing brief information about all memory cgroups, including dead > ones, so that we could at least see how many dead memory cgroups are > dangling out there. Yeah, debugfs interface would make some sense. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: memcontrol: fix possible css ref leak on oom 2016-05-24 8:47 ` Michal Hocko @ 2016-05-24 9:01 ` Vladimir Davydov 2016-05-24 9:22 ` Michal Hocko 0 siblings, 1 reply; 11+ messages in thread From: Vladimir Davydov @ 2016-05-24 9:01 UTC (permalink / raw) To: Michal Hocko; +Cc: Andrew Morton, Johannes Weiner, linux-mm, linux-kernel On Tue, May 24, 2016 at 10:47:37AM +0200, Michal Hocko wrote: > On Tue 24-05-16 11:43:19, Vladimir Davydov wrote: > > On Mon, May 23, 2016 at 07:44:43PM +0200, Michal Hocko wrote: > > > On Mon 23-05-16 19:02:10, Vladimir Davydov wrote: > > > > mem_cgroup_oom may be invoked multiple times while a process is handling > > > > a page fault, in which case current->memcg_in_oom will be overwritten > > > > leaking the previously taken css reference. > > > > > > Have you seen this happening? I was under impression that the page fault > > > paths that have oom enabled will not retry allocations. > > > > filemap_fault will, for readahead. > > I thought that the readahead is __GFP_NORETRY so we do not trigger OOM > killer. Hmm, interesting. We do allocate readahead pages with __GFP_NORETRY, but we add them to page cache and hence charge with GFP_KERNEL or GFP_NOFS mask, see __do_page_cache_readahaed -> read_pages. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: memcontrol: fix possible css ref leak on oom 2016-05-24 9:01 ` Vladimir Davydov @ 2016-05-24 9:22 ` Michal Hocko 2016-05-24 10:05 ` Vladimir Davydov 0 siblings, 1 reply; 11+ messages in thread From: Michal Hocko @ 2016-05-24 9:22 UTC (permalink / raw) To: Vladimir Davydov; +Cc: Andrew Morton, Johannes Weiner, linux-mm, linux-kernel On Tue 24-05-16 12:01:42, Vladimir Davydov wrote: > On Tue, May 24, 2016 at 10:47:37AM +0200, Michal Hocko wrote: > > On Tue 24-05-16 11:43:19, Vladimir Davydov wrote: > > > On Mon, May 23, 2016 at 07:44:43PM +0200, Michal Hocko wrote: > > > > On Mon 23-05-16 19:02:10, Vladimir Davydov wrote: > > > > > mem_cgroup_oom may be invoked multiple times while a process is handling > > > > > a page fault, in which case current->memcg_in_oom will be overwritten > > > > > leaking the previously taken css reference. > > > > > > > > Have you seen this happening? I was under impression that the page fault > > > > paths that have oom enabled will not retry allocations. > > > > > > filemap_fault will, for readahead. > > > > I thought that the readahead is __GFP_NORETRY so we do not trigger OOM > > killer. > > Hmm, interesting. We do allocate readahead pages with __GFP_NORETRY, but > we add them to page cache and hence charge with GFP_KERNEL or GFP_NOFS > mask, see __do_page_cache_readahaed -> read_pages. I guess we do not want to trigger OOM just because of readahead. What do you think about the following? I will cook up a full patch if this (untested) looks ok. --- diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 97354102794d..81363b834900 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -209,10 +209,10 @@ static inline struct page *page_cache_alloc_cold(struct address_space *x) return __page_cache_alloc(mapping_gfp_mask(x)|__GFP_COLD); } -static inline struct page *page_cache_alloc_readahead(struct address_space *x) +static inline gfp_t readahead_gfp_mask(struct address_space *x) { - return __page_cache_alloc(mapping_gfp_mask(x) | - __GFP_COLD | __GFP_NORETRY | __GFP_NOWARN); + return mapping_gfp_mask(x) | + __GFP_COLD | __GFP_NORETRY | __GFP_NOWARN; } typedef int filler_t(void *, struct page *); diff --git a/mm/readahead.c b/mm/readahead.c index 40be3ae0afe3..7431fefe4ede 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -108,7 +108,7 @@ int read_cache_pages(struct address_space *mapping, struct list_head *pages, EXPORT_SYMBOL(read_cache_pages); static int read_pages(struct address_space *mapping, struct file *filp, - struct list_head *pages, unsigned nr_pages) + struct list_head *pages, unsigned nr_pages, gfp_t gfp_mask) { struct blk_plug plug; unsigned page_idx; @@ -126,8 +126,7 @@ static int read_pages(struct address_space *mapping, struct file *filp, for (page_idx = 0; page_idx < nr_pages; page_idx++) { struct page *page = lru_to_page(pages); list_del(&page->lru); - if (!add_to_page_cache_lru(page, mapping, page->index, - mapping_gfp_constraint(mapping, GFP_KERNEL))) { + if (!add_to_page_cache_lru(page, mapping, page->index, gfp_mask)) { mapping->a_ops->readpage(filp, page); } put_page(page); @@ -159,6 +158,7 @@ int __do_page_cache_readahead(struct address_space *mapping, struct file *filp, int page_idx; int ret = 0; loff_t isize = i_size_read(inode); + gfp_t gfp_mask = readahead_gfp_mask(mapping); if (isize == 0) goto out; @@ -180,7 +180,7 @@ int __do_page_cache_readahead(struct address_space *mapping, struct file *filp, if (page && !radix_tree_exceptional_entry(page)) continue; - page = page_cache_alloc_readahead(mapping); + page = __page_cache_alloc(gfp_mask); if (!page) break; page->index = page_offset; @@ -196,7 +196,7 @@ int __do_page_cache_readahead(struct address_space *mapping, struct file *filp, * will then handle the error. */ if (ret) - read_pages(mapping, filp, &page_pool, ret); + read_pages(mapping, filp, &page_pool, ret, gfp_mask); BUG_ON(!list_empty(&page_pool)); out: return ret; -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: memcontrol: fix possible css ref leak on oom 2016-05-24 9:22 ` Michal Hocko @ 2016-05-24 10:05 ` Vladimir Davydov 2016-05-24 11:31 ` Michal Hocko 0 siblings, 1 reply; 11+ messages in thread From: Vladimir Davydov @ 2016-05-24 10:05 UTC (permalink / raw) To: Michal Hocko; +Cc: Andrew Morton, Johannes Weiner, linux-mm, linux-kernel On Tue, May 24, 2016 at 11:22:02AM +0200, Michal Hocko wrote: > On Tue 24-05-16 12:01:42, Vladimir Davydov wrote: > > On Tue, May 24, 2016 at 10:47:37AM +0200, Michal Hocko wrote: > > > On Tue 24-05-16 11:43:19, Vladimir Davydov wrote: > > > > On Mon, May 23, 2016 at 07:44:43PM +0200, Michal Hocko wrote: > > > > > On Mon 23-05-16 19:02:10, Vladimir Davydov wrote: > > > > > > mem_cgroup_oom may be invoked multiple times while a process is handling > > > > > > a page fault, in which case current->memcg_in_oom will be overwritten > > > > > > leaking the previously taken css reference. > > > > > > > > > > Have you seen this happening? I was under impression that the page fault > > > > > paths that have oom enabled will not retry allocations. > > > > > > > > filemap_fault will, for readahead. > > > > > > I thought that the readahead is __GFP_NORETRY so we do not trigger OOM > > > killer. > > > > Hmm, interesting. We do allocate readahead pages with __GFP_NORETRY, but > > we add them to page cache and hence charge with GFP_KERNEL or GFP_NOFS > > mask, see __do_page_cache_readahaed -> read_pages. > > I guess we do not want to trigger OOM just because of readahead. What do I agree this is how it should ideally work. Not sure if anybody would bother in practice. > you think about the following? I will cook up a full patch if this > (untested) looks ok. It won't work for most filesystems as they define custom ->readpages. I wonder if it'd be OK to patch them all not to trigger oom. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: memcontrol: fix possible css ref leak on oom 2016-05-24 10:05 ` Vladimir Davydov @ 2016-05-24 11:31 ` Michal Hocko 0 siblings, 0 replies; 11+ messages in thread From: Michal Hocko @ 2016-05-24 11:31 UTC (permalink / raw) To: Vladimir Davydov; +Cc: Andrew Morton, Johannes Weiner, linux-mm, linux-kernel On Tue 24-05-16 13:05:23, Vladimir Davydov wrote: > On Tue, May 24, 2016 at 11:22:02AM +0200, Michal Hocko wrote: [...] > > you think about the following? I will cook up a full patch if this > > (untested) looks ok. > > It won't work for most filesystems as they define custom ->readpages. I > wonder if it'd be OK to patch them all not to trigger oom. readpages is mostly a wrapper for mpage_readpages so I guess this wouldn't be a big deal. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: memcontrol: fix possible css ref leak on oom 2016-05-23 16:02 [PATCH] mm: memcontrol: fix possible css ref leak on oom Vladimir Davydov 2016-05-23 17:44 ` Michal Hocko @ 2016-05-27 17:36 ` Johannes Weiner 2016-05-29 9:11 ` Vladimir Davydov 2016-05-30 7:26 ` Michal Hocko 1 sibling, 2 replies; 11+ messages in thread From: Johannes Weiner @ 2016-05-27 17:36 UTC (permalink / raw) To: Vladimir Davydov; +Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel On Mon, May 23, 2016 at 07:02:10PM +0300, Vladimir Davydov wrote: > mem_cgroup_oom may be invoked multiple times while a process is handling > a page fault, in which case current->memcg_in_oom will be overwritten > leaking the previously taken css reference. There is a task_in_memcg_oom() check before calling mem_cgroup_oom(). How can this happen? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: memcontrol: fix possible css ref leak on oom 2016-05-27 17:36 ` Johannes Weiner @ 2016-05-29 9:11 ` Vladimir Davydov 2016-05-30 7:26 ` Michal Hocko 1 sibling, 0 replies; 11+ messages in thread From: Vladimir Davydov @ 2016-05-29 9:11 UTC (permalink / raw) To: Johannes Weiner; +Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel On Fri, May 27, 2016 at 01:36:29PM -0400, Johannes Weiner wrote: > On Mon, May 23, 2016 at 07:02:10PM +0300, Vladimir Davydov wrote: > > mem_cgroup_oom may be invoked multiple times while a process is handling > > a page fault, in which case current->memcg_in_oom will be overwritten > > leaking the previously taken css reference. > > There is a task_in_memcg_oom() check before calling mem_cgroup_oom(). > > How can this happen? Oops, I overlooked that check. Scratch this patch then. Sorry for the noise. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: memcontrol: fix possible css ref leak on oom 2016-05-27 17:36 ` Johannes Weiner 2016-05-29 9:11 ` Vladimir Davydov @ 2016-05-30 7:26 ` Michal Hocko 1 sibling, 0 replies; 11+ messages in thread From: Michal Hocko @ 2016-05-30 7:26 UTC (permalink / raw) To: Johannes Weiner; +Cc: Vladimir Davydov, Andrew Morton, linux-mm, linux-kernel On Fri 27-05-16 13:36:29, Johannes Weiner wrote: > On Mon, May 23, 2016 at 07:02:10PM +0300, Vladimir Davydov wrote: > > mem_cgroup_oom may be invoked multiple times while a process is handling > > a page fault, in which case current->memcg_in_oom will be overwritten > > leaking the previously taken css reference. > > There is a task_in_memcg_oom() check before calling mem_cgroup_oom(). > > How can this happen? Ble, I have missed that... Thanks for pointing that out -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-05-30 7:26 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-05-23 16:02 [PATCH] mm: memcontrol: fix possible css ref leak on oom Vladimir Davydov 2016-05-23 17:44 ` Michal Hocko 2016-05-24 8:43 ` Vladimir Davydov 2016-05-24 8:47 ` Michal Hocko 2016-05-24 9:01 ` Vladimir Davydov 2016-05-24 9:22 ` Michal Hocko 2016-05-24 10:05 ` Vladimir Davydov 2016-05-24 11:31 ` Michal Hocko 2016-05-27 17:36 ` Johannes Weiner 2016-05-29 9:11 ` Vladimir Davydov 2016-05-30 7:26 ` Michal Hocko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox