From: Johannes Weiner <hannes@cmpxchg.org>
To: azurIt <azurit@pobox.sk>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@suse.cz>,
David Rientjes <rientjes@google.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
linux-mm@kvack.org, cgroups@vger.kernel.org, x86@kernel.org,
linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [patch 0/7] improve memcg oom killer robustness v2
Date: Mon, 9 Sep 2013 16:12:38 -0400 [thread overview]
Message-ID: <20130909201238.GH856@cmpxchg.org> (raw)
In-Reply-To: <20130909215917.96932098@pobox.sk>
On Mon, Sep 09, 2013 at 09:59:17PM +0200, azurIt wrote:
> >On Mon, Sep 09, 2013 at 03:10:10PM +0200, azurIt wrote:
> >> >Hi azur,
> >> >
> >> >On Wed, Sep 04, 2013 at 10:18:52AM +0200, azurIt wrote:
> >> >> > CC: "Andrew Morton" <akpm@linux-foundation.org>, "Michal Hocko" <mhocko@suse.cz>, "David Rientjes" <rientjes@google.com>, "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@jp.fujitsu.com>, "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>, linux-mm@kvack.org, cgroups@vger.kernel.org, x86@kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org
> >> >> >Hello azur,
> >> >> >
> >> >> >On Mon, Sep 02, 2013 at 12:38:02PM +0200, azurIt wrote:
> >> >> >> >>Hi azur,
> >> >> >> >>
> >> >> >> >>here is the x86-only rollup of the series for 3.2.
> >> >> >> >>
> >> >> >> >>Thanks!
> >> >> >> >>Johannes
> >> >> >> >>---
> >> >> >> >
> >> >> >> >
> >> >> >> >Johannes,
> >> >> >> >
> >> >> >> >unfortunately, one problem arises: I have (again) cgroup which cannot be deleted :( it's a user who had very high memory usage and was reaching his limit very often. Do you need any info which i can gather now?
> >> >> >
> >> >> >Did the OOM killer go off in this group?
> >> >> >
> >> >> >Was there a warning in the syslog ("Fixing unhandled memcg OOM
> >> >> >context")?
> >> >>
> >> >>
> >> >>
> >> >> Ok, i see this message several times in my syslog logs, one of them is also for this unremovable cgroup (but maybe all of them cannot be removed, should i try?). Example of the log is here (don't know where exactly it starts and ends so here is the full kernel log):
> >> >> http://watchdog.sk/lkml/oom_syslog.gz
> >> >There is an unfinished OOM invocation here:
> >> >
> >> > Aug 22 13:15:21 server01 kernel: [1251422.715112] Fixing unhandled memcg OOM context set up from:
> >> > Aug 22 13:15:21 server01 kernel: [1251422.715191] [<ffffffff811105c2>] T.1154+0x622/0x8f0
> >> > Aug 22 13:15:21 server01 kernel: [1251422.715274] [<ffffffff8111153e>] mem_cgroup_cache_charge+0xbe/0xe0
> >> > Aug 22 13:15:21 server01 kernel: [1251422.715357] [<ffffffff810cf31c>] add_to_page_cache_locked+0x4c/0x140
> >> > Aug 22 13:15:21 server01 kernel: [1251422.715443] [<ffffffff810cf432>] add_to_page_cache_lru+0x22/0x50
> >> > Aug 22 13:15:21 server01 kernel: [1251422.715526] [<ffffffff810cfdd3>] find_or_create_page+0x73/0xb0
> >> > Aug 22 13:15:21 server01 kernel: [1251422.715608] [<ffffffff811493ba>] __getblk+0xea/0x2c0
> >> > Aug 22 13:15:21 server01 kernel: [1251422.715692] [<ffffffff8114ca73>] __bread+0x13/0xc0
> >> > Aug 22 13:15:21 server01 kernel: [1251422.715774] [<ffffffff81196968>] ext3_get_branch+0x98/0x140
> >> > Aug 22 13:15:21 server01 kernel: [1251422.715859] [<ffffffff81197557>] ext3_get_blocks_handle+0xd7/0xdc0
> >> > Aug 22 13:15:21 server01 kernel: [1251422.715942] [<ffffffff81198304>] ext3_get_block+0xc4/0x120
> >> > Aug 22 13:15:21 server01 kernel: [1251422.716023] [<ffffffff81155c3a>] do_mpage_readpage+0x38a/0x690
> >> > Aug 22 13:15:21 server01 kernel: [1251422.716107] [<ffffffff81155f8f>] mpage_readpage+0x4f/0x70
> >> > Aug 22 13:15:21 server01 kernel: [1251422.716188] [<ffffffff811973a8>] ext3_readpage+0x28/0x60
> >> > Aug 22 13:15:21 server01 kernel: [1251422.716268] [<ffffffff810cfa48>] filemap_fault+0x308/0x560
> >> > Aug 22 13:15:21 server01 kernel: [1251422.716350] [<ffffffff810ef898>] __do_fault+0x78/0x5a0
> >> > Aug 22 13:15:21 server01 kernel: [1251422.716433] [<ffffffff810f2ab4>] handle_pte_fault+0x84/0x940
> >> >
> >> >__getblk() has this weird loop where it tries to instantiate the page,
> >> >frees memory on failure, then retries. If the memcg goes OOM, the OOM
> >> >path might be entered multiple times and each time leak the memcg
> >> >reference of the respective previous OOM invocation.
> >> >
> >> >There are a few more find_or_create() sites that do not propagate an
> >> >error and it's incredibly hard to find out whether they are even taken
> >> >during a page fault. It's not practical to annotate them all with
> >> >memcg OOM toggles, so let's just catch all OOM contexts at the end of
> >> >handle_mm_fault() and clear them if !VM_FAULT_OOM instead of treating
> >> >this like an error.
> >> >
> >> >azur, here is a patch on top of your modified 3.2. Note that Michal
> >> >might be onto something and we are looking at multiple issues here,
> >> >but the log excert above suggests this fix is required either way.
> >>
> >>
> >>
> >>
> >> Johannes, is this still up to date? Thank you.
> >
> >No, please use the following on top of 3.2 (i.e. full replacement, not
> >incremental to what you have):
>
>
>
> Unfortunately it didn't compile:
>
>
>
>
> LD vmlinux.o
> MODPOST vmlinux.o
> WARNING: modpost: Found 4924 section mismatch(es).
> To see full details build your kernel with:
> 'make CONFIG_DEBUG_SECTION_MISMATCH=y'
> GEN .version
> CHK include/generated/compile.h
> UPD include/generated/compile.h
> CC init/version.o
> LD init/built-in.o
> LD .tmp_vmlinux1
> arch/x86/built-in.o: In function `do_page_fault':
> (.text+0x26a77): undefined reference to `handle_mm_fault'
> mm/built-in.o: In function `fixup_user_fault':
> (.text+0x224d3): undefined reference to `handle_mm_fault'
> mm/built-in.o: In function `__get_user_pages':
> (.text+0x24a0f): undefined reference to `handle_mm_fault'
> make: *** [.tmp_vmlinux1] Error 1
Oops, sorry about that. Must be configuration dependent because it
works for me (and handle_mm_fault is obviously defined).
Do you have warnings earlier in the compilation? You can use make -s
to filter out everything but warnings.
Or send me your configuration so I can try to reproduce it here.
Thanks!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-09-09 20:13 UTC|newest]
Thread overview: 99+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-03 16:59 Johannes Weiner
2013-08-03 16:59 ` [patch 1/7] arch: mm: remove obsolete init OOM protection Johannes Weiner
2013-08-06 6:34 ` Vineet Gupta
2013-08-03 16:59 ` [patch 2/7] arch: mm: do not invoke OOM killer on kernel fault OOM Johannes Weiner
2013-08-03 16:59 ` [patch 3/7] arch: mm: pass userspace fault flag to generic fault handler Johannes Weiner
2013-08-05 22:06 ` Andrew Morton
2013-08-05 22:25 ` Johannes Weiner
2013-08-03 16:59 ` [patch 4/7] x86: finish user fault error path with fatal signal Johannes Weiner
2013-08-03 16:59 ` [patch 5/7] mm: memcg: enable memcg OOM killer only for user faults Johannes Weiner
2013-08-05 9:18 ` Michal Hocko
2013-08-03 16:59 ` [patch 6/7] mm: memcg: rework and document OOM waiting and wakeup Johannes Weiner
2013-08-03 17:00 ` [patch 7/7] mm: memcg: do not trap chargers with full callstack on OOM Johannes Weiner
2013-08-05 9:54 ` Michal Hocko
2013-08-05 20:56 ` Johannes Weiner
2013-08-03 17:08 ` [patch 0/7] improve memcg oom killer robustness v2 Johannes Weiner
2013-08-09 9:06 ` azurIt
2013-08-30 19:58 ` azurIt
2013-09-02 10:38 ` azurIt
2013-09-03 20:48 ` Johannes Weiner
2013-09-04 7:53 ` azurIt
2013-09-04 8:18 ` azurIt
2013-09-05 11:54 ` Johannes Weiner
2013-09-05 12:43 ` Michal Hocko
2013-09-05 16:18 ` Johannes Weiner
2013-09-09 12:36 ` Michal Hocko
2013-09-09 12:56 ` Michal Hocko
2013-09-12 12:59 ` Johannes Weiner
2013-09-16 14:03 ` Michal Hocko
2013-09-05 13:24 ` Michal Hocko
2013-09-09 13:10 ` azurIt
2013-09-09 17:28 ` Johannes Weiner
2013-09-09 19:59 ` azurIt
2013-09-09 20:12 ` Johannes Weiner [this message]
2013-09-09 20:18 ` azurIt
2013-09-09 21:08 ` azurIt
2013-09-10 18:13 ` azurIt
2013-09-10 18:37 ` Johannes Weiner
2013-09-10 19:32 ` azurIt
2013-09-10 20:12 ` Johannes Weiner
2013-09-10 21:08 ` azurIt
2013-09-10 21:18 ` Johannes Weiner
2013-09-10 21:32 ` azurIt
2013-09-10 22:03 ` Johannes Weiner
2013-09-11 12:33 ` azurIt
2013-09-11 18:03 ` Johannes Weiner
2013-09-11 18:54 ` azurIt
2013-09-11 19:11 ` Johannes Weiner
2013-09-11 19:41 ` azurIt
2013-09-11 20:04 ` Johannes Weiner
2013-09-14 10:48 ` azurIt
2013-09-16 13:40 ` Michal Hocko
2013-09-16 14:01 ` azurIt
2013-09-16 14:06 ` Michal Hocko
2013-09-16 14:13 ` azurIt
2013-09-16 14:57 ` Michal Hocko
2013-09-16 15:05 ` azurIt
2013-09-16 15:17 ` Johannes Weiner
2013-09-16 15:24 ` azurIt
2013-09-16 15:25 ` Michal Hocko
2013-09-16 15:40 ` azurIt
2013-09-16 20:52 ` azurIt
2013-09-17 0:02 ` Johannes Weiner
2013-09-17 11:15 ` azurIt
2013-09-17 14:10 ` Michal Hocko
2013-09-18 14:03 ` azurIt
2013-09-18 14:24 ` Michal Hocko
2013-09-18 14:33 ` azurIt
2013-09-18 14:42 ` Michal Hocko
2013-09-18 18:02 ` azurIt
2013-09-18 18:36 ` Michal Hocko
[not found] ` <20130918160304.6EDF2729-Rm0zKEqwvD4@public.gmane.org>
2013-09-18 18:04 ` Johannes Weiner
[not found] ` <20130918180455.GD856-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-09-18 18:19 ` Johannes Weiner
2013-09-18 19:55 ` Johannes Weiner
2013-09-18 20:52 ` azurIt
2013-09-25 7:26 ` azurIt
2013-09-26 16:54 ` azurIt
2013-09-26 19:27 ` Johannes Weiner
2013-09-27 2:04 ` azurIt
2013-10-07 11:01 ` azurIt
[not found] ` <20131007130149.5F5482D8-Rm0zKEqwvD4@public.gmane.org>
2013-10-07 19:23 ` Johannes Weiner
2013-10-09 18:44 ` azurIt
2013-10-10 0:14 ` Johannes Weiner
2013-10-10 22:59 ` azurIt
2013-09-17 11:20 ` azurIt
2013-09-16 10:22 ` azurIt
2013-09-04 9:45 ` azurIt
2013-09-04 11:57 ` Michal Hocko
2013-09-04 12:10 ` azurIt
2013-09-04 12:26 ` Michal Hocko
2013-09-04 12:39 ` azurIt
2013-09-05 9:14 ` azurIt
2013-09-05 9:53 ` Michal Hocko
2013-09-05 10:17 ` azurIt
2013-09-05 11:17 ` Michal Hocko
2013-09-05 11:47 ` azurIt
2013-09-05 12:03 ` Michal Hocko
2013-09-05 12:33 ` azurIt
2013-09-05 12:45 ` Michal Hocko
2013-09-05 13:00 ` azurIt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130909201238.GH856@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=azurit@pobox.sk \
--cc=cgroups@vger.kernel.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
--cc=rientjes@google.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox