While checking the patches fixed broken memcg accounting in vmalloc I found another issue: a false global OOM triggered by memcg-limited user space task. I executed vmalloc-eater inside a memcg limited LXC container in a loop, checked that it does not consume host memory beyond the assigned limit, triggers memcg OOM and generates "Memory cgroup out of memory" messages. Everything was as expected. However I was surprised to find quite rare global OOM messages too. I set sysctl vm.panic_on_oom to 1, repeated the test and successfully crashed the node. Dmesg showed that global OOM was detected on 16 GB node with ~10 GB of free memory. syz-executor invoked oom-killer: gfp_mask=0x0(), order=0, oom_score_adj=1000 CPU: 2 PID: 15307 Comm: syz-executor Kdump: loaded Not tainted 5.15.0-rc4+ #55 Hardware name: Virtuozzo KVM, BIOS 1.11.0-2.vz7.4 04/01/2014 Call Trace: dump_stack_lvl+0x57/0x72 dump_header+0x4a/0x2c1 out_of_memory.cold+0xa/0x7e pagefault_out_of_memory+0x46/0x60 exc_page_fault+0x79/0x2b0 asm_exc_page_fault+0x1e/0x30 ... Mem-Info: Node 0 DMA: 0*4kB 0*8kB <...> = 13296kB Node 0 DMA32: 705*4kB (UM) <...> = 2586964kB Node 0 Normal: 2743*4kB (UME) <...> = 6904828kB ... 4095866 pages RAM ... Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled Full dmesg can be found in attached file. How could this happen? User-space task inside the memcg-limited container generated a page fault, its handler do_user_addr_fault() called handle_mm_fault which could not allocate the page due to exceeding the memcg limit and returned VM_FAULT_OOM. Then do_user_addr_fault() called pagefault_out_of_memory() which executed out_of_memory() without set of memcg. Partially this problem depends on one of my recent patches, disabled unlimited memory allocation for dying tasks. However I think the problem can happen on non-killed tasks too, for example because of kmem limit. At present do_user_addr_fault() does not know why page allocation was failed, i.e. was it global or memcg OOM. I propose to save this information in new flag on task_struct. It can be set in case of memcg restrictons in obj_cgroup_charge_pages() (for memory controller) and in try_charge_memcg() (for kmem controller). Then it can be used in mem_cgroup_oom_synchronize() called inside pagefault_out_of_memory(): in case of memcg-related restrictions it will not trigger fake global OOM and returns to user space which will retry the fault or kill the process if it got a fatal signal. Thank you, Vasily Averin Vasily Averin (1): memcg: prevent false global OOM trigggerd by memcg limited task. include/linux/sched.h | 1 + mm/memcontrol.c | 12 +++++++++--- 2 files changed, 10 insertions(+), 3 deletions(-) -- 2.32.0