From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.6 required=3.0 tests=FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7B5EFA372C for ; Fri, 8 Nov 2019 04:01:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 77A032084D for ; Fri, 8 Nov 2019 04:01:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 77A032084D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=sina.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2863D6B0005; Thu, 7 Nov 2019 23:01:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 20E616B0006; Thu, 7 Nov 2019 23:01:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 125576B0007; Thu, 7 Nov 2019 23:01:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id F0BB06B0005 for ; Thu, 7 Nov 2019 23:01:17 -0500 (EST) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 98B944DAE for ; Fri, 8 Nov 2019 04:01:17 +0000 (UTC) X-FDA: 76131760194.21.frame73_6260c902d610f X-HE-Tag: frame73_6260c902d610f X-Filterd-Recvd-Size: 11322 Received: from mail3-164.sinamail.sina.com.cn (mail3-164.sinamail.sina.com.cn [202.108.3.164]) by imf21.hostedemail.com (Postfix) with SMTP for ; Fri, 8 Nov 2019 04:01:15 +0000 (UTC) Received: from unknown (HELO localhost.localdomain)([114.244.162.243]) by sina.com with ESMTP id 5DC4E8870000E049; Fri, 8 Nov 2019 12:01:13 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com X-SMAIL-MID: 66626115075901 From: Hillf Danton To: kernel test robot Cc: Hillf Danton , linux-mm , Andrew Morton , Chris Down , Tejun Heo , Roman Gushchin , Shakeel Butt , Minchan Kim , Mel Gorman , linux-kernel Subject: Re: [memcg] 1fc14cf673: invoked_oom-killer:gfp_mask=0x Date: Fri, 8 Nov 2019 12:01:02 +0800 Message-Id: <20191108040102.1528-1-hdanton@sina.com> In-Reply-To: <20191026110745.12956-1-hdanton@sina.com> References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hey Rong On Thu, 7 Nov 2019 17:02:34 +0800 Rong Chen wrote: >=20 > FYI, we noticed the following commit (built with gcc-7): >=20 Thanks for your report :) > commit: 1fc14cf67325190e0075cf3cd5511965499fffb4 ("[RFC v2] memcg: add = memcg lru for page reclaiming") > url: https://github.com/0day-ci/linux/commits/Hillf-Danton/memcg-add-me= mcg-lru-for-page-reclaiming/20191029-143906 >=20 >=20 > in testcase: vm-scalability > with following parameters: >=20 > runtime: 300s > test: lru-file-mmap-read > cpufreq_governor: performance > ucode: 0x500002b >=20 > test-description: The motivation behind this suite is to exercise funct= ions and regions of the mm/ of the Linux kernel which are of interest to = us. > test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalabili= ty.git/ >=20 >=20 > on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30G= Hz with 192G memory >=20 > caused below changes (please refer to attached dmesg/kmsg for entire lo= g/backtrace): >=20 >=20 > +--------------------------------------------------+------------+------= ------+ > | | 8005803a2c | 1fc14= cf673 | > +--------------------------------------------------+------------+------= ------+ > | boot_successes | 2 | 4 = | > | boot_failures | 11 | = | > | WARNING:at_fs/iomap/direct-io.c:#iomap_dio_actor | 10 | = | > | RIP:iomap_dio_actor | 10 | = | > | BUG:kernel_hang_in_boot_stage | 1 | = | > | last_state.OOM | 0 | 4 = | > +--------------------------------------------------+------------+------= ------+ >=20 >=20 > If you fix the issue, kindly add following tag > Reported-by: kernel test robot >=20 >=20 >=20 > user :notice: [ 51.667771] 2019-11-06 23:56:11 ./usemem --runtime 3= 00 -f /tmp/vm-scalability-tmp/vm-scalability/sparse-lru-file-mmap-read-71= --readonly 22906492245 >=20 > user :notice: [ 51.697549] 2019-11-06 23:56:11 ./usemem --runtime 3= 00 -f /tmp/vm-scalability-tmp/vm-scalability/sparse-lru-file-mmap-read-72= --readonly 22906492245 >=20 > kern :warn : [ 51.715513] usemem invoked oom-killer: gfp_mask=3D0x4= 00dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=3D0, oom_score_adj=3D0 >=20 > user :notice: [ 51.724161] 2019-11-06 23:56:11 truncate /tmp/vm-sca= lability-tmp/vm-scalability/sparse-lru-file-mmap-read-73 -s 22906492245 >=20 > kern :warn : [ 51.727992] CPU: 11 PID: 3618 Comm: usemem Not tainte= d 5.4.0-rc5-00020-g1fc14cf673251 #2 > user :notice: [ 51.744101] 2019-11-06 23:56:11 ./usemem --runtime 3= 00 -f /tmp/vm-scalability-tmp/vm-scalability/sparse-lru-file-mmap-read-73= --readonly 22906492245 >=20 > kern :warn : [ 51.752655] Call Trace: > kern :warn : [ 51.752666] dump_stack+0x5c/0x7b > user :notice: [ 51.771480] 2019-11-06 23:56:11 truncate /tmp/vm-sca= lability-tmp/vm-scalability/sparse-lru-file-mmap-read-74 -s 22906492245 >=20 > kern :warn : [ 51.775027] dump_header+0x4a/0x220 > kern :warn : [ 51.775029] oom_kill_process+0xe9/0x130 > kern :warn : [ 51.775031] out_of_memory+0x105/0x510 > kern :warn : [ 51.775037] __alloc_pages_slowpath+0xa3f/0xdb0 > kern :warn : [ 51.775040] __alloc_pages_nodemask+0x2f0/0x340 > kern :warn : [ 51.775044] pte_alloc_one+0x13/0x40 > kern :warn : [ 51.775048] __handle_mm_fault+0xe9d/0xf70 > kern :warn : [ 51.775050] handle_mm_fault+0xdd/0x210 > kern :warn : [ 51.775054] __do_page_fault+0x2f1/0x520 > kern :warn : [ 51.775056] do_page_fault+0x30/0x120 > user :notice: [ 51.782517] 2019-11-06 23:56:11 truncate /tmp/vm-sca= lability-tmp/vm-scalability/sparse-lru-file-mmap-read-75 -s 22906492245 >=20 > kern :warn : [ 51.792048] page_fault+0x3e/0x50 > kern :warn : [ 51.792051] RIP: 0033:0x55c6ced07cfc > user :notice: [ 51.798308] 2019-11-06 23:56:11 ./usemem --runtime 3= 00 -f /tmp/vm-scalability-tmp/vm-scalability/sparse-lru-file-mmap-read-74= --readonly 22906492245 >=20 > kern :warn : [ 51.799413] Code: 00 00 e8 37 f6 ff ff 48 83 c4 08 c3= 48 8d 3d 74 23 00 00 e8 56 f6 ff ff bf 01 00 00 00 e8 bc f6 ff ff 85 d2 = 74 08 48 8d 04 f7 <48> 8b 00 c3 48 8d 04 f7 48 89 30 b8 00 00 00 00 c3 48= 89 f8 48 29 > kern :warn : [ 51.799415] RSP: 002b:00007ffe889ebfe8 EFLAGS: 000102= 02 > user :notice: [ 51.808045] 2019-11-06 23:56:11 ./usemem --runtime 3= 00 -f /tmp/vm-scalability-tmp/vm-scalability/sparse-lru-file-mmap-read-75= --readonly 22906492245 >=20 > kern :warn : [ 51.809437] RAX: 00007fd4c5400000 RBX: 00000000085cc6= 00 RCX: 0000000000000018 > kern :warn : [ 51.809438] RDX: 0000000000000001 RSI: 00000000085cc6= 00 RDI: 00007fd48259d000 > kern :warn : [ 51.809440] RBP: 00000000085cc600 R08: 000000005dc2ed= 1f R09: 00007ffe889ebfa0 > user :notice: [ 51.818030] 2019-11-06 23:56:11 truncate /tmp/vm-sca= lability-tmp/vm-scalability/sparse-lru-file-mmap-read-76 -s 22906492245 >=20 > kern :warn : [ 51.820780] R10: 00007ffe889ebfa0 R11: 00000000000002= 46 R12: 0000000042e63000 > kern :warn : [ 51.820781] R13: 00007fd48259d000 R14: 00007ffe889ec0= 8c R15: 0000000000000001 > kern :warn : [ 51.820813] Mem-Info: > user :notice: [ 51.829016] 2019-11-06 23:56:11 ./usemem --runtime 3= 00 -f /tmp/vm-scalability-tmp/vm-scalability/sparse-lru-file-mmap-read-76= --readonly 22906492245 >=20 > kern :warn : [ 51.830751] active_anon:68712 inactive_anon:29360 iso= lated_anon:0 > active_file:497 inactive_file:48481807 i= solated_file:32 oom was triggered by an order-0 request on a machine with 192G memory even in the presence of fair amount of file pages ... perhaps down to that the memcg lru failed to reclaim some dirty pages. ---8<--- Subject: [RFC] memcg: make memcg lru reclaim dirty pages From: Hillf Danton The memcg lru was added on the top of high work which is currently unable to reclaim dirty pages, with the target of bypassing the soft limit reclaim by hooking into kswapd's logic. Because of dirty pages, memcg lru adds the risk of premature oom even in case of order-0 allocation, so being able to handle dirty pages is a must-have. To do that, memcg lru no longer goes the high work route but embeds in kswapd's logic for page reclaim, by providing reclaimer the victim memcg, and then kswapd will take care of the rest. Signed-off-by: Hillf Danton --- --- b/include/linux/memcontrol.h +++ d/include/linux/memcontrol.h @@ -742,7 +742,7 @@ static inline void mod_lruvec_page_state local_irq_restore(flags); } =20 -void mem_cgroup_reclaim_high(void); +struct mem_cgroup *mem_cgroup_reclaim_high(void); =20 unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, gfp_t gfp_mask, @@ -1130,8 +1130,9 @@ static inline void __mod_lruvec_slab_sta __mod_node_page_state(page_pgdat(page), idx, val); } =20 -static inline void mem_cgroup_reclaim_high(void) +static inline struct mem_cgroup *mem_cgroup_reclaim_high(void) { + return NULL; } =20 static inline --- b/mm/memcontrol.c +++ d/mm/memcontrol.c @@ -2362,12 +2362,22 @@ static struct mem_cgroup *memcg_pinch_lr return NULL; } =20 -void mem_cgroup_reclaim_high(void) +struct mem_cgroup *mem_cgroup_reclaim_high(void) { - struct mem_cgroup *memcg =3D memcg_pinch_lru(); + struct mem_cgroup *memcg, *victim; =20 - if (memcg) - schedule_work(&memcg->high_work); + memcg =3D victim =3D memcg_pinch_lru(); + if (!memcg) + return NULL; + + while ((memcg =3D parent_mem_cgroup(memcg))) + if (page_counter_read(&memcg->memory) > memcg->high) { + memcg_memory_event(memcg, MEMCG_HIGH); + memcg_add_lru(memcg); + break; + } + + return victim; } =20 static void reclaim_high(struct mem_cgroup *memcg, --- b/mm/vmscan.c +++ d/mm/vmscan.c @@ -2996,8 +2996,15 @@ static void shrink_zones(struct zonelist if (zone->zone_pgdat =3D=3D last_pgdat) continue; =20 - mem_cgroup_reclaim_high(); + if (true) { + struct mem_cgroup *memcg; + + memcg =3D mem_cgroup_reclaim_high(); + if (memcg) + shrink_node_memcg(zone->zone_pgdat, + memcg, sc); continue; + } =20 /* * This steals pages from memory cgroups over softlimit @@ -3693,8 +3700,20 @@ restart: if (sc.priority < DEF_PRIORITY - 2) sc.may_writepage =3D 1; =20 - mem_cgroup_reclaim_high(); - goto soft_limit_reclaim_end; + if (true) { + struct mem_cgroup *memcg; + + memcg =3D mem_cgroup_reclaim_high(); + if (memcg) { + unsigned long nr_to_reclaim; + + nr_to_reclaim =3D sc.nr_to_reclaim; + sc.nr_to_reclaim =3D SWAP_CLUSTER_MAX; + shrink_node_memcg(pgdat, memcg, &sc); + sc.nr_to_reclaim =3D nr_to_reclaim; + } + goto soft_limit_reclaim_end; + } =20 /* Call soft limit reclaim before calling shrink_node. */ sc.nr_scanned =3D 0; -- > unevictable:259869 dirty:2 writeback:0 u= nstable:0 > slab_reclaimable:130937 slab_unreclaimab= le:70163 > mapped:48488420 shmem:30398 pagetables:9= 7884 bounce:0 > free:169055 free_pcp:20966 free_cma:0 > user :notice: [ 51.838463] 2019-11-06 23:56:11 truncate /tmp/vm-sca= lability-tmp/vm-scalability/sparse-lru-file-mmap-read-77 -s 22906492245 >=20 > kern :warn : [ 51.840634] Node 0 active_anon:109476kB inactive_anon= :1400kB active_file:76kB inactive_file:47988152kB unevictable:281836kB is= olated(anon):0kB isolated(file):0kB mapped:47993516kB dirty:4kB writeback= :0kB shmem:1512kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB write= back_tmp:0kB unstable:0kB all_unreclaimable? no >=20 >=20 > To reproduce: >=20 > git clone https://github.com/intel/lkp-tests.git > cd lkp-tests > bin/lkp install job.yaml # job file is attached in this email > bin/lkp run job.yaml >=20 >=20 >=20 > Thanks, > Rong Chen