From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F313FC433E0 for ; Mon, 1 Feb 2021 12:20:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 631D364E9C for ; Mon, 1 Feb 2021 12:20:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 631D364E9C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BE01B6B0070; Mon, 1 Feb 2021 07:20:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B901F6B0071; Mon, 1 Feb 2021 07:20:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A573F6B0073; Mon, 1 Feb 2021 07:20:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 9042B6B0070 for ; Mon, 1 Feb 2021 07:20:07 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 5605E181AEF15 for ; Mon, 1 Feb 2021 12:20:07 +0000 (UTC) X-FDA: 77769606054.20.plate62_08002e7275c2 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin20.hostedemail.com (Postfix) with ESMTP id 2C7BD180C07AF for ; Mon, 1 Feb 2021 12:20:07 +0000 (UTC) X-HE-Tag: plate62_08002e7275c2 X-Filterd-Recvd-Size: 8515 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf21.hostedemail.com (Postfix) with ESMTP for ; Mon, 1 Feb 2021 12:20:06 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 19DA7ABD5; Mon, 1 Feb 2021 12:20:05 +0000 (UTC) To: Steven Noonan , "linux-mm@kvack.org" References: Cc: Christoph Hellwig , Josef Bacik , Matthew Wilcox From: Vlastimil Babka Subject: Re: "cat /proc/sys/kernel/random/entropy_avail" sometimes causes page allocation failure Message-ID: <592ccd73-9152-0748-a7ce-be663593f8da@suse.cz> Date: Mon, 1 Feb 2021 13:20:04 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.6.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2/1/21 12:26 PM, Steven Noonan wrote: > (Please CC me on replies, I'm not subscribed to the list.) >=20 > I started seeing this problem several months ago, but after some Google= searches I concluded that someone had already root caused and fixed it, = and I just needed to wait for a newer kernel to come along: >=20 > https://www.spinics.net/lists/linux-mm/msg226311.html Yeah, that thread mentions Josef's series [1], but I don't see that it ma= de it into git log. What happened to it? Meanwhile Matthew refactored it with "4bd6a7353ee1 ("sysctl: Convert to iter interfaces")" which left the kzal= loc() but changed count to count+1, which would explain the change from order-5= to order-6. In my quick test, strace cat tells me it uses 128k sized read, w= hich is order-5. So miminally there should be kvzalloc(), but it's still somewhat unfortun= ate to do that for reading a few bytes. Also the check for KMALLOC_MAX_SIZE happens before the +1. Meh. Matthew? [1] https://lore.kernel.org/linux-fsdevel/20200813210411.905010-2-josef@toxic= panda.com/ > But it turns out this has not yet been fixed, even in 5.10.10. Also, wh= ile it used to be logged as an order:5 allocation it's now an order:6 all= ocation for some reason. It only happens rarely (anywhere from once per d= ay to once per week), but it's kind of scary to check the kernel log and = see this: >=20 > [569537.153097] cat: page allocation failure: order:6, mode:0x40dc0(GFP= _KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=3D(null),cpuset=3D/,mems_allowed= =3D0 > [569537.153104] CPU: 1 PID: 3153417 Comm: cat Tainted: P S IO = 5.10.10-1-hsw #1 > [569537.153105] Hardware name: ASUSTeK COMPUTER INC. P10S-M WS Series/P= 10S-M WS Series, BIOS 4503 07/23/2019 > [569537.153106] Call Trace: > [569537.153111] dump_stack+0x58/0x6b > [569537.153113] warn_alloc.cold+0x78/0xdc > [569537.153116] ? __alloc_pages_direct_compact+0x15f/0x170 > [569537.153117] __alloc_pages_slowpath.constprop.0+0xc4e/0xc70 > [569537.153120] __alloc_pages_nodemask+0x2de/0x310 > [569537.153121] kmalloc_order+0x25/0x80 > [569537.153122] kmalloc_order_trace+0x17/0x80 > [569537.153125] proc_sys_call_handler+0xb8/0x250 > [569537.153127] new_sync_read+0x10f/0x190 > [569537.153128] vfs_read+0x147/0x1a0 > [569537.153130] ksys_read+0x5f/0xe0 > [569537.153131] do_syscall_64+0x2d/0x70 > [569537.153134] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [569537.153135] RIP: 0033:0x7fcc21ef92b2 > [569537.153137] Code: c0 e9 b2 fe ff ff 50 48 8d 3d d2 2a 0a 00 e8 e5 e= c 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f= 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 5= 4 24 > [569537.153138] RSP: 002b:00007ffcfbcc5c88 EFLAGS: 00000246 ORIG_RAX: 0= 000000000000000 > [569537.153140] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f= cc21ef92b2 > [569537.153140] RDX: 0000000000020000 RSI: 00007fcc21ff8000 RDI: 000000= 0000000003 > [569537.153141] RBP: 00007fcc21ff8000 R08: 00007fcc21ff7010 R09: 000000= 0000000000 > [569537.153142] R10: 0000000000000022 R11: 0000000000000246 R12: 000055= 6e99b70c00 > [569537.153142] R13: 0000000000000003 R14: 0000000000020000 R15: 000000= 0000020000 > [569537.153144] Mem-Info: > [569537.153146] active_anon:50924 inactive_anon:1715988 isolated_anon:0 > active_file:163427 inactive_file:104041 isolated_file:= 0 > unevictable:3247 dirty:5825 writeback:0 > slab_reclaimable:39566 slab_unreclaimable:1760175 > mapped:49406 shmem:17850 pagetables:10611 bounce:0 > free:1060578 free_pcp:0 free_cma:0 > [569537.153149] Node 0 active_anon:203696kB inactive_anon:6863952kB act= ive_file:653708kB inactive_file:416164kB unevictable:12988kB isolated(ano= n):0kB isolated(file):0kB mapped:197624kB dirty:23300kB writeback:0kB shm= em:71400kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tm= p:0kB kernel_stack:18112kB all_unreclaimable? no > [569537.153151] Node 0 DMA free:11800kB min:24kB low:36kB high:48kB res= erved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB in= active_file:0kB unevictable:0kB writepending:0kB present:15984kB managed:= 15896kB mlocked:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB = free_cma:0kB > [569537.153153] lowmem_reserve[]: 0 1810 64139 64139 64139 > [569537.153156] Node 0 DMA32 free:252476kB min:3208kB low:5076kB high:6= 944kB reserved_highatomic:0KB active_anon:756kB inactive_anon:124844kB ac= tive_file:2120kB inactive_file:180kB unevictable:0kB writepending:0kB pre= sent:1937436kB managed:1871900kB mlocked:0kB pagetables:8kB bounce:0kB fr= ee_pcp:0kB local_pcp:0kB free_cma:0kB > [569537.153158] lowmem_reserve[]: 0 0 62329 62329 62329 > [569537.153160] Node 0 Normal free:3978036kB min:109404kB low:173232kB = high:237060kB reserved_highatomic:2048KB active_anon:202940kB inactive_an= on:6739108kB active_file:651588kB inactive_file:415984kB unevictable:1298= 8kB writepending:23076kB present:65011712kB managed:63831896kB mlocked:11= 176kB pagetables:42436kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0= kB > [569537.153162] lowmem_reserve[]: 0 0 0 0 0 > [569537.153164] Node 0 DMA: 2*4kB (U) 2*8kB (U) 2*16kB (U) 3*32kB (U) 2= *64kB (U) 2*128kB (U) 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB = (M) =3D 11800kB > [569537.153171] Node 0 DMA32: 3738*4kB (UME) 2341*8kB (UME) 1719*16kB (= UME) 943*32kB (UME) 443*64kB (UME) 19*128kB (UME) 108*256kB (UME) 169*512= kB (UM) 8*1024kB (UM) 4*2048kB (UM) 0*4096kB =3D 252704kB > [569537.153179] Node 0 Normal: 81071*4kB (UMEH) 112101*8kB (UMEH) 10110= 8*16kB (UMEH) 34297*32kB (UMEH) 660*64kB (UME) 0*128kB 0*256kB 0*512kB 0*= 1024kB 0*2048kB 0*4096kB =3D 3978564kB > [569537.153185] Node 0 hugepages_total=3D0 hugepages_free=3D0 hugepages= _surp=3D0 hugepages_size=3D1048576kB > [569537.153186] Node 0 hugepages_total=3D64 hugepages_free=3D51 hugepag= es_surp=3D64 hugepages_size=3D2048kB > [569537.153186] 298118 total pagecache pages > [569537.153197] 14242 pages in swap cache > [569537.153198] Swap cache stats: add 165298, delete 151079, find 10932= 118/10944726 > [569537.153199] Free swap =3D 100129528kB > [569537.153199] Total swap =3D 100663288kB > [569537.153200] 16741283 pages RAM > [569537.153201] 0 pages HighMem/MovableOnly > [569537.153201] 311360 pages reserved > [569537.153202] 0 pages hwpoisoned >=20 > I used sysdig to figure out what 'cat' was trying to read when it faile= d, via: >=20 > # sysdig -p "%12user.name %6proc.pid %12proc.name %fd.name %evt.args" '= evt.type=3Dread and evt.failed=3Dtrue and proc.name=3Dcat' >=20 > The relevant one was this: >=20 > nobody 315341 cat /proc/sys/kernel/random/entropy_avail = res=3D-12(ENOMEM) data=3D >=20 > Which I believe is a munin script gathering a bunch of data for graphs = (munin-node runs as "nobody", uses shell scripts for its data gathering, = and one of the plugins tracks entropy_avail, so that would seem to be the= guilty party). >=20 > Any idea why it's trying to do an order:6 allocation for just a simple = read of entropy_avail? Based on the stack trace I'm guessing this isn't o= wned by the "random" driver maintainers, it seems to be a more general is= sue with memory allocations while reading from procfs. >=20