From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4EAFC83F25 for ; Mon, 21 Jul 2025 17:13:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2A1786B008C; Mon, 21 Jul 2025 13:13:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 251426B0092; Mon, 21 Jul 2025 13:13:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1407B6B0093; Mon, 21 Jul 2025 13:13:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id F3B866B008C for ; Mon, 21 Jul 2025 13:13:15 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 7FD9B10C966 for ; Mon, 21 Jul 2025 17:13:15 +0000 (UTC) X-FDA: 83688917550.22.B62EB44 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf10.hostedemail.com (Postfix) with ESMTP id 4EB42C0002 for ; Mon, 21 Jul 2025 17:13:13 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=H1GpzBcG ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1753117993; a=rsa-sha256; cv=none; b=GvxwKR4tTpNfRfyYFPSHTkbVoVH9Q+yADsFvrjpSNksOoBiJO9N5jFYzVh91svqD7QfBxZ Dh5ae4VwzgeBhrMnqcDlWgwlJseA/0Zo+JuwQoLdl7SY2WmXp7kalknYM+Q/6Yobrh7ooL 21mwzFLtWaBhX7wK9gO+79uvCuhl2NY= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=H1GpzBcG; dmarc=none; spf=none (imf10.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1753117993; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NC6C+PjqKI/NeJ+YCHvWQarwsxh01i6R66mhx16Pca0=; b=KLJgCgb9ELts38n7k6MpvtWTdRR9cwHi8Wup9E6zmK4Q72BiMLHxA+nPHDJYr7UffwiZuM 1+RJj5fpCxZ60m0MHvmz9UTMVwtv9Dpx8mQcTDORnSMVOGAhGCFz9SkN5bhvIeIP6rkGGc e30HDTgWiZVpEXMQxTGYhaeT+NNJzHs= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Transfer-Encoding: Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date: Sender:Reply-To:Content-ID:Content-Description; bh=NC6C+PjqKI/NeJ+YCHvWQarwsxh01i6R66mhx16Pca0=; b=H1GpzBcGf4l7laCmPw3QpswvEt yZo8vvWTeX8W8T67rreVeGZ4ZqI0Yl+6Vpe+q5OP9UfgHzlZKKrGTwh9QwbIlrgee5f+o8T8SH0dK li3yvE0tAiQaivIOQOrzyIIQO1SEbyuJEyBy8o1oFtt6ZRH38lSuRCI5MADyg3e91cVAh+5vs9vl0 UjS8oiD0Gcj/rAyGP5BWU0FVSZIpMdCncI2AHJjfhrSlhoqP8EeREO7jSS3ICMus6KAV58vfBSr8j Q+FdrRr7aM4KNYXA+FOlSzlUPnSdY0Yof2V6WGRRHeZHcpglPx6ESlpCnEnrV7HJGGAh3EnUUHYVg 9bGfQRkA==; Received: from willy by casper.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1udu4c-0000000G38k-1T6I; Mon, 21 Jul 2025 17:13:10 +0000 Date: Mon, 21 Jul 2025 18:13:10 +0100 From: Matthew Wilcox To: Muhammad Usama Anjum Cc: linux-kernel@vger.kernel.org, gregkh@linuxfoundation.org, Andrew Morton , kernel@collabora.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: Re: Excessive page cache occupies DMA32 memory Message-ID: References: <766ef20e-7569-46f3-aa3c-b576e4bab4c6@collabora.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <766ef20e-7569-46f3-aa3c-b576e4bab4c6@collabora.com> X-Rspamd-Queue-Id: 4EB42C0002 X-Stat-Signature: btwssqe37tykgxkctxh5wjoubchfh99y X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1753117993-714196 X-HE-Meta: U2FsdGVkX19w0m47vlmuvACFxYS5kJxZRKFk6Nb4GvFfAGBOtF1CW8uuB16eMkAj77BjXrzGuQ4+fi9eCVkCHQ5Bt8W4X/pLOC5xxgUxzrKE53I7HkgPGtP1a97YwdIxFq9kf44P93Iwt6NyZmFBI9WpgmuxPLrsOm+yQY9dj/lKpRy9N/oT9QOvnpGOS7rIs7LY1eaBaYnk6LkB9L09iGe+Wq9OSy0tnu0bOiVALHTlaWc7RVa9DOvwQuKpRYRmUx91TbcoQHYmlvRF/y3oKnVrAJ87w4J/TATIKQka3r8VY+X86AHzLyhM3p+n2GDZ38BxbUKDazC7wtcaWwN59HJpMAarjfT+8tjBz9/rnVgwmqoXa5Y7kEF5bYY+rDR1ii+rB3OJlN6anp0Wk4Vwk7LQ6wHC5AYcgWFfhwDZRHFmdCSdHmXobt7rFP2KTqGVryuaVmHknFhgagJ0YqwtDNfYKsA8MtXcbymybnRM7xuu3QPsOaDslo69m3ig96cj25EPs5vFaPvQV9x15wR4w8kdNCgDH2T01ADED+cUEAcFQtKwja00948B4uRZcGBLz8dYrsolbRryUrjxaQM+3/tJ5KPdMCt35mntUE3PgnI6O+iiq+604TUk25fHls324DOGy5/ttH08udF/D+n8CDmos/HGVB3fJ8wN36N9SFviBtwZpFu5czlMZ4gFO8iczHfQT3mzXc8HMJh/zgbXHMwOFFSG7UQVgMOiOvkG2dr9D+7FkFYbBZkIYV5jz6jYIU1UemIWvXssaZwQglUAC4cYYk5l+qI0veQzaCNDWghoviCPcGIDiF3oLsnHgCg0bNtlhmeFPQ5l+sWBfF7zqLt8BFmKPg2Ti0mJdWblFQHCAZ6lcKTE+wR3I+AcuSrFxor/8cUd8cD0UP1MaZ/BPizYkFE5JhAMM65skxD1aHP5uSzEyxlN8p01kygY6c8vGSiMJjIDqQkcr6CmWRF VJEwH6jf Y49Zs4mI46M5GhNe0MH01syKGQX+pe2nYmJJtXYZ6+Dc9AKYA+usTP7KQ9IrNYcHKNtutaG5wtQvN51ZELTHRTRdh5oLGu0CwtESXwEwwUrvHBb322HRbsDT4CPYBVyWA6933S9/HH3wvSDNgnIyEBj6kVbLZK4/dkhf1GINT/AiYlfynBVfJ5kFGX9mUFJfxQuV2bjYkMKy2o7WqLGepnszmJP8J+omdjaP9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jul 21, 2025 at 08:03:12PM +0500, Muhammad Usama Anjum wrote: > Hello, > > When 10-12GB our of total 16GB RAM is being used as page cache > (active_file + inactive_file) at suspend time, the drivers fail to allocate > dma memory at resume as dma memory is either occupied by the page cache or > fragmented. Example: > > kworker/u33:5: page allocation failure: order:7, mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0 Just to be clear, this is not a page cache problem. The driver is asking us to do a 512kB allocation without doing I/O! This is a ridiculous request that should be expected to fail. The solution, whatever it may be, is not related to the page cache. I reject your diagnosis. Almost all of the page cache is clean and could be dropped (as far as I can tell from the output below). Now, I'm not too familiar with how the page allocator chooses to fail this request. Maybe it should be trying harder to drop bits of the page cache. Maybe it should be doing some compaction. I am not inclined to go digging on your behalf, because frankly I'm offended by the suggestion that the page cache is at fault. Perhaps somebody else will help you, or you can dig into this yourself. > CPU: 1 UID: 0 PID: 7693 Comm: kworker/u33:5 Not tainted 6.11.11-valve17-1-neptune-611-g027868a0ac03 #1 3843143b92e9da0fa2d3d5f21f51beaed15c7d59 > Hardware name: Valve Galileo/Galileo, BIOS F7G0112 08/01/2024 > Workqueue: mhi_hiprio_wq mhi_pm_st_worker [mhi] > Call Trace: > > dump_stack_lvl+0x4e/0x70 > warn_alloc+0x164/0x190 > ? srso_return_thunk+0x5/0x5f > ? __alloc_pages_direct_compact+0xaf/0x360 > __alloc_pages_slowpath.constprop.0+0xc75/0xd70 > __alloc_pages_noprof+0x321/0x350 > __dma_direct_alloc_pages.isra.0+0x14a/0x290 > dma_direct_alloc+0x70/0x270 > mhi_fw_load_handler+0x126/0x340 [mhi a96cb91daba500cc77f86bad60c1f332dc3babdf] > mhi_pm_st_worker+0x5e8/0xac0 [mhi a96cb91daba500cc77f86bad60c1f332dc3babdf] > ? srso_return_thunk+0x5/0x5f > process_one_work+0x17e/0x330 > worker_thread+0x2ce/0x3f0 > ? __pfx_worker_thread+0x10/0x10 > kthread+0xd2/0x100 > ? __pfx_kthread+0x10/0x10 > ret_from_fork+0x34/0x50 > ? __pfx_kthread+0x10/0x10 > ret_from_fork_asm+0x1a/0x30 > > Mem-Info: > active_anon:513809 inactive_anon:152 isolated_anon:0 > active_file:359315 inactive_file:2487001 isolated_file:0 > unevictable:637 dirty:19 writeback:0 > slab_reclaimable:160391 slab_unreclaimable:39729 > mapped:175836 shmem:51039 pagetables:4415 > sec_pagetables:0 bounce:0 > kernel_misc_reclaimable:0 > free:125666 free_pcp:0 free_cma:0 > Node 0 active_anon:2055236kB inactive_anon:608kB active_file:1437260kB inactive_file:9948004kB unevictable:2548kB isolated(anon):0kB isolated(file):0kB mapped:703344kB dirty:76kB writeback:0kB shmem:204156kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:495616kB writeback_tmp:0kB kernel_stack:9440kB pagetables:17660kB sec_pagetables:0kB all_unreclaimable? no > Node 0 DMA free:68kB boost:0kB min:68kB low:84kB high:100kB reserved_highatomic:0KB active_anon:8kB inactive_anon:0kB active_file:0kB inactive_file:13232kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > lowmem_reserve[]: 0 1808 14772 0 0 > Node 0 DMA32 free:9796kB boost:0kB min:8264kB low:10328kB high:12392kB reserved_highatomic:0KB active_anon:14148kB inactive_anon:88kB active_file:128kB inactive_file:1757192kB unevictable:0kB writepending:0kB present:1935736kB managed:1867440kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > lowmem_reserve[]: 0 0 12964 0 0 > Node 0 DMA: 5*4kB (U) 0*8kB 1*16kB (U) 1*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 68kB > Node 0 DMA32: 103*4kB (UME) 52*8kB (UME) 43*16kB (UME) 58*32kB (UME) 35*64kB (UME) 23*128kB (UME) 5*256kB (ME) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 9836kB > Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB > Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB > 2897795 total pagecache pages > 0 pages in swap cache > Free swap = 8630724kB > Total swap = 8630776kB > 3892604 pages RAM > 0 pages HighMem/MovableOnly > 101363 pages reserved > 0 pages cma reserved > 0 pages hwpoisoned > > As you can see above, the ~11 GB of page cache has consumed DMA32 pages, > leaving only 9.8MB free but heavily fragmented with no contiguous blocks > ≥512KB. Its hard to reproduce by a test. We have received several reports > for v6.11 kernel. As we don't have reliable reproducer yet, we cannot test > if other kernels are also affected. > > Current mitigations are: > 1 Pre-allocate buffer in drivers and don't free them even if they are only > used during during initialization at boot and resume. But it wastes memory > and unacceptable even if its just 2-4MB. > 2 Drop caches at suspend. But it causes latency during suspension and > slowness on resume. There is no way to drop only couple of GB of page > cache as that wouldn't take long at suspend time. > > Greg dislikes 1 and rejects it which is understandable. [1]: > > It should be reclaiming this, as it's just cache, not really used > > memory. > > Would it be reasonable to add a mechanism to limit page cache growth? > I think, there should be some watermark or similar by which we can > indicate to page cache to don't go above it. Or at suspend, drop only > a part of of the page cache and not the entire page cache. What other > options are available? > > [1] https://lore.kernel.org/all/2025071722-panther-legwarmer-d2be@gregkh > > Thanks, > Muhammad Usama Anjum