From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9BAB7C4167B for ; Fri, 1 Dec 2023 01:18:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DE4A88D005F; Thu, 30 Nov 2023 20:18:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DC2C68D0001; Thu, 30 Nov 2023 20:18:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C828F8D005F; Thu, 30 Nov 2023 20:18:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B78158D0001 for ; Thu, 30 Nov 2023 20:18:50 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 8D0791401AE for ; Fri, 1 Dec 2023 01:18:50 +0000 (UTC) X-FDA: 81516490020.02.66D8BB8 Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) by imf28.hostedemail.com (Postfix) with ESMTP id A38E6C0008 for ; Fri, 1 Dec 2023 01:18:48 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=ihTH5+qE; spf=pass (imf28.hostedemail.com: domain of david@fromorbit.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701393528; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=g/48m46wPgyvqnD4XpgC0rhUYtIZ3FOXv9eiG/yAP/4=; b=hPhonvO6gA8cly0EjMzlkIPk4fMLvJlDi31u5iaW/wh1KBeVrfk0q1pyzIlSeh6hNv8cwA 5wLVGoKFrgMMyvlUWLgLIqdcJsUByEcPrezvTCp9QMWq65NpOqTTHorBhqi6rn2RvIy4uX fdEmE86o/q+wR+C1IMfW46/wnqsb+Kw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701393528; a=rsa-sha256; cv=none; b=huU2yJbxB1Lqf7cy4FkQO+gKhRiE1RMP0hlwMXoG0juE6/Y6RZS4mprFutaSi6O3pmvSK4 1Zef4KAflJfwYxjcSOxhKROMW4ruMJCqfxysSrR59UyEwJ5ZOSUKxZBr1kGdRSyFA9C1yv ollZ44LblUrAlAvEJphFWqsgVj01Dww= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=ihTH5+qE; spf=pass (imf28.hostedemail.com: domain of david@fromorbit.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com Received: by mail-pf1-f170.google.com with SMTP id d2e1a72fcca58-6cdd13c586fso1506592b3a.0 for ; Thu, 30 Nov 2023 17:18:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1701393527; x=1701998327; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=g/48m46wPgyvqnD4XpgC0rhUYtIZ3FOXv9eiG/yAP/4=; b=ihTH5+qEof4lD8QIHp5e2nPBFnR0x2kadwO1jyGYPF9lKyZbvtqene7mWS+IFv0jM2 yekOZ1FCU+2JF3AIEnQl/GYmJnQZMeayf5Tt1+qqaLZQEZ2zWA0bkQ1/R9Pf9jNGBLve hkA6pV1v63SuauyzaNeSJpe5kbz3Xa0jLCsatYet40kVOWgUN86PZAbKqwi9brQvfOVA zO9AXFJ8YKxZCRZfZse/u+9d11pWDa1psb3y+yMP/xAB5mrPMlis7CcqmWcb37TqSjG5 ZwBETjNP5MWuBYVP1megfdz3dMdtoeruunKmvFMyTU0XrDp24M2c4lvZte1+ZPRzCF6R AoLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701393527; x=1701998327; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=g/48m46wPgyvqnD4XpgC0rhUYtIZ3FOXv9eiG/yAP/4=; b=fm7NgPDXe3ez2/6Ct7bR7D+pNY2Vsww934qE0akFWJmMfFwvmeMn9Odo4MyrPIClQX Q5Cw13JDZ9d1Sc6cIecTKjCSbdlj63LnGgaM8QhrJX2+ou8n4I4ZcgL2DtlGootrgtGd At0XMTmWFqm2YxzNn/mNU5oFHVk+32HmnaNRWMF9nIk+4Ki/nFJmVUL1mFk7hCDBfKMw iRKO+OGeOMSwwYLhHqTtuZ8ZfulA6oo3btfl8AiPCtlqyQJ1ggjRxPXcYn4V2cWuMzB0 H0rG+utiTtD4B+ldLtbrCZh+pFpeNZUpIEKb68Pw2nLyvVN5jJQKpRCd+Y9Zv7S6qw+G lkEg== X-Gm-Message-State: AOJu0Yx/p3/PEoR9O6u2WiIZydxtHJvmcVKGb5MivuDtP25xvDan1a+2 Gieo7j3q2ldVq+qwpI0VNb8MzA== X-Google-Smtp-Source: AGHT+IFbXpDmkQrMsVjyH2MbAdXUl/hHHiinDnOdNUwvXQkqgibcLnsfNxUrQES1AUWkmrI2vqAZ7g== X-Received: by 2002:a05:6a20:daaa:b0:187:c662:9b7e with SMTP id iy42-20020a056a20daaa00b00187c6629b7emr24458515pzb.25.1701393527121; Thu, 30 Nov 2023 17:18:47 -0800 (PST) Received: from dread.disaster.area (pa49-180-125-5.pa.nsw.optusnet.com.au. [49.180.125.5]) by smtp.gmail.com with ESMTPSA id hq23-20020a056a00681700b006cddd9d0174sm1843768pfb.108.2023.11.30.17.18.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Nov 2023 17:18:46 -0800 (PST) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1r8sB2-002C0s-0T; Fri, 01 Dec 2023 12:18:44 +1100 Date: Fri, 1 Dec 2023 12:18:44 +1100 From: Dave Chinner To: Roman Gushchin Cc: Kent Overstreet , Qi Zheng , Michal Hocko , Muchun Song , Linux-MM , linux-kernel@vger.kernel.org, Andrew Morton Subject: Re: [PATCH 2/7] mm: shrinker: Add a .to_text() method for shrinkers Message-ID: References: <20231125003009.tbaxuquny43uwei3@moria.home.lan> <76A1EE85-B62C-49B3-889C-80F9A2A88040@linux.dev> <20231128035345.5c7yc7jnautjpfoc@moria.home.lan> <20231129231147.7msiocerq7phxnyu@moria.home.lan> <04f63966-af72-43ef-a65c-ff927064a3e4@bytedance.com> <20231130032149.ynap4ai47dj62fy3@moria.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Stat-Signature: h769b96firuzk3byp4qkzun4t7u73h44 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: A38E6C0008 X-Rspam-User: X-HE-Tag: 1701393528-631024 X-HE-Meta: U2FsdGVkX1/PNdnuZXW1i8jYbHWMH2xNllz8UuG/ciTgSDRNWfYywbhSBz+H5iZJ6WylnZNXD9Dy9HArlf3c11C7/poyDSV6r5RaUYwpVwFPK1ARmDPqNRDXogQOBr8vz8K8yET1xVSy4wXEMBaLxmoGQ3dzst37eHe+YBeZPEOlkZV8wSa4bVCEmP38wzsYy6oitdEsZdbVdZx21T8UXCcJ5rubPpI5F+visRaVEPL9/hyk5VSaI72pNtHnpYobby/ZqKMy03RFceHmkFBYgxO+QdPxwLu8TqmPky+WEYYVeXvQ6mC5Hl80hgXDcAB3B3G4NtK33E2z1MlkbRW6y1XLY0NVeSZo3ZodxOdBN0/TJ5Y65Bdo0BUWoNWfIR78v8uxt7qm04dN5cSHNtSZdYRLS0X/e1AHLbHPoVviTsaoOSNRaVzqUgcxezVvDBgAcwDdvJyWe9VJ41KFHgQXWcFD2ERiPS4ci3hZUky265BjfntLXmhHEaq1JB8X+V8cyqyatzex9B9iCbBvQQBUlKUEYjCFE0fZUdpDRKKxJ+9ViFNbSmKndIhy4g+7SbJsPYThmT3Bfn50JzYufYgLagtwpxa+7mzAOAchPpO6GZs1ZAd6i+F6p8gYSsNyJLh/2V5Kcqk1b0bu+WHraMDSM8ZOR8XDCDe8WeYPbppfBp0frD0iOD6o/hbFH1qdwLLZ987Yap5OBTWFg7ANeu5/o6cZ5GMinCMpbOnavLFvFz6hj/vOedXwJjPu2TFFrVYUzVDvP240QkxeIGUWUKr0e5vjB1PwbUqrqVfBhlSZgcsh11hWBsYxdmgYC+edryGHB1OrYV75FywUQWAXnszy4wyiG9ma8ntjZ5Ul+hsJlLp/Ni0Mq1c/u6uShBc+8aTSJvwOitFun9twiXBp4dzD+hzBNGuCZmpnRK/MYFEH8dx8Sul1ghgI6YkAggCuPK+XswHLTG2pld8zw1cjyyT ZDb5FI1C lK2cT2a3pw/7CtMvxgTAfRUGmBcjSqJOw4zCzI0XW9STGaVtDq08h2qpStx6WPYizThPThstLXFcFy790ha0VfCErEmBpgNoDr1XhUh70mOLJBgI5fz4Nn/GPO/PX5blBtCkOxldUjjnxX4T94XoAUm6SrdYpXbjCBo+YGvi8T0up4vSEz72t1VeA864ydlBwy/u03q2cAN+y5RxGrFusgnlBYBB3JSyEyqthPh3PvtqBdn2Ax6yrtElfnydd2pMtqPRUjPjfkYptc73sqPdl2XfpExy7bzmn+ZdCyHfAwbq+obRg1WIBrVrGIZ1y4EWjHb4SS8nf1z8hd5WVYXAAq1fDRu1TUf6iKtvbGLvZrn7dtZBJc7kgeH+DRckwmE+dBiqYaydORWznwxq+Ang0KdgHsmYI+qK0godloSTx+SUjRlB6fe+2DjHH/86LDRxXAzfpQAro3UqZ6is52ufldwhaj20oErc+a1SHBOy9p4g3sKoTXaO+U51SYtulXEX/YMn1OrW+xe8P8SEH0TfKMMIcCNZP702zzyXJ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Nov 30, 2023 at 11:01:23AM -0800, Roman Gushchin wrote: > On Wed, Nov 29, 2023 at 10:21:49PM -0500, Kent Overstreet wrote: > > On Thu, Nov 30, 2023 at 11:09:42AM +0800, Qi Zheng wrote: > > > For non-bcachefs developers, who knows what those statistics mean? For non-mm developers, who knows what those internal mm state statistics mean? IOWs, a non-mm developer goes and asks a mm developer to help them decipher the output to determine what to do next. So why can't a mm developer go an ask a subsystem developer to tell them what the shrinker oom-kill output means? Such a question is a demonstration of an unconscious bias that prioritises internal mm stuff as far more important than what anyone else outside core-mm might ever need... > > > You can use BPF or drgn to traverse in advance to get the address of the > > > bcachefs shrinker structure, and then during OOM, find the bcachefs > > > private structure through the shrinker->private_data member, and then > > > dump the bcachefs private data. Is there any problem with this? > > > > No, BPF is not an excuse for improving our OOM/allocation failure > > reports. BPF/tracing are secondary tools; whenever we're logging > > information about a problem we should strive to log enough information > > to debug the issue. > > Ok, a simple question then: > why can't you dump /proc/slabinfo after the OOM? Taken to it's logical conclusion, we arrive at: OOM-kill doesn't need to output anything at all except for what it killed because we can dump /proc/{mem,zone,vmalloc,buddy,slab}info after the OOM.... As it is, even asking such a question shows that you haven't looked at the OOM kill output for a long time - it already reports the slab cache usage information for caches that are reclaimable. That is, if too much accounted slab cache based memory consumption is detected at OOM-kill, it will calldump_unreclaimable_slab() to dump all the SLAB_RECLAIM_ACCOUNT caches (i.e. those with shrinkers) to the console as part of the OOM-kill output. The problem Kent is trying to address is that this output *isn't sufficient to debug shrinker based memory reclaim issues*. It hasn't been for a long time, and so we've all got our own special debug patches and methods for checking that shrinkers are doing what they are supposed to. Kent is trying to formalise one of the more useful general methods for exposing that internal information when OOM occurs... Indeed, I can think of several uses for a shrinker->to_text() output that we simply cannot do right now. Any shrinker that does garbage collection on something that is not a pure slab cache (e.g. xfs buffer cache, xfs inode gc subsystem, graphics memory allocators, binder, etc) has no visibility of the actuall memory being used by the subsystem in the OOM-kill output. This information isn't in /proc/slabinfo, it's not accounted by a SLAB_RECLAIM_ACCOUNT cache, and it's not accounted by anything in the core mm statistics. e.g. How does anyone other than a XFS expert know that the 500k of active xfs_buf handles in the slab cache actually pins 15GB of cached metadata allocated directly from the page allocator, not just the 150MB of slab cache the handles take up? Another example is that an inode can pin lots of heap memory (e.g. for in-memory extent lists) and that may not be freeable until the inode is reclaimed. So while the slab cache might not be excesively large, we might have an a million inodes with a billion cumulative extents cached in memory and it is the heap memory consumed by the cached extents that is consuming the 30GB of "missing" kernel memory that is causing OOM-kills to occur. How is a user or developer supposed to know when one of these situations has occurred given the current lack of memory usage introspection into subsystems? These are the sorts of situations that shrinker->to_text() would allow us to enumerate when it is necessary (i.e. at OOM-kill). At any other time, it just doesn't matter, but when we're at OOM having a mechanism to report somewhat accurate subsystem memory consumption would be very useful indeed. > Unlike anon memory, slab memory (fs caches in particular) should not be heavily > affected by killing some userspace task. Whether tasks get killed or not is completely irrelevant. The issue is that not all memory that is reclaimed by shrinkers is either pure slab cache memory or directly accounted as reclaimable to the mm subsystem.... -Dave. -- Dave Chinner david@fromorbit.com