From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3F42C3DA7F for ; Fri, 2 Aug 2024 19:02:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 759256B009F; Fri, 2 Aug 2024 15:02:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 709666B00A0; Fri, 2 Aug 2024 15:02:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5D31B6B00A1; Fri, 2 Aug 2024 15:02:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 3EF4B6B009F for ; Fri, 2 Aug 2024 15:02:30 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id C82F51A1347 for ; Fri, 2 Aug 2024 19:02:29 +0000 (UTC) X-FDA: 82408226418.26.131E651 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.14]) by imf05.hostedemail.com (Postfix) with ESMTP id 56B8810000B for ; Fri, 2 Aug 2024 19:02:27 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=ggp7aY8y; spf=pass (imf05.hostedemail.com: domain of alison.schofield@intel.com designates 198.175.65.14 as permitted sender) smtp.mailfrom=alison.schofield@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722625301; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fj+oEJBNehP4b1V7JHFQfT6mAmsbrfq+fDxU/1URjuo=; b=XM8BNoSrzbEeSfx4GFZSfxp4e1ZobfzOx0ERgfdzKe0hawkb4iY0nXc2tCzqohZKE7QZUm O887V1R1Ref+2IIJgVYleEtaKNcUlc+54WHcmih8XVmsvXM0NtqTIPr0I7SG7rew5JUBiy w4E5hc8GVFdLGh8gG20MKvknyQhUXZA= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=ggp7aY8y; spf=pass (imf05.hostedemail.com: domain of alison.schofield@intel.com designates 198.175.65.14 as permitted sender) smtp.mailfrom=alison.schofield@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722625301; a=rsa-sha256; cv=none; b=kz52DuGq6LlUTPW+pcrbjz7o0CkJ2dZXdWIvcUhCNir7s6+PkMmSPN1WvlFe7ZMG1zSzFw cxZz/giu2Cso2bCzfjnN3mBK4GUprB4K1fxo6+tQWHrgrMiFAGqtp4ZTMk7/k40TAu6k6d SWSTNYfnPHTedMG3jFkZMMH8QRaCdoE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1722625347; x=1754161347; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=DtXmQRpqkfWh5hqOlB9LaOmznGqd6TYDkyB+QBPI/MM=; b=ggp7aY8yxQP5OhwyNFeEKhpjXfw6ciT2M9sDJojBMxTF6ZqhFEOq+qwv bqp7C4qeCmmsHPghvuGVwRAYklDTkxSVdqiYXEwVFbjbLiSQVAvwtkZwE JEs3UBjIY+Dga4a03JKqkuIOut/5EaoTPWbjd33I+tfeDXAa/vceR+0V0 4aKxhN6PMUMvXQOzPjZqoWMspz7zPwUMxCeiPrIFSBcpBi9VfpkmcK8Ke qWHLyPqR63eAATcltZjV9N4nmGO9Ir32mc447PBeoFPufD0egUBCfqzo9 6KNACxM4gpxFkpndJG3erF2Eqza2KEnyyUeKF8Od53NxJjhNnRKAcZtPh w==; X-CSE-ConnectionGUID: dB8Mm87ZTTOKR58O24qBrw== X-CSE-MsgGUID: YAPliRGNRjOKU3FsERxLGg== X-IronPort-AV: E=McAfee;i="6700,10204,11152"; a="24448005" X-IronPort-AV: E=Sophos;i="6.09,258,1716274800"; d="scan'208";a="24448005" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by orvoesa106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Aug 2024 12:02:25 -0700 X-CSE-ConnectionGUID: yUXC+5NMS9OPAWVlGKXPWg== X-CSE-MsgGUID: fM9R7SXISw2CB+bmsmzBFg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,258,1716274800"; d="scan'208";a="59844572" Received: from aschofie-mobl2.amr.corp.intel.com (HELO aschofie-mobl2) ([10.209.91.178]) by fmviesa005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Aug 2024 12:02:23 -0700 Date: Fri, 2 Aug 2024 12:02:21 -0700 From: Alison Schofield To: Sourav Panda Cc: corbet@lwn.net, gregkh@linuxfoundation.org, rafael@kernel.org, akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, rppt@kernel.org, david@redhat.com, rdunlap@infradead.org, chenlinxuan@uniontech.com, yang.yang29@zte.com.cn, tomas.mudrunka@gmail.com, bhelgaas@google.com, ivan@cloudflare.com, pasha.tatashin@soleen.com, yosryahmed@google.com, hannes@cmpxchg.org, shakeelb@google.com, kirill.shutemov@linux.intel.com, wangkefeng.wang@huawei.com, adobriyan@gmail.com, vbabka@suse.cz, Liam.Howlett@oracle.com, surenb@google.com, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, willy@infradead.org, weixugc@google.com, David Rientjes , nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org, yi.zhang@redhat.com Subject: Re: [PATCH v13] mm: report per-page metadata information Message-ID: References: <20240605222751.1406125-1-souravpanda@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240605222751.1406125-1-souravpanda@google.com> X-Stat-Signature: xa3n46we73ru6ekoqieza49pxee6j9ui X-Rspam-User: X-Rspamd-Queue-Id: 56B8810000B X-Rspamd-Server: rspam02 X-HE-Tag: 1722625347-66728 X-HE-Meta: U2FsdGVkX1/UeJKCDvGLTqtFyRt2o5FNTAOY8GUkpSpYjdxkja+eMSkmxRK9HM4IyO8P888JWHfpUpJQDFe5EqZ/YYX1onJjBjaiPxTff81rc2CBVL6z0op09vQOlHE1t1Px7UTwz5cD6IkkhxnSr3+aFf9HMZl3qNFrIeD7quica2Dx1aBsdT+cK4ch+YsJg3oYmgYtNfk9+TX+UueZSRpTNVIJ9/ZpNDm4J5iiHPp7/mWp7JuL4NowndSpZHrJgOj9SFEDTgUrQw/0+pQMKbDPCFyCP2FOf4ESwswcAcbogTz0wS9tglVGq2ubahb4PVT9HYomrTrU7ib76mJh2bGSDRAZH+3cbBCIrQgZCFDUDOtVnXzzyuzV/V5w8yanl7tDDYf39vuFo81yiGxpxQ07toFuTYt7IBy0j4GLOZVgU0hc31OvVAYFt8eJtfovzA3ZyyfedYOt9hMd71ot7xT6M7mQu9YM6Tgj6nZhEExgLv1+cZBvoSnEpMVNddP8lkp4mLnt4pjwCUgPaVK8DkeEeY2Ouev0/TLvmRTqrHdfRSk5XtN0iZaDKJTJBzakUruT9osXK0FdH+Kn0gEV4xUoGMLqzTAmJjB5x2PASDsckrNPuIJPSJJQZSh5eLLzo4CpIT7ppjZGwgnwDtDSgve2JkwjR36zMQVm3KyhyjTr+eM7BH3Bn5t+RiOD94SZuDBfM85xWLlkFFuuGIFz6q9FozhVpEUqKcVknEwyTHhzhbODckeTuJed8ePRcNRI8FieENdqjbOCuPRZ0nQnsOsJ9YMhwb2hbsz999KM0YZR0i2UG6yHAsj/TPn/4TdBdXGO8UlB9tdYMmy+5h1AX2TEWNxdP1o19B5Wco5wTuqPDTJ+ffDiuaDoWaAK/kcX2k5F78PxTTxXaUzMGDWE0D0/8UcRP1A45HlC4O4jd8VzhePSEw8ujKphsJe/MkPzz7zAfJ8uegReOJs746k LFN+/jD+ tWE4IgDvJPr9aZpdua0dS8CqmRSa0ZmgZ9T/voLBtwa62MhAzObdVTjjgiP6l7MjuFnsGahC2HitgA6clJjhUgS+sdtGTAqplfIXDoe0az/l+jpDwpyHyHocXrdsJef0h/t4VJHwHSOYSF198Cu5uhZoLYTpBbAMLl8H2yVgSGB5Ll1eyYOGp3+5RrQ29GcuM3+AUOuth7NbSkX7iXb/Jf5FPimKxQGqvScnFDSDjd8TLrzi2MhBa+9EJLERVtTf92GNzobMkgjFhaOlj65DcC0AXiyhcT8VHIfEHWI6pH0ncq18FU2nUqENy+cQTfun7J1qBkvrDiWb01C/WQHLs6hVAaNxzmu5s5U4ycTryLExacCEI28R7xNvMOiRwWxlcseSNm7OQJGBhOBsf87Nfl/AxkA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: ++ nvdimm, linux-cxl, Yu Zhang On Wed, Jun 05, 2024 at 10:27:51PM +0000, Sourav Panda wrote: > Today, we do not have any observability of per-page metadata > and how much it takes away from the machine capacity. Thus, > we want to describe the amount of memory that is going towards > per-page metadata, which can vary depending on build > configuration, machine architecture, and system use. > > This patch adds 2 fields to /proc/vmstat that can used as shown > below: > > Accounting per-page metadata allocated by boot-allocator: > /proc/vmstat:nr_memmap_boot * PAGE_SIZE > > Accounting per-page metadata allocated by buddy-allocator: > /proc/vmstat:nr_memmap * PAGE_SIZE > > Accounting total Perpage metadata allocated on the machine: > (/proc/vmstat:nr_memmap_boot + > /proc/vmstat:nr_memmap) * PAGE_SIZE > > Utility for userspace: > > Observability: Describe the amount of memory overhead that is > going to per-page metadata on the system at any given time since > this overhead is not currently observable. > > Debugging: Tracking the changes or absolute value in struct pages > can help detect anomalies as they can be correlated with other > metrics in the machine (e.g., memtotal, number of huge pages, > etc). > > page_ext overheads: Some kernel features such as page_owner > page_table_check that use page_ext can be optionally enabled via > kernel parameters. Having the total per-page metadata information > helps users precisely measure impact. Furthermore, page-metadata > metrics will reflect the amount of struct pages reliquished > (or overhead reduced) when hugetlbfs pages are reserved which > will vary depending on whether hugetlb vmemmap optimization is > enabled or not. > > For background and results see: > lore.kernel.org/all/20240220214558.3377482-1-souravpanda@google.com > > Acked-by: David Rientjes > Signed-off-by: Sourav Panda > Reviewed-by: Pasha Tatashin This patch is leading to Oops in 6.11-rc1 when CONFIG_MEMORY_HOTPLUG is enabled. Folks hitting it have had success with reverting this patch. Disabling CONFIG_MEMORY_HOTPLUG is not a long term solution. Reported here: https://lore.kernel.org/linux-cxl/CAHj4cs9Ax1=CoJkgBGP_+sNu6-6=6v=_L-ZBZY0bVLD3wUWZQg@mail.gmail.com/ A bit of detail below, follow above link for more: dmesg: [ 1408.632268] Oops: general protection fault, probably for non-canonical address 0xdffffc0000005650: 0000 [#1] PREEMPT SMP KASAN PTI [ 1408.644006] KASAN: probably user-memory-access in range [0x000000000002b280-0x000000000002b287] [ 1408.652699] CPU: 26 UID: 0 PID: 1868 Comm: ndctl Not tainted 6.11.0-rc1 #1 [ 1408.659571] Hardware name: Dell Inc. PowerEdge R640/08HT8T, BIOS 2.20.1 09/13/2023 [ 1408.667136] RIP: 0010:mod_node_page_state+0x2a/0x110 [ 1408.672112] Code: 0f 1f 44 00 00 48 b8 00 00 00 00 00 fc ff df 41 54 55 48 89 fd 48 81 c7 80 b2 02 00 53 48 89 f9 89 d3 48 c1 e9 03 48 83 ec 10 <80> 3c 01 00 0f 85 b8 00 00 00 48 8b bd 80 b2 02 00 41 89 f0 83 ee [ 1408.690856] RSP: 0018:ffffc900246d7388 EFLAGS: 00010286 [ 1408.696088] RAX: dffffc0000000000 RBX: 00000000fffffe00 RCX: 0000000000005650 [ 1408.703222] RDX: fffffffffffffe00 RSI: 000000000000002f RDI: 000000000002b280 [ 1408.710353] RBP: 0000000000000000 R08: ffff88a06ffcb1c8 R09: 1ffffffff218c681 [ 1408.717486] R10: ffffffff93d922bf R11: ffff88855e790f10 R12: 00000000000003ff [ 1408.724619] R13: 1ffff920048dae7b R14: ffffea0081e00000 R15: ffffffff90c63408 [ 1408.731750] FS: 00007f753c219200(0000) GS:ffff889bf2a00000(0000) knlGS:0000000000000000 [ 1408.739834] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1408.745581] CR2: 0000559f5902a5a8 CR3: 00000001292f0006 CR4: 00000000007706f0 [ 1408.752713] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1408.759843] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1408.766976] PKRU: 55555554 [ 1408.769690] Call Trace: [ 1408.772143] [ 1408.774248] ? die_addr+0x3d/0xa0 [ 1408.777577] ? exc_general_protection+0x150/0x230 [ 1408.782297] ? asm_exc_general_protection+0x22/0x30 [ 1408.787182] ? mod_node_page_state+0x2a/0x110 [ 1408.791548] section_deactivate+0x519/0x780 [ 1408.795740] ? __pfx_section_deactivate+0x10/0x10 [ 1408.800449] __remove_pages+0x6c/0xa0 [ 1408.804119] arch_remove_memory+0x1a/0x70 [ 1408.808141] pageunmap_range+0x2ad/0x5e0 [ 1408.812067] memunmap_pages+0x320/0x5a0 [ 1408.815909] release_nodes+0xd6/0x170 [ 1408.819581] ? lockdep_hardirqs_on+0x78/0x100 [ 1408.823941] devres_release_all+0x106/0x170 [ 1408.828126] ? __pfx_devres_release_all+0x10/0x10 [ 1408.832834] device_unbind_cleanup+0x16/0x1a0 [ 1408.837198] device_release_driver_internal+0x3d5/0x530 [ 1408.842423] ? klist_put+0xf7/0x170 [ 1408.845916] bus_remove_device+0x1ed/0x3f0 [ 1408.850017] device_del+0x33b/0x8c0 [ 1408.853518] ? __pfx_device_del+0x10/0x10 [ 1408.857532] unregister_dev_dax+0x112/0x210 [ 1408.861722] release_nodes+0xd6/0x170 [ 1408.865387] ? lockdep_hardirqs_on+0x78/0x100 [ 1408.869749] devres_release_all+0x106/0x170 [ 1408.873933] ? __pfx_devres_release_all+0x10/0x10 [ 1408.878643] device_unbind_cleanup+0x16/0x1a0 [ 1408.883007] device_release_driver_internal+0x3d5/0x530 [ 1408.888235] ? __pfx_sysfs_kf_write+0x10/0x10 [ 1408.892598] unbind_store+0xdc/0xf0 [ 1408.896093] kernfs_fop_write_iter+0x358/0x530 [ 1408.900539] vfs_write+0x9b2/0xf60 [ 1408.903954] ? __pfx_vfs_write+0x10/0x10 [ 1408.907891] ? __fget_light+0x53/0x1e0 [ 1408.911646] ? __x64_sys_openat+0x11f/0x1e0 [ 1408.915835] ksys_write+0xf1/0x1d0 [ 1408.919249] ? __pfx_ksys_write+0x10/0x10 [ 1408.923264] do_syscall_64+0x8c/0x180 [ 1408.926934] ? __debug_check_no_obj_freed+0x253/0x520 [ 1408.931997] ? __pfx___debug_check_no_obj_freed+0x10/0x10 [ 1408.937405] ? kasan_quarantine_put+0x109/0x220 [ 1408.941944] ? lockdep_hardirqs_on+0x78/0x100 [ 1408.946304] ? kmem_cache_free+0x1a6/0x4c0 [ 1408.950408] ? do_sys_openat2+0x10a/0x160 [ 1408.954424] ? do_sys_openat2+0x10a/0x160 [ 1408.958434] ? __pfx_do_sys_openat2+0x10/0x10 [ 1408.962794] ? lockdep_hardirqs_on+0x78/0x100 [ 1408.967153] ? __pfx___debug_check_no_obj_freed+0x10/0x10 [ 1408.972554] ? __x64_sys_openat+0x11f/0x1e0 [ 1408.976737] ? __pfx___x64_sys_openat+0x10/0x10 [ 1408.981269] ? rcu_is_watching+0x11/0xb0 [ 1408.985204] ? lockdep_hardirqs_on_prepare+0x179/0x400 [ 1408.990351] ? do_syscall_64+0x98/0x180 [ 1408.994191] ? lockdep_hardirqs_on+0x78/0x100 [ 1408.998549] ? do_syscall_64+0x98/0x180 [ 1409.002386] ? do_syscall_64+0x98/0x180 [ 1409.006227] ? lockdep_hardirqs_on+0x78/0x100 [ 1409.010585] ? do_syscall_64+0x98/0x180 [ 1409.014425] ? lockdep_hardirqs_on_prepare+0x179/0x400 [ 1409.019565] ? do_syscall_64+0x98/0x180 [ 1409.023401] ? lockdep_hardirqs_on+0x78/0x100 [ 1409.027763] ? do_syscall_64+0x98/0x180 [ 1409.031600] ? do_syscall_64+0x98/0x180 [ 1409.035439] ? do_syscall_64+0x98/0x180 [ 1409.039281] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 1409.044331] RIP: 0033:0x7f753c0fda57 [ 1409.047911] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 1409.066655] RSP: 002b:00007ffc19323e28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 1409.074220] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007f753c0fda57 [ 1409.081352] RDX: 0000000000000007 RSI: 0000559f5901f740 RDI: 0000000000000003 [ 1409.088483] RBP: 0000000000000003 R08: 0000000000000000 R09: 00007ffc19323d20 [ 1409.095616] R10: 0000000000000000 R11: 0000000000000246 R12: 0000559f5901f740 [ 1409.102748] R13: 00007ffc19323e90 R14: 00007f753c219120 R15: 0000559f5901fc30 [ 1409.109887] [ 1409.112082] Modules linked in: kmem device_dax rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace netfs rfkill sunrpc dm_multipath intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common skx_edac skx_edac_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm mgag200 rapl cdc_ether iTCO_wdt dell_pc i2c_algo_bit iTCO_vendor_support ipmi_ssif usbnet acpi_power_meter drm_shmem_helper mei_me dell_smbios platform_profile intel_cstate dcdbas wmi_bmof dell_wmi_descriptor intel_uncore pcspkr mii drm_kms_helper i2c_i801 mei i2c_smbus intel_pch_thermal lpc_ich ipmi_si acpi_ipmi dax_pmem ipmi_devintf ipmi_msghandler drm fuse xfs libcrc32c sd_mod sg nd_pmem nd_btt crct10dif_pclmul crc32_pclmul crc32c_intel ahci ghash_clmulni_intel libahci bnxt_en megaraid_sas tg3 libata wmi nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod [ 1409.189120] ---[ end trace 0000000000000000 ]--- -- snip >