From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7EDB1C0219B for ; Fri, 7 Feb 2025 10:15:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F3433280003; Fri, 7 Feb 2025 05:14:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EE4CE280001; Fri, 7 Feb 2025 05:14:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DABC9280003; Fri, 7 Feb 2025 05:14:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id BB397280001 for ; Fri, 7 Feb 2025 05:14:59 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 6DF51817D3 for ; Fri, 7 Feb 2025 10:14:59 +0000 (UTC) X-FDA: 83092740318.09.02A8DCA Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf10.hostedemail.com (Postfix) with ESMTP id A2B17C000D for ; Fri, 7 Feb 2025 10:14:56 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=none; spf=pass (imf10.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738923297; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yWumAjsyWagh2GF3B0q2bosIccW1FFBhKhds49I26w8=; b=cvcbQ26t7HywTmvJeu7tx1iMutcDmUUK0MCom8ebABBqbDpGyIon1GDM7YBnX7kfa7S+cu PA2GFluso2zk09BjiKpgYs9FDDLKD1NcM2pHoEatq7oZZL5aJf8DpTVBSHhb2MjsOgWXwB iEMsdXExfvLe3MLZ3eL4dNEZwn23qeg= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=none; spf=pass (imf10.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738923297; a=rsa-sha256; cv=none; b=s3g2213HEwNEeykzQTWPErlB9HiAlnyVXMxh1EJx50ts+FEG5xYLKCR5e8rgVeMjiRCMD+ IaoDkLJFTHeAjMZTOG8g2PmIwbJcypD4jZmq7jER8DoW5U5nR/9bWOrLI0YbvXPOE4W2Nv C+PwQjTz/AVhi0tAQb4a5E+HZ8NGJ3c= X-AuditID: a67dfc5b-3e1ff7000001d7ae-d2-67a5dd1f99cb Date: Fri, 7 Feb 2025 19:14:49 +0900 From: Byungchul Park To: Gregory Price Cc: Matthew Wilcox , Hyeonggon Yoo <42.hyeyoo@gmail.com>, lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, linux-cxl@vger.kernel.org, Honggyu Kim , kernel_team@skhynix.com Subject: Re: [LSF/MM/BPF TOPIC] Restricting or migrating unmovable kernel allocations from slow tier Message-ID: <20250207101449.GA35103@system.software.com> References: <20250207072024.GA48419@system.software.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrMLMWRmVeSWpSXmKPExsXC9ZZnoa783aXpBt96eC0m9hhY/Lx7nN3i /KxTLBb31vxntdj3ei+zxe8fc9gc2Dx2zrrL7tHddpndY/MKLY9Nnyaxe0y+sZzR4/MmuQC2 KC6blNSczLLUIn27BK6M/t2HmAv2i1esvRjcwLhZqIuRk0NCwETi1fTvzDD2692nwGwWARWJ TbefgtlsAuoSN278BLI5OEQEVCXarrh3MXJxMAs8YpTY/n4fG0hcWCBN4u0PP5ByXgELiSs7 mxlBaoQEdjJKzNreygqREJQ4OfMJC4jNLKAlcePfSyaQXmYBaYnl/zhAwpwCZhKXGu+ygdii AsoSB7YdZwKZIyGwg01iWms7O8SdkhIHV9xgmcAoMAvJ2FlIxs5CGLuAkXkVo1BmXlluYmaO iV5GZV5mhV5yfu4mRmBQL6v9E72D8dOF4EOMAhyMSjy8CQeWpAuxJpYVV+YeYpTgYFYS4Z2y BijEm5JYWZValB9fVJqTWnyIUZqDRUmc1+hbeYqQQHpiSWp2ampBahFMlomDU6qBMTzpVU2a y+3vm0slg16oRpbzLL93+mZoz7ueJezHWt/u7LqSwuNiM9kgO/CeTeshFVW74N6OqU+d54n6 cn23v6LmdlTnb/LOpqqUBboSRlylXu4/l5x9+OrzBv9LO46zua18tDxVMuD/rC9rjr90+vau dUmnn3+ab8X/v1d1dTXe1n7/Hr9xhxJLcUaioRZzUXEiADk2aVtmAgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrMLMWRmVeSWpSXmKPExsXC5WfdrCt/d2m6wasNjBYTewwsft49zm7x +dlrZovDc0+yWpyfdYrF4t6a/6wW+17vZbb4/WMOmwOHx85Zd9k9utsus3tsXqHlsenTJHaP yTeWM3p8u+3hsfjFByaPz5vkAjiiuGxSUnMyy1KL9O0SuDL6dx9iLtgvXrH2YnAD42ahLkZO DgkBE4nXu08xg9gsAioSm24/BbPZBNQlbtz4CWRzcIgIqEq0XXHvYuTiYBZ4xCix/f0+NpC4 sECaxNsffiDlvAIWEld2NjOC1AgJ7GSUmLW9lRUiIShxcuYTFhCbWUBL4sa/l0wgvcwC0hLL /3GAhDkFzCQuNd5lA7FFBZQlDmw7zjSBkXcWku5ZSLpnIXQvYGRexSiSmVeWm5iZY6pXnJ1R mZdZoZecn7uJERi0y2r/TNzB+OWy+yFGAQ5GJR7ehANL0oVYE8uKK3MPMUpwMCuJ8E5ZAxTi TUmsrEotyo8vKs1JLT7EKM3BoiTO6xWemiAkkJ5YkpqdmlqQWgSTZeLglGpgFPjZnrMg5eqz h7Pr3+WvlWy++mymvLyDjpzKlN/nG/ZNu7z0ROBxgVf2P1tEmQx0pkReKJ5978dk7clSzBJJ a79X31FZu+pYSJLQp2fuKm77Jwc+Onxe+fK2tNlua845bfJlmfRPoqA1+Mc3+YZXj+WZDQz8 HX593Bo7oZLlZH6o9U65Sc7Nv5VYijMSDbWYi4oTAd9z6+pWAgAA X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Queue-Id: A2B17C000D X-Stat-Signature: dwejfu1tcq4m9tokui4oecrpb1bdf8bc X-Rspamd-Server: rspam03 X-HE-Tag: 1738923296-666665 X-HE-Meta: U2FsdGVkX19LEkfBoTbzeahq0yJ0SGFCAIzcfRwdaAMt8IrCh0EC5/4Coc7pwK7YhAtkYsgs8kexTqCbiZM3/fVnY9M69Ozcg2XyEIGL0iJZ4LSWCowULVdvnA3goevla8Qv6Dgp9LjZaikl1IhR1CPkyAjOY92taPCbMboBI6elbzgmum/3SA/RVF0ajwObJ8q/S5aqeLs4uAatrMWzIzugiKysM3YfmtdNTDB5ASkTRU9hFrQqPwpT1T8a9khPnjM6EVVyQITqIA+lgHD3D5zYaqHNbxhdNu4zUgm+zl5sByMQw9qfK0V99ce9U7QJh/EXEBuDVvFoko+TsUlkTHg/d51tq78XGXrMq/YhEqh02knZekaeuloI6enEPf9yh0/ntnqKz+wgEIPQIiXKXsHCSqvmexmaBC8q6ePtEVNgGYpcdWju9HzfGlvcBEbxIO+4jz6HHEhmKoPBg6fYf1Nit6HIB8BXMqFLExenAq2SP2XpiorB2i4nn86huLo2No4hfjs4JhfSKRabIVDEcbPkkadubES0l6c62sV3SbZhSuBdpnamcF+URG9DtPJjawm6sF3dYdFBrZ+30CI76x1Q2SpN1Uok507At7bzVspI/e/soQ8Z+jEm64Cl178R0r1qHdVB9HN398hVqrAHR677kZxEGqMg4D18Lb/wVrQvSlpHi0TyJMotiw0n8gAFp+JM0Qv2UGQTgzY9kgYcEwOtvM5qZYP4TUvD7jcFMJjZnbBDEV9IJvd1lxeUbefHBeW9Dmir5r/KjhNWX8zC+HRMqxxEKeIpO5QWqyu+68OeyzHXH7MIKHIbM3Roirus11trke4IUyqAQFgk6LkZ6fSVxD4P3VidrX/tQJ3psGftw8RLwTHcUfMEv7axKiweEsdhGtT4EPoWSlidgi4ggi7NwXf90YLg/i1Aq77SWg02bhl/JXORdmbXvRsN5isS8Ydui8HTPmo1etN96aX W8R3RGX8 CzvGG/41RILx8rqY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Feb 07, 2025 at 03:57:45AM -0500, Gregory Price wrote: > On Fri, Feb 07, 2025 at 04:20:24PM +0900, Byungchul Park wrote: > > On Sat, Feb 01, 2025 at 02:04:17PM +0000, Matthew Wilcox wrote: > > > > We can work with from the easiest object > > >e.g. page table > > It's more efficient and easier to change page sizes than it is to make > page tables migratable. You are misunderstanding. I didn't say 'do not change page sizes'. I didn't say it's easier than changing page size. I said *both* changing page sizes and making them migratable could reduce ZONE_NORMAL cost. > It's also easier to reclaim cold pages eating up significantly more > memory than the page table (which describes pages at ~8 bytes per page). Same. We should keep reclaiming cold pages eating up memory. Why do we give up reclaiming cold pages if page table becomes migratable? I really don't understand why you are trying to exclusively pick up only one effort for that purpose. > Also, there's quite a bit of literature that shows page tables landing > on remote nodes (cross-socket) has negative performance impacts. Exactly. That's the motivation to suggest this topic. That's why we are asking about kernel object migratibility. Of course, we try our best to place kernel object in DRAM in the first place. However, the thing would arise when it becomes impossible. It's about comparison between 'premature reclaim and die(= oom)' and 'slight degradation of performance'. > Putting them on CXL makes the problem worse. No. Higher chance to die is worse. > > struct page, > > `struct page` is a structure that describes a physically addressed page. > > It is common to access it by simply doing `pfn_to_page()`, which is a > fairly simply conversion (bit more complex in sparsemem w/ sections) > > This is used in a lockless manner to acquire page references all over > the kernel. > > Making that migratable is... ambitious, to say the least. Yes. I don't think it's easy. > > and kernel stack, > > The default kernel stack size is like 16kb. You'd need like 100,000 > threads to eat up 1.5GB, and 2048 threads only eats like 32MB. > > It's not an interesting amount of memory if you have a 20TB system. Kernel stack is an example. We can skip it and look for better candidate. > > When it comes to this topic, the most important thing is the collected > > *direction* from the community so that we can start the work under the > > *direction*. > > > > My thoughts here are that memory tiering is the wrong tool for the > problem you are trying to solve. I think any valid efforts can be considered at the same time. Is there any reason that effort in tiering environment should be excluded? Byungchul > Maybe there's a world in which we propose a ZONE_MEMDESC which is > exclusively used for `struct page` for a node. > > At least then you could design CXL capacities *around* that. > > ~Gregory