From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B4DEE7719E for ; Mon, 13 Jan 2025 08:01:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 94ECF6B007B; Mon, 13 Jan 2025 03:01:29 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8D7706B0083; Mon, 13 Jan 2025 03:01:29 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 777E46B0085; Mon, 13 Jan 2025 03:01:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 5740F6B007B for ; Mon, 13 Jan 2025 03:01:29 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id CD8BE1A1CA3 for ; Mon, 13 Jan 2025 08:01:28 +0000 (UTC) X-FDA: 83001683856.28.4231278 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.16]) by imf06.hostedemail.com (Postfix) with ESMTP id 1A9A218000F for ; Mon, 13 Jan 2025 08:01:25 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=dFqCd4dN; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf06.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 198.175.65.16) smtp.mailfrom=kirill.shutemov@linux.intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736755286; a=rsa-sha256; cv=none; b=fFVmsDyolpLfdVMxd3DILSxBOvbMXRhXy/sM8mS/epZOrzkypofcGBpyLDkqJgei3lulMY TjUATLS47EwBoFpNY9imHm8r9yIs77TweqOCQE7Em3NHL+Nsa93b/JFehwSEwku5h07S4f LVcZ13w+jR/rjSswoYqHHOqQmGKoDbI= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=dFqCd4dN; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf06.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 198.175.65.16) smtp.mailfrom=kirill.shutemov@linux.intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736755286; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cvtrbhOZ/jnb0cKefuPBu8WDeu540Muca3yRyRrNZtI=; b=J+ZQ32e7Z/DtsFbabVdrR7VfppIrOQyHp+I3dLLueXopXFgGkLBR8/KRo93AIDM6tzDx7W V4FiQ2NiKNW9KLjuwrbXxcikvKe+yG192rI7ZREOtNS+Yo1PrQB6eNQBWJXyBPq+bPSacr 0NY1Btq9pRcj+P2RS4ZUmzAUvZVPe9o= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1736755287; x=1768291287; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=PQUlUvfhCWekOuCl5QttmQTgLYBws5TIBjoR0vGiG7Q=; b=dFqCd4dNmBmLQXFxOBr6ovXBoLXqri6UZzaY9Za8vw6aLTJHmt4quNBU FRUSnjFdvZLUgi4QUk/rBqoxIGRfY6seM5UyKGDfReJm3YNhUOvrKLg7i p0dJoF00H3XHxHeWz9JJFNntBZ3dmfoR8A3BwKWQGyZbhs6+ZhDrpiSWS Qm7p07Cy/DBMDAH0u85VDPFPIxJGCGBCDOkr+bvgzjgb34Dd0mKNPXSFV FpOzYPkhJqDVWVt/dIjlx984WlFS2xNOugZNxRn4fcc0Q3gfsL4JxybPz VCqpkxZ0dQC6sU7b/i7McA0qP2s1vZp0N36zEo8FXv1YA27IaImCgw96F g==; X-CSE-ConnectionGUID: DLlCZsKuTviQS6iLU8Gu8w== X-CSE-MsgGUID: lMmH/pGoQnKrry2nI1PLUg== X-IronPort-AV: E=McAfee;i="6700,10204,11313"; a="37162344" X-IronPort-AV: E=Sophos;i="6.12,310,1728975600"; d="scan'208";a="37162344" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by orvoesa108.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Jan 2025 00:01:25 -0800 X-CSE-ConnectionGUID: errEg64YRjm1IP+ivSWOEA== X-CSE-MsgGUID: JLLGpe/CSLWefE+dqKfu3A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="109517553" Received: from black.fi.intel.com ([10.237.72.28]) by orviesa005.jf.intel.com with ESMTP; 13 Jan 2025 00:01:15 -0800 Received: by black.fi.intel.com (Postfix, from userid 1000) id 4C363331; Mon, 13 Jan 2025 10:01:13 +0200 (EET) Date: Mon, 13 Jan 2025 10:01:13 +0200 From: "Kirill A. Shutemov" To: Mike Rapoport Cc: "Kirill A. Shutemov" , Andrew Morton , Andy Lutomirski , Anton Ivanov , Borislav Petkov , Brendan Higgins , Daniel Gomez , Daniel Thompson , Dave Hansen , David Gow , Douglas Anderson , Ingo Molnar , Jason Wessel , Jiri Kosina , Joe Lawrence , Johannes Berg , Josh Poimboeuf , Luis Chamberlain , Mark Rutland , Masami Hiramatsu , Miroslav Benes , "H. Peter Anvin" , Peter Zijlstra , Petr Mladek , Petr Pavlu , Rae Moar , Richard Weinberger , Sami Tolvanen , Shuah Khan , Song Liu , Steven Rostedt , Thomas Gleixner , kgdb-bugreport@lists.sourceforge.net, kunit-dev@googlegroups.com, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, linux-modules@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-um@lists.infradead.org, live-patching@vger.kernel.org, x86@kernel.org Subject: Re: [PATCH 3/8] x86/mm/pat: Restore large pages after fragmentation Message-ID: References: <20241227072825.1288491-1-rppt@kernel.org> <20241227072825.1288491-4-rppt@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 1A9A218000F X-Stat-Signature: 4g8bgertxg6fkf5i158osf7xmii7jn7r X-Rspam-User: X-HE-Tag: 1736755285-292598 X-HE-Meta: U2FsdGVkX1+TJUoxWcn2X98SzLQIrcavoxGz38viMsslz6p6HjOb2kc+Sd8Pc9K9m3bB1xfzi6QK/QEl2TUdjzM/bfehh9B/2EF51hbqvap2xPuaVTYy69NYilw0xnTJejFvlY/GtXq4dIbxrXu8pn96m5/+VU2ql4AmGQ3HaBFAQMQuccHIdVHNt+cXmajeclI6meo8AbnyaANwKm9YQkP+upAThEF4QFhUGVY4uOrmJJEpUXzXiM6aYGX0269Sq+OoVWZWSBUnA88UBeOVvD99P+n8e3eirZlz312jFMqezS2D+4O9pUs8a72PFOjzNh3P22WiEKSck4ks+Zm/4q1znbXMZdNlg9QvcB0OevyzyGPurQJuZvKTYAjoppvXmvbAeioMhXYEx8Q2kvjiy1/eev2bR2AG/yzaVMtFe7NV/Aad2ZHwQxrDkE4NDoN/q5Fs7qeScaMfi+mZdROwtikdiJ6r8WVMiOQLic7HhyhDXL40anoVvtRvhkcyKx9HohPcLLmu3+YAyp5cSkMXRx/amH6qF9HX/fEd+LVsy3bO614CSF39PYPgz7uGj3kicRl0SXgYs4o5yZjiFfsquS/sksEo7D+puVPPewmo+Fe7L/1jDeXqE8L+fMCo/awidVpUJ0b4ntcYoWhXTahvFJHF0UE8PN3M/Qo3TO92v41c7wAYZsKopXqa46w2krmOkWj/i5QN5jxPhG3VOYe7f7hy9/bioAKCYsqtlOJOkVmpJzD+/k2YXc1TmRkBf14TOyYN3RosH2t4CcHz5bpS1wSsJK+ISSN1beQJ7F7hyLKZHbGfMuGnpeEuENQmFXTeRcn0fjUmDFf4n8c/08agNl1NKpW3zf0E2B1QbruulyaZrLu6njCzIt0xhlc0GRzRp8bI1WGNPXTl1dJgTul/uLop7QHsIv+UOJmGB2G0DsFK42Lg2qAzUJGAjiU7HLCSWhf8JAgieSs0tQmEvOg u02pl+ef J/bCJ/JPKozKMkODwbJgrBUoq8qng2lpU7eKT3RMkEweLev3HjlfF9/G4GTtjvoU2cEpyz/0X7VEsqa9W4+3DMJDk3yy6igQb6/NKkds+WHzXE+kYxkIr9JZOVcfueD0MAWzsum+F6Ow1OuBqJO4S9xHdEg9RNWs8FgWpyg/GG9QgIyR0kiopUalVrh1S2IEbz2kEdIukHwPqKDe2ieOjO8C3T8J+sYTPitvvbfWeQVhU7YzgvtCniBEVB92t9MH+11JLb5UrvnSH0aZZ6OHZXY582A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Jan 12, 2025 at 10:54:46AM +0200, Mike Rapoport wrote: > Hi Kirill, > > On Fri, Jan 10, 2025 at 12:36:59PM +0200, Kirill A. Shutemov wrote: > > On Fri, Dec 27, 2024 at 09:28:20AM +0200, Mike Rapoport wrote: > > > From: "Kirill A. Shutemov" > > > > > > Change of attributes of the pages may lead to fragmentation of direct > > > mapping over time and performance degradation as result. > > > > > > With current code it's one way road: kernel tries to avoid splitting > > > large pages, but it doesn't restore them back even if page attributes > > > got compatible again. > > > > > > Any change to the mapping may potentially allow to restore large page. > > > > > > Hook up into cpa_flush() path to check if there's any pages to be > > > recovered in PUD_SIZE range around pages we've just touched. > > > > > > CPUs don't like[1] to have to have TLB entries of different size for the > > > same memory, but looks like it's okay as long as these entries have > > > matching attributes[2]. Therefore it's critical to flush TLB before any > > > following changes to the mapping. > > > > > > Note that we already allow for multiple TLB entries of different sizes > > > for the same memory now in split_large_page() path. It's not a new > > > situation. > > > > > > set_memory_4k() provides a way to use 4k pages on purpose. Kernel must > > > not remap such pages as large. Re-use one of software PTE bits to > > > indicate such pages. > > > > > > [1] See Erratum 383 of AMD Family 10h Processors > > > [2] https://lore.kernel.org/linux-mm/1da1b025-cabc-6f04-bde5-e50830d1ecf0@amd.com/ > > > > > > [rppt@kernel.org: > > > * s/restore/collapse/ > > > * update formatting per peterz > > > * use 'struct ptdesc' instead of 'struct page' for list of page tables to > > > be freed > > > * try to collapse PMD first and if it succeeds move on to PUD as peterz > > > suggested > > > * flush TLB twice: for changes done in the original CPA call and after > > > collapsing of large pages > > > ] > > > > > > Link: https://lore.kernel.org/all/20200416213229.19174-1-kirill.shutemov@linux.intel.com > > > Signed-off-by: Kirill A. Shutemov > > > Co-developed-by: Mike Rapoport (Microsoft) > > > Signed-off-by: Mike Rapoport (Microsoft) > > > > When I originally attempted this, the patch was dropped because of > > performance regressions. Was it addressed somehow? > > I didn't realize the patch was dropped because of performance regressions, > so I didn't address it. > > Do you remember where did the regressions show up? https://github.com/zen-kernel/zen-kernel/issues/169 My understanding is if userspace somewhat frequently triggers set_memory_* codepath we will get a performance hit. -- Kiryl Shutsemau / Kirill A. Shutemov