From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9EC18C25B07 for ; Thu, 11 Aug 2022 04:50:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DACA56B0073; Thu, 11 Aug 2022 00:50:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D5C7E6B0074; Thu, 11 Aug 2022 00:50:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C24058E0001; Thu, 11 Aug 2022 00:50:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B4AE26B0073 for ; Thu, 11 Aug 2022 00:50:51 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 79F33160721 for ; Thu, 11 Aug 2022 04:50:51 +0000 (UTC) X-FDA: 79786086702.25.83BC7B6 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by imf06.hostedemail.com (Postfix) with ESMTP id ECA58180168 for ; Thu, 11 Aug 2022 04:50:50 +0000 (UTC) Received: by mail-pl1-f170.google.com with SMTP id 13so14602326plo.12 for ; Wed, 10 Aug 2022 21:50:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc; bh=g1tB4PzGwPaYivNuYZQ1WLvbqe8EaYq8NKrVrjSyJXc=; b=c+zrxQ6+cdQYCS/N5LRVQuaOTOSbl/9tuRDxcto/BR1ntLwhL783AouGqMnQagkfQe CCid8MojN38ExbqX1WzSsJBkBLAydGQBXYKDp8MxwEFdzEmitWBWrpLmWYZCwYrGL4j8 HKOPetspK0mSldCeqTI7vz1SoFylPycknQeTFFn1amooQyIvUXSW/5JdABj1fIS2FEm3 vNFfJjvwAfhR7l06ciVTDTuew/4ERRdVcvKqUKfBEmc04RHyFT6YRBmFeS0kBlGqAWCr ZSxYzxYCGdT4ENBOZTRXq4/UD92v/+IBVxX/GQBw6NFetVy145HXcNjSqpOkrXP1CPsw rW9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc; bh=g1tB4PzGwPaYivNuYZQ1WLvbqe8EaYq8NKrVrjSyJXc=; b=E2JJh6auzflYlRbW6VtH8EU4G8GE9bkgWvSNc8N3o6KHBucHyjHTO15oTB4LkVQnOx v1wp28YZ/+Hngv1KYbtbc11/HO0w9YYS54Wy1tTr6OEHDnfRtcZUEdXqnEwuVD9nQJDf QK0jxl8VUocaTOw22OWpEHNow5MaiguJpA5+BHbDDhotqLf+8D2VQFCF4vOa4Bybi4PQ 6nEAXfMUHpza0E1dNj2sNzA+e2Cjo83HIHYG53z0u1rRJElM5AmALGYhQgK/ZyzzfcwZ iV10IctS8S2kNvEkjQ3+kZ1ntKur9PeV4PFIQdsDBHKN0rPWNA63pOg1jNTmFtD2+xRA tacA== X-Gm-Message-State: ACgBeo3G5wKwPVXM7Pr+DHpu9QGzvyz89YszNLKnfmNNw87hSpubVvfJ t6FkAU8XC8mobe3cogbi6Sw= X-Google-Smtp-Source: AA6agR5EbzmroQdQvkmFjNJrR7BdWQrjvhL5tELLflY63x02a3iWQJ3Y9mjlt2+z8khOhj6d/KMSUQ== X-Received: by 2002:a17:90a:a2a:b0:1f3:1479:e869 with SMTP id o39-20020a17090a0a2a00b001f31479e869mr6830467pjo.41.1660193449778; Wed, 10 Aug 2022 21:50:49 -0700 (PDT) Received: from ip-172-31-24-42.ap-northeast-1.compute.internal (ec2-35-79-20-36.ap-northeast-1.compute.amazonaws.com. [35.79.20.36]) by smtp.gmail.com with ESMTPSA id h11-20020a17090a3d0b00b001f31d6fe0f3sm2604499pjc.57.2022.08.10.21.50.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Aug 2022 21:50:48 -0700 (PDT) Date: Thu, 11 Aug 2022 04:50:44 +0000 From: Hyeonggon Yoo <42.hyeyoo@gmail.com> To: Aaron Lu Cc: Dave Hansen , Rick Edgecombe , Song Liu , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Mike Rapoport , Hyeonggon Yoo <42.hyeyoo@gmail.com> Subject: Re: [RFC PATCH 0/4] x86/mm/cpa: merge small mappings whenever possible Message-ID: References: <20220808145649.2261258-1-aaron.lu@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220808145649.2261258-1-aaron.lu@intel.com> ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=c+zrxQ6+; spf=pass (imf06.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660193451; a=rsa-sha256; cv=none; b=nPcjEE3l8w7DBHq0Z0EFWsY6TAiKw7eApQz0w+hhynxFvFkO3QGF74eM0vIzdzmw0UqoG0 gIiEQLp6QTHyp5Qp0whgkh0LKBf+gTVT9hJ4MOxjx4ss7cRXOBqJJbeJfi8gwii+yq1p53 S7oCCMjt67ti9beQGsqD6RLCxx27IjI= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660193451; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=g1tB4PzGwPaYivNuYZQ1WLvbqe8EaYq8NKrVrjSyJXc=; b=DBotdRAglNq+F2xcTcGoLwyW9KuFBGCZP36VwecdvYUSvMAA3rz0iq3ht8PlXMMKNyz2QC rLIjUwNoP/NlziKlGzjBV00j7uPIZn701gmbZfP/qL5AfBFhvJlWZh9Gq0N+OFamB4ERif qQzsns8KkbvBjzpWisWTL3BqtXsd6sQ= Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=c+zrxQ6+; spf=pass (imf06.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: ECA58180168 X-Stat-Signature: gtrcun1p6umhz4113hprjq3suedrj5nn X-Rspam-User: X-HE-Tag: 1660193450-580515 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Aug 08, 2022 at 10:56:45PM +0800, Aaron Lu wrote: > This is an early RFC. While all reviews are welcome, reviewing this code > now will be a waste of time for the x86 subsystem maintainers. I would, > however, appreciate a preliminary review from the folks on the to and cc > list. I'm posting it to the list in case anyone else is interested in > seeing this early version. > Hello Aaron! +Cc Mike Rapoport, who has been same problem. [1] There is also LPC discussion (with different approach on this problem) [2], [4] and performance measurement when all pages are 4K/2M. [3] [1] https://lore.kernel.org/linux-mm/20220127085608.306306-1-rppt@kernel.org/ [2] https://www.youtube.com/watch?v=egC7ZK4pcnQ [3] https://lpc.events/event/11/contributions/1127/attachments/922/1792/LPC21%20Direct%20map%20management%20.pdf [4] https://lwn.net/Articles/894557/ > Dave Hansen: I need your ack before this goes to the maintainers. > > Here it goes: > > On x86_64, Linux has direct mapping of almost all physical memory. For > performance reasons, this mapping is usually set as large page like 2M > or 1G per hardware's capability with read, write and non-execute > protection. > > There are cases where some pages have to change their protection to RO > and eXecutable, like pages that host module code or bpf prog. When these > pages' protection are changed, the corresponding large mapping that > cover these pages will have to be splitted into 4K first and then > individual 4k page's protection changed accordingly, i.e. unaffected > pages keep their original protection as RW and NX while affected pages' > protection changed to RO and X. > > There is a problem due to this split: the large mapping will remain > splitted even after the affected pages' protection are changed back to > RW and NX, like when the module is unloaded or bpf progs are freed. > After system runs a long time, there can be more and more large mapping > being splitted, causing more and more dTLB misses and overall system > performance getting hurt[1]. > > For this reason, people tried some techniques to reduce the harm of > large mapping beling splitted, like bpf_prog_pack[2] which packs > multiple bpf progs into a single page instead of allocating and changing > one page's protection for each bpf prog. This approach made large > mapping split happen much fewer. > > This patchset addresses this problem in another way: it merges > splitted mappings back to a large mapping when protections of all entries > of the splitted small mapping page table become same again, e.g. when the > page whose protection was changed to RO+X now has its protection changed > back to RW+NX due to reasons like module unload, bpf prog free, etc. and > all other entries' protection are also RW+NX. > I tried very similar approach few months ago (for toy implementation) [5], and the biggest obstacle to this approach was: you need to be extremely sure that the page->nr_same_prot is ALWAYS correct. For example, in arch/x86/include/asm/kfence.h [6], it clears and set _PAGE_PRESENT without going through CPA, which can simply break the count. [5] https://github.com/hygoni/linux/tree/merge-mapping-v1r3 [6] https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/kfence.h#L56 I think we may need to hook set_pte/set_pmd/etc and use proper synchronization primitives when changing init_mm's page table to go further on this approach. > One final note is, with features like bpf_prog_pack etc., there can be > much fewer large mapping split IIUC; also, this patchset can not help > when the page which has its protection changed keeps in use. So my take > on this large mapping split problem is: to get the most value of keeping > large mapping intact, features like bpf_prog_pack is important. This > patchset can help to further reduce large mapping split when in use page > that has special protection set finally gets released. > > [1]: http://lkml.kernel.org/r/CAPhsuW4eAm9QrAxhZMJu-bmvHnjWjuw86gFZzTHRaMEaeFhAxw@mail.gmail.com > [2]: https://lore.kernel.org/lkml/20220204185742.271030-1-song@kernel.org/ > > Aaron Lu (4): > x86/mm/cpa: restore global bit when page is present > x86/mm/cpa: merge splitted direct mapping when possible > x86/mm/cpa: add merge event counter > x86/mm/cpa: add a test interface to split direct map > > arch/x86/mm/pat/set_memory.c | 411 +++++++++++++++++++++++++++++++++- > include/linux/mm_types.h | 6 + > include/linux/page-flags.h | 6 + > include/linux/vm_event_item.h | 2 + > mm/vmstat.c | 2 + > 5 files changed, 420 insertions(+), 7 deletions(-) > > -- > 2.37.1 > >