From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2BA15C433F5 for ; Wed, 11 May 2022 07:50:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AC6E86B0073; Wed, 11 May 2022 03:50:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A4CF86B0075; Wed, 11 May 2022 03:50:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 89F576B0078; Wed, 11 May 2022 03:50:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 791FD6B0073 for ; Wed, 11 May 2022 03:50:53 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 52FC922045 for ; Wed, 11 May 2022 07:50:53 +0000 (UTC) X-FDA: 79452690786.16.44B30D5 Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) by imf30.hostedemail.com (Postfix) with ESMTP id D81AE800A8 for ; Wed, 11 May 2022 07:50:34 +0000 (UTC) Received: by mail-pf1-f176.google.com with SMTP id c14so1306492pfn.2 for ; Wed, 11 May 2022 00:50:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=TOpiEgIVOcB5WT+3SW82dW6oiJYrSmauk2q3s83vFLY=; b=GjYiH/cdcBunvwcLneFnTnTdDBF/SnRaJ9OcOMcfesIKYqQx/BCTReS/SXwkP5ZAuK KtBBQl+aHtTVNQHlgfhLMAaDwx9uiilDXJIuJEW1n8qxDy0UXOBW3BkjdzBqxRoS3Ymb XfWIZAaRT/pp0dt/FLpu+J+g2cU0bFBma1IYlsIpE1zbMhtbToxM3Gx7Vl1lRkAd1s3T 8ZvSgHSd96OOqNpjQGrdxpJ4kPwp5P7PqxjUiM46yTHg3PSGx8ma2CM4infM86N7zc21 LUgF+mTB3/NAoDVwNDqV/Au4D3ijEzrwW+g8sDwesW2BWYFli08oKOgN7I0jjoCOFiOa D3jQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=TOpiEgIVOcB5WT+3SW82dW6oiJYrSmauk2q3s83vFLY=; b=EBl33HiDCqM9RtUOQpshC4oHDZofo+ZPLS4A3xFG3LxsiMxcQTYhjNc2basOQmWMaO jwCB9P1dXaFlShwgv+TQ674aFWA5+D93u3gZ4ekZLXiKWjPvHlj6X62WvUvmS3EeKQW1 QXL2GOk+p9doZgIaV9XCqIQ56ZwBU7FbcJsyBU9uv3g5yTqcLOT+tagdiaMA/J+XmuR+ s2B+dXFFSlOr6huqlWG/tjamsG1WtbwoY2zpzppZsDgu3mPrcP7mwnTUk3SZWSIrSFK/ fUe1nk2Pyi3G5UFGIxdnkI0tRCOhslwMgE2+uLJj9c5/hx/EXTDz/eD73u5tMBi9hKNx N7lQ== X-Gm-Message-State: AOAM532Zidha5HRKFsCPdOFHlsAdxYYXwRiz7vaCKna0v41O7pJ/svAg O+17j+CoF78KbrsXyl+Qslo= X-Google-Smtp-Source: ABdhPJyWfCGGLnrrOVCcFuplYmh4xhOYlD6nL1apLGnsFdOkhHBpxvGTnM+WgF1ZB4sgnv6HEW8Jbw== X-Received: by 2002:a65:6a01:0:b0:3aa:b8:afc3 with SMTP id m1-20020a656a01000000b003aa00b8afc3mr19850719pgu.348.1652255451792; Wed, 11 May 2022 00:50:51 -0700 (PDT) Received: from hyeyoo ([114.29.24.243]) by smtp.gmail.com with ESMTPSA id y19-20020a170902d65300b0015e8d4eb273sm1019850plh.189.2022.05.11.00.50.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 May 2022 00:50:50 -0700 (PDT) Date: Wed, 11 May 2022 16:50:44 +0900 From: Hyeonggon Yoo <42.hyeyoo@gmail.com> To: Mike Rapoport Cc: linux-mm@kvack.org, Andrew Morton , Andy Lutomirski , Dave Hansen , Ira Weiny , Kees Cook , Mike Rapoport , Peter Zijlstra , Rick Edgecombe , Vlastimil Babka , linux-kernel@vger.kernel.org, x86@kernel.org Subject: Re: [RFC PATCH 0/3] Prototype for direct map awareness in page allocator Message-ID: References: <20220127085608.306306-1-rppt@kernel.org> <20220430134415.GA25819@ip-172-31-27-201.ap-northeast-1.compute.internal> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="GjYiH/cd"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf30.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: D81AE800A8 X-Rspam-User: X-Stat-Signature: tfssai9fzt49s4jh5hhasxxth6p4d1dc X-HE-Tag: 1652255434-333039 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, May 02, 2022 at 09:44:48PM -0700, Mike Rapoport wrote: > On Sat, Apr 30, 2022 at 01:44:16PM +0000, Hyeonggon Yoo wrote: > > On Tue, Apr 26, 2022 at 06:21:57PM +0300, Mike Rapoport wrote: > > > Hello Hyeonggon, > > > > > > On Tue, Apr 26, 2022 at 05:54:49PM +0900, Hyeonggon Yoo wrote: > > > > On Thu, Jan 27, 2022 at 10:56:05AM +0200, Mike Rapoport wrote: > > > > > From: Mike Rapoport > > > > > > > > > > Hi, > > > > > > > > > > This is a second attempt to make page allocator aware of the direct map > > > > > layout and allow grouping of the pages that must be mapped at PTE level in > > > > > the direct map. > > > > > > > > > > > > > Hello mike, It may be a silly question... > > > > > > > > Looking at implementation of set_memory*(), they only split > > > > PMD/PUD-sized entries. But why not _merge_ them when all entries > > > > have same permissions after changing permission of an entry? > > > > > > > > I think grouping __GFP_UNMAPPED allocations would help reducing > > > > direct map fragmentation, but IMHO merging split entries seems better > > > > to be done in those helpers than in page allocator. > > > > > > Maybe, I didn't got as far as to try merging split entries in the direct > > > map. IIRC, Kirill sent a patch for collapsing huge pages in the direct map > > > some time ago, but there still was something that had to initiate the > > > collapse. > > > > But in this case buddy allocator's view of direct map is quite limited. > > It cannot merge 2M entries to 1G entry as it does not support > > big allocations. Also it cannot merge entries of pages freed in boot process > > as they weren't allocated from page allocator. > > > > And it will become harder when pages in MIGRATE_UNMAPPED is borrowed > > from another migrate type.... > > > > So it would be nice if we can efficiently merge mappings in > > change_page_attr_set(). this approach can handle cases above. > > > > I think in this case grouping allocations and merging mappings > > should be done separately. > > I've added the provision to merge the mappings in __free_one_page() because > at that spot we know for sure we can replace multiple PTEs with a single > PMD. > > I'm not saying there should be no additional mechanism for collapsing > direct map pages, but I don't know when and how it should be invoked. > I'm still thinking about a way to accurately track number of split pages - because tracking number of split pages only in CPA code may be inaccurate when kernel page table is changed outside CPA. In case you wonder, my code is available at: https://github.com/hygoni/linux/tree/merge-mapping-v1r3 it also adds vmstat items: # cat /proc/vmstat | grep direct_map direct_map_level2_splits 1079 direct_map_level3_splits 6 direct_map_level1_merges 1079 direct_map_level2_merges 6 Thanks, Hyeonggon > > > > For example: > > > > 1) set_memory_ro() splits 1 RW PMD entry into 511 RW PTE > > > > entries and 1 RO PTE entry. > > > > > > > > 2) before freeing the pages, we call set_memory_rw() and we have > > > > 512 RW PTE entries. Then we can merge it to 1 RW PMD entry. > > > > > > For this we need to check permissions of all 512 pages to make sure we can > > > use a PMD entry to map them. > > > > Of course that may be slow. Maybe one way to optimize this is using some bits > > in struct page, something like: each bit of page->direct_map_split (unsigned long) > > is set when at least one entry in (PTRS_PER_PTE = 512)/(BITS_PER_LONG = 64) = 8 entries > > has special permissions. > > > > Then we just need to set the corresponding bit when splitting mappings and > > iterate 8 entries when changing permission back again. (and then unset the bit when 8 entries has > > usual permissions). we can decide to merge by checking if page->direct_map_split is zero. > > > > When scanning, 8 entries would fit into one cacheline. > > > > Any other ideas? > > > > > Not sure that doing the scan in each set_memory call won't cause an overall > > > slowdown. > > > > I think we can evaluate it by measuring boot time and bpf/module > > load/unload time. > > > > Is there any other workload that is directly affected > > by performance of set_memory*()? > > > > > > 3) after 2) we can do same thing about PMD-sized entries > > > > and merge them into 1 PUD entry if 512 PMD entries have > > > > same permissions. > > > > [...] > > > > > Mike Rapoport (3): > > > > > mm/page_alloc: introduce __GFP_UNMAPPED and MIGRATE_UNMAPPED > > > > > mm/secretmem: use __GFP_UNMAPPED to allocate pages > > > > > EXPERIMENTAL: x86/module: use __GFP_UNMAPPED in module_alloc > > > > -- > > > > Thanks, > > > > Hyeonggon > > > > > > -- > > > Sincerely yours, > > > Mike. > > -- > Sincerely yours, > Mike. -- Thanks, Hyeonggon