From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E947C54EBD for ; Mon, 9 Jan 2023 13:24:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7DCF68E0003; Mon, 9 Jan 2023 08:24:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 78C9D8E0001; Mon, 9 Jan 2023 08:24:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 67BC68E0003; Mon, 9 Jan 2023 08:24:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 57E298E0001 for ; Mon, 9 Jan 2023 08:24:30 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1F4D81209D0 for ; Mon, 9 Jan 2023 13:24:30 +0000 (UTC) X-FDA: 80335329900.16.0D39ABF Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf24.hostedemail.com (Postfix) with ESMTP id A130F180012 for ; Mon, 9 Jan 2023 13:24:27 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=CkvFhx5L; spf=none (imf24.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673270668; a=rsa-sha256; cv=none; b=H4NYolxPFeo4JK71sURcDvFJFlm87JGe/dieBNl2H5cXHz1hSpVl509OVXqBX7lruZvVYi bQCSrAtexdBJTfB43lKpqilT+tYczhBhpRBHQOta1yzqrObAozF1y2BX0en+M8AR9XYyfD wAGfBkHbS8uWCvPHWNmbcJ422o+5voM= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=CkvFhx5L; spf=none (imf24.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673270668; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vtVlp2pCCvi5PCZrU7V1uM7Z5VuQOVZnza26fuzxhko=; b=P6x8mY9R+6X+jfgENrSNgoCIwxeJbOMHTIZvaWF16wVLtS8QMiUO4vhUzg+J25OyAky5DT wLAW2CY+uT2zOhU0FZQZ40DlIueZ9NhnNTgvheqbFBzBXb5ejaqlY8R5pPBWaDy4+ZD1bK Ty1IGa2JUHE4vD02xIQQ3V+8an2TNLY= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=vtVlp2pCCvi5PCZrU7V1uM7Z5VuQOVZnza26fuzxhko=; b=CkvFhx5LUG/+YcJgDsidZZtFoe u8giNf5nO1dbx+qHrv1mt5onMdQIgZMQ4ohCY6HDfkzTgbZyy6FsOLnVta7ATp2Uv2utsTLST8oHw 65Cjck6LGZRRXAwDIpP5dvbvZJOdwbiu8FNlheeCswBhs955w5Nn+1lYL0ti7i++Wa3f7Z2x+j7d6 MjG1WgB44IWAvf8PPDevqR6kjV3LLFAtlY0I7Blx6EW8lB+O1tg/7rbOFEQeU00zqnxtx8koKUf33 gJK2GlrZ57q3H4lsQUtA7qVpDGgx9Zpg0YHtFQDHitnWAIg/cDzu9G4pRmkJ3js1n0GFIsMqKB/kr GpsUWseQ==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1pEs8d-002IZo-4e; Mon, 09 Jan 2023 13:24:31 +0000 Date: Mon, 9 Jan 2023 13:24:31 +0000 From: Matthew Wilcox To: Yin Fengwei Cc: linux-mm@kvack.org, akpm@linux-foundation.org, jack@suse.cz, hughd@google.com, kirill.shutemov@linux.intel.com, mhocko@suse.com, ak@linux.intel.com, aarcange@redhat.com, npiggin@gmail.com, mgorman@techsingularity.net, rppt@kernel.org, dave.hansen@intel.com, ying.huang@intel.com, tim.c.chen@intel.com Subject: Re: [RFC PATCH 1/4] mcpage: add size/mask/shift definition for multiple consecutive page Message-ID: References: <20230109072232.2398464-1-fengwei.yin@intel.com> <20230109072232.2398464-2-fengwei.yin@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230109072232.2398464-2-fengwei.yin@intel.com> X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: A130F180012 X-Stat-Signature: szpsongqp7p5d6sxghk8ckdh5x4ogcwu X-HE-Tag: 1673270667-492753 X-HE-Meta: U2FsdGVkX18cMcCYWLFmKvwgWL4x875pAnpimZXCW5pn76PD20TduARyWWCdFcpBouB3KB8qN2VpYMMXi5gdryQSz2SQjpMfYFW36pwPnHkG/sPusMKdCsJT4AmcNfEf3KNoRMIyg2oLfbvXB6jzRpB7GRCj/7QMdirn5GaaIza7cpA/OACa63EBPwN39rSBmbuDaw62k6pRqFIdfgZq/nEJGdhB4DCicF4gM1LI9DIE4fgzrOgmUbr3IQE13gPR9UXbeT+kbBajBC7Vbwsud+v0on2tOQb+PFk1Ildb6J2y4LA5RKqRBnLROUxlnCWMycDP91MIirxqCrioYphxTUgHXKcoe9F7Rw1Uu30Ej9NP/ZYfLHwXwJncCUzfIVTAj12StnUhYxBmY7lsXFCOjudTn96VySpLGuKpqxlt/hYTf4BGsOrDKEaWPJ5g8GuOu1iVMR1WlS1pdhvc4GivCGnqVuGW/0vW7PzPe8rHBmOwr8vkN7rhRy4v/Is/YWhb7qvb8Spd8xoQPuXMAt/6Pa6NOaHSFBS4odsUlB7iiBR9hLjJPDCZQzf7IE3YYSH2X4t7Zs15XCmsiGmOQ9hXLaExKbA1bJZCAz9AswGqftVMvEe8tGzINpEC7Oyi7yeHx29iIkIKghhtdWXDxW4yz1Zi6fLUf+pKmjpjUeg4hlCdj2fIgAK5p1Y0EEmjQhMG9pQDH3H+HjXvf9b81bsFYcwiYBRY+jwzwrI26wyZEEtuNpmKMdjjvDsCrUYPu8qRpEfrVTPjfUv8GuvRqgKBQa5AhaTeY3U3t9t3W6NlyvJrti/YEEV9nwfG7tM5qy2ZBdsGtNjVZYVBH7TXuKPSF2c9Epxi6Rh5QglNJht+Ly0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jan 09, 2023 at 03:22:29PM +0800, Yin Fengwei wrote: > The idea of the multiple consecutive page (abbr as "mcpage") is using > collection of physical contiguous 4K page other than huge page for > anonymous mapping. This is what folios are for. You have an interesting demonstration here that shows that moving to larger folios for anonymous memory is worth doing (thank you!) but you're missing several of the advantages of folios by going off and doing your own thing. > The size of mcpage can be configured. The default value of 16K size is > just picked up arbitrarily. User should choose the value according to the > result of tuning their workload with different mcpage size. Uh, no. We don't do these kinds of config options any more (or boot-time options as you mention later). The size of a folio allocated for a given VMA should be adaptive based on observing how the program is using memory. There will likely be many different sizes of folio present in a given VMA. > To have physical contiguous pages, high order pages is allocated (order > is calculated according to mcpage size). Then the high order page will > be split. By doing this, each sub page of mcpage is just normal 4K page. > The current kernel page management infrastructure is applied to "mc" > pages without any change. This is somewhere that you're losing an advantage of folios. By keeping all the pages together, they get managed as a single unit. That shrinks the length of the LRU list and reduces lock contention. It also reduces the number of cache lines which are modified as, eg, we only need to keep track of one dirty bit for many pages. > To reduce the page fault number, multiple page table entries are populated > in one page fault with sub pages pfn of mcpage. This also brings a little > bit cost of memory consumption. That needs to be done for folios. It's a long way down my todo list, so if you wanted to take it on, it would be very much appreciated!