From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4D6ABC47E49
	for <linux-mm@archiver.kernel.org>; Wed, 24 Jan 2024 19:18:28 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id DB27B8D0007; Wed, 24 Jan 2024 14:18:27 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id D3C578D0001; Wed, 24 Jan 2024 14:18:27 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id C03958D0007; Wed, 24 Jan 2024 14:18:27 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16])
	by kanga.kvack.org (Postfix) with ESMTP id ABE588D0001
	for <linux-mm@kvack.org>; Wed, 24 Jan 2024 14:18:27 -0500 (EST)
Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay02.hostedemail.com (Postfix) with ESMTP id 6B4191204E8
	for <linux-mm@kvack.org>; Wed, 24 Jan 2024 19:18:27 +0000 (UTC)
X-FDA: 81715165854.14.DECCD83
Received: from gentwo.org (gentwo.org [62.72.0.81])
	by imf12.hostedemail.com (Postfix) with ESMTP id C6F3040012
	for <linux-mm@kvack.org>; Wed, 24 Jan 2024 19:18:25 +0000 (UTC)
Authentication-Results: imf12.hostedemail.com;
	dkim=none;
	dmarc=fail reason="No valid SPF, No valid DKIM" header.from=linux.com (policy=none);
	spf=softfail (imf12.hostedemail.com: 62.72.0.81 is neither permitted nor denied by domain of cl@linux.com) smtp.mailfrom=cl@linux.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706123905; a=rsa-sha256;
	cv=none;
	b=YzSDo9qeZl8dlfVL6d2rSjuaFnoVfe20xtcFSNI/722jgJofqn47GSFjf18UJxFHCnWSmR
	/ePeUUxTV+D4NQ4drNv01VPMf5KXJiPSU94iWyBQUQB+l/RZ/co+zQgylgMfUSuoVO0zI8
	gLoUoRWr2A0uOOACpqoeExsyNiAejn8=
ARC-Authentication-Results: i=1;
	imf12.hostedemail.com;
	dkim=none;
	dmarc=fail reason="No valid SPF, No valid DKIM" header.from=linux.com (policy=none);
	spf=softfail (imf12.hostedemail.com: 62.72.0.81 is neither permitted nor denied by domain of cl@linux.com) smtp.mailfrom=cl@linux.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1706123905;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=AQzBo0MUT+5QK8UATheghs68UNDIYH3LZY/zvnAGlVs=;
	b=KhNvs1rnXGxzqMbQu3/eoxPP6cR1xb0FLtodaq8QshDFy9V8ZPMEOb/6H2Tlkgo8iRUT+4
	jJ3R+T6Xno3sWvUegnsVD9SlOM6RZlspb+az0syICwfiQIsfXJZdaKeMSPSnAyAWRK2DtC
	8QfAwi5r1xtnqx2BUR8k2bY3hl9RYms=
Received: by gentwo.org (Postfix, from userid 1003)
	id 5DB2840A94; Wed, 24 Jan 2024 11:18:24 -0800 (PST)
Received: from localhost (localhost [127.0.0.1])
	by gentwo.org (Postfix) with ESMTP id 5CE5840787;
	Wed, 24 Jan 2024 11:18:24 -0800 (PST)
Date: Wed, 24 Jan 2024 11:18:24 -0800 (PST)
From: "Christoph Lameter (Ampere)" <cl@linux.com>
To: Matthew Wilcox <willy@infradead.org>
cc: linux-mm@kvack.org
Subject: Re: Project: Improving the PCP allocator
In-Reply-To: <Za6RXtSE_TSdrRm_@casper.infradead.org>
Message-ID: <a68be899-ea3c-1f01-8ea2-20095d8f39e9@linux.com>
References: <Za6RXtSE_TSdrRm_@casper.infradead.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII; format=flowed
X-Rspamd-Server: rspam08
X-Rspamd-Queue-Id: C6F3040012
X-Stat-Signature: tzxfmd549k1ii8ghnsaiw17f1g3ixeyp
X-Rspam-User: 
X-HE-Tag: 1706123905-302007
X-HE-Meta: U2FsdGVkX1+peC9ZtX380XPSQjFq/rktFpSak+aFmdpWBj1gnKH2/QRuDp5QeNm3lF0f7dGks8ej3FvVZTpugKkWeiq4Or3QdJSpPkfWop1QC9bpTpvpd0Skfug1H6ZTekdBBTCLlDhaVJoAeSH8KskYm8fglHqh74Em8I9ppnppyAbEg6JhvW60EAVuUaFHHNciNkdysGgRVmqlXRaFHHxbkY3SjYJe1sJycUHsdTd8n7MWLXAMfKMSwSO0ICujqojUGP2Xyz1U/uRGtv8z+3DhDf5lKr8fxbzDJipNLaDkUEd3dfuA/DvW8Q1s+Cexg/TeoNv41onrFme3u+GwE7wcgcTIIu+k+8LM43j413cDYoazoqK3U84n/XBhXyTqkEUXVepyACNpBav3o/sR2JmU1UrIGBQUfdh7O7/sXxZzOtQx8d/zXNr1Y1YAhctChFAYAcUYHO905Rlpg5QObc67UELl3uZtAmiiMjbW5oB2hci/icONaYhKcESyzhkiYI4C8B81N009UuZPBlpiD0oQl/Aji9uo2LXrBbuvO1fyAP546IuiR+8pWr0vfx/EiYFEJshBq7GcED+9LdLIqMZerFjn3iFb8YePmEny4iRc642YrdpTm19hsjErvivSG22W1F6qVF/1Vz9/n5hAGbYX6ytuQG3jTsyP8hL1YKVEFkHnCRKuRXC9Pm5Go2CXA/0hQUdjIaqyZ799wz80RimaIYCPop/u5e40Tcv04RdsnQ+pEZOTXgzy1DwVY45JH02wyqPDnQA/STSbZ0xlwEbh50qyGeF8pfUNEI8EUmMla2smPq3+HeTJMc+czKFTlbJRI6LcxsnZy8huJYaV/0BWw7NiQwPN32pNZ1ltzm3mG560qYDjcSqkiKEL2VKeSX680G3B654sLb4sqA7OurcGFj5UB7mfNpToZSVN+iENHovqFG3xAMg1ORvd2N++c75QQVGWnPMWb/JNEJE
 1NhwR+4m
 NJYWBTl79Pn8hW3I6MqZmrhIWCb0VQipR2JQTAYGzMrkCabRS1cDaH72hz7JtrpDdEzuhw3UTzoVHNlOwr4On6QOZOg/KvrjBr8EHG3YPyaPGcmdm8ljvJ6/ELbe39X7qwALG
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Mon, 22 Jan 2024, Matthew Wilcox wrote:

> When we have memdescs, allocating a folio from the buddy is a two step
> process.  First we allocate the struct folio from slab, then we ask the
> buddy allocator for 2^n pages, each of which gets its memdesc set to
> point to this folio.  It'll be similar for other memory descriptors,
> but let's keep it simple and just talk about folios for now.

I need to catch up on memdescs. One of the key issues may be fragmentation 
that occurs during alloc / free of folios of different sizes.

Maybe we could use an approach similar to what the slab allocator uses to 
defrag. Allocate larger folios/pages and then break out sub 
folios/sizes/components until the page is full and recycle any frees of 
components in that page before going to the next large page.

With that we end up with a list of per cpu huge pages or so that the page 
allocator will serve from similar to the cpu partial lists in SLUB.

Once the huge page is used up then the page allocator needs to move on to 
a huge page that already has a lot of recent frees of smaller fragments. 
So something like a partial lists can exist also in the page allocator 
that is basically sorted by available space within each huge page.

There is the additional issue of different sizes to break out so it may 
not be as easy as in the SLUB allocator because different sizes are in one 
huge page.

Basically this is a move from SLAB style object management (caching large 
lists of small objects without regard to locality which increases 
fragmentation) to a combination of spatial considerations as well as list 
of large frames. I think this is necessary in order to keep memory as 
defragmented as possible.

> I think this could be a huge saving.  Consider allocating an order-9 PMD
> sized THP.  Today we initialise compound_head in each of the 511 tail
> pages.  Since a page is 64 bytes, we touch 32kB of memory!  That's 2/3 of
> my CPU's L1 D$, so it's just pushed out a good chunk of my working set.
> And it's all dirty, so it has to get written back.

Right.

> We still need to distinguish between specifically folios (which
> need the folio_prep_large_rmappable() call on allocation and
> folio_undo_large_rmappable() on free) and other compound allocations which
> do not need or want this, but that's touching one/two extra cachelines,
> not 511.

> Do we have a volunteer?

Maybe. I have to think about this but since I got my hands dirty years ago 
on the PCP logic I may qualify.

Need to get my head around the details and see where this could go.