From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2497C636D7 for ; Wed, 15 Feb 2023 15:13:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 546096B0072; Wed, 15 Feb 2023 10:13:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4F60A6B0073; Wed, 15 Feb 2023 10:13:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3BDC36B0074; Wed, 15 Feb 2023 10:13:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 285756B0072 for ; Wed, 15 Feb 2023 10:13:54 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id E92E2C086D for ; Wed, 15 Feb 2023 15:13:53 +0000 (UTC) X-FDA: 80469871146.09.399AE21 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf30.hostedemail.com (Postfix) with ESMTP id 539E780021 for ; Wed, 15 Feb 2023 15:13:49 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=jaRY85sD; spf=none (imf30.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676474031; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=K+HsKECr0fW7mbtxAp0H9G5uO3+Me+CphwL20ugGLYE=; b=F+iB4KBoh4k5V0WxNHvc1tcvgnS0w1Hl6O/2iFG5n/GoU9rDwqnDs3wYVJ1ZciNNAsp+qe qgCI0YkoG+gl2SRLui4oFmTjfu/io+a/4qMBqqCXPEsjTEUbGa2iboFyN+h5xn1A/BE3nf Kir+0y/Wkao4tB8UlyrjHlnG5iX3syg= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=jaRY85sD; spf=none (imf30.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676474031; a=rsa-sha256; cv=none; b=3n9VMhiCU8qJh7Mo03Y22ptGiRMWSGJATMfFlOe1i6vTBEk75YnLaZSbkqbstbOHh/Yw7w F1UnlB1lUDVhtnIRHZ4VM67VKtx7ZZwluF+yp45e0RQPErJxY61HtHRI50OKanfsNQ0rOg 0FsHINONL8NcfzBw3YVE9WtkePa8MfQ= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Transfer-Encoding: Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date: Sender:Reply-To:Content-ID:Content-Description; bh=K+HsKECr0fW7mbtxAp0H9G5uO3+Me+CphwL20ugGLYE=; b=jaRY85sDFkFQDMpt6uy87AXefH pe1AhhJAjEr+guYkoK+ZfjpoOxqfq+eBs6dEGbqJ3MzhpcecxOEpWGZXjU8cDRtaxdxfbgIWzeLw6 MRfAruy1nL8acfjibrtI6XM1Y6knXtPZXQWIqWVFs5Je8XfXxF9WW8SnYXUQj4/kVCylCN5c+nGHw TQWkMJOwdcqaatFgUK2/XSbhZaHBE8mtcNdVG0fZg0sEJQp9KgNg1wI8H5+g0eNcZOeOh2Eo67Yv+ DPkWZyNRZ8LH3kBCEqL5Kkq63HNVJ1CifIrriOGVlCLIf5HRctQeX9Uz0bzpZvp/g9VRemRzwBqNy Fn/mc/Ig==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1pSJTe-007Z9g-8D; Wed, 15 Feb 2023 15:13:46 +0000 Date: Wed, 15 Feb 2023 15:13:46 +0000 From: Matthew Wilcox To: Ryan Roberts Cc: Linux-MM , Catalin Marinas , Mark Rutland , Ruben Ayrapetyan Subject: Re: Folios for anonymous memory Message-ID: References: <4c991dcb-c5bb-86bb-5a29-05df24429607@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <4c991dcb-c5bb-86bb-5a29-05df24429607@arm.com> X-Rspamd-Queue-Id: 539E780021 X-Stat-Signature: t3bmmpwef3xwh4d7bjxkkz4emdebaf9j X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1676474029-941434 X-HE-Meta: U2FsdGVkX1/n8H96kCDyBCiIbs1Zf9+smnzNXLTtbZr9t1meLEQe6jcSAq2c3iVcjXFmieofdTCNZofF+uso0ZyPWpxzGqFqcWd43uHEV5iZ/GAxJxiv9ELDUvdJlYVE6iiEseBLgFP8vpUNCJixMRfzmVQQsy7ZDP/F/EBLPdhvk3ci4tzw2L0XCJcUL7lbXCWA/tjFuNKhSVbq0z4jxufW7lig6jOagtr0PRgGz+oWAuzflHShnfHG1FU7EzoYbDmAxxnhTmSUhLtU3zs2cSgvtsRvYoyZHiPWkss6iGlqBdfUjLX2Pw8D1MKfm68/NXzpZ50piApKSstKhN/lrBRZyltpb2CLSgZfgyNnjO+bPz1KxxG5ZITWK58e8n5J1kHm+cFFY9A8PB5tPcMC0JOrGqmPhSefKIHiukEnIwLqzCZuT9762wTpvfmPuxNx2ZSmfmiljrVSvTCGXbSLOsSIzGz+lM+nqOgsJNUNM5X+v1RHKdgwGzofNVPr3gckWYp6NYfu8rlh2K5JBRnc1kEbzhhrlX730Dg6uAEgsTV/e+IrJBBiKExyfgqCGo/y2VOn0TLXlMRoqBaRbUCVhzehV0XHA+cfaHmGxRP+kvJJhd/TaGEuHNKake3t1b31SpRt7rquyQAQU36jlhCpf+YuIdeZ2iTu8LeZbT/lsJ9h4fhJf6YYUdIKVswpYIK25HF7Gm/OM0Wcab2GzJKSUsCeyx1/dLqtJ5K4PxMs2VY7jsOTKyrM9bm7cOlQjt/BaKGVw1XgY6VQ7BYTGbd+zbsFpfjUfBdb2wiuA1ta31zp3m3vWpyWDCDJtJudkQrf/+56KCnUR+TT0WfqkkCjzrfoHNHohJVfrWLtzcNqTHixxLVUiyDCACDFH7NOTBSKd3siSjcB/J1smbjOn+u5E3LcmwS0r8X8UBOdZhNtO+NaBpVtGaNXsAbGbxmdJAkFSuD+P0+Pfsbph43UbaY ZHkmNqZu 68KqRCR3VooQhGFRNsniz1ACCZ0Eh6+DdiqTjrcJsUN2Z7ZTH8momhx8dxK5zFXzbkCA07qvold9M/fhSaKDGr6XcHPVhaTiRGkisJ+xciDeTGMQ9hSugWa8XGGOv7fkRkZCAesb0tXh9EV5vE/P5JeryBOuPnEjyq504WaDvOrw/z5k7K9gvH1qrWGj/3o7PhsdNBlXIJQET/0K/QN1fpXkvNYvghC6ROUEi6f8hawUN/TYjvtG9DW1IoTfgVoy4nu0JoLdnDNeWU2c= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Feb 15, 2023 at 12:38:13PM +0000, Ryan Roberts wrote: > Kernel Compilation: > Speed up due to SW overhead reduction: 6.5% > Speed up due to HW overhead reduction: 5.0% > Total speed up: 11.5% > > Speedometer 2.0: > Speed up due to SW overhead reduction: 5.3% > Speed up due to HW overhead reduction: 5.1% > Total speed up: 10.4% > > Digging into the reasons for the SW-side speedup, it boils down to less > book-keeping - 4x fewer page faults, 4x fewer pages to manage locks/refcounts/… > for, which leads to faster abort and syscall handling. I think these phenomena > are well understood in the Folio context? Although for these workloads, the > memory is primarily anonymous. All of that tracks pretty well with what I've found. Although I haven't been conducting exactly the same experiments, and different hardware is going to have different properties, it all seems about right. > I’d like to figure out how to realise some of these benefits in a kernel that > still maintains a 4K page user ABI. Reading over old threads, LWN and watching > Matthew’s talk at OSS last summer, it sounds like this is exactly what Folios > intend to solve? Yes, it's exactly what folios are supposed to achieve -- opportunistic use of larger memory allocations & TLB sizes when the stars align. > So a few questions: > > - I’ve seen folios for anon memory listed as future work; what’s the current > status? Is anyone looking at this? It’s something that I would be interested to > take a look at if not (although don’t take that as an actual commitment yet!). There are definitely people _looking_ at it. I don't think anyone's committed to it, and I don't think there's anyone 50 patches into a 100 patch series to make it work ;-) I think there are a lot of unanswered questions about how best to do it. > - My understanding is that as of v6.0, at least, XFS was the only FS supporting > large folios? Has that picture changed? Is there any likelihood of seeing ext4 > and f2fs support anytime soon? We have some progress on that front. In addition to XFS, AFS, EROFS and tmpfs currently enable support for large folios. I've heard tell of NFS support coming soon. I'm pretty sure CIFS is looking into it. The OCFS2 maintainers are interested. You can find the current state of fs support by grepping for mapping_set_large_folios(). People are working on it from the f2fs side: https://lore.kernel.org/linux-fsdevel/Y5D8wYGpp%2F95ShTV@bombadil.infradead.org/ ext4 is being more conservative. I posted a patch series to convert ext4 to use order-0 folios instead of pages (enabling large folios will be more work), but I don't have any significant responses to that yet: https://lore.kernel.org/linux-fsdevel/20230126202415.1682629-1-willy@infradead.org/ > - Matthew mentioned in the talk that he had data showing memory fragmentation > becoming less of an issue as more users we allocating large folios. Is that data > or the experimental approach public? I'm not sure I have data on that front; more of an argument from first principles -- page cache is the easiest form of memory to reclaim since it's usually clean. If the filesystems using the page cache are allocating large folios, it's easier to find larger chunks of memory. Also every time a fs tries to allocate large folios and fails, it'll poke the compaction code to try to create larger chunks of memory. There's also memory allocation patterns to consider. At some point, all our low-order pools will be empty and we'll have to break up an order-10 page. If we're allocating individual pages for the filesystem, we'll happily allocate the first few, but then the radix tree which we store the pages in will have to allocate a new node from slab. Slab allocates 28 nodes from an order-2 page allocation, so you'll almost instantly get a case where this order-10 page will never be reassembled. Unless your system is configured with a movable memory zone (which will segregate slab allocations from page cache allocations), and my laptop certainly isn't. I don't want you to get the impression that all the work going on is targetted at filesystem folios. There's a lot of infrastructure that's being converted from pages to folios and being reexamined at the same time to be sure it handles arbitrary-order folios correctly. Right now, I'm working on the architecture support for inserting multiple consecutive PTEs at the same time: https://lore.kernel.org/linux-arch/20230211033948.891959-1-willy@infradead.org/ Thanks for reaching out. We have a Zoom call on alternate Fridays, so if you're free at 5pm UK time (yes, I know ... trying to fit in both California and central Europe leads to awkward times for phone calls), I can send you the meeting details.