From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3CF0C47DAF for ; Mon, 22 Jan 2024 16:01:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5E39F6B008C; Mon, 22 Jan 2024 11:01:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5933A6B0092; Mon, 22 Jan 2024 11:01:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 482286B0093; Mon, 22 Jan 2024 11:01:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3955B6B008C for ; Mon, 22 Jan 2024 11:01:39 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id C7EC31209A3 for ; Mon, 22 Jan 2024 16:01:38 +0000 (UTC) X-FDA: 81707412276.26.DA7CFD5 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf29.hostedemail.com (Postfix) with ESMTP id 30B26120003 for ; Mon, 22 Jan 2024 16:01:35 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b="No+WrQF/"; spf=none (imf29.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705939296; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=QwvPlScHNQDglCdyj7tunZH9CyPtBfqmcTD12A76kbQ=; b=iV3sH0Mb1rs6r7mrjEwzulgFOn7iKtDwEHSSuVxotQ5odAuOI7qwxsWyXpIKJLe77FrVoM u1BDBwm/9m2bdCHEeTPP91WbL99cVRPC9lDgO8ystBR+iZSh5qxWmPsnxqj6cF0NaTpOyG XzXHo2/W0VzU7iM8IbYls6fOH5x0P/A= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705939296; a=rsa-sha256; cv=none; b=r/FCN1CIjmdEbzZQ1204ExL/fVk8697DR//IgWrayv9lBcORxqdMVqZaPG+/uUYog+IihF HkREnaXVLjVWxCty/ghogzxmKo/Haop+Zv5ZdZ7JL4VwwZDcDpC8Lv3xSDzNkJWJQk4wFF SurHTV8UdIK9bzSDzs2k21Whv2qNqsk= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b="No+WrQF/"; spf=none (imf29.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:Message-ID: Subject:To:From:Date:Sender:Reply-To:Cc:Content-Transfer-Encoding:Content-ID: Content-Description:In-Reply-To:References; bh=QwvPlScHNQDglCdyj7tunZH9CyPtBfqmcTD12A76kbQ=; b=No+WrQF/Rfhqcm98KbtpS3iMc+ l5p1bXbObY1ec0k5Vk3y8ao+0wDltsb0YEvrrtDu73qTx1R8Aj7XI4VWpDGGS5NeDuCFAmYO8K4OE 5z+DrlaGb/7WmgAtzy8MJhpklWI7tX7xR3UNx9+pTChxQ0RejDoDXyiKzwkJ+dzyQ0i4UJE59DIVA 87vVINQMeWpT754JkejI/x4Rh+xDaRZaj65E/D9qi0nriDSbIwm7pXZ8nVt7YhrBAXJPTB724mo3Z jycVgwuy4vc/zig9VIqapyploPWhV4qvpols+wAc+4mkRy7Y9JOol3DZNvEmMt4tU+57jNKjTVESA MbW+ajHA==; Received: from willy by casper.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1rRwju-00000000LPS-1c6A for linux-mm@kvack.org; Mon, 22 Jan 2024 16:01:34 +0000 Date: Mon, 22 Jan 2024 16:01:34 +0000 From: Matthew Wilcox To: linux-mm@kvack.org Subject: Project: Improving the PCP allocator Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Rspamd-Queue-Id: 30B26120003 X-Rspam-User: X-Stat-Signature: dfnwaukg7yi73uq9ybkurjefqoec3n7q X-Rspamd-Server: rspam03 X-HE-Tag: 1705939295-962287 X-HE-Meta: U2FsdGVkX18+drl2bYiA4d/XCaKLvZC/i5d5ZWfUI5aQyyoaP16qROGX89zSNnMKQenQVJUPqSYinxXIOje7m0AlO6AYayTXQpihIbmsd/V+sq9HGItismMv8lDwY8scQRoQV4XstzDFjWgDb2FaXds2hEkYYR+EZPJGbIJTWAkm1UZP8ZGFdru/7jv8cMvqYydnMi94p3OFdAVAz6sbatXMVQ3Ej8Nw/kIq/FOLJLCXvCXF9jx85sLynvSaZnRhbsfqZZ1QCi36905+iE42zJXy/gAGq7A+XI7HAYBnM41D7s3JhPtVcBDuDo6h8kXZzm6POKzTufpZq8U+qhYgEEq7I9S9+uGKceW6TblVQh7sWV+8pGjc8mGzx0qonAC3VPAI94Wbvg6gln/7xeelLPcGOsrBjVMMjb+FN25jXe46ie51VrEg5RFKXct3yM9URgpMX8m7+o7rc0Dh0sVvdwenh5sScx6a8TQeACUoY2JAKWNbc7MdO0kDXcJWeYyHdzyQ7wofymRUyR+FvBBpfXCFmOxtE1RZGxOIIV5D6gVMQwgn9kkT4JH8jkBBA+h8B6/7mkhtKheZQidqGImXvU5WeAoal3tl1TZD8D9/gnHmpa3hp+vKPvIjPotpLqomzByXviYD3T22yiwDhVkNUy8MAz7VlQXuIQbxRh41v+sMWe6WRbcciK8JsOsUBglIybb98OQfXmAenfdSunDZ+NpTiTfk1qL5GUqA39+U4W9R/ml5DsQx3dRkm4q+bLazV1nf7rzp5GdoWGojkmG9xnxZyCmu24dzt7n882Ktzsj02vlTpYiegilI+CZEFKavLGwH4wCxBNeGmCDJ9uFBFbNPOww14gXb17RwAMh8VDL+lXM5iwjRZlysBuFduCWKR/mHJh5WJW1YD39dQ/sCj2V0fbQZPhzj2v7p/K3MIGYs9mPlmLSdCvhJBsRp6ERPvCOvNN3700DlbE8vxjM g3YtNkC1 LgWXdc+aVAlBZfr9VOiqQR3YbQyMuXWdETfwF6PMohMO3Q7acGOI0ghV3jEN/9TArlNGAVxP6bbtQbnLVwlDsdDoU9SIwt24C3C3lWW2kdmXptp4nsm2+BuxNGdyNiAPBNk0ZocfqvEcsvR1r7oxP1SHIGyQaiQGuSiGrqUsrESJSTzswlnncyflwgTOS+vvkiJy3vpUm72NT8maQ2zuZkdOZT6Ll+e2K/SS2Q4bx0oLzXSlTTzbY4gWU5nlcaEXYiP0j X-Bogosity: Ham, tests=bogofilter, spamicity=0.000254, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: As I mentioned here [1], I have Thoughts on how the PCP allocator works in a memdesc world. Unlike my earlier Thoughts on the buddy allocator [2], we can actually make progress towards this one (and see substantial performance improvement, I believe). So it's ripe for someone to pick up. == With memdescs == When we have memdescs, allocating a folio from the buddy is a two step process. First we allocate the struct folio from slab, then we ask the buddy allocator for 2^n pages, each of which gets its memdesc set to point to this folio. It'll be similar for other memory descriptors, but let's keep it simple and just talk about folios for now. Usually when we free folios, it's due to memory pressure (yes, we'll free memory due to truncating a file or processes exiting and freeing their anonymous memory, but that's secondary). That means we're likely to want to allocate a folio again soon. Given that, returning the struct folio to the slab allocator seems like a waste of time. The PCP allocator can hold onto the struct folio as well as the underlying memory and then just hand it back to the next caller of folio_alloc. This also saves us from having to invent a 'struct pcpdesc' and swap the memdesc pointer from the folio to the pcpdesc. This implies that we no longer have a single pcp allocator for all types of memory; rather we have one for each memdesc type. I think that's going to be OK, but it might introduce some problems. == Before memdescs == Today we take all comers on the PCP list. __free_pages() calls free_the_page() calls free_unref_page() calls free_unref_page_prepare() calls free_pages_prepare() which undoes all the PageCompound work. Most multi-page allocations are compound. Slab, file, anon; it's all compound. I propose that we _only_ keep compound memory on the PCP list. Freeing non-compound multi-page memory can either convert it into compound pages before being placed on the PCP list or just hand the memory back to the buddy allocator. Non-compound multi-page allocations can either go straight to buddy or grab from the PCP list and undo the compound nature of the pages. I think this could be a huge saving. Consider allocating an order-9 PMD sized THP. Today we initialise compound_head in each of the 511 tail pages. Since a page is 64 bytes, we touch 32kB of memory! That's 2/3 of my CPU's L1 D$, so it's just pushed out a good chunk of my working set. And it's all dirty, so it has to get written back. We still need to distinguish between specifically folios (which need the folio_prep_large_rmappable() call on allocation and folio_undo_large_rmappable() on free) and other compound allocations which do not need or want this, but that's touching one/two extra cachelines, not 511. Do we have a volunteer? [1] https://lore.kernel.org/linux-mm/Za2lS-jG1s-HCqbx@casper.infradead.org/ [2] https://lore.kernel.org/linux-mm/ZamnIGxD8_dOJVi6@casper.infradead.org/