From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C6FEC636D7 for ; Tue, 21 Feb 2023 13:46:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5B2436B0071; Tue, 21 Feb 2023 08:46:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 561B46B0072; Tue, 21 Feb 2023 08:46:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 429CF6B0073; Tue, 21 Feb 2023 08:46:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3003E6B0071 for ; Tue, 21 Feb 2023 08:46:26 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 0A06D140424 for ; Tue, 21 Feb 2023 13:46:26 +0000 (UTC) X-FDA: 80491423572.06.D5AE11C Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf10.hostedemail.com (Postfix) with ESMTP id 51397C000E for ; Tue, 21 Feb 2023 13:46:24 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=C62GxGM8; spf=none (imf10.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676987184; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ajzDzZvH+/TzZE/mD3l/7ZI4K5ZdwDVy8incoYDiFbc=; b=fpZvhkkONWdvs3P2QNj1vr1XrlowSKoW26pBNm0BfGj9k2dxd4JzgVRhDoagaNstusMvwY Pg6OARrVjkmyS9LnuJhJvhlgm7gZ4EaDi9ZYkD5CTjYyCDq0s/JyFhe+hF9g0TY42GQxf0 oAhNWItOxv6rX+VyNzdKuryaj4Snlb4= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=C62GxGM8; spf=none (imf10.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676987184; a=rsa-sha256; cv=none; b=W3ZBT8XiZcXaSh918APPv7zyq4Sr9eU7C7S3bepGB23wTdk2o9HWLCNI7ItfR2mDPwSedB IllTmQTUtcA/qzqCxoaodIQkuyM6yWBAeVryzbVPLTar959Fm6cn4fq2zgN55ujgB+6M25 u5Z7oOKsRYDsUPKDMWU1Ek6Im4kyeTA= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=ajzDzZvH+/TzZE/mD3l/7ZI4K5ZdwDVy8incoYDiFbc=; b=C62GxGM8v2N3CAhj/yhOw3NxF1 PQw7cmTvltko55D8WEIhhVHDiFxu1Ho9x0rwaL1zu7JMDHZ96hyIc4fEYx4OTYyRVyeY/jl2wlhUK TA1+pDnMVnl/Qy2iBBzlIqjqJsHTnBNKco/oUnWNFc/Ek90c3wjSgpdvfXFEYY+mEDKD8uNhW50dZ r4YUKmua66BNQI4XDhtgqaHEQUtLItpypqlGtyAeelhjLLiaE5shcpDJFfnHLig+TML2v5239EUNr T4xJvydH+rNm52qqyG67lMSZkML8s/lFhdTUX8+ghwbY6SRiR5dydEZWRLnEnPXmOvxG9ggEAA+0Z 0HytqGEQ==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1pUSyK-00CeOE-GN; Tue, 21 Feb 2023 13:46:20 +0000 Date: Tue, 21 Feb 2023 13:46:20 +0000 From: Matthew Wilcox To: Pasha Tatashin Cc: lsf-pc@lists.linux-foundation.org, linux-mm Subject: Re: [LSF/MM/BPF TOPIC] Single Owner Memory Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 51397C000E X-Stat-Signature: 3fakb3kw7wsy65p6eazykuywapamcbbo X-Rspam-User: X-HE-Tag: 1676987184-941497 X-HE-Meta: U2FsdGVkX1+PyRuImbHNz87KoEoVyVgN/OHKHsGojzsNxEfJUO3jf0IxkLGqgihjylJS1pcD+1WqC4UrQEZax5xDPQgXAnwcb7UlkHsCIsNunjJCY3O6/F4jaTAZInDmkkRfhmn5cNH4wBwl97UrOec5syK/GgRbgbNGU3kKXkZrDOB3ku10NUDePkgh4o5W7+BKteMnZAkMwxLdk0mbIgo1PmT2m9MpcXCvqwJzf52Z2ItkzMnYBtkqhwSjvxo6xZMLGo+9JFL/H+tAEcJtsVzzcTfJ1jQqA2lOkzS+hmJHZtnHIuDbMv3G/H6ngvVCyZzYW+i9fK737IvO9usCx15VfZ6zRRm288fRO3d7YLfWbBbo0s9LVRiAwUzJq/CVQ36QQapsOTGKsp29uLAlSxgPwCt0UJdOdvaiRKwtVCn16FICOA+A1vuNVLoYE9R1z/BjCE4bJ7VUoQjTj/w5s3TKMqNAmpvq+0otqHg3ikX5RwxHWPqcDtaOzJsyDNqsB/XGqAwN0uYSLRwMJV3NdbXDnIjntgpNE2/Ns/ZdOHS8x6rPlcsmnyk1kt3T6iIWQqv7dQbP09szMaPeLrIqYkE3E4XvII+mO8+Zv/O7rItIr0w00VhtAlvmH5fwthcwcmOrnlPx/itPs1bkwtJn5Bny8ZOxq1ajNqD8gue0F+AuKcECnf4VgokAoSS77mjZZuJPiMHNrsqOAGGOd9TKau48peGkSKHxpH0VQPJVn/II7xwhDOyxbJNuCw0RCAMbNB+hFzAJC9BqopOCC0sKto9ZzdZy3ZzPPrnV3Vx5QRC1akHCgAHaEpWIVGbmyUvHlrZtj9XXW2p7wXUp9pY4guUjhSGrSRFP5xUp6Y5xkwJzrJ0SpIeFhZ645j52ajhce0Q4kgNIjFXLd7gJ7ElxOzrW9X/z7U6H50cXJc5PMZP5TuLcjHPXh8Ip3DhJzRGqDJ3Id3+Ps+r2H4a7oMJ 8gy9VCrE j2XP2z37S1QZYZh1wd65gMqkPg48zVXyHzJsGXuqLL/khdkdK/orCl0rvEKUUpq7ByqZLiru3z/Tf1YtdPhMNvOX6DErwMnVzufDZ6Rh6eFz7UdJXofUjr2LYRQlH8kFAzEgFf00u5nFtrUZE/foCSFZu7g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Feb 20, 2023 at 02:10:24PM -0500, Pasha Tatashin wrote: > Within Google the vast majority of memory, over 90% has a single > owner. This is because most of the jobs are not multi-process but > instead multi-threaded. The examples of single owner memory > allocations are all tcmalloc()/malloc() allocations, and > mmap(MAP_ANONYMOUS | MAP_PRIVATE) allocations without forks. On the > other hand, the struct page metadata that is shared for all types of > memory takes 1.6% of the system memory. It would be reasonable to find > ways to optimize memory such that the common som case has a reduced > amount of metadata. > > This would be similar to HugeTLB and DAX that are treated as special > cases, and can release struct pages for the subpages back to the > system. DAX can't, unless something's changed recently. You're referring to CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP > The proposal is to discuss a new som driver that would use HugeTLB as > a source of 2M chunks. When user creates a som memory, i.e.: > > mmap(MAP_ANONYMOUS | MAP_PRIVATE); > madvise(mem, length, MADV_DONTFORK); > > A vma from the som driver is used instead of regular anon vma. That's going to be "interesting". The VMA is already created with the call to mmap(), and madvise has not traditionally allowed drivers to replace a VMA. You might be better off creating a /dev/som and hacking the malloc libraries to pass an fd from that instead of passing MAP_ANONYMOUS. > The discussion should include the following topics: > - Interaction with folio and the proposed struct page {memdesc}. > - Handling for migrate_pages() and friends. > - Handling for FOLL_PIN and FOLL_LONGTERM. > - What type of madvise() properties the som memory should handle Obviously once we get to dynamically allocated memdescs, this whole thing goes away, so I'm not excited about making big changes to the kernel to support this. The savings you'll see are 6 pages (24kB) per 2MB allocated (1.2%). That's not nothing, but it's not huge either.