From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2295C636D7 for ; Tue, 21 Feb 2023 14:37:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 601DB6B0075; Tue, 21 Feb 2023 09:37:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5B0C76B007B; Tue, 21 Feb 2023 09:37:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 477B36B007D; Tue, 21 Feb 2023 09:37:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 31C3C6B0075 for ; Tue, 21 Feb 2023 09:37:57 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 031F2405A8 for ; Tue, 21 Feb 2023 14:37:56 +0000 (UTC) X-FDA: 80491553394.26.3DE2285 Received: from mail-qv1-f47.google.com (mail-qv1-f47.google.com [209.85.219.47]) by imf05.hostedemail.com (Postfix) with ESMTP id 2739E10000B for ; Tue, 21 Feb 2023 14:37:53 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=soleen.com header.s=google header.b=lYqrSo4l; dmarc=none; spf=pass (imf05.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.219.47 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676990274; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pQA9KnPznn/Y+/xGZwKh1R4SvsHnU7NqGNRZZMEEmOs=; b=6mGptnZ2suNDNz41lgFqg8I6Xd+GXXZHOsrRi+NwfBKd0o70bvJvE3UwlmzppNMlhlwp6M MrYXwInZFmpg8RjyxchqTQIKSR9NIf1TiZ1g7nwJGChWip8WypZDZazG4Pz+BSobrdLU7Z xhfidsSxIhgguBWcrybjRdHRJ12ORPQ= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=soleen.com header.s=google header.b=lYqrSo4l; dmarc=none; spf=pass (imf05.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.219.47 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676990274; a=rsa-sha256; cv=none; b=CZtOx6pWN9VD8OTCCnkORMgyoI6Y8G46imDB8PcqyOTsX7i4/JWzzcOzTACFfRC/LxjH7R 0vNZq8tLuDJsFAJhRPEr8Qb9dyHPsJr62NazjWjKB+HVUCauUl8vTNQ7+ShjhTYw0ZVGQ2 ELPs/pJz5cbbT6AoZHOdnVl4KP1j0CI= Received: by mail-qv1-f47.google.com with SMTP id f1so5721328qvx.13 for ; Tue, 21 Feb 2023 06:37:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; t=1676990273; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=pQA9KnPznn/Y+/xGZwKh1R4SvsHnU7NqGNRZZMEEmOs=; b=lYqrSo4lzpOpNrl1tgnopeDq1UlvPzGrK5qz/X1IOZx93Jsa1jPvpoPdD4SguJqkPF 45aG06BSQcO8ssZxXyiFXymgpH5GRCGl9HnbHieUOvlTL5KgK81/1UkpF0gZdsc8efMb ixXa+Cc7TTfH5KOeAWtWVeu9rsamkqiwANVt0cOtKjfsY/HpyoHZsESgbwf90BSuQ58b aDwY54gwYsfu/NMdyAa/mlaN5lYM51bPe7ckNU5fJVqfNUe0Pom+bsDHn9sFvDC81MQq ZGE6J39biaOnCno6DsPEfSB/YkG5oaB/nHADFLR24QCH4Mx6c1ZXA5wvNO5JqEBCHLhP 2N0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1676990273; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=pQA9KnPznn/Y+/xGZwKh1R4SvsHnU7NqGNRZZMEEmOs=; b=0nskEsPHNVNul0btW6Xn6LOj2l9+g0SCt+WRefUSQ6CfRYwW8vHoM4XNrwGkccEn+S 7RNGgh9lcWr3spQmmr5nBFtlSb/obZ2lk4cz+oDdHhNuFICyvZ9aege1gFWQ0Xbk3m/5 X+FzcezSivaXJ2otCJvCghUpgCFiPhIVz7RgalfEM1FTDIVK1wiYiH+h5mgW5n08iY3M OkgpLb+KiNR7egYhEWFXcHQXtUKHwWqFhTPY+PenImCyC2uI5Y5OjYpd6joSlDIak6d2 NsdtSSTtRQ/JR3j9x1NVF+wCTfXuJ9rTN5tuTH/4Xv7iQ9//IDFtwPV3Lwgge3d7zN+J GFeA== X-Gm-Message-State: AO0yUKVwyIm+EZo2PS+Lnj/Bp02uSw/m8RKKDde5Nt3ZkUttjUVIQQtN RerNN5SAC54bQ5ArojpQBFHAJ6tQ67jWZMnnzBJh+eotoEuDCAiG X-Google-Smtp-Source: AK7set8V/Wxi6wMsPz3vzPItOxhBX5nt/HXKDtHC2Bokt+JYLIiPmc09c1dHYO/y3IstRL7C9HSv00r2PL4JOSvsQAA= X-Received: by 2002:a0c:b390:0:b0:571:bb7c:3bd4 with SMTP id t16-20020a0cb390000000b00571bb7c3bd4mr316584qve.82.1676990273234; Tue, 21 Feb 2023 06:37:53 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Pasha Tatashin Date: Tue, 21 Feb 2023 09:37:17 -0500 Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] Single Owner Memory To: Matthew Wilcox Cc: lsf-pc@lists.linux-foundation.org, linux-mm Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 2739E10000B X-Stat-Signature: 845gca9az3yswqiepiqbd1no83ema3j3 X-HE-Tag: 1676990273-603247 X-HE-Meta: U2FsdGVkX1+HzivOM7GcN45gv2Zo4F4AoRYTo0L6+CG/ywYQRRNsw1cgs6rMtBjf/98Pi9fbITaNpwDnfb/8EcIjHTesty2oSgJoAL0cONhtZnjGYTgpr+S5W7K9LLCqiB9s71WQE9eM5N60IroDiz9uGNPK1A+QcehbCt34VkHs5jjXJlN8UYLFtp0BksWDe3GyqktL+NhaQXSpr+LjAyvTC/RuFKxY2mo1//xEKqbXIB3Zsu3Bx7wDvjW0T+kop1kvV5lTa77doiCP93tpGMmK3Zq/zEAt67IYyFodl8PS/kDkZzceJrJld6515xi/VqgQCGUUcdsfOyZra9ZvdzFNjNmYoWOx3w85HSQY9NIjrvulU+MUTF2a3jQ6GBqO4NyN9orBeMoljZD2/mAepcRu8y2LgNwCxHDaZ2i8TFKWYXKjlrmaw96anTw51SEnL7BqzsraUxN6TL59rcNywgwmm+kZm8oNWyF/xwipw9QM6Pd0+lA3KjU/MlKHZTma9LKy70ZwxjKHXZ3f9zV7RPQkzEdC3heDGQCbuMSOfUt+OGZIIRZE1JUAz5x6Eu6shAuAEMoMneHM03ALPVHbjGHgDGbjZKIMZK8WRbpmwlbBXknnr6lbpoCxXqMC3ZelKRBr/Qs9gl1S9jAI4Nal/2EU9p5SZDDdrQ+oSCRZLNfLzSQ4Lq7zhIo7i1jsd90rzhyZ8/SMXq1r8m1DaaQsmbLHMii9OG5wXhKMfcpZhT1AlpIfwQIxyuGt7bCsTAe9XCL4v1me+A9z+GfZLYMD7qVs7ClDg/r6kjzy8TWQylqCjl/em8CDLu8Tv3ppIr87dAo9gjeqQt7Gw07LeXz01lsatphP/HftHvXUKy+2PtOA/KLbxAm+c/vbQ0WxfP3zy2zInAKPABNYGcUf7Um+4UZ37E433cmjazM7u/CADQ38It8wGyWD6K9pPkA0QoKzC6STW24k9UpiZ9UnsLe dMZBRZeL 4hAgUK1IeVnq3P0AEa/KAQM9LV+bZg3ifM78FL7s7y1rUD8AikqJ40GLTW9ac3PvCrTDZl/SE4PDgPqOfhrtiHjhdHxxO/ZHpTTOsW2lxn7SfmMLwBrN4I1WCGLwVZDG0mMi59i57o4iwrXo8VzlhNirhMI4TpVwt9CXh6H6IBDpbw//HpKtV2EKe8neIDxkcL9Zn8/aC+hj/Dj2g+Z5gp/jjW2XQQQSiDqmSgwxNcEDHc4EsfrDrpq3AzTIgB6qB3B3+OAFsBUAQ4ZB5x8X188dp4Y+NileP2bB2SU/H2y55m1tJS7+zhYLrI7agDfdhLulHgw9rO1QLdeE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hey Matthew, Thank you for looking into this. On Tue, Feb 21, 2023 at 8:46 AM Matthew Wilcox wrote: > > On Mon, Feb 20, 2023 at 02:10:24PM -0500, Pasha Tatashin wrote: > > Within Google the vast majority of memory, over 90% has a single > > owner. This is because most of the jobs are not multi-process but > > instead multi-threaded. The examples of single owner memory > > allocations are all tcmalloc()/malloc() allocations, and > > mmap(MAP_ANONYMOUS | MAP_PRIVATE) allocations without forks. On the > > other hand, the struct page metadata that is shared for all types of > > memory takes 1.6% of the system memory. It would be reasonable to find > > ways to optimize memory such that the common som case has a reduced > > amount of metadata. > > > > This would be similar to HugeTLB and DAX that are treated as special > > cases, and can release struct pages for the subpages back to the > > system. > > DAX can't, unless something's changed recently. You're referring to > CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP DAX has a similar optimization: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.2&id=e3246d8f52173a798710314a42fea83223036fc8 > > > The proposal is to discuss a new som driver that would use HugeTLB as > > a source of 2M chunks. When user creates a som memory, i.e.: > > > > mmap(MAP_ANONYMOUS | MAP_PRIVATE); > > madvise(mem, length, MADV_DONTFORK); > > > > A vma from the som driver is used instead of regular anon vma. > > That's going to be "interesting". The VMA is already created with > the call to mmap(), and madvise has not traditionally allowed drivers > to replace a VMA. You might be better off creating a /dev/som and > hacking the malloc libraries to pass an fd from that instead of passing > MAP_ANONYMOUS. I do not plan to replace VMA after madvise(), I showed the syscall sequence to show how Single Owner Memory can be enforced today. However, in the future we either need to add another mmap() flag for single owner memory if that is proved to be important or as you suggested use ioctl() through /dev/som. > > The discussion should include the following topics: > > - Interaction with folio and the proposed struct page {memdesc}. > > - Handling for migrate_pages() and friends. > > - Handling for FOLL_PIN and FOLL_LONGTERM. > > - What type of madvise() properties the som memory should handle > > Obviously once we get to dynamically allocated memdescs, this whole > thing goes away, so I'm not excited about making big changes to the > kernel to support this. This is why the changes that I am thinking about are going to be mostly localized in a separate driver and do not alter the core mm much. However, even with memdesc, today the Single Owner Memory is not singled out from the rest of memory types (shared, anon, named), so I do not expect the memdescs can provide saving or optimizations for this specific use case. > The savings you'll see are 6 pages (24kB) per 2MB allocated (1.2%). > That's not nothing, but it's not huge either. This depends on the scale, in our fleet 1.2% savings are huge. Pasha