From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0CB11C001DF for ; Mon, 31 Jul 2023 12:51:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 54B9E28003A; Mon, 31 Jul 2023 08:51:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4D4B8280023; Mon, 31 Jul 2023 08:51:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 34D9828003A; Mon, 31 Jul 2023 08:51:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 20687280023 for ; Mon, 31 Jul 2023 08:51:09 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id AB6951C99DC for ; Mon, 31 Jul 2023 12:51:07 +0000 (UTC) X-FDA: 81071892174.17.5DA4A9E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf24.hostedemail.com (Postfix) with ESMTP id 32728180011 for ; Mon, 31 Jul 2023 12:51:05 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ZBz6e0uG; spf=pass (imf24.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690807865; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=84KWZN3/Cp66pCpih1k+aMiQI8Xpxjc8sclddBLm/oE=; b=UdD34yUfy7RZGRL3JfccqwWbPx9MfOBp6jjqUTn8YbrKQbCrv1wRlDVcn9rWu87x5Hklnr 1utjTLGWhqoMUgAMjtBDbWtPPUbK+JHRYQ2hjdCXv/7I/JL32AZ+jo5QwfELNs8FvZ5O0u EmVwXRadsql0lPv+AWlbdGJbpT7wp3I= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690807865; a=rsa-sha256; cv=none; b=PkfQ4HchbHtSxhLqGpZI+9WcpCJdUexrHNAYBjL2TnRHyg42pCUuIX8NOto8FKIh2x8eyz 0pgrF2a/0qEcXsW/mZ58Qc71a+CHwBfYq2DDm7y+2+ud3KU1Xr5fbZjM+ZaHspPJVXJarf VClaVEiO4aBtPPzuJeOnUiSnxps4T7Q= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ZBz6e0uG; spf=pass (imf24.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1690807864; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=84KWZN3/Cp66pCpih1k+aMiQI8Xpxjc8sclddBLm/oE=; b=ZBz6e0uGK+ikpJSWFCHA3cofPrtiG5PddifYSsWJ+XbnfyFfbbTvR4rIXcmS5vxCy4x7AB 5XfVcYvQP2q2EUOaFyT7wwsdqeciUd7kj11UcgNsBA/4gMTbk1XAEC+TdMX9v9Fer4j2M6 18CiqEUIFWyfwxVQj0pR8y/Js9E5O4w= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-564-Z1Mn1M_UNV2iTAKFjIoPDA-1; Mon, 31 Jul 2023 08:51:02 -0400 X-MC-Unique: Z1Mn1M_UNV2iTAKFjIoPDA-1 Received: by mail-wr1-f69.google.com with SMTP id ffacd0b85a97d-30e6153f0eeso2143849f8f.0 for ; Mon, 31 Jul 2023 05:51:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690807862; x=1691412662; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=84KWZN3/Cp66pCpih1k+aMiQI8Xpxjc8sclddBLm/oE=; b=dkAqYFF3SSmjJ0GsIDLCXUQuIak77uL0AguXViQog4O2QOIbw5+NoV7AI4htmIMEzD 6YJI/Hi9vsxey5PnyNC8L0X5T1qdg8ZxHk1yR/Z/uqQFZYdZSnsNTO0lJWzJ7pr4m7jc sBZyKYhBQ+ELeMTX6A15AYzqiO4OLL3sVQLBCxxKz5I8TObql8Ly2RAKBdaF/iEPRorC 7iIpADa7h6UilKcv5m591hMtW91zYOO7gvodiOoUO7YTBYUz5QYjNb6laQrm6zqqyEHK RiEPlV1xQ07egQyGuI9UdJK5hFOFn7S2ARctzVvav7fo6b4n8lb3VBhv3PdDkrMF0wsN 5nbw== X-Gm-Message-State: ABy/qLZByPLsa4tRWiphCljkPYKGDtF7yP8t2Z3AHeYaPQH2UEwBaUx3 Bx9+IeWZd49UHV3Ne1amncmamufBtVKajVPnKULnGT8Mo/Tzxn2SNqh20214WsMuQ4ppuF8EvQG lCTojJmcp7yY= X-Received: by 2002:adf:ea4b:0:b0:317:5849:c2e0 with SMTP id j11-20020adfea4b000000b003175849c2e0mr7238455wrn.9.1690807861763; Mon, 31 Jul 2023 05:51:01 -0700 (PDT) X-Google-Smtp-Source: APBJJlEJQrfVikayMf0IsKJjnYIPL3QRn3J9y6TzxNIyAXEm2FG75J8eow3X9SP4R2jpj0KHNj+FyQ== X-Received: by 2002:adf:ea4b:0:b0:317:5849:c2e0 with SMTP id j11-20020adfea4b000000b003175849c2e0mr7238436wrn.9.1690807861313; Mon, 31 Jul 2023 05:51:01 -0700 (PDT) Received: from ?IPV6:2003:cb:c723:4c00:5c85:5575:c321:cea3? (p200300cbc7234c005c855575c321cea3.dip0.t-ipconnect.de. [2003:cb:c723:4c00:5c85:5575:c321:cea3]) by smtp.gmail.com with ESMTPSA id a10-20020a5d508a000000b0031773e3cf46sm13111204wrt.61.2023.07.31.05.51.00 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 31 Jul 2023 05:51:00 -0700 (PDT) Message-ID: Date: Mon, 31 Jul 2023 14:50:59 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 To: Matthew Wilcox , Rongwei Wang Cc: linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, "xuyu@linux.alibaba.com" References: <74fe50d9-9be9-cc97-e550-3ca30aebfd13@linux.alibaba.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH RFC v2 0/4] Add support for sharing page tables across processes (Previously mshare) In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 32728180011 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 555np8huy7fmq5eznttuyrzkys9myopj X-HE-Tag: 1690807865-790604 X-HE-Meta: U2FsdGVkX19Ead9FQG9MG9SNeONjetCKv1rEOXNrGh2jVFC9xmhkbMBXC6vjMfdFe1WAzCT+NA6xPPOcUfiow4zHGz9ttaFiZgCaaHgTU1VWzC7l1fvzO/gsM7a+iTOBltdoxW84/q5G1djw1WOETdax2J+zOwON0Ts6YqFrNfxkKte7KzPAtpyPxExnNTuIcNarABTjp2E/p2wVu1AwURVa/A5P9WMKNV2mvTDjqC1VO6sphVRH3jf6ZLYFi/lVqANK1A6PwKOK5OwwkNkJ3wzh3IgplhJu0kJfgx6UzGllN6p7hLwOV0js3IbASGz7GZ6lDXrR1Lt+fE9asPHSJHWheK1W6H94eq1Ov4BXlyb9u6T9lVvz8mWKeHASCZro30bba7hVLk7siur4TZchz38TZy8tju9yaol+CTflmgZAnnkL1AYjm4NsC70cAwoOS2jN9x0l7n+O8NM/8O4H+2pmyzANfIx8+5zLfnHvUcE5+9MCG4RunTJJC8MI7n4Re1iOCzQrbjt55ahPCO356dOBxZjFwan6ojhe8S/a/+GH9MeANh5/v/P1IiydBO8WzSxXaYg2Ifj3tBOvl9rL3WiRBGWiqUIuB9jxVbxDmfbtOasC8iyRmD9qTNsjM7MrI5oLBGN+kkqI9bGuVqwUnH64nMhU2tUaGUv7AaS0tQzoeq4EKdun71YgwKUw0pE3S1lArZYYUB7AAEVIRPbNWl7KGSy4Eb8toNbgudI8OVI0B7YRAKR3o06PGMgOJMYu/I/SY8kUrhiHTlWE8uyaKAVHxGfWeb+a4ldG1RRuSfTAgZ65cI0Yra70VwKWmaNhZGz1B48ItSVSX+tYzgPHegLsOp21nVTdEs8JxZY2wrhhYvXold4o+xIw4Kh/wotzjD1CzZ28zZS0x+qWJl5P97176wsPI0lQa3pMubN0TUMr8yTajTEj7Yo4har67Ix3fHqVNERpLPRY7g19E8O fOok0gLM 1iQJOg5l/CsdUUvpo0WO0ScnpEkCMj76Nmg46wS4lyUM+Z+Ey6a5Y4ec8anhrdrkKL9zTqgn5tUsGspFZMLO85xEOkauA6J3Lk5E3vRKaq4sBf+GIBZWLUcTb3QtooPP2s0ZzDjcFgKl0NuRlbfoQf7ppqZgbCDpKoR8m79zxaKJbDYpNuJLFORRkQRnmsSxLzxDk6jETbxzAkYwrp+oxqH9jzp6nsZmTca6gAlPZKmoQoZZECPJR6lDb7+GlNXlx23jP/8YJFXHKs2Cp/XPSLDZ3AbAyntGDsN5lVYwBwVsbv7M78w/oNdRUR9cXh1mcC/Hx5w4qjNRTTet+QFqjPKUvOYo/VFLDUSGJ8sDEY7pvXYmkWM/4geEvkVl/k3F1pfR9OfTor/yULgZrMW2c7/5dzGvzkltQdyUufX85Tx/WDAg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 31.07.23 14:25, Matthew Wilcox wrote: > On Mon, Jul 31, 2023 at 12:35:00PM +0800, Rongwei Wang wrote: >> Hi Matthew >> >> May I ask you another question about mshare under this RFC? I remember you >> said you will redesign the mshare to per-vma not per-mapping (apologize if >> remember wrongly) in last time MM alignment session. And I also refer to you >> to re-code this part in our internal version (based on this RFC). It seems >> that per VMA will can simplify the structure of pgtable sharing, even >> doesn't care the different permission of file mapping. these are advantages >> (maybe) that I can imagine. But IMHO, It seems not a strongly reason to >> switch per-mapping to per-vma. >> >> And I can't imagine other considerations of upstream. Can you share the >> reason why redesigning in a per-vma way, due to integation with hugetlbfs >> pgtable sharing or anonymous page sharing? > > It was David who wants to make page table sharing be per-VMA. I think > he is advocating for the wrong approach. In any case, I don't have time > to work on mshare and Khalid is on leave until September, so I don't > think anybody is actively working on mshare. Not that I also don't have any time to look into this, but my comment essentially was that we should try decoupling page table sharing (reduce memory consumption, shorter rmap walk) from the mprotect(PROT_READ) use case. For page table sharing I was wondering whether there could be ways to just have that done semi-automatically. Similar to how it's done for hugetlb. There are some clear limitations: mappings < PMD_SIZE won't be able to benefit. It's still unclear whether that is a real limitation. Some use cases were raised (put all user space library mappings into a shared area), but I realized that these conflict with MAP_PRIVATE requirements of such areas. Maybe I'm wrong and this is easily resolved. At least it's not the primary use case that was raised. For the primary use cases (VMs, databases) that map huge areas, it might not be a limitation. Regarding mprotect(PROT_READ), my point was that mprotect() is most probably the wrong tool to use (especially, due to signal handling). Instead, I was suggesting having a way to essentially protect pages in a shmem file -- and get notified whenever wants to write to such a page either via the page tables or via write() and friends. We do have the write-notify infrastructure for filesystems in place that we might extend/reuse. That mechanism could benefit from shared page tables by having to do less rmap walks. Again, I don't have time to look into that (just like everybody else as it appears) and might miss something important. Just sharing my thoughts that I raised in the call. -- Cheers, David / dhildenb