From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3125C001DC for ; Mon, 31 Jul 2023 16:30:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 355936B0123; Mon, 31 Jul 2023 12:30:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 305966B0124; Mon, 31 Jul 2023 12:30:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1CD7D6B0125; Mon, 31 Jul 2023 12:30:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 0C8C16B0123 for ; Mon, 31 Jul 2023 12:30:31 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id CEBB71A09F7 for ; Mon, 31 Jul 2023 16:30:30 +0000 (UTC) X-FDA: 81072445020.30.FD2874E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf30.hostedemail.com (Postfix) with ESMTP id 655A980007 for ; Mon, 31 Jul 2023 16:30:28 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=N9NYo8To; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf30.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690821028; a=rsa-sha256; cv=none; b=d4S7ZT7ZPkJBid//jQy6VFjm+8vA7c40zYZWbifJ84mZdW1QAOZIVnqjGi7talQfS2biJ5 DC0UwX/DCbAgNb26gzsTa4z2emK2cDRM56YW4TVjOMaf3CzwlYc2tHAKzPAMJbsCywsO8K zIU4syAq5uO85oQPFCb9IspisQycRYE= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=N9NYo8To; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf30.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690821028; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=px7yZE/hl3CkWpJKpt6G6+MH9UKWdf3vJIyU3rBolmc=; b=fgeMCw7KPlDcq8KjteXYZ/hMmPsrM99uLnSXBt8FxEsFxWavunbYEt18L/QQm35Q/Xg0PT zSzuZelj+PIxEpTC3VSKrFPasm5YSToTGEpdvVbYdivKoKv5F9gpfewdJC+UNdH/Htisc8 zkY3rbM6nU5uM2cZROsRZ6kQCTXw6YY= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1690821027; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=px7yZE/hl3CkWpJKpt6G6+MH9UKWdf3vJIyU3rBolmc=; b=N9NYo8ToHrfk/V9WSvCkugNEbzedL/HuITD9WFJ8qVVDhAZuOxwpxIMUxjcRQYao8vIM6o AY6NKlh398B63q39E7aVNxg98tVzOh6gKZKperlXVNdL5IxOZHXv9UbQBTmGu2kZdGM3HS SkCF8W1+a6YQnD7p3DFSN+VKKSQ4m24= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-205-QHd3N2b5P5Cs_fKCEVAuxg-1; Mon, 31 Jul 2023 12:30:25 -0400 X-MC-Unique: QHd3N2b5P5Cs_fKCEVAuxg-1 Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-3fe182913c5so9802995e9.0 for ; Mon, 31 Jul 2023 09:30:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690821024; x=1691425824; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=px7yZE/hl3CkWpJKpt6G6+MH9UKWdf3vJIyU3rBolmc=; b=WZnBMf4fp2q3bQaa64fURUTvgI1P4DeIUhanweWGc50KuaWTe+bfiZWMzV17kIEqoO NV3V5FUsmpIfRkttC4HjjbFb2MsHvzEib3g0zu1926xm5LRtiwVRo+KuNqcy3BkiU/SF BjEBY7CMcQq2NQWp+NL5T3sKAjdrhVUFrkixuPZFYyPN+ebe7e2v1pcMjj+5UH794sti oe8kMYiPPsKQFo7hriKKPWB9hhR6Tnp8PpgWDPeOc3o4/pvzlTi/m2VP2ueICbcsD9E9 ASRGRtwo7iiozQs5d4SqfXySVawTioij0PcOpZbkGX5Ec7w0VYJ4PS/HhQYs+6yUAaSM KkXA== X-Gm-Message-State: ABy/qLYNWGJGGNGt0oXMucK58q9eG8v7VSScFY9sSC0KL44H68iU0wYW OXcRGRHFrns6deVlA3pz94LM+C19hmvasmDdKA0c2D/0eRMJ8ZZLanGk6bXEtSZy7/lwMlvXKSn /d3Ma6k0utdw= X-Received: by 2002:a7b:c859:0:b0:3fe:1b9e:e790 with SMTP id c25-20020a7bc859000000b003fe1b9ee790mr371392wml.2.1690821024149; Mon, 31 Jul 2023 09:30:24 -0700 (PDT) X-Google-Smtp-Source: APBJJlE4rPnoOFUauKsSsg9IXIbvMihh3+dXLQtyh593VbGQxpvSA34OH+mZ1+oH3S9DGahyG33ryw== X-Received: by 2002:a7b:c859:0:b0:3fe:1b9e:e790 with SMTP id c25-20020a7bc859000000b003fe1b9ee790mr371366wml.2.1690821023756; Mon, 31 Jul 2023 09:30:23 -0700 (PDT) Received: from ?IPV6:2003:cb:c723:4c00:5c85:5575:c321:cea3? (p200300cbc7234c005c855575c321cea3.dip0.t-ipconnect.de. [2003:cb:c723:4c00:5c85:5575:c321:cea3]) by smtp.gmail.com with ESMTPSA id l22-20020a1c7916000000b003fe22da3bc5sm2293142wme.42.2023.07.31.09.30.22 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 31 Jul 2023 09:30:23 -0700 (PDT) Message-ID: Date: Mon, 31 Jul 2023 18:30:22 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 To: Rongwei Wang , Matthew Wilcox Cc: linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, "xuyu@linux.alibaba.com" References: <74fe50d9-9be9-cc97-e550-3ca30aebfd13@linux.alibaba.com> <9faea1cf-d3da-47ff-eb41-adc5bd73e5ca@linux.alibaba.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH RFC v2 0/4] Add support for sharing page tables across processes (Previously mshare) In-Reply-To: <9faea1cf-d3da-47ff-eb41-adc5bd73e5ca@linux.alibaba.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 655A980007 X-Stat-Signature: 4krb581td1tnmgo1ttzpkxto71gstifi X-HE-Tag: 1690821028-780287 X-HE-Meta: U2FsdGVkX18GffPezboSG7JX8pV8+jhqQzKy12HeJ3pD1pnhT2+iAxCJMHCIgOxziQRwNZmzkrCO7chfq7Y3A3DnRhOP2ARUXLu1qfy6RvgI0claUTwa4A/7LC2NEWWAVr8/AbYKwuGGQ9HiH0AmpWRXYATAEy8SgsmAp1Z58CFQHA7ZMvMn8e5oMnTihdJDyqYaGEEB65BAwQwQki8PyTVrf93bU/VZt6GKMM7KpEh7vvl1T6jPv5AoVjwATbiEg+hU8R+pz9YhUcNrCVgZIBA+6EhfFVcoc3N0lGpfCd0/GQbmCQWxveqweDLGC6xz4KWg2fpL4YTB7YXfglmrlYWmIDiTevxXs9U2pguAWnKupiuDDV7ytKojvRUZrZChIFZo1rKsAVDx/uCV6HtPnwyIzZqgkKLbx3XOjfaabfha8OkcmFrhDtjnKfQBEEVmhKl0UQx/CaS7D+xP5uc8stQs503ah0PW1lLoxs9l9TKASXICYkgwERGdntB1skXDbgZFPNKP0BtQKJ2oniu+gFz0r5fc7YzwLsGRxizabeLIK4HbguPkBO+DIST0XenVyacY9mLVU/BlFOSRDyq3mV8PvdZqmohMSw3P7fkGSr5QGy34d/yqh8PUEllwadyBxpblk82ClM2sKXNzyGXEIvl3VqPsGQ8SB22KjC0sQwliQRGXjvgkuWslD67J8S2PKw/Pqb20MYVUXTP13dIkjxJM+fmArx+nn3s5/+xaYA3JLyDvwzM7uDAoDsiU5ueaV8Dyj417PXQQcS3e2OaLecJd3g6x78Vw+nVnoOmA4g8SksM1oFgxqpDKBXCA9/mK9AqtQ1gvpHW2mqMmqRwGNp414iDSDX6a4J4yGbY28GoYxrbZg8ftq7oayXiEN1aljvQTpf1s3Lg4dvWEpjAixx9FbuHlVspChsZ2MLFIXOu3TkEOTaxntI26wn//IT5bvk+C+7HJqsPZUxNHrwx u2raBjM1 i2JpjLb/4eslEG5Zoh9UR74qgwkCw208tsx3c//zni61uym01rEMwQTzEImtHiS83GRffZm0Us88CumIS19H51u4hPXiyZx1Q1czg3q4EJMc5/C760Zg64tdUZONda2gq/9VL/p9CYbL8jVfXaCfBiaJR+naP+IEvW5qmMQMQAsxNW2wZFeMR/TIdFNnBQszu63F6lQVfRo88kgeQQ11plWgzoQnUAtWsXNeZ0Gt5vQpFp8w6GDqFi/5+DkrF9rfkaG4PwXel1zUmJkTZ7lTx7opom1eGpkpAXj7+v2i3+rCMyA1EgxxS4s9TEeCk5wSRfQpv209KCVnLVbxExaK3c3W4WBC7Hg6EXkkF4NFHpgNb9XOY+tGBBUBXcrHkt2TdYbe20oB4Wm5/tKVrO0tme7IXfuLQgRXIRQl4pfPONeZB9Uk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 31.07.23 18:19, Rongwei Wang wrote: > > On 2023/7/31 20:50, David Hildenbrand wrote: >> On 31.07.23 14:25, Matthew Wilcox wrote: >>> On Mon, Jul 31, 2023 at 12:35:00PM +0800, Rongwei Wang wrote: >>>> Hi Matthew >>>> >>>> May I ask you another question about mshare under this RFC? I >>>> remember you >>>> said you will redesign the mshare to per-vma not per-mapping >>>> (apologize if >>>> remember wrongly) in last time MM alignment session. And I also >>>> refer to you >>>> to re-code this part in our internal version (based on this RFC). It >>>> seems >>>> that per VMA will can simplify the structure of pgtable sharing, even >>>> doesn't care the different permission of file mapping. these are >>>> advantages >>>> (maybe) that I can imagine. But IMHO, It seems not a strongly reason to >>>> switch per-mapping to per-vma. >>>> >>>> And I can't imagine other considerations of upstream. Can you share the >>>> reason why redesigning in a per-vma way, due to integation with >>>> hugetlbfs >>>> pgtable sharing or anonymous page sharing? >>> >>> It was David who wants to make page table sharing be per-VMA.  I think >>> he is advocating for the wrong approach.  In any case, I don't have time >>> to work on mshare and Khalid is on leave until September, so I don't >>> think anybody is actively working on mshare. >> >> Not that I also don't have any time to look into this, but my comment >> essentially was that we should try decoupling page table sharing >> (reduce memory consumption, shorter rmap walk) from the >> mprotect(PROT_READ) use case. > > Hi David, Matthew > > Thanks for your reply. > > Uh, sorry, I can't imagine the relative between decouping page table > sharing with per-VMA design. And I think mprotect(PROT_READ) has to > modify all sharing page tables of related tasks. It seems that I miss > something about per-VMA from your words. Assume we do do the page table sharing at mmap time, if the flags are right. Let's focus on the most common: mmap(memfd, PROT_READ | PROT_WRITE, MAP_SHARED) And doing the same in each and every process. Having the original design of doing an mprotect(PROT_READ) in each and every process is just absolutely inefficient to protect a memfd page. For that case, my thought was that you actually want to write-protect the pages on the memfd level. So instead of doing mprotect(PROT_READ) in 999 processes, or doing mprotect(PROT_READ) on mshare(), you have memfd feature to protect pages from any write access -- not using virtual addresses but using an offset in the memfd. Assume such a (badly imagined) memfd_protect(PROT_READ) would make sure that: (1) Any page table mappings of the page are write-protected and (2) Any write access using the page table mappings trigger write-notify and (3) Any other access -- e.g., write() -- similarly informs memfd. Without page table sharing, (1) would have to walk all mappings via the rmap. With page table sharing, it would only have to walk one page table. But the features would be two separate things. What memfd would do with that write notification (inject a signal, something like uffd) would be a different story. Again, just an idea and maybe complete garbage. -- Cheers, David / dhildenb