From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 095B0C4338F for ; Mon, 16 Aug 2021 16:13:27 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8EAA160F46 for ; Mon, 16 Aug 2021 16:13:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 8EAA160F46 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 2FD5C6B006C; Mon, 16 Aug 2021 12:13:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2AD736B0072; Mon, 16 Aug 2021 12:13:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 19C6A8D0001; Mon, 16 Aug 2021 12:13:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0179.hostedemail.com [216.40.44.179]) by kanga.kvack.org (Postfix) with ESMTP id F13C56B006C for ; Mon, 16 Aug 2021 12:13:25 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 94C0B180364C6 for ; Mon, 16 Aug 2021 16:13:25 +0000 (UTC) X-FDA: 78481438770.21.75CAAAB Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf30.hostedemail.com (Postfix) with ESMTP id 4896AE003977 for ; Mon, 16 Aug 2021 16:13:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1629130404; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=omSfE1i2O/mSMW260UhgiDj/AWHS5s3qHE5PmalCULs=; b=Isb3WUPr7euLKooJCyrMjkComkNyFNT9Jw3fniIXw2LJcOyZPVhnyv7ugh2sl0Wrp8Cwuk W4antT8l7xXWpC9oiHkubelSXq4fSimULzH73j2R82Verw8PpCy6WeSoMZR40uzvPaLNjA F0LScVM+x54VWYMbzKSzkRdDMnvI1UA= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-309-X7GbwPRhPVmlDbP4l4lvHQ-1; Mon, 16 Aug 2021 12:13:23 -0400 X-MC-Unique: X7GbwPRhPVmlDbP4l4lvHQ-1 Received: by mail-wr1-f69.google.com with SMTP id q4-20020a05600000c400b00156d811312aso106306wrx.3 for ; Mon, 16 Aug 2021 09:13:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:cc:references:from:organization:subject :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=omSfE1i2O/mSMW260UhgiDj/AWHS5s3qHE5PmalCULs=; b=YT+pxqKwJl+3BTAgixpL4S/4Tg4D9bkR6RJumjfuulPvsID22tQAqDzkyVxRrvnunb xRFPZ3VG+GV5JO2gEN6RXqZWzmedk12CCaeWSxetpK/ql0WPDD8fV8FCX3EPYWH/TFXj vwwNpH5N+7P2NOnQ6KrwJZXZfcJ9gBhYKy+WNnhVZPgG5TzYE/OcItFh84Y+tvATXlkE ZRbSpPOGUQ06zVsyZHZ+TDET7mjb4utIVH59hAAB3+MOOUSq0mvkWf124q5hSwWLOhru mCFog1ffGFhXNfMAhIgrjLAozy7Ec03LdS+rgl4f877x+titieH/K+py0rcxMYl8kUoB XYuQ== X-Gm-Message-State: AOAM53219shFO1ML0czsJMo/qnzkObaNORRSOzZElYOwunigW5Ns1x72 hXpOTU3jOiNFfsFI+GHCEaDp3Ps5JXIzq5p3UZvDqGBVJLTR/0dr1KxUh7DB7fc7tZSq3JXJegC U6nXwFV6U9eg= X-Received: by 2002:a7b:c2f0:: with SMTP id e16mr16018855wmk.144.1629130402205; Mon, 16 Aug 2021 09:13:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwX5+2pq1udnYpXDWSnKOhLd1QNqtk4jCRZuoif26U3VsJumuh0AbpdtyZUzhe9vTG+rdBF0w== X-Received: by 2002:a7b:c2f0:: with SMTP id e16mr16018840wmk.144.1629130401978; Mon, 16 Aug 2021 09:13:21 -0700 (PDT) Received: from [192.168.3.132] (p5b0c67f1.dip0.t-ipconnect.de. [91.12.103.241]) by smtp.gmail.com with ESMTPSA id k14sm11425807wri.46.2021.08.16.09.13.21 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 16 Aug 2021 09:13:21 -0700 (PDT) To: Matthew Wilcox Cc: Khalid Aziz , "Longpeng (Mike, Cloud Infrastructure Service Product Dept.)" , Steven Sistare , Anthony Yznaga , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "Gonglei (Arei)" References: <88884f55-4991-11a9-d330-5d1ed9d5e688@redhat.com> <40bad572-501d-e4cf-80e3-9a8daa98dc7e@redhat.com> <3ce1f52f-d84d-49ba-c027-058266e16d81@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [RFC PATCH 0/5] madvise MADV_DOEXEC Message-ID: <97ed86a0-9fac-3dbc-0f9e-d669484c9485@redhat.com> Date: Mon, 16 Aug 2021 18:13:20 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Rspamd-Queue-Id: 4896AE003977 Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Isb3WUPr; spf=none (imf30.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam01 X-Stat-Signature: urtsgqoaxt86d53pda9pd81coocise59 X-HE-Tag: 1629130405-878762 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 16.08.21 17:59, Matthew Wilcox wrote: > On Mon, Aug 16, 2021 at 05:01:44PM +0200, David Hildenbrand wrote: >> On 16.08.21 16:40, Matthew Wilcox wrote: >>> On Mon, Aug 16, 2021 at 04:33:09PM +0200, David Hildenbrand wrote: >>>>>> I did not follow why we have to play games with MAP_PRIVATE, and h= aving >>>>>> private anonymous pages shared between processes that don't COW, i= ntroducing >>>>>> new syscalls etc. >>>>> >>>>> It's not about SHMEM, it's about file-backed pages on regular >>>>> filesystems. I don't want to have XFS, ext4 and btrfs all with the= ir >>>>> own implementations of ARCH_WANT_HUGE_PMD_SHARE. >>>> >>>> Let me ask this way: why do we have to play such games with MAP_PRIV= ATE? >>> >>> : Mappings within this address range behave as if they were shared >>> : between threads, so a write to a MAP_PRIVATE mapping will create a >>> : page which is shared between all the sharers. >>> >>> If so, that's a misunderstanding, because there are no games being pl= ayed. >>> What Khalid's saying there is that because the page tables are alread= y >>> shared for that range of address space, the COW of a MAP_PRIVATE will >>> create a new page, but that page will be shared between all the share= rs. >>> The second write to a MAP_PRIVATE page (by any of the sharers) will n= ot >>> create a COW situation. Just like if all the sharers were threads of >>> the same process. >>> >> >> It actually seems to be just like I understood it. We'll have multiple >> processes share anonymous pages writable, even though they are not usi= ng >> shared memory. >> >> IMHO, sharing page tables to optimize for something kernel-internal (p= age >> table consumption) should be completely transparent to user space. Jus= t like >> ARCH_WANT_HUGE_PMD_SHARE currently is unless I am missing something >> important. >> >> The VM_MAYSHARE check in want_pmd_share()->vma_shareable() makes me as= sume >> that we really only optimize for MAP_SHARED right now, never for >> MAP_PRIVATE. >=20 > It's definitely *not* about being transparent to userspace. It's about > giving userspace new functionality where multiple processes can choose > to share a portion of their address space with each other. What any > process changes in that range changes, every sharing process sees. > mmap(), munmap(), mprotect(), mremap(), everything. Oh okay, so it's actually much more complicated and complex than I=20 thought. Thanks for clarifying that! I recall virtiofsd had similar=20 requirements for sharing memory with the QEMU main process, I might be=20 wrong. "existing shared memory area" and your initial page table example made=20 me assume that we are simply dealing with sharing page tables of MAP_SHAR= ED. It's actually something like a VMA container that you share between=20 processes. And whatever VMAs are currently inside that VMA container is=20 mirrored to other processes. I assume sharing page tables could actually=20 be an implementation detail, especially when keeping MAP_PRIVATE=20 (confusing in that context!) and other features that will give you=20 surprises (uffd) out of the picture. --=20 Thanks, David / dhildenb