From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 190CFC4338F for ; Mon, 16 Aug 2021 08:02:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AFB7C61AEF for ; Mon, 16 Aug 2021 08:02:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org AFB7C61AEF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 12C2F6B0071; Mon, 16 Aug 2021 04:02:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0DC0C8D0001; Mon, 16 Aug 2021 04:02:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F0CE66B0073; Mon, 16 Aug 2021 04:02:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0215.hostedemail.com [216.40.44.215]) by kanga.kvack.org (Postfix) with ESMTP id D4F8F6B0071 for ; Mon, 16 Aug 2021 04:02:27 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 702288249980 for ; Mon, 16 Aug 2021 08:02:27 +0000 (UTC) X-FDA: 78480201534.23.8FE8F19 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf27.hostedemail.com (Postfix) with ESMTP id E049F7008666 for ; Mon, 16 Aug 2021 08:02:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1629100946; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/VoHWbWOZZ5Gc380Xsm1HHUKCGfyh24JQPgPz3EJgIA=; b=HfWf3nT91v/qfh1RtnyDlIK0taqvZi/I/RU5w7BUmizKAot3QnFTZN0iXBvX/5wighxZHS xw2OzhBHQDs3FDidsnoC89YNc1aCYRjs84BvGwMBeQO4CWK3GecGxz+J3nXpddVwvpj4U1 1XnP2wk/asiVSPjfXDPXgqN2azu2Jrg= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-215-7HKUPyuJOEK4FooLVKNuaw-1; Mon, 16 Aug 2021 04:02:24 -0400 X-MC-Unique: 7HKUPyuJOEK4FooLVKNuaw-1 Received: by mail-wr1-f71.google.com with SMTP id n10-20020a5d660a0000b02901551ef5616eso4703846wru.20 for ; Mon, 16 Aug 2021 01:02:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=/VoHWbWOZZ5Gc380Xsm1HHUKCGfyh24JQPgPz3EJgIA=; b=oKOvz2K54D/kCxJm/EMepvstBLtUKmh7VyDRjhgDhkKKtFSfsCqmBjXIzlWpJecewW 6AgterwGkX/a+duzHC2cKC++mV0+xVpx2Oz7jmuB21KXE0ZHa7BIm5hoo2ldMmzUdNaX x2YxDfdjAW8jer22qnj5S4hbHnWQvSqYAp1jIUoCdgklyQ2GF3KxHSuFjZNtjMeaA4l4 W1E0X9F6+L4w47fQjyyxVuOsTXbWCkQty6t8ayAmbUvZne8vx8D6kh5f0Nll7AZgKxCW 3+NBlC2kZToYogg9UOwGG+zeUnipTvAwXnkOTeH9YGkbFAzGr86X1MKYJV99kELLN8oP hTZQ== X-Gm-Message-State: AOAM532dQtYLpwJtLfwGtjAuZM2rubEU3UrZN0kCZe5MSn+ycnjwj8Yt MAmu/dcZhIUTmi4GCzNfknmh0l1eXjxhFk/6oqCeCNJbI7EWil4442dV8jjTzFqnM08Fr7kUZ4f 0RHWWEQv/7kk= X-Received: by 2002:a5d:4bc6:: with SMTP id l6mr17306757wrt.210.1629100943675; Mon, 16 Aug 2021 01:02:23 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwOrPHrkJJ7EJbiWot1B5Zb3Htst+14frqtFk9dAp7/v+KmNHAUYoj4HkECEYZF0IDdDyFLzQ== X-Received: by 2002:a5d:4bc6:: with SMTP id l6mr17306711wrt.210.1629100943372; Mon, 16 Aug 2021 01:02:23 -0700 (PDT) Received: from [192.168.3.132] (p5b0c67f1.dip0.t-ipconnect.de. [91.12.103.241]) by smtp.gmail.com with ESMTPSA id e11sm10754872wrm.80.2021.08.16.01.02.22 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 16 Aug 2021 01:02:22 -0700 (PDT) Subject: Re: [RFC PATCH 0/5] madvise MADV_DOEXEC To: Khalid Aziz , "Longpeng (Mike, Cloud Infrastructure Service Product Dept.)" , Matthew Wilcox Cc: Steven Sistare , Anthony Yznaga , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "Gonglei (Arei)" References: <1595869887-23307-1-git-send-email-anthony.yznaga@oracle.com> <43471cbb-67c6-f189-ef12-0f8302e81b06@oracle.com> <55720e1b39cff0a0f882d8610e7906dc80ea0a01.camel@oracle.com> From: David Hildenbrand Organization: Red Hat Message-ID: Date: Mon, 16 Aug 2021 10:02:22 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <55720e1b39cff0a0f882d8610e7906dc80ea0a01.camel@oracle.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: E049F7008666 X-Stat-Signature: 1b9o4dc5bzisekomrh1n18618t4hw38c Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HfWf3nT9; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf27.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=david@redhat.com X-HE-Tag: 1629100946-754915 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 13.08.21 21:49, Khalid Aziz wrote: > On Tue, 2021-07-13 at 00:57 +0000, Longpeng (Mike, Cloud Infrastructure > Service Product Dept.) wrote: >> Hi Matthew, >> >>> -----Original Message----- >>> From: Matthew Wilcox [mailto:willy@infradead.org] >>> Sent: Monday, July 12, 2021 9:30 AM >>> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.) >>> >>> Cc: Steven Sistare ; Anthony Yznaga >>> ; linux-kernel@vger.kernel.org; >>> linux-mm@kvack.org; Gonglei (Arei) >>> Subject: Re: [RFC PATCH 0/5] madvise MADV_DOEXEC >>> >>> On Mon, Jul 12, 2021 at 09:05:45AM +0800, Longpeng (Mike, Cloud >>> Infrastructure Service Product Dept.) wrote: >>>> Let me describe my use case more clearly (just ignore if you're not >>>> interested in it): >>>> >>>> 1. Prog A mmap() 4GB memory (anon or file-mapping), suppose the >>>> allocated VA range is [0x40000000,0x140000000) >>>> >>>> 2. Prog A specifies [0x48000000,0x50000000) and >>>> [0x80000000,0x100000000) will be shared by its child. >>>> >>>> 3. Prog A fork() Prog B and then Prog B exec() a new ELF binary. >>>> >>>> 4. Prog B notice the shared ranges (e.g. by input parameters or >>>> ...) >>>> and remap them to a continuous VA range. >>> >>> This is dangerous.=C2=A0 There must be an active step for Prog B to a= ccept >>> Prog A's >>> ranges into its address space.=C2=A0 Otherwise Prog A could almost >>> completely fill >>> Prog B's address space and so control where Prog B places its >>> mappings.=C2=A0 It >>> could also provoke a latent bug in Prog B if it doesn't handle >>> address space >>> exhaustion gracefully. >>> >>> I had a proposal to handle this.=C2=A0 Would it meet your requirement= s? >>> https://lore.kernel.org/lkml/20200730152250.GG23808@casper.infradead.= org/ >> >> I noticed your proposal of project Sileby and I think it can meet >> Steven's requirement, but I not sure whether it's suitable for mine >> because there's no sample code yet, is it in progress ? >=20 > Hi Mike, >=20 > I am working on refining the ideas from project Sileby. I am also > working on designing the implementation. Since the original concept, > the mshare API has evolved further. Here is what it loks like: >=20 > The mshare API consists of two system calls - mshare() and > mshare_unlink() >=20 > mshare > =3D=3D=3D=3D=3D=3D >=20 > int mshare(char *name,void *addr, size_t length, int oflags, mode_t > mode) >=20 > mshare() creates and opens a new, or opens an existing shared memory > area that will be shared at PTE level. name refers to shared object > name that exists under /dev/mshare (this is subject to change. There > might be better ways to manage the names for mshare'd areas). addr is > the starting address of this shared memory area and length is the size > of this area. oflags can be one of: >=20 > O_RDONLY opens shared memory area for read only access by everyone > O_RDWR opens shared memory area for read and write access > O_CREAT creates the named shared memory area if it does not exist > O_EXCL If O_CREAT was also specified, and a shared memory area > exists with that name, return an error. >=20 > mode represents the creation mode for the shared object under > /dev/mshare. >=20 > Return Value > ------------ >=20 > mshare() returns a file descriptor. A read from this file descriptor > returns two long values - (1) starting address, and (2) size of the > shared memory area. >=20 > Notes > ----- >=20 > PTEs are shared at pgdir level and hence it imposes following > requirements on the address and size given to the mshare(): >=20 > - Starting address must be aligned to pgdir size (512GB on x86_64) > - Size must be a multiple of pgdir size > - Any mappings created in this address range at any time become > shared automatically > - Shared address range can have unmapped addresses in it. Any > access to unmapped address will result in SIGBUS >=20 > Mappings within this address range behave as if they were shared > between threads, so a write to a MAP_PRIVATE mapping will create a > page which is shared between all the sharers. The first process that > declares an address range mshare'd can continue to map objects in the > shared area. All other processes that want mshare'd access to this > memory area can do so by calling mshare(). After this call, the > address range given by mshare becomes a shared range in its address > space. Anonymous mappings will be shared and not COWed. Did I understand correctly that you want to share actual page tables=20 between processes and consequently different MMs? That sounds like a=20 very bad idea. --=20 Thanks, David / dhildenb