From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 020DDC63682 for ; Wed, 26 Jan 2022 10:16:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8D5D96B0072; Wed, 26 Jan 2022 05:16:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 884FC6B0074; Wed, 26 Jan 2022 05:16:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 74CD36B0075; Wed, 26 Jan 2022 05:16:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0227.hostedemail.com [216.40.44.227]) by kanga.kvack.org (Postfix) with ESMTP id 620AF6B0072 for ; Wed, 26 Jan 2022 05:16:50 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 24319181A870F for ; Wed, 26 Jan 2022 10:16:50 +0000 (UTC) X-FDA: 79072034580.18.BDA1F44 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf17.hostedemail.com (Postfix) with ESMTP id 9809240002 for ; Wed, 26 Jan 2022 10:16:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643192209; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=i+iYnOCszqZtWRIHCDiP4bSvj4+rWsxJQtP1TOvOfEg=; b=VvlfNBJMpXnR0T34C70yHx1TQGMQrWg72icohlmDhIcx47jkkXGj9pPsJdxB3QXhOlBUns f4ryfO7XSisC5H0L6cc4gLXbSkzoS9kn0rOhWsli7c+j1JFysG4d5ENVf5l9RLbTPXX2yp GfYwn1v99icxqq0S/4cERaFIPg4gwEM= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-315-1aWloTbMOjGP-2a-1zr-6A-1; Wed, 26 Jan 2022 05:16:45 -0500 X-MC-Unique: 1aWloTbMOjGP-2a-1zr-6A-1 Received: by mail-wr1-f70.google.com with SMTP id j26-20020adfb31a000000b001d8e22f75fbso4096434wrd.20 for ; Wed, 26 Jan 2022 02:16:45 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=i+iYnOCszqZtWRIHCDiP4bSvj4+rWsxJQtP1TOvOfEg=; b=Z5CKn0fVNtdqyxacttc+yF9/JoIAB6bOXfu3tE2IrWfzb5wMtq8dASJ74b4b1YpsRx gZQOhbgHhirVuIjagvlivAJOYkSWUhE7WwdojpDNR6NEEhz3253otiR7TZP1hRPjEoAv N0nbeWXLhj4PdhKhDsqqgy/ho7VMA1hS+UF9llwUBERW53RJkaqBtesW+sY+JvQmqASC Ln/A5MzRqCDSVG8a9Xfqny41oeh9aYNITSFbkCQV9N7TQxySjLVtf9g1CzClJ+JTGrBw C7OJY/7jTRj8Pd3aktz1yRxMiVqGSiCxhbTtdOYBf9AySd//zM1SSmi6GvAzvfYdjMmL x0yA== X-Gm-Message-State: AOAM531fXIFeeAUBJeuAxjlzKJYKhMZe3Trz2wrox4hlg6/xqXeHKiYF DCnBpSb+OkEZSGH9VIMcxKgTi5CJv80jbooG5T0bmCFQZJpktKufMUyQZAyReuCP6IoD/AdaKDh mnHlzJXtOPYo= X-Received: by 2002:a1c:7416:: with SMTP id p22mr6891611wmc.30.1643192204646; Wed, 26 Jan 2022 02:16:44 -0800 (PST) X-Google-Smtp-Source: ABdhPJwIeCJvuTiz2MYLx0qNnWNJytnSE3+PMInZFlSic+GGcCvaHzih6qz2BK6Wsm+IBAbf4DNl5w== X-Received: by 2002:a1c:7416:: with SMTP id p22mr6891585wmc.30.1643192204361; Wed, 26 Jan 2022 02:16:44 -0800 (PST) Received: from ?IPV6:2003:cb:c709:2700:cdd8:dcb0:2a69:8783? (p200300cbc7092700cdd8dcb02a698783.dip0.t-ipconnect.de. [2003:cb:c709:2700:cdd8:dcb0:2a69:8783]) by smtp.gmail.com with ESMTPSA id g6sm16786801wrq.97.2022.01.26.02.16.43 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 26 Jan 2022 02:16:43 -0800 (PST) Message-ID: Date: Wed, 26 Jan 2022 11:16:42 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.0 To: Matthew Wilcox , "Kirill A. Shutemov" Cc: Khalid Aziz , akpm@linux-foundation.org, longpeng2@huawei.com, arnd@arndb.de, dave.hansen@linux.intel.com, rppt@kernel.org, surenb@google.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Peter Xu References: <20220125114212.ks2qtncaahi6foan@box.shutemov.name> <20220125135917.ezi6itozrchsdcxg@box.shutemov.name> <20220125185705.wf7p2l77vggipfry@box.shutemov.name> From: David Hildenbrand Organization: Red Hat Subject: Re: [RFC PATCH 0/6] Add support for shared PTEs across processes In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 9809240002 X-Rspam-User: nil Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=VvlfNBJM; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf17.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=david@redhat.com X-Stat-Signature: jn8gi3mq1nzo6wrufz79cwx3fowdf18k X-Rspamd-Server: rspam08 X-HE-Tag: 1643192209-107525 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 26.01.22 05:04, Matthew Wilcox wrote: > On Tue, Jan 25, 2022 at 06:59:50PM +0000, Matthew Wilcox wrote: >> On Tue, Jan 25, 2022 at 09:57:05PM +0300, Kirill A. Shutemov wrote: >>> On Tue, Jan 25, 2022 at 02:09:47PM +0000, Matthew Wilcox wrote: >>>>> I think zero-API approach (plus madvise() hints to tweak it) is worth >>>>> considering. >>>> >>>> I think the zero-API approach actually misses out on a lot of >>>> possibilities that the mshare() approach offers. For example, mshare() >>>> allows you to mmap() many small files in the shared region -- you >>>> can't do that with zeroAPI. >>> >>> Do you consider a use-case for many small files to be common? I would >>> think that the main consumer of the feature to be mmap of huge files. >>> And in this case zero enabling burden on userspace side sounds like a >>> sweet deal. >> >> mmap() of huge files is certainly the Oracle use-case. With occasional >> funny business like mprotect() of a single page in the middle of a 1GB >> hugepage. > > Bill and I were talking about this earlier and realised that this is > the key point. There's a requirement that when one process mprotects > a page that it gets protected in all processes. You can't do that > without *some* API because that's different behaviour than any existing > API would produce. A while ago I talked with Peter about an extended uffd (here: WP) mechanism that would work on fds instead of the process address space. The rough idea would be to register the uffd (or however that would be called) handler on an fd instead of a virtual address space of a single process and write-protect pages in that fd. Once anybody would try writing to such a protected range (write, mmap, ...), the uffd handler would fire and user space could handle the event (-> unprotect). The page cache would have to remember the uffd information ("wp using uffd"). When (un)protecting pages using this mechanism, all page tables mapping the page would have to be updated accordingly using the rmap. At that point, we wouldn't care if it's a single page table (e.g., shared similar to hugetlb) or simply multiple page tables. It's a completely rough idea, I just wanted to mention it. -- Thanks, David / dhildenb