From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25ADFC07E9D for ; Tue, 27 Sep 2022 23:23:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5CAA68E0107; Tue, 27 Sep 2022 19:23:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 553498E00C1; Tue, 27 Sep 2022 19:23:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3A56B8E0107; Tue, 27 Sep 2022 19:23:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 2414B8E00C1 for ; Tue, 27 Sep 2022 19:23:31 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E2F5C1C65AB for ; Tue, 27 Sep 2022 23:23:30 +0000 (UTC) X-FDA: 79959444180.17.AC01BDD Received: from mail-pg1-f171.google.com (mail-pg1-f171.google.com [209.85.215.171]) by imf14.hostedemail.com (Postfix) with ESMTP id 5D729100002 for ; Tue, 27 Sep 2022 23:23:30 +0000 (UTC) Received: by mail-pg1-f171.google.com with SMTP id 129so9292640pgc.5 for ; Tue, 27 Sep 2022 16:23:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date; bh=4MnMVs6E8JaSDSsn99y8Cflsak9Vn4oCEKVAnOpimrc=; b=KTBy7fQLQX03opurQqZ2U9ctULmybQ5FJUoo4PZb+gpAETjrkid149E9fk/0W8/73E ZrwvTRfgroj/9bgiZZ8XJyKrd78XrYX2iQYAUldEdxqL5OZpVAi1zS9yu8prYMHVa77l 4zjeo6ZBuAJWWGf1zHyMcSzUGInYNji/wPL8fWTGb5gG2xdIaqbbpro066Nv0icVdUH8 qi08NfiTgaHr/ClJbZ6TR3Yy/OpgSs603ryzzPCdNeTU5ylFZxeFY2UaHzVWoEaEC5jr gHQ/fgp9BIfxbbHrCGK8KR3V4hCHTuvinL4qrNVOIDHv67EUocIGa2CUS0dDgpDBgy9Q 85BA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date; bh=4MnMVs6E8JaSDSsn99y8Cflsak9Vn4oCEKVAnOpimrc=; b=33k0BRlGooWHKkK/TqAXDWHUkAwOv3d8osNWMQZxp63uQHaW/IyCKi/NCkUll7rr6g SZX1KZCYxf2tMD4QeLLXv1+ijWNGxHK48TVo+VuLQ5xmEIXluXtSmfD0hPy3dD3QIL+Y UkfcS/XeAtr7IM/rZqm/sppUFpSZXjlwxqfOKRI1iESWgtsMdg8DUn9Bt9xY8Di2fXZl GjpG0xMMsGD/RQmSwxu6LrhEJbx1vcjkv6sW6/DjeDRZ05nt5GMplChwAXIMu3ZGXpzm sTQU+s1vJAiUB5Ya1FCEW3gHSG7cjZRlzp4+fgiwezfjylyEYEjojSeuCN5Vg9VRrBn4 OX9Q== X-Gm-Message-State: ACrzQf0ghg2sabdk88M0TIL0mqOLIRkVLes6Sg0m1Shtb9+tOx8ynXDE S/aG741BqfnVtbqDEs4X22IjNg== X-Google-Smtp-Source: AMsMyM5cCdZ/UJFMHAs5b3PS1i/34zTcpqdwe1Srr8AE4gzK5DkIpIp9pBozWLBz/E9ieWU+3YHPog== X-Received: by 2002:a05:6a00:124f:b0:542:6c43:5be8 with SMTP id u15-20020a056a00124f00b005426c435be8mr31885111pfi.5.1664321009101; Tue, 27 Sep 2022 16:23:29 -0700 (PDT) Received: from google.com (7.104.168.34.bc.googleusercontent.com. [34.168.104.7]) by smtp.gmail.com with ESMTPSA id j1-20020a17090a318100b002007b60e288sm58733pjb.23.2022.09.27.16.23.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Sep 2022 16:23:28 -0700 (PDT) Date: Tue, 27 Sep 2022 23:23:24 +0000 From: Sean Christopherson To: David Hildenbrand Cc: "Kirill A. Shutemov" , "Kirill A . Shutemov" , Paolo Bonzini , Chao Peng , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org, Jonathan Corbet , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Shuah Khan , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , Michael Roth , mhocko@suse.com, Muchun Song , wei.w.wang@intel.com Subject: Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd Message-ID: References: <20220915142913.2213336-1-chao.p.peng@linux.intel.com> <20220915142913.2213336-2-chao.p.peng@linux.intel.com> <20220923005808.vfltoecttoatgw5o@box.shutemov.name> <20220926144854.dyiacztlpx4fkjs5@box.shutemov.name> <0a99aa24-599c-cc60-b23b-b77887af3702@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <0a99aa24-599c-cc60-b23b-b77887af3702@redhat.com> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664321010; a=rsa-sha256; cv=none; b=eK5IX77JzRIO5iIBJHZVamSko+Bj7MGKnX2ZFgeX3XuIGi/OiwoS2TyPP8ds3piUczJIb5 nqa4JBaXwA3vpmA/ouwLBzCwTHoXbV2fTS4ReMSA9BDTJFxCbsk712ZZpyi8qp7qsRJSUa 9q4Lir3kiCOd/jrhvN5jIQCvHIgDTUA= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=KTBy7fQL; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf14.hostedemail.com: domain of seanjc@google.com designates 209.85.215.171 as permitted sender) smtp.mailfrom=seanjc@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664321010; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4MnMVs6E8JaSDSsn99y8Cflsak9Vn4oCEKVAnOpimrc=; b=ZesMR8DkSMnsV3KcumLPSQcgkyM+xmuJxocWqgSQQTbHrujr9T/9wFAFAPD3PJ3j5F37uD lSQDfUQCaZSywCL00tovSLpK4l7Qhe5rHc5Ip5UfN+An+4Fzi+KG/cNYUqunrybsGC+XX+ EMIy3M5nODgXuhWObENkqGgUVttN078= X-Rspam-User: X-Rspamd-Queue-Id: 5D729100002 Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=KTBy7fQL; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf14.hostedemail.com: domain of seanjc@google.com designates 209.85.215.171 as permitted sender) smtp.mailfrom=seanjc@google.com X-Stat-Signature: ip3n9n6z1bkh195np1fzbdhkeg4pb315 X-Rspamd-Server: rspam07 X-HE-Tag: 1664321010-294891 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Sep 26, 2022, David Hildenbrand wrote: > On 26.09.22 16:48, Kirill A. Shutemov wrote: > > On Mon, Sep 26, 2022 at 12:35:34PM +0200, David Hildenbrand wrote: > > > When using DAX, what happens with the shared <->private conversion? Which > > > "type" is supposed to use dax, which not? > > > > > > In other word, I'm missing too many details on the bigger picture of how > > > this would work at all to see why it makes sense right now to prepare for > > > that. > > > > IIUC, KVM doesn't really care about pages or folios. They need PFN to > > populate SEPT. Returning page/folio would make KVM do additional steps to > > extract PFN and one more place to have a bug. > > Fair enough. Smells KVM specific, though. TL;DR: I'm good with either approach, though providing a "struct page" might avoid refactoring the API in the nearish future. Playing devil's advocate for a second, the counter argument is that KVM is the only user for the foreseeable future. That said, it might make sense to return a "struct page" from the core API and force KVM to do page_to_pfn(). KVM already does that for HVA-based memory, so it's not exactly new code. More importantly, KVM may actually need/want the "struct page" in the not-too-distant future to support mapping non-refcounted "struct page" memory into the guest. The ChromeOS folks have a use case involving virtio-gpu blobs where KVM can get handed a "struct page" that _isn't_ refcounted[*]. Once the lack of mmu_notifier integration is fixed, the remaining issue is that KVM doesn't currently have a way to determine whether or not it holds a reference to the page. Instead, KVM assumes that if the page is "normal", it's refcounted, e.g. see kvm_release_pfn_clean(). KVM's current workaround for this is to refuse to map these pages into the guest, i.e. KVM simply forces its assumption that normal pages are refcounted to be true. To remove that workaround, the likely solution will be to pass around a tuple of page+pfn, where "page" is non-NULL if the pfn is a refcounted "struct page". At that point, getting handed a "struct page" from the core API would be a good thing as KVM wouldn't need to probe the PFN to determine whether or not it's a refcounted page. Note, I still want the order to be provided by the API so that KVM doesn't need to run through a bunch of helpers to try and figure out the allowed mapping size. [*] https://lore.kernel.org/all/CAD=HUj736L5oxkzeL2JoPV8g1S6Rugy_TquW=PRt73YmFzP6Jw@mail.gmail.com