From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 873FAC61DB3 for ; Tue, 10 Jan 2023 09:19:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 090768E0002; Tue, 10 Jan 2023 04:19:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 067A28E0001; Tue, 10 Jan 2023 04:19:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E71388E0002; Tue, 10 Jan 2023 04:19:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D474D8E0001 for ; Tue, 10 Jan 2023 04:19:02 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id AFD18822B5 for ; Tue, 10 Jan 2023 09:19:02 +0000 (UTC) X-FDA: 80338340124.04.65C856F Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf18.hostedemail.com (Postfix) with ESMTP id DA5701C0013 for ; Tue, 10 Jan 2023 09:18:59 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=NgwlyId1; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf18.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 134.134.136.24) smtp.mailfrom=chao.p.peng@linux.intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673342340; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6A8l9LeafzkhjCZMFPdER1tQKzd2tVfkXCVwvAIl6z4=; b=cgPkmHtjQcwLhUkp9Ls6b2SG7jkXYHuHAYW3ndkEvBfLkopYWgPOTLhY5Do850Qc+y+Hc5 hS7tFCZCqWUKdcKOe6zA1rHFGJxQOQf4WxK4CBNcYi5ZzKhDvsmLvy+gqWiGURD+GWWfSj azbFS10AEo9VHG5jxQv4BZhB4tOiavI= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=NgwlyId1; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf18.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 134.134.136.24) smtp.mailfrom=chao.p.peng@linux.intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673342340; a=rsa-sha256; cv=none; b=r6XlS72F4Ao46TBDhNAZbMFjp+ba9wvMyVYmmPWzm8ZWgj2hZQggA5VEfxVmN6/BbULaxJ jLJiWFXe1lNZSvM+zsV4F1IpVe2fhcQ9mtpThRWbkaL0DImWChclrAGhjP13leLsJxMMnW dvfGbymWaVhWCuHXc+7FBD+rh3lJQm4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1673342340; x=1704878340; h=date:from:to:cc:subject:message-id:reply-to:references: mime-version:in-reply-to; bh=2L0+jVEogDEzAgz7y5b+vzXDvmX8Ax8MX0YUEFpP2YQ=; b=NgwlyId1S+PtoaF2Ogpr2wqUHdNIeR/3fcTOoxqgI9KNy0hmM2t/WLar ylMw4xI/fz6OtMOwceXojpS7oVqzZAuNJCPOt/gwOo89wIVValV9CdJZw i8pyuTVvqZNMUKftOA5MxOxkimYfbpZT3yrTUSyyyPI+6TXQ5ImWV1li+ nc+IOSIInIrFd7oZ8WRscZF26JMI6tVosrfAUp/WCwm6xGDBjtdP9nA2y Uo0f+G9vJZnjYoIIDSUANcL6oCF/FBRIgV77HW1WUI3JVOzvpHpEPQUVw 97QlQylDm3QhDlcNY8KNBYNBtbNyIMgvyHEC8V9Gsm6l9iGgaCI/CpBmS g==; X-IronPort-AV: E=McAfee;i="6500,9779,10585"; a="324343040" X-IronPort-AV: E=Sophos;i="5.96,314,1665471600"; d="scan'208";a="324343040" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jan 2023 01:18:57 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10585"; a="689352645" X-IronPort-AV: E=Sophos;i="5.96,314,1665471600"; d="scan'208";a="689352645" Received: from chaop.bj.intel.com (HELO localhost) ([10.240.193.75]) by orsmga001.jf.intel.com with ESMTP; 10 Jan 2023 01:18:43 -0800 Date: Tue, 10 Jan 2023 17:14:32 +0800 From: Chao Peng To: Sean Christopherson Cc: Jarkko Sakkinen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Arnd Bergmann , Naoya Horiguchi , Miaohe Lin , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Shuah Khan , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , "Kirill A . Shutemov" , luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , tabba@google.com, Michael Roth , mhocko@suse.com, wei.w.wang@intel.com Subject: Re: [PATCH v10 3/9] KVM: Extend the memslot to support fd-based private memory Message-ID: <20230110091432.GA2441264@chaop.bj.intel.com> Reply-To: Chao Peng References: <20221202061347.1070246-1-chao.p.peng@linux.intel.com> <20221202061347.1070246-4-chao.p.peng@linux.intel.com> <20230106094000.GA2297836@chaop.bj.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: DA5701C0013 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: ud5t144q79k8866w86hiefm9hy69gexn X-HE-Tag: 1673342339-587585 X-HE-Meta: U2FsdGVkX1/FOuc4EXoU+1wtEbYPl1x/ao5qj+k8EPd6nf+5AOY3GJRbVzJp+EMI38ku5VdksIBinvTVfiyMVSBnTxSPwJZ/EcfRao9Z+JLuGNWJvrtPYxmgSBcR/8Gipfj9/9IOoCTxTrH3bmnNj1TnLKkZjYnuBF+gjsLh12MLk1kuaId3vlxvOAMTiz/jOCPED4s5wNtKi1iFzxlLczVRp6i9LG6wDLBgqvXdsdny56cK4TSWzFKA8jEHN/blQwNSKJVitwAp3EslJp8ZMs4y9r2/pFKjz8U3JyYLNfBDafPZW74OVwzcl6D/74qpL0OWXwfsSBJ0zQE6c42HnZY2Ubfrmxh9adey80SW7yykaOp/o83JhISNHp2kZMkNgx9r5xvtBcGIQdQI/ka5j6o3+N/gyU5NQ7zwAcw/mzou2O/v4N93m2etfDkl7vXFFq+9coQrNMF2dJu6Q+JGP3qvCB4g4Xr7KUfHFWOgMOePRMu7PkdCETUkp9cAIBjQmjez4XJ/2Z5wVnwc8CyqaRUVJcwmKR54ME1S53gL9SCaIHgXDQJQGB1DfcYooyuK6guHzyU+2WuLLzPWpLxxtoLXn3ECoKxoJwpMYewxMdocmQmHdgs2/PI3xQhfT5A8p2sqHs58ckGqxzekvbN9SwjnqRsYFu1V2/uKW8iNshUi6/TQ0C/QcnaBE3PBswmwnSFXeyQUUAVQFjK2vIvUZO9oB2XQ2AL1zSwjNFYd6yOXBi8Ee3pg0JklXh8lhkUAiVh3h+/G5FYaYI0u52AeNWlCqc2uilfJ5unFB5Vik7Rpx21D9aKHVCWTdXq4vNZdcpV9u40eadq3eZaHImWHZ6x98Sbv4uxWzQXofvdXiK/FL/9XfqppAcvRTt4acYYBifDlvYPeoELg+V9pZvMCgXAD1uyF5VZg1IbsEcDJMG34km5bCgU3Drj8p84fNAN53WN7jBGBAMs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jan 09, 2023 at 07:32:05PM +0000, Sean Christopherson wrote: > On Fri, Jan 06, 2023, Chao Peng wrote: > > On Thu, Jan 05, 2023 at 11:23:01AM +0000, Jarkko Sakkinen wrote: > > > On Fri, Dec 02, 2022 at 02:13:41PM +0800, Chao Peng wrote: > > > > To make future maintenance easy, internally use a binary compatible > > > > alias struct kvm_user_mem_region to handle both the normal and the > > > > '_ext' variants. > > > > > > Feels bit hacky IMHO, and more like a completely new feature than > > > an extension. > > > > > > Why not just add a new ioctl? The commit message does not address > > > the most essential design here. > > > > Yes, people can always choose to add a new ioctl for this kind of change > > and the balance point here is we want to also avoid 'too many ioctls' if > > the functionalities are similar. The '_ext' variant reuses all the > > existing fields in the 'normal' variant and most importantly KVM > > internally can reuse most of the code. I certainly can add some words in > > the commit message to explain this design choice. > > After seeing the userspace side of this, I agree with Jarkko; overloading > KVM_SET_USER_MEMORY_REGION is a hack. E.g. the size validation ends up being > bogus, and userspace ends up abusing unions or implementing kvm_user_mem_region > itself. How is the size validation being bogus? I don't quite follow. Then we will use kvm_userspace_memory_region2 as the KVM internal alias, right? I see similar examples use different functions to handle different versions but it does look easier if we use alias for this function. > > It feels absolutely ridiculous, but I think the best option is to do: > > #define KVM_SET_USER_MEMORY_REGION2 _IOW(KVMIO, 0x49, \ > struct kvm_userspace_memory_region2) Just interesting, is 0x49 a safe number we can use? > > /* for KVM_SET_USER_MEMORY_REGION2 */ > struct kvm_user_mem_region2 { > __u32 slot; > __u32 flags; > __u64 guest_phys_addr; > __u64 memory_size; > __u64 userspace_addr; > __u64 restricted_offset; > __u32 restricted_fd; > __u32 pad1; > __u64 pad2[14]; > } > > And it's consistent with other KVM ioctls(), e.g. KVM_SET_CPUID2. Okay, agree from KVM userspace API perspective this is more consistent with similar existing examples. I see several of them. I think we will also need a CAP_KVM_SET_USER_MEMORY_REGION2 for this new ioctl. > > Regarding the userspace side of things, please include Vishal's selftests in v11, > it's impossible to properly review the uAPI changes without seeing the userspace > side of things. I'm in the process of reviewing Vishal's v2[*], I'll try to > massage it into a set of patches that you can incorporate into your series. Previously I included Vishal's selftests in the github repo, but not include them in this patch series. It's OK for me to incorporate them directly into this series and review together if Vishal is fine. Chao > > [*] https://lore.kernel.org/all/20221205232341.4131240-1-vannapurve@google.com