From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 979FEC433EF for ; Tue, 28 Dec 2021 21:48:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B8CB66B0072; Tue, 28 Dec 2021 16:48:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B3B6C6B0073; Tue, 28 Dec 2021 16:48:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A02AA6B0074; Tue, 28 Dec 2021 16:48:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0020.hostedemail.com [216.40.44.20]) by kanga.kvack.org (Postfix) with ESMTP id 9177C6B0072 for ; Tue, 28 Dec 2021 16:48:14 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 3796A894A6 for ; Tue, 28 Dec 2021 21:48:14 +0000 (UTC) X-FDA: 78968541708.17.6F3F2BD Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) by imf13.hostedemail.com (Postfix) with ESMTP id B7B1F20004 for ; Tue, 28 Dec 2021 21:48:00 +0000 (UTC) Received: by mail-pf1-f176.google.com with SMTP id 196so17161077pfw.10 for ; Tue, 28 Dec 2021 13:48:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=AINbZEGCl8z/7FJgJFS1bW6C73bLWpVUK2DVW0sKcL8=; b=i8BDfu3Cbnb3Viigeb90+/RoyFCsdY4zmmhG67xy1/xKkRGBvV7zYsDSfUujxRz1M6 WKWW+hF3hWcTBahPXxiAE2dd5TzfFfyW+kaCuL37DnTaDADymu2gDTSIFocZkVOxl6Bm CXzjV9xx6eCJ5Fw/xZYWhUhKQdZf9vwKMKPt6PajLZqspTsivs90pzHPbq5ZOF4iu0Al EogQs7/Dz0D2oeqbOHpA93p/WmGqQRboZHG7tge27x0qsYapE32/3MTkgYVB/z5hT7Oh xGGAYApGd3to+yh2r1kf72CjhAP/3e8BbkQgzoMOcENXvMEuD5Cz7S0TaFuhwptpdVII KgIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=AINbZEGCl8z/7FJgJFS1bW6C73bLWpVUK2DVW0sKcL8=; b=FaIT3U7b9vhYMsHZ6ZdKTSVd9V5dqNx77qw3I21wsjZbRip9Wt9vrmX1onUuNv7Bbi Bd5Cm5JU6RvIi+m7xytufkkUJftVsHCa71d6VyjOt7e0ao8+oOaE6bQxIEKmWlnHqwpm up0BBVEaUgplkNJzXs4QBRMjynrRc4YEm/BBJ22EjL/sSc144VzFaXIvF1Bca97DbFu/ tjRjU4ibtgnhr1g7odZrYCQMs9mRRQghdXP6xexWZOKWx4Bb1MjpTQZeLuE4MJbXkZ8b wg1AVdSFi64cpoJYsZLX+VmFCzlgyk5Oc6EDi65xy9eXNlOgzHwF4nZVmskXNxvpCV8c 4SnQ== X-Gm-Message-State: AOAM531UKDeQ3d7e5p4b4ljX62v2W7RqfPRqsysD38Q+9CR6Kx018yjy DpkyV5XT5v8ofhkS6C/aaSbRtg== X-Google-Smtp-Source: ABdhPJyKE3UVODPS1fshDuKNztoIqtf+jVLcm9grjNtuxnSRjURqBYuIllKdU8KU2kEC55IFpKtL1g== X-Received: by 2002:a63:711a:: with SMTP id m26mr3052221pgc.49.1640728092301; Tue, 28 Dec 2021 13:48:12 -0800 (PST) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id t8sm113511pfj.114.2021.12.28.13.48.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Dec 2021 13:48:11 -0800 (PST) Date: Tue, 28 Dec 2021 21:48:08 +0000 From: Sean Christopherson To: Chao Peng Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: Re: [PATCH v3 kvm/queue 05/16] KVM: Maintain ofs_tree for fast memslot lookup by file offset Message-ID: References: <20211223123011.41044-1-chao.p.peng@linux.intel.com> <20211223123011.41044-6-chao.p.peng@linux.intel.com> <20211224035418.GA43608@chaop.bj.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20211224035418.GA43608@chaop.bj.intel.com> X-Rspamd-Queue-Id: B7B1F20004 X-Stat-Signature: gpft6rdq135h34o5ebasu3kmcixgdi9n Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=i8BDfu3C; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf13.hostedemail.com: domain of seanjc@google.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=seanjc@google.com X-Rspamd-Server: rspam02 X-HE-Tag: 1640728080-688483 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Dec 24, 2021, Chao Peng wrote: > On Thu, Dec 23, 2021 at 06:02:33PM +0000, Sean Christopherson wrote: > > On Thu, Dec 23, 2021, Chao Peng wrote: > > > Similar to hva_tree for hva range, maintain interval tree ofs_tree for > > > offset range of a fd-based memslot so the lookup by offset range can be > > > faster when memslot count is high. > > > > This won't work. The hva_tree relies on there being exactly one virtual address > > space, whereas with private memory, userspace can map multiple files into the > > guest at different gfns, but with overlapping offsets. > > OK, that's the point. > > > > > I also dislike hijacking __kvm_handle_hva_range() in patch 07. > > > > KVM also needs to disallow mapping the same file+offset into multiple gfns, which > > I don't see anywhere in this series. > > This can be checked against file+offset overlapping with existing slots > when register a new one. > > > > > In other words, there needs to be a 1:1 gfn:file+offset mapping. Since userspace > > likely wants to allocate a single file for guest private memory and map it into > > multiple discontiguous slots, e.g. to skip the PCI hole, the best idea off the top > > of my head would be to register the notifier on a per-slot basis, not a per-VM > > basis. It would require a 'struct kvm *' in 'struct kvm_memory_slot', but that's > > not a huge deal. > > > > That way, KVM's notifier callback already knows the memslot and can compute overlap > > between the memslot and the range by reversing the math done by kvm_memfd_get_pfn(). > > Then, armed with the gfn and slot, invalidation is just a matter of constructing > > a struct kvm_gfn_range and invoking kvm_unmap_gfn_range(). > > KVM is easy but the kernel bits would be difficulty, it has to maintain > fd+offset to memslot mapping because one fd can have multiple memslots, > it need decide which memslot needs to be notified. No, the kernel side maintains an opaque pointer like it does today, KVM handles reverse engineering the memslot to get the offset and whatever else it needs. notify_fallocate() and other callbacks are unchanged, though they probably can drop the inode. E.g. likely with bad math and handwaving on the overlap detection: int kvm_private_fd_fallocate_range(void *owner, pgoff_t start, pgoff_t end) { struct kvm_memory_slot *slot = owner; struct kvm_gfn_range gfn_range = { .slot = slot, .start = (start - slot->private_offset) >> PAGE_SHIFT, .end = (end - slot->private_offset) >> PAGE_SHIFT, .may_block = true, }; if (!has_overlap(slot, start, end)) return 0; gfn_range.end = min(gfn_range.end, slot->base_gfn + slot->npages); kvm_unmap_gfn_range(slot->kvm, &gfn_range); return 0; }