From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C276BC6FA8F for ; Wed, 30 Aug 2023 16:44:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5705E44016D; Wed, 30 Aug 2023 12:44:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 520DF440165; Wed, 30 Aug 2023 12:44:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3E8AD44016D; Wed, 30 Aug 2023 12:44:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 30524440165 for ; Wed, 30 Aug 2023 12:44:56 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 090A21A02D4 for ; Wed, 30 Aug 2023 16:44:56 +0000 (UTC) X-FDA: 81181345392.06.AB48614 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf17.hostedemail.com (Postfix) with ESMTP id 264964001E for ; Wed, 30 Aug 2023 16:44:53 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=WF3bzdPR; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of 3BXLvZAsKCNQ02A4HB4OJD66EE6B4.2ECB8DKN-CCAL02A.EH6@flex--ackerleytng.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3BXLvZAsKCNQ02A4HB4OJD66EE6B4.2ECB8DKN-CCAL02A.EH6@flex--ackerleytng.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1693413894; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:dkim-signature; bh=Q7U5WETyc0Qx792+SyKBjLV43d3DHT0ObMpMrFVoeXw=; b=Ww8vL06LFLi2PBivOKywxXJFR8JXRpYinwSfm87oD4ozs990Sxes92IDt4az8zvwYFWS6k Owp6oZtUSCqYGL2B/aCSFF9IiPWqv0l6IxB7U/xDYGUgpt/Ku4+EP0SdQXoB0NZ2kFUohR XBruBSqUK8kKkpTapkqfWPwIy0AHbrk= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=WF3bzdPR; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of 3BXLvZAsKCNQ02A4HB4OJD66EE6B4.2ECB8DKN-CCAL02A.EH6@flex--ackerleytng.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3BXLvZAsKCNQ02A4HB4OJD66EE6B4.2ECB8DKN-CCAL02A.EH6@flex--ackerleytng.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1693413894; a=rsa-sha256; cv=none; b=y2Es8SmL/cJh0WRPVFiHx8Ibdrf0Rfffb3q1MX0QLfxlVEJ7PyWuiet3u4H4i8ah/wgJIx +1itTzQHSKDva6qAQFxFu1vjfDCRk6QwFf/TiWbmnCJIYBTcstr3fF4bOCUCuHM8bi2rBe UVTRnnG3lW0JiZ8iQ6v3e6OAOOHBPlM= Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-58daaa2ba65so81886937b3.1 for ; Wed, 30 Aug 2023 09:44:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1693413893; x=1694018693; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date:from:to :cc:subject:date:message-id:reply-to; bh=Q7U5WETyc0Qx792+SyKBjLV43d3DHT0ObMpMrFVoeXw=; b=WF3bzdPRmWBoIFOlciGdz9WRvuF+SgQ0XuUWHZpbMN7eMpMlOdBXooQsdg3xOywfat v/mQT77VkvRveuKEZQIaZ63+bUjWqNl87OBPvuzVdVMsW2W207GJgHmdwri5uVFXHRHo GppE44/htOsCreTVKYwyu6wRydceqcUSkuXc9NohZKZxeQ078DYmbExgnERRNzwNTa8B fMmNBJ1x6p77sl+8HNkd4bToAovF/Zh2PEDBkvAkf/QGW40Kx/2rF/X2bEGWKc6Z3is1 rHdNP4etKlJziEcGBrSyzpTwg5ypTsU3akKa0CYV+evvs5q2p8qvTFNrPi+fWIy460dg IK2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693413893; x=1694018693; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Q7U5WETyc0Qx792+SyKBjLV43d3DHT0ObMpMrFVoeXw=; b=fgq2EbD839M8XAcxFYZd2uNCRbgh/070aThcFdVsNknNlb/ynktiyW/QaeFuomgjZ5 OnKZf41oMAx1wNeKQWw8dvrLdNriy82yYJhk3RIp/d2sJ638k3RnY8qLZ1N5YVV/85EJ OykUIAGCADXkPAq9/+jxLYZhANVrEIY0aNFuhAS0V7NvWTtSIO7AXQcFbHmx9KYhQ4PQ mh6LyJsPMWiPp4fLf5A79+QXS5vw2TNThQnoPYq4RUPSC/ZFmmrM5r8LLkJm8wAxIjDP E11SNqmUoxNc5fZDk1ebuWIQTEvxosquPw1GEnLsgXB03jmxOGvJz8vFBH9TPIIsfg7Q aO4Q== X-Gm-Message-State: AOJu0YxlmmCkQP/jCZ0Z/32A3W76147elfJZVs3is46t5jdkrhSzuUCi 87oaxZju2P2ZjfItkBU5q+z7k60DFDVwZo+e1Q== X-Google-Smtp-Source: AGHT+IHtp9pD/5ci3MqPyr+0sXpcwrOb70482adH4y9FFBhT1onMN4Y2Yo7VfYMaFeFIQUgnoauWykiqo7sAzbNhxQ== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a81:eb0b:0:b0:57a:793:7fb0 with SMTP id n11-20020a81eb0b000000b0057a07937fb0mr78438ywm.3.1693413893177; Wed, 30 Aug 2023 09:44:53 -0700 (PDT) Date: Wed, 30 Aug 2023 16:44:51 +0000 In-Reply-To: <30ffe039-c9e2-b996-500d-5e11bf6ea789@linux.intel.com> (message from Binbin Wu on Wed, 30 Aug 2023 23:12:19 +0800) Mime-Version: 1.0 Message-ID: Subject: Re: [RFC PATCH v11 12/29] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory From: Ackerley Tng To: Binbin Wu Cc: seanjc@google.com, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, pbonzini@redhat.com, maz@kernel.org, oliver.upton@linux.dev, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, willy@infradead.org, akpm@linux-foundation.org, paul@paul-moore.com, jmorris@namei.org, serge@hallyn.com, chao.p.peng@linux.intel.com, tabba@google.com, jarkko@kernel.org, yu.c.zhang@linux.intel.com, vannapurve@google.com, mail@maciej.szmigiero.name, vbabka@suse.cz, david@redhat.com, qperret@google.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 264964001E X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: cmfeyyfsojxf76696d4j4dib35qzn5om X-HE-Tag: 1693413893-862205 X-HE-Meta: U2FsdGVkX19rlso13hzEhTQA2xRNxoqrDqha7u9h73nHndmIR9218o2aW89mMscPr89sBxjtP4fwpLmtNvPlgYBGvoSLNDlVfzkkMhsjGCljXZEZ6jORxGYXKApc87vyKrljLSZ1oWW8P6pzqydQCK6gqd8b1NzusKdmDZOwGGyPl+0RnVXlCR5oL63B+sEM1TCTWKWZtaDr+iuP0FbZ650nog3Igj4Dl3eSezTFfE1ePBQnZjw79t04WRKkv++BwxStGtnet+ze5LRA8++A+j9MGHuWK5SrDBWDKddA88WMiWXYHuumDrRZHxcIyiGG/xKciKxtf57+JJrYui2cefk+7kT7rGXAf4tx/Zy/mUFoUC3NzV1jwxBRQx/QxoUMdAMjfjJ3AuaseKDDd84hspf/ardIVxRQwOthCOJ3ij0ixhcRqa96F94X15SPQhQkl8yR9LN/+ivBcJGYjltnlV8CHizHfk54bekXZWNpTBmufqx27aggeD67NU+A6G7pzdBokAIzM5g0btZ1LedjrkmtIDcNb58Q/u1lnXfkGaylo/YPTEVuNwDREl5uOyD9gv4YcE4++GKqDrB5LSgMVGL+xnEZ1hsWkDrgjuwXATKI6XFTtjsc2xnYSGqqUiQzSCF17J5qFiFjmshBuwvMRgVHNf/0zxT1KbpAP+Qq7jeVOm0hphZ85LQLrLvMOGTFQRA/1TAskGKFNjFF6y4eGrclqTROJnAsyW125X5C/j+6R6BS/yNK0HNdC9vB4DOaf/8EUzkbTaMKDZ8gECK7g/PRztjr7Fq12Tf3PAg6GtqItU0YKnQH9XtYReNabXt29ZaiC8EogJZT13+0GEM3DQ87ovmFK76Y5241zUGIIfttzTXo0JeT4N6u+ye8Vtga+4bZsy2Tl/BkDlpXB55P1qZ5ad5ioUvtb89DMo/dQKukBUnu0vTTn8kSIHMfEp8BHr62G3L5wMVGp2BzgeU sfjcOKPH gHQGVm0CdFrMuH3SiNNPQC+b+Nw1sAy5NRMMDdmETxhe+8kUnmKFMUWzZuX9nRgQt9Qm5PloZqCbZ57HrRkTyYJEpgO6o/Zw0mosapQs0wZYA9De+ycuiYQstmVWgv2cs7pl1TNAHmkRCR1xflaYjLZ7d7xNArgiYnwop8LT63Hp92YVUEc5PNN2BF3AXPPA2ZXjCfiEnbuCfBW3jcs1/1D+kfowdva5LhdU32JRHM30QR36Kqg7rWE+3Ez5FAoB/AWOZnr2LFPG3bzAUsjmBUFt5Smj2lYQqJmviF2p+EvV8UCKvcf5ARg/en0CuwC5thXzzpEVzFz/5OCSz7xbDB/vaLqzFhjhK3TNBnY9fBsZAE8z2XtJyvGOpOaTPcXjiaecSMQaqlIBx8jBA7wV3QnTrmIaBX7FgGwCmWepAznwNLlOW7GbpYm2AQ7yIMVtodrqso388DmISIM76ETpi6qn9SydXY8+J6c6MMlCr4+MRCYOneCc2qb0QrEMVFwySL3iyP9fNijSJhiMaw1PLLay6T9f9aLoveIiVrwOjyXSc/B8Kd8kl8GR/lr5CdZo4kjoSTm4V1PkjD2iu38v7L9UasIsDRtLjIBlZ0u3mOkcpER8Bt4nJwA//5yS8UV6EaENUuT3OYfWY1TsbUuBQJBRspruIIw4f+YaBEo0NL8jE8VM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Binbin Wu writes: >> >> >> +static long kvm_gmem_allocate(struct inode *inode, loff_t offset, loff_t len) >> +{ >> + struct address_space *mapping = inode->i_mapping; >> + pgoff_t start, index, end; >> + int r; >> + >> + /* Dedicated guest is immutable by default. */ >> + if (offset + len > i_size_read(inode)) >> + return -EINVAL; >> + >> + filemap_invalidate_lock_shared(mapping); >> + >> + start = offset >> PAGE_SHIFT; >> + end = (offset + len) >> PAGE_SHIFT; >> + >> + r = 0; >> + for (index = start; index < end; ) { >> + struct folio *folio; >> + >> + if (signal_pending(current)) { >> + r = -EINTR; >> + break; >> + } >> + >> + folio = kvm_gmem_get_folio(inode, index); >> + if (!folio) { >> + r = -ENOMEM; >> + break; >> + } >> + >> + index = folio_next_index(folio); >> + >> + folio_unlock(folio); >> + folio_put(folio); > May be a dumb question, why we get the folio and then put it immediately? > Will it make the folio be released back to the page allocator? > I was wondering this too, but it is correct. In filemap_grab_folio(), the refcount is incremented in three places: + When the folio is created in filemap_alloc_folio(), it is given a refcount of 1 in filemap_alloc_folio() -> folio_alloc() -> __folio_alloc_node() -> __folio_alloc() -> __alloc_pages() -> get_page_from_freelist() -> prep_new_page() -> post_alloc_hook() -> set_page_refcounted() + Then, in filemap_add_folio(), the refcount is incremented twice: + The first is from the filemap (1 refcount per page if this is a hugepage): filemap_add_folio() -> __filemap_add_folio() -> folio_ref_add() + The second is a refcount from the lru list filemap_add_folio() -> folio_add_lru() -> folio_get() -> folio_ref_inc() In the other path, if the folio exists in the page cache (filemap), the refcount is also incremented through filemap_grab_folio() -> __filemap_get_folio() -> filemap_get_entry() -> folio_try_get_rcu() I believe all the branches in kvm_gmem_get_folio() are taking a refcount on the folio while the kernel does some work on the folio like clearing the folio in clear_highpage() or getting the next index, and then when done, the kernel does folio_put(). This pattern is also used in shmem and hugetlb. :) I'm not sure whose refcount the folio_put() in kvm_gmem_allocate() is dropping though: + The refcount for the filemap depends on whether this is a hugepage or not, but folio_put() strictly drops a refcount of 1. + The refcount for the lru list is just 1, but doesn't the page still remain in the lru list? >> + >> + /* 64-bit only, wrapping the index should be impossible. */ >> + if (WARN_ON_ONCE(!index)) >> + break; >> + >> + cond_resched(); >> + } >> + >> + filemap_invalidate_unlock_shared(mapping); >> + >> + return r; >> +} >> + >> >>