From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1122EC71157 for ; Wed, 18 Jun 2025 07:22:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8A1626B0088; Wed, 18 Jun 2025 03:22:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 852296B008C; Wed, 18 Jun 2025 03:22:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 767BF6B0092; Wed, 18 Jun 2025 03:22:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 692776B0088 for ; Wed, 18 Jun 2025 03:22:22 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 19B9DC09D1 for ; Wed, 18 Jun 2025 07:22:22 +0000 (UTC) X-FDA: 83567678124.12.CAD2318 Received: from mail-pg1-f174.google.com (mail-pg1-f174.google.com [209.85.215.174]) by imf14.hostedemail.com (Postfix) with ESMTP id 85A5B100007 for ; Wed, 18 Jun 2025 07:22:19 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=VlsVIbav; spf=pass (imf14.hostedemail.com: domain of lizhe.67@bytedance.com designates 209.85.215.174 as permitted sender) smtp.mailfrom=lizhe.67@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750231340; a=rsa-sha256; cv=none; b=TtscIK7F9e2E3DxoucUGnQSeYP9x/ScJBeB08Ua9ooJDOnGB0qBZVjPFpL7eZm4Lebx5cP 62Ui1wNFBZlOvLccqsyejvi9keB/bXmcS6ygQZtJf4G3oevNIr1xhaS6BJP48KzemXSe1D +0/PQirNDbItUopAp0PG97ovfpB6sNQ= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=VlsVIbav; spf=pass (imf14.hostedemail.com: domain of lizhe.67@bytedance.com designates 209.85.215.174 as permitted sender) smtp.mailfrom=lizhe.67@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750231340; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=g90eoEwOIpMaTkn+dJ7Xpv9ylWdjlRNbMVrS8tF+CzE=; b=TkGwtgHbFuuwqYShKiwgs7LZD9CbyPvL6EG1huWu3odFRljlJfdi1TGFcmkvOIsImPB2VP KGKcUZiKiSqDEI7IowQOj03Gu8jvmoVuk8HEXrP6ypGqTvHBKVZ3DAmCYuxNY+x66dCL3O klaq5sbGwfb7LKot3UOAJUlRa/BLJSo= Received: by mail-pg1-f174.google.com with SMTP id 41be03b00d2f7-b1ff9b276c2so4277838a12.1 for ; Wed, 18 Jun 2025 00:22:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1750231338; x=1750836138; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=g90eoEwOIpMaTkn+dJ7Xpv9ylWdjlRNbMVrS8tF+CzE=; b=VlsVIbavgWapnwAjqH/5ZdaOtJT+ZKl8z/qirCX4phkHESpGF3iNvcmVRHohsQ12Rl Sp4WTT8do9pUf/fRKWDfqiMQN4uGKpJjgNDOS3yC1GaxeBnhOOJIwHfS32edcASDNIRc 5k08eUL4rFf8QMzixDvcMq/8XTHMcFLsQhHDbpcnvi62Tt22J0IX698zurIUXa9TerYU WZgDXOGLfMWgOCx4WnmRqRZ3q+YT4FidLsnwHQbhA9arLwttHZMMwBsI/DaMeDubDPBm kB6qqQL7QM0LCw8idRdH2hblqcX2YXnMTMqGGJu/OcjsOveVAUr5SyRkPohGlZHwOgUj /A2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750231338; x=1750836138; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=g90eoEwOIpMaTkn+dJ7Xpv9ylWdjlRNbMVrS8tF+CzE=; b=fOKZVQI3FYxybBMVPfc1WiZdITa1eFfUw1q0CDep61g4lGEZ9c5X/coe6GV6ZewARc 1lhJQwjZuB6ypp03bPSRLhKQeGucNjJNlbvOgzZpryBMcAhgNtqtnWSwS4Pq22zI1Osg cGVwzxuOOAY4LZ+F4W+6Po/20+uNeueNBQ0cczvb9/cQxbHYzssJoYUr69ilmw+xhiX8 3e0FjTZVFlCNeZA1nQi+8Hd+wvA8KU3pCdzYUjcbl5B6DEa86l6eUfhfYEDSiCe55V7J BdYo2pV3y1tOVzgdtcDqRhifAXKVGVaZ/oRFAZ11cyCYk850BgI3Kjl9YMIGp7/mioa6 a3vg== X-Forwarded-Encrypted: i=1; AJvYcCXhLPfCnyuNZgEDqrW484MPHuuHztAipQyTjB9FV36KQZKBW7ywRgQBDJGs/H45iCZYBt6TyHA29g==@kvack.org X-Gm-Message-State: AOJu0Yyu0iuvAdHg+rYZGMf/zUoRxX9yHQG1giTf/8cAmB3ZefA8Dv0e MkW+tA2/zjdubnEVAcPSwh99JwhxvmaV0EDlc8JBPNDhT/bDXoCgpr6ooxP7ihmXEJcJzqwxGll YZFuv X-Gm-Gg: ASbGncvpvptup9rcUFN14z2MQtRiW+5VGXVb8AzfXEuUVGLH6eXWTkRXFMvNFEBZFu+ lyKtxozHzjd20LD4i0xr39uK7mhiTuOyHMiVv272PxRtHzfoxpIRkln9rN51vi1og7wFkMKr9uX ylsdhfOpX4+sgwEt72+YgX5vNxmSOU1EjJb2NT4Jhr798l9LWi4SPxnxHMPqH84O5r5F9HRenBV b92hMdDoLYcMvVKwICNyMaksANCj9xzcMPAU/ScfAjEMasexKlnm+kJcNao03O9jvmIaM60JR+n ATZK2Ndy30MuphUg0j9I6hv5tbM52GEvcJDvRz2gLFiGQ29SUDtGaSta+xjmMFwlBrGJMmLiSFn MaPu6RlwCVXRzmA== X-Google-Smtp-Source: AGHT+IEIlF9MTo6BXhNVepXYf6WgSUfQhwoHcEgxpRuHCE9CuMWp3AFMHSQtgSDSihKLhGjH6gjAVQ== X-Received: by 2002:a17:90b:350c:b0:312:f0d0:bb0 with SMTP id 98e67ed59e1d1-313f1ca0fbfmr30726786a91.12.1750231338007; Wed, 18 Jun 2025 00:22:18 -0700 (PDT) Received: from localhost.localdomain ([203.208.189.12]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2365dfc6990sm92909995ad.224.2025.06.18.00.22.14 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 18 Jun 2025 00:22:17 -0700 (PDT) From: lizhe.67@bytedance.com To: david@redhat.com Cc: akpm@linux-foundation.org, alex.williamson@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, peterx@redhat.com, lizhe.67@bytedance.com Subject: Re: [PATCH v4 3/3] vfio/type1: optimize vfio_unpin_pages_remote() for large folio Date: Wed, 18 Jun 2025 15:22:11 +0800 Message-ID: <20250618072211.12867-1-lizhe.67@bytedance.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20250618061143.6470-1-lizhe.67@bytedance.com> References: <20250618061143.6470-1-lizhe.67@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 85A5B100007 X-Stat-Signature: w4dym4ojimbgiw3srx4j1u7seh5zurrb X-Rspam-User: X-HE-Tag: 1750231339-255808 X-HE-Meta: U2FsdGVkX190J5wRRk0k80eELsuooTFEhNM4Qs23VCntni/yzMA8C2CFUhJM6blwiNaVug3ZxxIKNXlBxfwOqUnYzEh3iAV147r3FVbjNYoZRlALtAQ9G3mqZFdM3Oy5wARc5ZZJFiAFNxfF4P+aFn6p1o2aGfCLSOZbu1yZpMyGlr2tKS/8LPKR7vjYsO1llReYLV1ixKL+zo013mYA1gXXSXVuIiwq0QAdsj8K30QDOQQFepcWEtq7lv5+wvkBTB+jKyqVfNT4qwaDvZRR0GR3On9kLPoM38IYRKt4OvY+5lU1QiVLOaO5DSvvnj3sr0Ei0CS9lTCZNpK50+2MP4Hqf3sOLGXuYwnC4OyYizPnysgJ8QRi1LIQzvpM2fdu4c8Nspe4GL/mhLujZmaqBQvXUdKk374cHz3tInU+2OVfSMzzhVNXebY1UbEESAk3ccyo/Ch6bcfTOMzuejR54I6r29YxP4pKJshRjk3kR+eVJTtllvAG/HqCEGghGvIMKohsY1oWggEu836NMPkaQc9/6BTlfzLLHf+g7esgtP2DoAzkI0JAGi8n5T8j987ucQN071tFpBDmBtwX02wbZTQekibePe57Pzqi/cr/hz44UMWbWt2s9ZYGyGqS/4HsKXthvrael0+1KIUyieOcj9sUuQR3N+RyrdOHUj2esqVolw7QSgELedNX4gvDyJvt/PJVYziemjiPwn94tTKmVb9k6LhU2jET334T8alqx21X0MjoZ8pUAYZxVnsj8pv2y31wfunYu4RGe3mlcEVpfzzvZc4lWbplfOI1aIE4vI0ji745NOZVHaeeoAZpGAcaDsceTcN6HCGZ3vrl7Tzasqf7MTNFNkZNkkEOOG0aWFAlGLT/fTv5oJY4MFA+sgUpMbs3ab9omga7dQG2+zIlv3yi21kKs8lcheM457MPyK5IsfhbDs2mlhvogmnUKFU9nrfcnpAQ5bpPvH4rA/T fPl0iMRD Gwnr/qRS1m4u33fT2DZ0TctVsggTZkrVTpDr6PQdAa9E5pNdYRwARf0ClEuW091V53wHFfr9AIdhR+LjcUuwsJq/lUc7G/k055PPfZEm0gP48EEyF8HIgnWS72R7WRrKtyYGv2REcYq4fP5R52BEm5aURKBgaHnkpySkPEpBkhhaPGp8uV1ROtil1x97PoyYhdnYXJ7GvFj4uRKHirjAMXnQrKo6wYuhczvuLqhEBGORRknmv76PlAIZ/vKr87+8k5yc5yK+WyWO+hEFw5f71OPlIZX01qCGSnKuxvAATHEKNIPlw+r5LHy0rBCUoSXROrhrzAPTUh3i20Ks2fWLiBSI9e6bBzYSQ16ei X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, 18 Jun 2025 14:11:43 +0800, lizhe.67@bytedance.com wrote: > > > How do you think of this implementation? > > > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > > index 242b05671502..eb91f99ea973 100644 > > > --- a/include/linux/mm.h > > > +++ b/include/linux/mm.h > > > @@ -2165,6 +2165,23 @@ static inline long folio_nr_pages(const struct folio *folio) > > > return folio_large_nr_pages(folio); > > > } > > > > > > +/* > > > + * folio_remaining_pages - Counts the number of pages from a given > > > + * start page to the end of the folio. > > > + * > > > + * @folio: Pointer to folio > > > + * @start_page: The starting page from which to begin counting. > > > + * > > > + * Returned number includes the provided start page. > > > + * > > > + * The caller must ensure that @start_page belongs to @folio. > > > + */ > > > +static inline unsigned long folio_remaining_pages(struct folio *folio, > > > + struct page *start_page) > > > +{ > > > + return folio_nr_pages(folio) - folio_page_idx(folio, start_page); > > > +} > > > + > > > /* Only hugetlbfs can allocate folios larger than MAX_ORDER */ > > > #ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE > > > #define MAX_FOLIO_NR_PAGES (1UL << PUD_ORDER) > > > diff --git a/mm/gup.c b/mm/gup.c > > > index 15debead5f5b..14ae2e3088b4 100644 > > > --- a/mm/gup.c > > > +++ b/mm/gup.c > > > @@ -242,7 +242,7 @@ static inline struct folio *gup_folio_range_next(struct page *start, > > > > > > if (folio_test_large(folio)) > > > nr = min_t(unsigned int, npages - i, > > > - folio_nr_pages(folio) - folio_page_idx(folio, next)); > > > + folio_remaining_pages(folio, next)); > > > > > > *ntails = nr; > > > return folio; > > > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > > > index b2fc5266e3d2..34e85258060c 100644 > > > --- a/mm/page_isolation.c > > > +++ b/mm/page_isolation.c > > > @@ -96,7 +96,7 @@ static struct page *has_unmovable_pages(unsigned long start_pfn, unsigned long e > > > return page; > > > } > > > > > > - skip_pages = folio_nr_pages(folio) - folio_page_idx(folio, page); > > > + skip_pages = folio_remaining_pages(folio, page); > > > pfn += skip_pages - 1; > > > continue; > > > } > > > --- > > > > Guess I would have pulled the "min" in there, but passing something like > > ULONG_MAX for the page_isolation case also looks rather ugly. > > Yes, the page_isolation case does not require the 'min' logic. Since > there are already places in the current kernel code where > folio_remaining_pages() is used without needing min, we could simply > create a custom wrapper function based on folio_remaining_pages() only > in those specific scenarios where min is necessary. > > Following this line of thinking, the wrapper function in vfio would > look something like this. > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -801,16 +801,40 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr, > return pinned; > } > > +static inline unsigned long vfio_folio_remaining_pages( > + struct folio *folio, struct page *start_page, > + unsigned long max_pages) > +{ > + if (!folio_test_large(folio)) > + return 1; The above two lines may no longer be necessary. > + return min(max_pages, > + folio_remaining_pages(folio, start_page)); > +} Thanks, Zhe