From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A39AC5B549 for ; Thu, 5 Jun 2025 03:34:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A10DC6B008C; Wed, 4 Jun 2025 23:34:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9C1296B0092; Wed, 4 Jun 2025 23:34:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8FE8D6B0093; Wed, 4 Jun 2025 23:34:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 72B7F6B008C for ; Wed, 4 Jun 2025 23:34:53 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id D005C1A153D for ; Thu, 5 Jun 2025 03:34:52 +0000 (UTC) X-FDA: 83519930424.19.B56CAAB Received: from mail-pg1-f173.google.com (mail-pg1-f173.google.com [209.85.215.173]) by imf27.hostedemail.com (Postfix) with ESMTP id 2490240002 for ; Thu, 5 Jun 2025 03:34:49 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=KDnDegrn; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf27.hostedemail.com: domain of lizhe.67@bytedance.com designates 209.85.215.173 as permitted sender) smtp.mailfrom=lizhe.67@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749094491; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=Z1hNibJxXLnrnySg972N1gYLR6Ade0RiNhcmbxL3rpM=; b=d9BRN+s6Mfl9BHbz9dh9ixcYhvpjZeWGLANVo+eVswtq02kZGu03avkOV2ww2Px+DxiVlC Vi3Wg09pYp/vj37h/WXJAyACl8VT4kVSh+BppFegl3b5IhQxaW+03Y0JuxonETkwxw3cVr 2A09WsIzDxH2pLKxxD2E2ewMm9/2z8k= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749094491; a=rsa-sha256; cv=none; b=6PJHGyqOlnb54QradjhnWIBwCcD603vCtWAseG8or5sinUQTiF0yXS2w6dBJBzxzsBmjSR hJ3n/PKLGK22xv/fKEs09lUfuj9Ohgv36q3YmNUnttic/KuIz7gPsAi9z3vwaVVhNjvues 4tQwEUaJVvAxSQCEc4+5yQBGYeBhckM= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=KDnDegrn; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf27.hostedemail.com: domain of lizhe.67@bytedance.com designates 209.85.215.173 as permitted sender) smtp.mailfrom=lizhe.67@bytedance.com Received: by mail-pg1-f173.google.com with SMTP id 41be03b00d2f7-b2f11866376so344816a12.3 for ; Wed, 04 Jun 2025 20:34:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1749094489; x=1749699289; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=Z1hNibJxXLnrnySg972N1gYLR6Ade0RiNhcmbxL3rpM=; b=KDnDegrnbMsIo3t4AmRojPzsmhqL6fQgV11iZTd6VeGrQO/uMaLadXlWxzX2Sx/Q/b XqZLiIqyejkTtoaMMuK4Hns80xrgkDltCDcoxLU+i16txCG1fo+0AyV/jMWQ7ewC86Sw 7RPALO10RGTN4WXxdp6WaV/f02P6CRqY6hj+bSqaWY7v3Moc5p58ObWJ/BKtBeuhwJo/ rQNGKf4gl+DOLKgf9KRnsQQRDNb5vz3H0QV9owXP+vtr02h8Hdmdzkbdu5RfboVd0R9w n6Gc6Ab/F8xiBWO/3xNn8BduaBuQQhQUKqIJ72uc8ynQON1lnJxkgtytLM3vl43aLUn5 Jivg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749094489; x=1749699289; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Z1hNibJxXLnrnySg972N1gYLR6Ade0RiNhcmbxL3rpM=; b=e1g+w0USOTmqmTDeOYUPfExSotzWdJlnvfJebOySfoP5AcVBygI5ooVfztuhFat9Q6 vZI9/4orTaRK8NY6gLzVwcjHz3/8/IMuuvFBZfIu0P3ZyJEmizNb3A8B57dZwzOzSo4V MBRI6HyFz0KDNgeouGXYViIbsddyKusSWbg43l+0NxBuqdxzgCok8653Mv+J7K0cWEFH yrvGcJ9MJYJeCNhsk5c9/8X+ytXWBkiroLUpmqY0qkVaPUTpjmnFcj3twD5FbwSb9xQo GpMb2N/D/tsqyldbMODJVLtUm2E1eivW7dRPNV90ajnhci3Foo7RVayLVyjvfgR1wUL+ tyXA== X-Gm-Message-State: AOJu0YydTzl173/8x0mJKEijkDP7wggQfw7qLMa5ztN+nW6KR5sQeXG2 7h4rX99jTIOObV0FAZm50qGIWS7RmzZX1bLS4TQIbeUpbtTB6SC80UzSMM5DPcS++n2oLF+Il93 i8tykC4c= X-Gm-Gg: ASbGncuuHTLExSwRCTtDhiQU2kfvn6IvC/lOTFj9iW7vz/MqjPssD15VaVPO76dnS9t p3+ybdNEWcpiqgZv/EQdBCyYDBkLAT7E3s4Uf++djVZJnocvz5IwZdYjCoxipNlCKFkkp8xfqGu MuXzobIzbo16iAPm1w5dlrYwezCVQglo7OZh/b+61Uzhz/Q+RdM4VR8fH7FDbqiKI3OtIeAZbX5 EBj+jfoij60IAtu126ucBu0IEsGeaY4GEWR+XCLXnH2yE6Mm8EVi3eLgV9vRbnLOKRy/7mwLX+g xvm4vv/dtMd0BzDdRcja+bxQoqP8H/UWPbfNHTyrkFpqg6WJo313rTrBVqflNzFwtEu6jTySlek xlw== X-Google-Smtp-Source: AGHT+IHGnu+iP844K4EL2P+B+oOadSdALTlQdx70vlzTI9nd2EUTVvPe3Xkd96F227tbANkJI8VKtw== X-Received: by 2002:a17:90b:38c1:b0:313:33ca:3b8b with SMTP id 98e67ed59e1d1-31333ca3d89mr368760a91.9.1749094488438; Wed, 04 Jun 2025 20:34:48 -0700 (PDT) Received: from localhost.localdomain ([203.208.189.8]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-313319f3d86sm200162a91.33.2025.06.04.20.34.44 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 04 Jun 2025 20:34:48 -0700 (PDT) From: lizhe.67@bytedance.com To: akpm@linux-foundation.org, david@redhat.com, jgg@ziepe.ca, jhubbard@nvidia.com, peterx@redhat.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, dev.jain@arm.com, muchun.song@linux.dev, lizhe.67@bytedance.com Subject: [PATCH v3] gup: optimize longterm pin_user_pages() for large folio Date: Thu, 5 Jun 2025 11:34:30 +0800 Message-ID: <20250605033430.83142-1-lizhe.67@bytedance.com> X-Mailer: git-send-email 2.45.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 2490240002 X-Stat-Signature: gspnntmu1kb3ko1by9576y5s71a1x4xj X-Rspam-User: X-HE-Tag: 1749094489-610332 X-HE-Meta: U2FsdGVkX1/60ya112WfOsf3VQOBBAyNriFrBY8/I3J0z1wF7z0JvDoPCOE1Is4vf1xAS6p5BnRa3y7jj3lCdwh7Ig5eBqWNjqdtqAtYrmMB2570P11KrBhhVGtmz6z72hA1D0r3nG65dvkSvfSECXjAdFpPTaMV6rzefcsMlVNYSnevFa+v2lk3hOeH7xRQAO42GyUb0Wz835bmodyNjuUKpHvXV3/JKDcOM5PXnS317sgqYCb2zw7BhqymBdlC9CT3s4VVF490Qg4xO3kSDcgZwO+F/e6vu0E31FPPj0pWxKzhYgS1u24Q6/rTfKpJIWDKjkZ0xHHUt0yjSXd53ODburBkI/bmLeCk+xEzDeb3D2M+4qx9X64RjcBytApfgIttM3JwhR7IUrEv5WJOg7BCHLCZijfgrv6sSsQbw7ITnW6BK01YrOPAlk+APVryfW0xdyMcpPkGavPiK9gdCZHyQfgc8fs7yiF6by5773Xq6UAmNnJT5v8lK+7VDgClT3j+8ILspBP/qdW1E4c2G+rCwe4apioUzyLcws0KsjAjGX0uESFHyLcAGsQrVDR9O2qrpDizp9C8ypO6NiKnZ+cpo0yYB169zJMscAThQ/PFAXeGMSanAKE1tYduYUvcNm4H2d8NPX00YPvMynOBg1gseBP0POmmNZCOSLzRTCnjzS7xoJafb3LGOnPdjoi5bStA1hcEnkuEyQbUn+onLIewrB3D5kZ7tOKyIcSGbsiRZELpBSUw7wKFmCGa12kIHQdLhTudXWMRY3u8C45P4SOhSQbzkpSCYtCPx/10B0Egbu7WUTn0yhNxYw/tcpBIndC7k79JXPSqylp0or5Yb+URGzVrqOLmFWm5k4tvnlN/u8gYrWq94sUInj0YoOktbbPZDBEIN2rlWBzdpb5HDgQQRFB+OvGwYQ6JoIR67hFndCBMEGfgbPKMWjXG7qs7SshRhS9x6O4oYcSPr69 JJBbL3LC 5/C+rbjzcSUcKddfWrNN1RaDgo3X9l4u4ek/XlcsJepRi+Urv9esVFJqYfVgJ9u3W4JWglSywar31ZYUz2/eFWE6dfC/nxYf016agm1t24KNDVXJxBJO1fThcu2l/jzBLIuSJfL58OH0Lkp+iYvu+/e9r2U2hBBzT0TJPUa7QYdSiQBF4sDW67//+1xubKg2CYJAE8KJw1SZBg+dXTbAwsBF9bAS4rAAkbwqNTtIOvHZbyNqeIxgZ+M1TxmdC9GtKxQ67RU/3zXLeaBGt2d6QE4dI9ITjH0NO42RCJxUy/VO/099V5CRPFqC1tfyHnXhbcDDHXImQtVBifTlVFxf5eYReXfBIEfI5E27zu+ILQMXg7YPTfhokvmMva2B5IXNqp7jDY0hIz/Xqnk46nUYl/TcgkJh7ZDkms2NWRRJTsxXaZNQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Li Zhe In the current implementation of the longterm pin_user_pages() function, we invoke the collect_longterm_unpinnable_folios() function. This function iterates through the list to check whether each folio belongs to the "longterm_unpinnabled" category. The folios in this list essentially correspond to a contiguous region of user-space addresses, with each folio representing a physical address in increments of PAGESIZE. If this user-space address range is mapped with large folio, we can optimize the performance of function collect_longterm_unpinnable_folios() by reducing the using of READ_ONCE() invoked in pofs_get_folio()->page_folio()->_compound_head(). Also, we can simplify the logic of collect_longterm_unpinnable_folios(). Instead of comparing with prev_folio after calling pofs_get_folio(), we can check whether the next page is within the same folio. The performance test results, based on v6.15, obtained through the gup_test tool from the kernel source tree are as follows. We achieve an improvement of over 66% for large folio with pagesize=2M. For small folio, we have only observed a very slight degradation in performance. Without this patch: [root@localhost ~] ./gup_test -HL -m 8192 -n 512 TAP version 13 1..1 # PIN_LONGTERM_BENCHMARK: Time: get:14391 put:10858 us# ok 1 ioctl status 0 # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0 [root@localhost ~]# ./gup_test -LT -m 8192 -n 512 TAP version 13 1..1 # PIN_LONGTERM_BENCHMARK: Time: get:130538 put:31676 us# ok 1 ioctl status 0 # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0 With this patch: [root@localhost ~] ./gup_test -HL -m 8192 -n 512 TAP version 13 1..1 # PIN_LONGTERM_BENCHMARK: Time: get:4867 put:10516 us# ok 1 ioctl status 0 # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0 [root@localhost ~]# ./gup_test -LT -m 8192 -n 512 TAP version 13 1..1 # PIN_LONGTERM_BENCHMARK: Time: get:131798 put:31328 us# ok 1 ioctl status 0 # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0 Signed-off-by: Li Zhe --- Changelogs: v2->v3: - Update performance test data based on v6.15. - Refine the description of the optimization approach in commit message. - Fix some issues of code formatting. - Fine-tune the conditions for entering the optimization path. v1->v2: - Modify some unreliable code. - Update performance test data. v2 patch: https://lore.kernel.org/all/20250604031536.9053-1-lizhe.67@bytedance.com/ v1 patch: https://lore.kernel.org/all/20250530092351.32709-1-lizhe.67@bytedance.com/ mm/gup.c | 37 +++++++++++++++++++++++++++++-------- 1 file changed, 29 insertions(+), 8 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 84461d384ae2..9fbe3592b5fc 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2317,6 +2317,31 @@ static void pofs_unpin(struct pages_or_folios *pofs) unpin_user_pages(pofs->pages, pofs->nr_entries); } +static struct folio *pofs_next_folio(struct folio *folio, + struct pages_or_folios *pofs, long *index_ptr) +{ + long i = *index_ptr + 1; + + if (!pofs->has_folios && folio_test_large(folio)) { + const unsigned long start_pfn = folio_pfn(folio); + const unsigned long end_pfn = start_pfn + folio_nr_pages(folio); + + for (; i < pofs->nr_entries; i++) { + unsigned long pfn = page_to_pfn(pofs->pages[i]); + + /* Is this page part of this folio? */ + if (pfn < start_pfn || pfn >= end_pfn) + break; + } + } + + if (unlikely(i == pofs->nr_entries)) + return NULL; + *index_ptr = i; + + return pofs_get_folio(pofs, i); +} + /* * Returns the number of collected folios. Return value is always >= 0. */ @@ -2324,16 +2349,12 @@ static void collect_longterm_unpinnable_folios( struct list_head *movable_folio_list, struct pages_or_folios *pofs) { - struct folio *prev_folio = NULL; bool drain_allow = true; - unsigned long i; - - for (i = 0; i < pofs->nr_entries; i++) { - struct folio *folio = pofs_get_folio(pofs, i); + struct folio *folio; + long i = 0; - if (folio == prev_folio) - continue; - prev_folio = folio; + for (folio = pofs_get_folio(pofs, i); folio; + folio = pofs_next_folio(folio, pofs, &i)) { if (folio_is_longterm_pinnable(folio)) continue; -- 2.20.1