From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5EB3FC3600B for ; Thu, 27 Mar 2025 11:55:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E66E32800E6; Thu, 27 Mar 2025 07:55:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E14952800DB; Thu, 27 Mar 2025 07:55:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CDDD22800E6; Thu, 27 Mar 2025 07:55:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id AE65E2800DB for ; Thu, 27 Mar 2025 07:55:06 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 68F0F120852 for ; Thu, 27 Mar 2025 11:55:07 +0000 (UTC) X-FDA: 83267175054.01.05E5764 Received: from out30-110.freemail.mail.aliyun.com (out30-110.freemail.mail.aliyun.com [115.124.30.110]) by imf30.hostedemail.com (Postfix) with ESMTP id B4D7C80015 for ; Thu, 27 Mar 2025 11:55:03 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=Cnbwo586; spf=pass (imf30.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.110 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743076505; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vr/c3ISrDk3OHC8wgPBlpsW0FwhldAhscNXOQWzhCaQ=; b=1ShomrOw3M9SDJ5AhBKL1MppO/qLX/n5q4ypKuJkcZtE3loc/Igo+gplLt/l9n2myaIAOd 3p/ABj4yP+fOBASHODD+FWiJ2xC5R8bvjmq8YLwYxGt7zBnU5AVTCkqIzD0NYYqYfa7VZG ggZ3dXuok/oakH8NuMLdl6kKi+62vkg= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=Cnbwo586; spf=pass (imf30.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.110 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743076505; a=rsa-sha256; cv=none; b=AWcBgOTx2moQGqF2HzJF5Zuy59AAI9oYnz9Pn3GTkHxmksCdx+rsl/d31TG3EjyQ8vqH5+ kgEU+/d+2/lRLUOeo/em1oc5JTVsU2rOi7Ir8MGq3JktohCDjgboG2WnxQBix47tbUiqI+ SzQUTEyVC6JONwZ8GZWibpU5pVJCZa8= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1743076500; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=vr/c3ISrDk3OHC8wgPBlpsW0FwhldAhscNXOQWzhCaQ=; b=Cnbwo586bQn/VLK3OTHDD2h0MjkysyhBLQKcnARUZ0v3XOF2SuGyNngG6nfOAYSkjbhdPmRiNmea7Rio772Ro4sg8+ABwx5qM/Ih3tQl0fxVxIKzMDU++YkSlSpt/twBWmo2zAtptEgUjPjR0BE/pmE3N/ZlXdNudMB1QOMpUn8= Received: from 30.39.188.151(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WT9Z7VA_1743076495 cluster:ay36) by smtp.aliyun-inc.com; Thu, 27 Mar 2025 19:54:56 +0800 Message-ID: <2f44fda6-c20c-4d90-ae83-e650c43a16ff@linux.alibaba.com> Date: Thu, 27 Mar 2025 19:54:55 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 2/2] mm: mincore: use folio_pte_batch() to batch process large folios To: Oscar Salvador Cc: akpm@linux-foundation.org, hughd@google.com, willy@infradead.org, david@redhat.com, 21cnbao@gmail.com, ryan.roberts@arm.com, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <7ad05bc9299de5d954fb21a2da57f46dd6ec59d0.1742960003.git.baolin.wang@linux.alibaba.com> From: Baolin Wang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: B4D7C80015 X-Rspamd-Server: rspam05 X-Rspam-User: X-Stat-Signature: pqfsinfrfcmcpp19t13ite3iqg8mupo5 X-HE-Tag: 1743076503-18637 X-HE-Meta: U2FsdGVkX18Lmws6pV6HArmCL21T0ePuVuKmT86BPdCvj8KQXCRZdO/4g+FnRF/OrpHPixOe5P44Af6UxvHm250yAHGnjCNx3wB6WmJ7ub4ZzWdxhmsJhB3yURSd2hDsMcp+0oAkM0VuR/kSW4ku76ULrbCFg3c+oO7ptU/S4sqyOqFvR8Ibc32XmDWEnJ5GvC9nci8Yu4SovVEGzPN3jF9i4wNbh1SDvBz5tMDe01binyPssQIDh7A8IikaV/g0Fm1YFF1p+Rxkz93aLAqszGW4nAi5i00OH6X0x4SOFfcSVRauYIp30YgWIBD+dGkZXPF6NEerAuaB9ZsV5sR2aEwqWxnoRZj2FXloz/stxQbLEdZnJTiUhkdzrc73q8Kjmbypkp15l0J/FGI7vKU6P6bhOMZb5qmUYJVP3pvWnHrFVGkQNd0S0O5K10Cl+ZU/GwNDnHwFbOvJwUhu9pLzslcWkAZOVuGiwYcxn+MnhoGZGx+Ddh75mMjvfnsnrZ5Z3ZobE06F2boUpBEO1EtdQgnZPpEREN6ScSMXIL9Xql6LI6Id1y8NXpEi6HxESJv2nhZnNI9nuP/RfaCNXK4APcAeMwOcTSTWmGuRDR+Q5ncliKkgbZU2n+8aU0bPlcxBY/dcI9yWWxDl98WlA0dbsYMEWMTzCzqbqmKgEDBwsd1fAMW1MlRpZLejZ4yOzuk6i2Jvt01sOcA50RbtebButINSHTzk5AOpzGY3ZAZiXSK4RZqTiypwNXB3kkj+37BFaVQiUK8dSAlyykOvQ7nf8LFUbah9Smzl2NUJcOvsYH5z1mJMwtLvIi54PcJqglo5GAL6iSCDfeQzOVBYvQKnDoHOw2N0hrHCdUsu3BDPROXK9B3kviqn1SwJqWsD2BQ4NRVKYiUQxd0MPMZ8aVJr/fbI+HLbAu1A9rWWI3BmrPL5YBQnoOKRFjwE5L8Ylrs+nC3BR9TKN4syVpBB+/W Vurlxf4B qjkpIhJgu1ZgYJh9cvbG/a3mo6AvmCHsbBp6SO7RCumyeYY88OFo5AsjKzwGJRKIMOcrMMo8/DsvM6P+qV7muImJVUs93VzWLnmUSEWIdz4rL2sNS1h0CH46MTC6xpZZBH9egw/auMBbnhCA24IzpHo5/pNGn3cbyGOVNap+Y+nNAJ/GvhKgp9aq4Aitb+lCQGY1tH1ICqMI7Q9ur5I4kE1oTTDQvsVbGCY92r/pD3OfX2sVmEHnLyslbxSl0svSn6gCno+BrsWlOq/ipHpsF1LObFRgVWPDnH7liaTV27k8SLJk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000013, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/3/27 18:49, Oscar Salvador wrote: > On Wed, Mar 26, 2025 at 11:38:11AM +0800, Baolin Wang wrote: >> @@ -118,16 +120,31 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, >> walk->action = ACTION_AGAIN; >> return 0; >> } >> - for (; addr != end; ptep++, addr += PAGE_SIZE) { >> + for (; addr != end; ptep += step, addr += step * PAGE_SIZE) { >> pte_t pte = ptep_get(ptep); >> >> + step = 1; >> /* We need to do cache lookup too for pte markers */ >> if (pte_none_mostly(pte)) >> __mincore_unmapped_range(addr, addr + PAGE_SIZE, >> vma, vec); >> - else if (pte_present(pte)) >> - *vec = 1; >> - else { /* pte is a swap entry */ >> + else if (pte_present(pte)) { >> + if (pte_batch_hint(ptep, pte) > 1) { > > AFAIU, you will only batch if the CONT_PTE is set, but that is only true for arm64, > and so we lose the ability to batch in e.g: x86 when we have contiguous > entries, right? > > So why not have folio_pte_batch take care of it directly without involving > pte_batch_hint here? Good question, this was the first approach I tried. However, I found there was a obvious performance regression with small folios (where CONT_PTE is not set). I think the overhead introduced by vm_normal_folio() and folio_pte_batch() is greater than the optimization gained from batch processing small folios. For large folios where CONT_PTE is set, ptep_get()--->contpte_ptep_get() wastes a significant amount of CPU time, so using folio_pte_batch() can improve the performance obviously.