From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11A95C636CC for ; Wed, 8 Feb 2023 02:26:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5A67D6B0071; Tue, 7 Feb 2023 21:26:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 557016B0072; Tue, 7 Feb 2023 21:26:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3F8D56B0073; Tue, 7 Feb 2023 21:26:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 2F4076B0071 for ; Tue, 7 Feb 2023 21:26:44 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 092641C27A3 for ; Wed, 8 Feb 2023 02:26:44 +0000 (UTC) X-FDA: 80442536328.09.3754110 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf14.hostedemail.com (Postfix) with ESMTP id 5AE14100011 for ; Wed, 8 Feb 2023 02:26:41 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=GF1io29b; spf=none (imf14.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675823201; a=rsa-sha256; cv=none; b=I6iQEUEM0hgfqQ6XXds6FF4OPfqTYdPiAbgfozFfo6AAAtqOHwZnukoAU5WHUdovtcy4GE ZoDFU3JjSfyUZuYCZcuNJnJ6Z8BXsihUCV9qoHqokIdvGFUBWPL7ADjc1Jy5skI5pOrqk0 OJqB0/loWF/smS9Y5vPjWXsT0wQEh/s= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=GF1io29b; spf=none (imf14.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675823201; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=REeIe7+XDVj7wp9nMYbqE0D2gjMDlfplKWNWAJ04W8s=; b=3SUY9OhQeEx9LpYoHCGG6OBxiVklAmtf7dFxlWE3MYcGtTHMgGwpobhvsb8rkNI1pHW3Bo k9mSvhYwZtqlHCtvNsVGe+cTHct83ECCHpro50tvfPj5l4ifhcgfCm/Nvvt3Fjc6jOpW4n /ZiKGP/MiiNXXO2Irm9Sqc4M0WsNULc= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=REeIe7+XDVj7wp9nMYbqE0D2gjMDlfplKWNWAJ04W8s=; b=GF1io29b+KVcWRYxMA7oDJHtiZ 832slwJsfrpK3MdWlO2f8+79IuSW9JRPXw/c0Q10s0mXkJF/hYRFd7lfRHuffISUD3EMMgIFdGEAC Yqxe5sSDEyVEg1GBkw2RRIsBIm85dRGbU65gIoAo01SRy/T0fFt5VNL3LPw7ezTvfa5l67aV0Y5Yk nObgFpByMYAivzsoofGtqXM8M8UAdVmAsz5VG7evBd81qPDOJGDWi84oJjAIUZO9AVzLkCPuoFLZb QPC/gW/Fx9WIl4hoGHDISPFO07Z8e+HYdGh67qfACJ1ctJxzgmJNN9LEfxlQkdtOEFJk+Ek3gET2W mnR/gdfw==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1pPaAN-000moA-AU; Wed, 08 Feb 2023 02:26:35 +0000 Date: Wed, 8 Feb 2023 02:26:35 +0000 From: Matthew Wilcox To: James Houghton Cc: linux-mm@kvack.org, Vishal Moola , Hugh Dickins , Rik van Riel , David Hildenbrand , "Yin, Fengwei" Subject: Re: Folio mapcount Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Queue-Id: 5AE14100011 X-Rspamd-Server: rspam01 X-Stat-Signature: e1ifzbhu1gubn6p4mmhqc5dtq3agrtmi X-HE-Tag: 1675823201-853670 X-HE-Meta: U2FsdGVkX19SF/Tjv0jMXrAXR4hX2qdhncP3ZQwhpENLE8TN8zyA3eXymzCute42i5gOt5Tv2/NdUVjAtvfQYqNOo77ftG5myZsGJqfWuTSEfoycXcGU8uOVP9rDYvXLbSrNDIvsF9Dmc0t/hSm0Wdh2skfrUrJlNNwkecgzfSf4WbJNQe9Ai9J947p1Hb01pT5aSROZuskCId3oNARBiaE3DandGdXo8eX6LyzMXsigpuYvZrWmKyW7CvzJRGeEY/3t5P4pTr3O8rU8TfqF6GWgIGj7SCWSoSt4Olgb0O5p5/7AkgdGn524+Yrvj5JWzwfY7J7a3iZZRjSVXYJS7dIqsFi+75lh2eGVWtad5ng17+KfO4P50y5UPkyLLteBY1Ef3PJrJno1THVKl7nOwtBcNGl62heb3Nc9seh+gJl5U5B9VznMA+rjoEzA4DBh0rEc9Tcb6NVi7jD9D/F19Le1/UyHXEepV7qkUdVCtc1Xe1pJmcuUKXa3yU8Z/zfbAGyL1wUTM7kvxsoFOYF7LiHQigio7l/PGKplrTHZyfsQvQR6JCDqy3RMO0CU7d/6drJQ0sV/SCqlIN+BIeTl+RINO7DB2tW78yOlyg8qfHt49f2cpMt2B3oPh4TKqoRWqAO15VrZNppjRKAx0q2DG//X1rszgIt+Ip+dr/SsqHMexngZE9vFOKqTc2t6JwH6AEYetHU2nrqTTyXu91E/dAo2T/TIZcI6lVw7SQCwr2ZRJAfPEndwo/4CUoiXL14A/P60ERp45Sm+nKco2QykMJ40En8UoVGRhAS1wuPIwpui9u1RRRJ/tGN/XtlStwpTPBoYQbadd6Msppo2Cnp7U1nqgvBDThwEsgiihmzw9J8th932/X3P1FzlNtXfp18x1AiToRbcVunIUB+76MCLiaOmEn88HDN1y2jQG5hNqgrJtCMnUVj/L2xWbmbLeSjpNUC7+X0bmdXxJuvZM/J b9R8X1e5 CQZi5CXFnrnmbe2QYRG9DHMgsZCJv0/7O7b4/ON6JmJ0Sobsi9Domvn04evPaLk2JKc06MG3mo9enjN7eRboCsxjMPOZUSGruGIAEhfHOAjDwpikchOrE45f4mjm3uWmvCYJ7gXkf5iYZxCdTbuom0JY/3RyWEEbliyLwhoYrZKJDl+2f1OXJVJhuVvHze8PES4iHf1IokHjoppruLps9NiB55cPm2fgh5mMZ0PgdAmo88hkc/tv30WowELjkZ7PLXo44A+NLuVm9x/yZpnBuPeL9OknRcp8+nPRj X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Feb 07, 2023 at 04:35:30PM -0800, James Houghton wrote: > On Tue, Feb 7, 2023 at 3:35 PM Matthew Wilcox wrote: > > > > On Tue, Feb 07, 2023 at 03:27:07PM -0800, James Houghton wrote: > > > So page_vma_mapped_walk() might have to walk up to HPAGE_PMD_NR-ish > > > PTEs (if we find a bunch of pte_none() PTEs). Just curious, could that > > > be any slower than what we currently do (like, incrementing up to > > > HPAGE_PMD_NR-ish subpage mapcounts)? Or is it not a concern? > > > > I think it's faster. Both of these operations work on folio_nr_pages() > > entries ... but a page table is 8 bytes and a struct page is 64 bytes. > > From a CPU prefetching point of view, they're both linear scans, but > > PTEs are 8 times denser. > > > > > The other factor to consider is how often we do each of these operations. > > Mapping a folio happens ~once per call to mmap() (even though it's delayed > > until page fault time). Querying folio_total_mapcount() happens ... less > > often, I think? Both are going to be quite rare since generally we map > > the entire folio at once. > > Maybe this is a case where we would see a regression: doing PAGE_SIZE > UFFDIO_CONTINUEs on a THP. Worst case, go from the end of the THP to > the beginning (ending up with a PTE-mapped THP at the end). > > For the i'th PTE we map / i'th UFFDIO_CONTINUE, we have to check > `folio_nr_pages() - i` PTEs (for most of the iterations anyway). Seems > like this scales with the square of the size of the folio, so this > approach would be kind of a non-starter for HugeTLB (with > high-granularity mapping), I think. > > This example isn't completely contrived: if we did post-copy live > migration with userfaultfd, we might end up doing something like this. > I'm curious what you think. :) I think that's a great corner-case to consider. For hugetlb pages, we know they're PMD/PUD aligned, so _if_ there's a page table present, at least one page from the folio is already mapped, and we don't need to look in the page table to find which one. Similarly, if the folio is going to occupy the entire PMD/PUD if it's mapped in part, we don't need to iterate within it. And contrariwise, if it's p*d_none(), then definitely none of the pages are mapped. That perhaps calls for using a different implementation than page_vma_mapped_walk(), which should be worth it to optimise this case.