From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F103C4167B for ; Mon, 4 Dec 2023 11:46:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9ED026B025B; Mon, 4 Dec 2023 06:46:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 99C956B026B; Mon, 4 Dec 2023 06:46:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 86E9E6B026C; Mon, 4 Dec 2023 06:46:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 775036B025B for ; Mon, 4 Dec 2023 06:46:49 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 596841A01CA for ; Mon, 4 Dec 2023 11:46:49 +0000 (UTC) X-FDA: 81528958938.01.8B3D5C6 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf07.hostedemail.com (Postfix) with ESMTP id 61EB240003 for ; Mon, 4 Dec 2023 11:46:47 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf07.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701690407; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VgvjH4z9PV6dGV2P0KwRZ5X6FGEjwnEUDs9NnSuUY6g=; b=IPlezFZPQGOfQ12DW/2R89Lt2DKgGq+CZq14Z+4c6esFK47bgcpHkV0ISoOJ37S+B/PKxt AJ06H6Yf/nBWsSEJ7GAuOxZ7/VOa33WDilp3apsJ/OptBKd7ppAkWAqltyzOn+C+D5Dw2v 85lZWqqzfrSUPpwXmKLEa5SlE2+rHs4= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf07.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701690407; a=rsa-sha256; cv=none; b=Z9K19+AoQojMfEfjVaEZVgqFSoQ5wio+uSfgFHjtUgckAC1IqKOliTxRYHpjBCxQZhMthz 2Dia5Ulz5dAe7BpESJp7GlVzcdNymw6CtaUNCEyVOths2pHxSVXArfrpUqAyq4J7HMPJpR CTphtG2sjmyYyg4UK7J11+RkFOSShr0= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A186F152B; Mon, 4 Dec 2023 03:47:33 -0800 (PST) Received: from [10.57.73.130] (unknown [10.57.73.130]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 517F83F6C4; Mon, 4 Dec 2023 03:46:43 -0800 (PST) Message-ID: <0c446883-7f01-406f-bddd-8e78b989d644@arm.com> Date: Mon, 4 Dec 2023 11:46:41 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH RFC 06/12] mm/gup: Drop folio_fast_pin_allowed() in hugepd processing Content-Language: en-GB To: Christophe Leroy , Peter Xu Cc: Matthew Wilcox , Christoph Hellwig , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , Andrea Arcangeli , James Houghton , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , John Hubbard , Yang Shi , Rik van Riel , Hugh Dickins , Jason Gunthorpe , Axel Rasmussen , "Kirill A . Shutemov" , Andrew Morton , "linuxppc-dev@lists.ozlabs.org" , Mike Rapoport , Mike Kravetz References: <20231116012908.392077-7-peterx@redhat.com> <510adc26-9aed-4745-8807-dba071fadbbe@arm.com> <283da12c-14f1-4255-b3c4-ab933f3373c4@csgroup.eu> <01aad92f-b1e0-4f31-b905-8b1c2012ebab@arm.com> <97c21205-f3e6-4634-82e6-c7bbd81d1835@csgroup.eu> From: Ryan Roberts In-Reply-To: <97c21205-f3e6-4634-82e6-c7bbd81d1835@csgroup.eu> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 61EB240003 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: urgwas8urekcz91idamcmu5ganimjx91 X-HE-Tag: 1701690407-141844 X-HE-Meta: U2FsdGVkX189A9/CYknUbnBfVL+s/ZtfBIGtKaZJLa8vcj1bGVCXqI1Pb2joA8eZR8uFZ3v/GLWGT3qSKw5mM2KKGkQ1/4JDppKaJ0f8WHe9AOfGUe4PRDblDsBhjVQIrJqlVy8Dxpl85TSf2W2HsoIh0j7YbgPeKFgrO2GuzZSTDPIBDhs0oemJDhaareM8AdLbRzI0cdvwV9HC8eM6jvWps0Hn0X0GO7erFCEX8sMBOmE9pPB+0a77y7eYKOiIcD3Np+PbHpzwfLp5fJeMSlkb23PtypauLNOkACWbX6GNHMMPgikLtbPB7bEAwUvh5Rsm1jqjJE2CtaFdAt0DH/J02HIZZgZWP28ps0+Nfgag2wZv9eiPhbTkDV4Vqo4txLN8J6pgJpKbXvpkBUVLo45nbrWLy96RHLjGoGxHiL1uPAMyq+jZ/FO1LzQW3la0Q3YFsXqK5QqnxXJjhjOuq7GkPHg7mog4vCwwKR2ByfbycJuJf1KBUOZEvju2PiDA3osldz30WU7275O1t6b01Yd8NVHKLhWRB1q40dx+7j22KMYlxKjSENsNRbUEm6k8zjXODba2k1PC/qn2gjbpejW5LyEPXrcQCK6ZckCLDYIbaZXi+dptdUNnUys0yTaIshOglCCpH3hNr8RtbyOckVYFpAi3aDzQzz1IogzLHlj9ovvej89Yt8e/w7RqXUv63D2x7lAp2bLxH6JFhSk0Q7cPNMhEODtYz4HyB2wd8oMqyUEkhRKN4SrSCaGkGf9TYrPEkstqPj2vzt9DQY5WW/QKd7bOPzTFQpzrqEJwiPuke0EaQyyJFhZL3bE3ThAfdNPOHmQuhWzyITH5DozfO/lgJf6QbFkOxD1TFydm/PeFuYQb3Un82OX8olZeO3H0XywSbBgOYOUmRhWz19zxI7s8/9lkJIJ0rt6TY5FuP0DM5hcfEPmKMho62Y/+JHsgG9y2+xln0rIwgD5BLaP ds8dsPZn dpUdzp9vQ49JhKtl9NqytR4PG23Nz++SLPAVDzglFX+RYU0gIYmjWrFO99vJT7BdGQpgQDKnBqMXK2X7A5jywG4j8mHjcHd3viey4p3a5b0hphVeOm1Mww90PK9UpWt9UjcAZEpDn4rMUQ3o8Fdk5SIJAA7Wbsctutsh0zPlISJzH7p8uG7voKIGuZVx1bbbMNy2xTW78tc41FaeyAiVzYEht4T4VvoGwtmLt6HntppAaSngr3ktgBusj+YIlg+mMVq+i2uJnufLji65P6CfY2iojGcFB1/31oTvsqVJZAP7UVoD4ZzR7G4qamOC9uroovRP+KBIoDlVsGEsGGcZtgoTBoQINbHvVm/VU X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 04/12/2023 11:25, Christophe Leroy wrote: > > > Le 04/12/2023 à 12:11, Ryan Roberts a écrit : >> On 03/12/2023 13:33, Christophe Leroy wrote: >>> >>> >>> Le 30/11/2023 à 22:30, Peter Xu a écrit : >>>> On Fri, Nov 24, 2023 at 11:07:51AM -0500, Peter Xu wrote: >>>>> On Fri, Nov 24, 2023 at 09:06:01AM +0000, Ryan Roberts wrote: >>>>>> I don't have any micro-benchmarks for GUP though, if that's your question. Is >>>>>> there an easy-to-use test I can run to get some numbers? I'd be happy to try it out. >>>>> >>>>> Thanks Ryan. Then nothing is needed to be tested if gup is not yet touched >>>>> from your side, afaict. I'll see whether I can provide some rough numbers >>>>> instead in the next post (I'll probably only be able to test it in a VM, >>>>> though, but hopefully that should still reflect mostly the truth). >>>> >>>> An update: I finished a round of 64K cont_pte test, in the slow gup micro >>>> benchmark I see ~15% perf degrade with this patchset applied on a VM on top >>>> of Apple M1. >>>> >>>> Frankly that's even less than I expected, considering not only how slow gup >>>> THP used to be, but also on the fact that that's a tight loop over slow >>>> gup, which in normal cases shouldn't happen: "present" ptes normally goes >>>> to fast-gup, while !present goes into a fault following it. I assume >>>> that's why nobody cared slow gup for THP before. I think adding cont_pte >>>> support shouldn't be very hard, but that will include making cont_pte idea >>>> global just for arm64 and riscv Svnapot. >>> >>> Is there any documentation on what cont_pte is ? I have always wondered >>> if it could also fit powerpc 8xx need ? >> >> pte_cont() (and pte_mkcont() and pte_mknoncont()) test and manipulte the >> "contiguous bit" in the arm64 PTE entries. Those helpers are arm64-specific >> (AFAIK). The contiguous bit is a hint to the HW to tell it that a block of PTEs >> are mapping a physically contiguous and naturally aligned piece of memory. The >> HW can use this to coalesce entries in the TLB. When using 4K base pages, the >> contpte size is 64K (16 PTEs). For 16K base pages, its 2M (128 PTEs) and for 64K >> base pages, its 2M (32 PTEs). >> >>> >>> On powerpc, for 16k pages, we have to define 4 consecutive PTEs. All 4 >>> PTE are flagged with the SPS bit telling it's a 16k pages, but for TLB >>> misses the HW needs one entrie for each 4k fragment. >> >> From that description, it sounds like the SPS bit might be similar to arm64 >> contiguous bit? Although sounds like you are currently using it in a slightly >> different way - telling kernel that the base page is 16K but mapping each 16K >> page with 4x 4K entries (plus the SPS bit set)? > > Yes it's both. > > When the base page is 16k, there are 4x 4k entries (with SPS bit set) in > the page table, and pte_t is a table of 4 'unsigned long' > > When the base page is 4k, there is a 16k hugepage size, which is the > same 4x 4k entries with SPS bit set. > > So it looks similar to the contiguous bit. > > > And by extension, the same principle is used for 512k hugepages, the bit > _PAGE_HUGE is copied by the TLB miss handler into the lower bit of PS, > PS being as follows: > - 00 Small (4 Kbyte or 16 Kbyte) > - 01 512 Kbyte > - 10 Reserved > - 11 8 Mbyte > > So as PMD size is 4M, 512k pages are 128 identical consecutive entries > in the page table. > > I which I could have THP with 16k or 512k pages. Then you have come to the right place! :) https://lore.kernel.org/linux-mm/20231204102027.57185-1-ryan.roberts@arm.com/ > > Christophe