From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6CD8C4167B for ; Mon, 4 Dec 2023 12:02:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4F19C6B026F; Mon, 4 Dec 2023 07:02:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4A1A96B02BD; Mon, 4 Dec 2023 07:02:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 369506B02BE; Mon, 4 Dec 2023 07:02:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 271A66B026F for ; Mon, 4 Dec 2023 07:02:47 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id E57E21201B2 for ; Mon, 4 Dec 2023 12:02:46 +0000 (UTC) X-FDA: 81528999132.16.5961630 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf16.hostedemail.com (Postfix) with ESMTP id C661D18002B for ; Mon, 4 Dec 2023 12:02:44 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf16.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701691365; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RusDy5/2nrrdWVplLUN6doyU64gejKpv1ZOAICrfxZo=; b=UKe6oyCOBYdSCz1A6f1Ivy65DwxFsR5OJNDN5g9GmaUALm3Joo6cPwhItFHO7olNRXU1Bc fUPNvsF1dZLVYuN/NtL+adhAHJbL3zVQwSO7qZUSykV8LpK5jzv3vywCnzzjHXIhBh+Jjx ywo8IkZZHoHxIBKJUp+nGhNDQQ9R3cQ= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf16.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701691365; a=rsa-sha256; cv=none; b=OW7RqTjMsMGEMPSzhuckmZToonRMMyYCjJqx0yBZhTdrFO24dvSgDKf5Iyd5WAxyQBPocQ d1urMLMf0NKyowWmXaaYsQ5fgARinLbUnUnestmmbIX+m7ohBcmXebtgksXeDgI5FAuTWT BOwWYo5gywppb7XOSorOLkU2tEBrbFQ= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id F2599152B; Mon, 4 Dec 2023 04:03:30 -0800 (PST) Received: from [10.57.73.130] (unknown [10.57.73.130]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 831AE3F5A1; Mon, 4 Dec 2023 04:02:40 -0800 (PST) Message-ID: <11601011-07af-4662-8ee4-f98785f75e29@arm.com> Date: Mon, 4 Dec 2023 12:02:38 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH RFC 06/12] mm/gup: Drop folio_fast_pin_allowed() in hugepd processing Content-Language: en-GB To: Christophe Leroy , Peter Xu Cc: Matthew Wilcox , Christoph Hellwig , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , Andrea Arcangeli , James Houghton , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , John Hubbard , Yang Shi , Rik van Riel , Hugh Dickins , Jason Gunthorpe , Axel Rasmussen , "Kirill A . Shutemov" , Andrew Morton , "linuxppc-dev@lists.ozlabs.org" , Mike Rapoport , Mike Kravetz References: <20231116012908.392077-7-peterx@redhat.com> <510adc26-9aed-4745-8807-dba071fadbbe@arm.com> <283da12c-14f1-4255-b3c4-ab933f3373c4@csgroup.eu> <01aad92f-b1e0-4f31-b905-8b1c2012ebab@arm.com> <97c21205-f3e6-4634-82e6-c7bbd81d1835@csgroup.eu> <0c446883-7f01-406f-bddd-8e78b989d644@arm.com> <8c7fe945-ee34-4eb6-b466-5707660c7723@csgroup.eu> From: Ryan Roberts In-Reply-To: <8c7fe945-ee34-4eb6-b466-5707660c7723@csgroup.eu> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Stat-Signature: rgo95a1a989jfpeqszb9fnmuyfcpu93z X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: C661D18002B X-HE-Tag: 1701691364-919153 X-HE-Meta: U2FsdGVkX1/nsWd853o37eYfMvQQuNwBVajObQlvnY8d1Mcj2T7HJvyfFQSJjXrb96X3+oPGQt9VL36CHld+Z9e4zQqvqzytdOw7yN2HPE6NILAAkz1Lu519S7MWYe3vV6rQfaOXjKfdu+Ko/aRSJC50MenHTWSaneJWxT0LXXrBcwRMlUxi2tsJQ/dhzBXprXByLE2V55Lzc8Q01uiil99dpwYgwOvTIagAqk9r0cWQHqHc5XY2i//mjvcYFWrUg3wAYuGjls06l5/Lz6r3ksc/Kgbrypd+oSADGnB0krmSae9eBvYbYLOIn0A3hA6ImuRg9aKGEs5bvoZpjjulRGZ5iYXzqMQAWQyjAqjjyTNDB/H2J8im1LSegMqFdsh6jxHf1TRoJn2I097yJSPq6SH9ZIrtuIHvS6ojv22tql/j3aPhhLDn31YIcDYjsaMEXR+ORScpTZWsvXUhUQ7uZR4DRcz5Y7A6nOuo/9fey+G08p5RqfwuLCifLpQ3LTcgLV3ZCWTrxaCD6/CJfa03I5yyvob7/V1mpGCI7ywZQEVcugPQF0Ks88aql79/jBO5rNQZDb+EHPFQvsL9UcbpsXnpNe2CVXNpII75OYOd+uj2+BD0/FkaNd4K88OZ8FE6ylOOe99RJ7HzlznvTMFzGQ89fCOVOzrQTbe6gPotF1zVfY9t9mW1GvpSb2NUawTK4Ih3X1AH0nxYx+RGE0o2xk6GxbxrTMe0lM/p/4a8lSK0IxMQ4Fb6rC4qChGoTeYS4Ves9904ngQY1NBcGaJbn+PkeF7joHRmyNyZSXcA6QsTFnkX9ErE+GBhH5YK3VULM7za7T3kglDjyGAfPasbWg7D5mD7y4EpoTcOOkyVh4wVtukc8FLRAh91Npqo0h0ozD5bLivZiHyDFACxphoqccd3E85+grCTQq0zTNI+DYhr1Hyk8ckCQILgM4Wye41UDmpYTxxgKD1S+KebquC GFVeu0zr LkdBX//u5lOy9Y7JAST6/beilm7MVn17jJb72Po4B5U2uTdQ5SUCAq4n/Bvctt3zmH+c/2WaiOe4uZp0IwdHJHblQhkLbq8fhTZpEL05dxFuftjgK3KZVsatf98AvLSyrTDvGkq5crmOawo6UNd2/WjlL1RvbmKI8c/aMoLi0i4Mv9GS1D36gV4sFN/bnpLvj5zkDTkdVopmzyafFOZ4vcbyFaaMfOGdkP6lw/qLb3vatxWfPtM0sIGVF05JKTKkNsGfc31NN1cwLzSx/EN4omWqd42dz/A+B+2KprpzHvw1qEKdR09NXUtRNZVHF5Fcu+rENPFqvDa8UZlsFC8pAdspbr1e9HS5z5nlE X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 04/12/2023 11:57, Christophe Leroy wrote: > > > Le 04/12/2023 à 12:46, Ryan Roberts a écrit : >> On 04/12/2023 11:25, Christophe Leroy wrote: >>> >>> >>> Le 04/12/2023 à 12:11, Ryan Roberts a écrit : >>>> On 03/12/2023 13:33, Christophe Leroy wrote: >>>>> >>>>> >>>>> Le 30/11/2023 à 22:30, Peter Xu a écrit : >>>>>> On Fri, Nov 24, 2023 at 11:07:51AM -0500, Peter Xu wrote: >>>>>>> On Fri, Nov 24, 2023 at 09:06:01AM +0000, Ryan Roberts wrote: >>>>>>>> I don't have any micro-benchmarks for GUP though, if that's your question. Is >>>>>>>> there an easy-to-use test I can run to get some numbers? I'd be happy to try it out. >>>>>>> >>>>>>> Thanks Ryan. Then nothing is needed to be tested if gup is not yet touched >>>>>>> from your side, afaict. I'll see whether I can provide some rough numbers >>>>>>> instead in the next post (I'll probably only be able to test it in a VM, >>>>>>> though, but hopefully that should still reflect mostly the truth). >>>>>> >>>>>> An update: I finished a round of 64K cont_pte test, in the slow gup micro >>>>>> benchmark I see ~15% perf degrade with this patchset applied on a VM on top >>>>>> of Apple M1. >>>>>> >>>>>> Frankly that's even less than I expected, considering not only how slow gup >>>>>> THP used to be, but also on the fact that that's a tight loop over slow >>>>>> gup, which in normal cases shouldn't happen: "present" ptes normally goes >>>>>> to fast-gup, while !present goes into a fault following it. I assume >>>>>> that's why nobody cared slow gup for THP before. I think adding cont_pte >>>>>> support shouldn't be very hard, but that will include making cont_pte idea >>>>>> global just for arm64 and riscv Svnapot. >>>>> >>>>> Is there any documentation on what cont_pte is ? I have always wondered >>>>> if it could also fit powerpc 8xx need ? >>>> >>>> pte_cont() (and pte_mkcont() and pte_mknoncont()) test and manipulte the >>>> "contiguous bit" in the arm64 PTE entries. Those helpers are arm64-specific >>>> (AFAIK). The contiguous bit is a hint to the HW to tell it that a block of PTEs >>>> are mapping a physically contiguous and naturally aligned piece of memory. The >>>> HW can use this to coalesce entries in the TLB. When using 4K base pages, the >>>> contpte size is 64K (16 PTEs). For 16K base pages, its 2M (128 PTEs) and for 64K >>>> base pages, its 2M (32 PTEs). >>>> >>>>> >>>>> On powerpc, for 16k pages, we have to define 4 consecutive PTEs. All 4 >>>>> PTE are flagged with the SPS bit telling it's a 16k pages, but for TLB >>>>> misses the HW needs one entrie for each 4k fragment. >>>> >>>> From that description, it sounds like the SPS bit might be similar to arm64 >>>> contiguous bit? Although sounds like you are currently using it in a slightly >>>> different way - telling kernel that the base page is 16K but mapping each 16K >>>> page with 4x 4K entries (plus the SPS bit set)? >>> >>> Yes it's both. >>> >>> When the base page is 16k, there are 4x 4k entries (with SPS bit set) in >>> the page table, and pte_t is a table of 4 'unsigned long' >>> >>> When the base page is 4k, there is a 16k hugepage size, which is the >>> same 4x 4k entries with SPS bit set. >>> >>> So it looks similar to the contiguous bit. >>> >>> >>> And by extension, the same principle is used for 512k hugepages, the bit >>> _PAGE_HUGE is copied by the TLB miss handler into the lower bit of PS, >>> PS being as follows: >>> - 00 Small (4 Kbyte or 16 Kbyte) >>> - 01 512 Kbyte >>> - 10 Reserved >>> - 11 8 Mbyte >>> >>> So as PMD size is 4M, 512k pages are 128 identical consecutive entries >>> in the page table. >>> >>> I which I could have THP with 16k or 512k pages. >> >> Then you have come to the right place! :) >> >> https://lore.kernel.org/linux-mm/20231204102027.57185-1-ryan.roberts@arm.com/ >> > > That looks great. That series only modifies core mm/ . > No changes needed in arch ? Will it work on powerpc without any > change/additions to arch code ? Yes there are also changes needed in arch; I have a separate series for arm64, which transparently manages the contiguous bit when it sees appropriate PTEs: https://lore.kernel.org/linux-arm-kernel/20231204105440.61448-1-ryan.roberts@arm.com/ > > Well, I'll try it soon. > > Christophe