From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD891C02193 for ; Tue, 4 Feb 2025 20:19:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5B58F6B007B; Tue, 4 Feb 2025 15:19:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 564C36B0082; Tue, 4 Feb 2025 15:19:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3B7456B0083; Tue, 4 Feb 2025 15:19:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 205426B007B for ; Tue, 4 Feb 2025 15:19:25 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id AA55A1C61C8 for ; Tue, 4 Feb 2025 20:19:24 +0000 (UTC) X-FDA: 83083377048.26.8D8FBD7 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf03.hostedemail.com (Postfix) with ESMTP id 4949520011 for ; Tue, 4 Feb 2025 20:19:22 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Fa2Nf/H3"; spf=pass (imf03.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738700362; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RCcqumqitsEfBaWdL7MDmkHkLWV4mVcr8ZvcwDFO498=; b=alMQatYsAbHeEDAF2Eu1/63kMdvknKSnVRkEm2yWE0szlWh65tI+nQ5zXvSDsqsoFfUFv7 HudJXIrEL13CQJD4q3zGMUcp1dyQhtVIfmsvdyb9vbSUG1U3zJiUeCWjWJqvhHaQt6GAJD pW5eeAlrx9VLgr7FuJ/ZX2G120bVdE4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738700362; a=rsa-sha256; cv=none; b=deNSe5E0UuLHRze8Kx5ZvagQfrkAyvW9KxAQvbKiTb62BeUxrN+Q4fZYoQ+k8DasDjsssN cOvqK6RuM41sq6KM5V8jhSkUs9VjXzZd4Ypr69os7eXXmXvcPgIGr73fwM2EJrH9+AbFL+ UJP1S3JsxNKag+gwSz3ogku+CnEHHmE= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Fa2Nf/H3"; spf=pass (imf03.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1738700361; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=RCcqumqitsEfBaWdL7MDmkHkLWV4mVcr8ZvcwDFO498=; b=Fa2Nf/H3hlf+BfullluO4TxmgXVsIVcF/r2Z7Df9er565LMF0uZ5hMxrObtb2HbUVWvtiC shNRKzpXK1obBCcu3YtBpiQw09A8lKT2qA9uCYXSCvNeeGuoUs5Hjn572tV2CwJy4aUsBK 4Dhji02eg24ndUBO76xEnVoWhRZpzRs= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-656-d1r6Rqj8NamO1ph0Qws42w-1; Tue, 04 Feb 2025 15:19:20 -0500 X-MC-Unique: d1r6Rqj8NamO1ph0Qws42w-1 X-Mimecast-MFC-AGG-ID: d1r6Rqj8NamO1ph0Qws42w Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-4361a8fc3bdso30245955e9.2 for ; Tue, 04 Feb 2025 12:19:20 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738700359; x=1739305159; h=content-transfer-encoding:in-reply-to:organization:autocrypt :content-language:from:references:cc:to:subject:user-agent :mime-version:date:message-id:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=RCcqumqitsEfBaWdL7MDmkHkLWV4mVcr8ZvcwDFO498=; b=WbMxnNgSGfDN90Ksdjcbk7g7I03H38kWBtMt76/AsLw4foYYtzARrd6mmev4PWP5+O E9nlaOTVTZ6u3aclpMV0UzNsm0tnp6BgFKPn/eWMtXLwhP7GlozB05vmsGmrRQGcABZ7 CtDkEaVo8FSOhtUDKp1Cn0xWRL+a+CZ6892ym3JrzZbAF9WHITnrUNXhZ9E4RpUgKY/L krSbmkanM+5IgwCPED92wPK1TPN3PWCxzmlDqZKJOm1yg9CKsjU6nW3uMOaQrmp1qPTW GEt0lJnZEoEDe8TBYQdIeXl4QtbCcNaaCOPE6XVAu+/CbXwjxeymh+F9ToStXtlVwgDt /d9Q== X-Forwarded-Encrypted: i=1; AJvYcCWS65+lStvue715sInMf0ZHy0JmGHVvio4b3v5jSLMRGqsMJojoyosFgWXQMsvkbGOq1Hhp+e1vHg==@kvack.org X-Gm-Message-State: AOJu0Yy8C9DYSW77Q46B/63IwONOyGbcI3bN/YDGCiP42hzsRSCSq3Jz fcmu6+LWiFjijc/m0BSgwBroPhM/b7bb+BQINvPQrbqgNT4Lj4ceDyeIcoLyERTsG/imqo8vdf9 1bEIIO/c/DBvt8Vcav5hrnyXeoLt2UXOwlSZQ1McZafxbnoZh X-Gm-Gg: ASbGncv/YLHbaEnIkRqeUPA5TR/OyldgA8XJoO8IYBwUDVikWm7ORn1nPyfcy9nBGcz +COWD97cIvbOiIvQm3d6JjPdNxtlIf2cPGxlvG9jb5q5glgh9e5zLrOHHMHpicO/OqksITicdA9 ZNZnFDr0E5I78UbGGMJErS/KAvj00tkPoxMT5YRkHuYe9pwKybk3Cg4DkklGjIE1GD/J/llDUt+ NC3KnEpO4o/kH3DaXJWGzauIVBDE/RMgUFdyOYyvbY01TBS+Uo7SuVYobxFcTY348KXNB032DM+ YvYW8kEnVq4qrgXJBHJt4+y6Wg7pICMfwf7vO2QSsagoFq9IvpofOyyEf19QbvFklIe17DO7UiQ GmsEqOwhVxWmrihCLAPHvr43lHhw= X-Received: by 2002:a05:600c:3c8e:b0:436:f3f6:9582 with SMTP id 5b1f17b1804b1-4390d433bbamr198235e9.8.1738700359308; Tue, 04 Feb 2025 12:19:19 -0800 (PST) X-Google-Smtp-Source: AGHT+IFEqYf3DJvihoT6Gzatf35wjZT9zeNrAVv7JsBq/kv4Y3Gk3C66dSQvpgRHk8ZsSsY1SVrC9g== X-Received: by 2002:a05:600c:3c8e:b0:436:f3f6:9582 with SMTP id 5b1f17b1804b1-4390d433bbamr198065e9.8.1738700358873; Tue, 04 Feb 2025 12:19:18 -0800 (PST) Received: from ?IPV6:2003:cb:c70a:300:3ae1:c3c0:cef:8413? (p200300cbc70a03003ae1c3c00cef8413.dip0.t-ipconnect.de. [2003:cb:c70a:300:3ae1:c3c0:cef:8413]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38dafbb2598sm1919676f8f.43.2025.02.04.12.19.17 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 04 Feb 2025 12:19:17 -0800 (PST) Message-ID: <8bd74d6f-6086-41d2-97ec-98bf1b9cb07e@redhat.com> Date: Tue, 4 Feb 2025 21:19:16 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [LSF/MM/BPF TOPIC] HugeTLB generic pagewalk To: Christophe Leroy , Oscar Salvador , lsf-pc@lists.linux-foundation.org Cc: Peter Xu , Muchun Song , linux-mm@kvack.org References: <4c50a439-e2b8-4f54-ba3d-366d0e2961b2@redhat.com> <660f6ee7-f474-4f72-b442-5f048a2ff8bb@csgroup.eu> From: David Hildenbrand Autocrypt: addr=david@redhat.com; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwZgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63XOwU0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAHCwXwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt WNyWQQ== Organization: Red Hat In-Reply-To: <660f6ee7-f474-4f72-b442-5f048a2ff8bb@csgroup.eu> X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: kH1y2-H25IBYbDnMt2nloU7NMm1WwxFmlni519NYr30_1738700359 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4949520011 X-Stat-Signature: 679o5r85mamu45ti41dre1g7h31seoky X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1738700362-585322 X-HE-Meta: U2FsdGVkX19RsO+XjCeNJzmeY+DyEVf6AGht7pRbzFIW5cKVS+OOT4mT10+RgszTWQD6iBeHlh7IF7Sszqgq35kbAbUr3yn5OnFRioe1LLhO2CZjSAe+0bY8LGW7qJpoYn2YStkN8MEBwvnJHP80fx/5RcS9yLhl/ZXuoCaCOvGE/4yQ113xU5rHtEJnOaeJbh9RWUKW9+PcdAdeWNgq1tPJQ30ovvK9NyMiMXKIl2b71MYK8NWuZKy6+NjBeMMqfL5DEnjzRB+2N2s9kr1dIZVEcdmmJNv0wLuUZrTc++PKmMuY7YZOIMktJ9Tl/q+ABn+URtQhMUnUnLGbqcN5u1+Ecx1bqYJpkQXW0VMWL++C/bTLpb1heFsvbWPQYSNiZeeb00JpYCyqr0H6yQ4onOkphyS8akO86j8jT9BoPoM+s+48VRhIZlTcdiEYY9k2ZnPkLEEuFF2ckM7IRw/B5K5rcEJwFO3DcYztb3qZ+wGUJu7HpjIysKrLpGlwKzdeffPMwpjgRJNwjddxxbz/CclRfZyIDB7KDEx2As9HsWkPnYdha9nmJvZYlSjJo05gtoqQvAlZCuiMX8TVls7uPnu0aEBEXKpLWAKUovpym+dt75ZewLeufZyFDod3LVCVFHRfHCdV24iOayV/1UHfXUQJk4MTMHZW6+WzhviXU3BaIhgJgtecj/HdCHQdmJtCxrI60BokXGNIR/62ezIrX+Px15gVcAXP3L183SUukXmm4MkGFAWQ1a220ondW7GwAuiuzt1SECAxhhwDyxcbSh0DZ76ijFzMJp8yA1tEIRREFoMBC8jbXPU0my9V8FD6ST14mYKeqNku04FqN/kwRRzEb1DzON2mjlCaCwJK3AN20BwcRcx0LW0riQjYcGNtrZOzhSeAL2mPVywJcNNSaNbhwd4RUNtdB8lpMUd+vl6h9gaY6ASpMsRfLT/JmMcnwR07d4SJbQVcF1V16kv r+Y16ZG7 DmomYlXKOfvi1uiqgvwHv5utukJegcheBC5TeuVb8bLRFmwGy8DHSePwMneeBnr6QVOXc4IHJOz7xh2gQA99ijdliauOIhLU8TwPGAz2O6qNaWx4nvJvWuRFQn4LnS6QwOAnyOkZcGE34/vO00z8VhRBd0c9kdkCFCWN0JVwCtrhp3wJ7TOBt84hUbBGd/PQLA1GcG0gWyk6Uuy2mu3MON/dtmDeuXY+bxFDeLPQt+8/jD7RoNY0l8BgiZc7ukPQuXcx9CS0TWdTz8aQ3X0unwqYPQ10Lihir7xnDQskcqyGWu0lw4E1dOhVFN1wQ1Z/VNnDBocSoEm/YghIRweJHs8rQpgA5BA4dgWeEAt9au9eyQ4JhM1WBVVrKz9X6Qh8voYXR/jH4J8Tvbm/KQ0tqzDfbxP3BDkZaHf5kP93FyryfG29Z+7fa8UuLMCONUarRaqCFgKoGxgbjo+C/EQezfhHProHfqiD4dnnm X-Bogosity: Ham, tests=bogofilter, spamicity=0.000143, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: >> >> commit 0549e76663730235a10395a7af7ad3d3ce6e2402 >> Author: Christophe Leroy >> Date:   Tue Jul 2 15:51:25 2024 +0200 >> >>     powerpc/8xx: rework support for 8M pages using contiguous PTE entries >>     In order to fit better with standard Linux page tables layout, add >> support >>     for 8M pages using contiguous PTE entries in a standard page >> table.  Page >>     tables will then be populated with 1024 similar entries and two PMD >>     entries will point to that page table. >>     The PMD entries also get a flag to tell it is addressing an 8M >> page, this >>     is required for the HW tablewalk assistance. >> >> Where we are walking a PTE table, but actually there is another PTE >> table we >> have to modify in the same go. >> >> >> Very hard to make that non-hugetlb aware, as it's simply completely >> different compared >> to ordinary page table walking/modifications today. >> >> Maybe there are ideas to tackle that, and I'd be very interested in them. >> > > > But at least that 8xx change allowed us to get ride of huge page > directories (hugepd) which was even more painful IIUC. Yes, don't get me wrong, it was a clear win to get rid of hugepd, allowing for GUP and folio_walk to work in a non-hugetlb fashion: at least, when all we want to do is lookup which page is mapped at a given address. Unfortunately, that's not what all page table walkers do. > > Neverthless, can't we turn that into a standard walk in a way or another ? > > While we walk we reach a PMD entry which is marked as a CONT-PMD, but it > is not tagged as a leaf entry, so there is a page table below. PMD_SIZE > is 4M but the page size is 8M so once you've walked the page table > entirely you know you still have 4M to go so you have to walk the second > PMD and the page table it points to. We would somehow have to fake that it is a PMD leaf, and realize that they both are cont, so we can batch both PMDs. The PTE page table handling is a bit of a pain, though. ... and modifying entries it is a bit of a pain as well; unless we can hide all that somehow in the powerpc pmd setters. Hm, far from ideal, at least at this stage, because we don't really support cont-pmd outside of hugetlb, and a lot of page table walkers must be taught do deal with cont-pmd. > > By the way, don't know it can help or make things worse, but indeed from > a HW point of view there is no need to replicate 1024 times the PTE > entry. Here we used a standard page table because it looked more generic > from kernel point of view, but all the HW needs is a single PTE located > at a page aligned address. Thats what we had when we used huge page > directories (hugepd). It was even easier because both PMD entries were > pointing to the same hugepd entry hence no need of CONT-PTE-like > management at PTE level. Ah, I see. I'll have to think about that a bit ... far from trivial. -- Cheers, David / dhildenb