From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E7D5C02181 for ; Wed, 22 Jan 2025 08:11:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F0BD86B0082; Wed, 22 Jan 2025 03:11:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EBC6A6B0083; Wed, 22 Jan 2025 03:11:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D0F5D6B0085; Wed, 22 Jan 2025 03:11:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id B3B646B0082 for ; Wed, 22 Jan 2025 03:11:19 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 61A8D141502 for ; Wed, 22 Jan 2025 08:11:19 +0000 (UTC) X-FDA: 83034367878.14.4B4FC43 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf15.hostedemail.com (Postfix) with ESMTP id B9685A0005 for ; Wed, 22 Jan 2025 08:11:16 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Vz2z0A6s; spf=pass (imf15.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737533477; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Tod1ZSlOo8mhSEr1JUlfH5PoZydr30hcjYlDhNEpQ+Y=; b=GMXAsqOAO7BDyNy6/V1W27YFZPRTXWos+a4eWhwVwz5obh91ZzFnU+phDQg4Izz90QgpVl PW0Hbp5waX2pRAD22obzoU7YMKA2IBgXyRuYVQp3Ey/6Sl1Hk7ZEN3KhVfB7YNGb5sc6Vt mwLFuF9kdXk/z/PJY7tZUKx50OWVdi0= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Vz2z0A6s; spf=pass (imf15.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737533477; a=rsa-sha256; cv=none; b=DFUnLhSWxxhhnfyQNOoyVSxb/Zd5B8AyKqmHKqcilRdd5gAA/ZaxkhzVSQoXIx5qCf579k Gbv1ZuBHdfFuzoX8MScp9TKKAThkdMSKb5cLV/HBhvgWQCH+yLeZevkVPpAkX+mLuh8J/y qPRNzJ8yJ3z2HLy/77JnQ0HGJpCrCPQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1737533476; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=Tod1ZSlOo8mhSEr1JUlfH5PoZydr30hcjYlDhNEpQ+Y=; b=Vz2z0A6sJ4hNdtf1m/5gcG7Dm7BnYVbJt82TmRzTsC4aL3eCDhIxUPlSJH2wSeDAyUUcV+ ktB+uthgvivckCPG+leXwI5cuAv0vKzcDzdU3wZvuuo3vAf/VLd41v+dFhiDy3/JB9pj7U xnwu3yLNNPV7Cb3o35ivxJ0CabiGhdg= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-574-8_4bcuZgMouYLql1Dc1rkw-1; Wed, 22 Jan 2025 03:11:12 -0500 X-MC-Unique: 8_4bcuZgMouYLql1Dc1rkw-1 X-Mimecast-MFC-AGG-ID: 8_4bcuZgMouYLql1Dc1rkw Received: by mail-wm1-f72.google.com with SMTP id 5b1f17b1804b1-4362b9c1641so32184535e9.3 for ; Wed, 22 Jan 2025 00:11:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737533471; x=1738138271; h=content-transfer-encoding:in-reply-to:organization:autocrypt :content-language:from:references:cc:to:subject:user-agent :mime-version:date:message-id:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=Tod1ZSlOo8mhSEr1JUlfH5PoZydr30hcjYlDhNEpQ+Y=; b=dF+3aO7iwyJMxbeu7QzUuhLN3yGyfvte7Ex66IBMM0dQJoXdke+MNEGZKSpMCKzmQb cpNvo/AX9HpS/+ZkA3SajHldgJ2jwOT57xemtoFirDytywpAFKrcn931twgXst5eKpyV u+Us3aFrEVxv4lbtSfV5SWntdATZWvsqK/ZmT89UhNg0gpfyVg3C3n5x8atK2C/i0Rn5 UeSXf97ToPCep+VeMqiyw+Ln2Vju+v6+Zn9hh0V9eqxZoQokKFOT5hJZe85QloJA7AnX 4qIyZY00MGxqpDJXFWBBVh43pHhNMiU1J6ML3zjLjtziop+ma6ohwwpk0Afuoow59dc2 Y+Sg== X-Forwarded-Encrypted: i=1; AJvYcCUcXupRnrXgc48P5sqs/nbdvzhZH7SroXDOowFNS37wK7n5MHZ850nPyT9LEQIZlI2/JtNpL32J2A==@kvack.org X-Gm-Message-State: AOJu0Yw39w/hxUXDsR97jwutVKf4VVhMc0zw8bZjcsvwv6oOl9pLKQuw mO1kPv9Y6ErqXGFgrDdOK3qLDiCfzFdNjaAE1lOLbmOOnd8vjBLs7nXWPX63LDGpyV1Usfz3Yx4 jNdOrrzA/V5lc0AnX9U/bXp8nhoQS/+fZ6gNe54Za16WnI8P9 X-Gm-Gg: ASbGnct0PyZOM4aawpQ1qXrXk2zKISrKHiFj5OmbZCTcPQ3Gc7xU8bR37Di64lGiZEs yArCigx3u6UMgOVv8v+qNZRGmU6xcvD13E3rHHxK+odylNrP9Wxv2cOEZodnDUmuY6CyKxgrq3i COzrDcQA46fyGm1jQneBIQM5KzG6KLU0QnP2YHF6lDdeMfdfW5Qe/ThWojlSycetbfcS7DjSwLo 23ltpIHdXx80TqgWdIzhVWs1jrp1/gIRpTI1PI3JhV6mCmDX4MNL49RL9+NxxusxPR1KhLgsRZM 0UVKmbpd1TfCzmdhvfdOLeBFlZDzd35C8tyw/Qjprk9ojDrdzSZh9/HJph1Ql9vBtMSyLLCTrC7 lxroQ6/uoJvbdxXROFSkeVQ== X-Received: by 2002:a05:600c:b8a:b0:434:a386:6ae with SMTP id 5b1f17b1804b1-438913cae53mr195906045e9.7.1737533471424; Wed, 22 Jan 2025 00:11:11 -0800 (PST) X-Google-Smtp-Source: AGHT+IFo8zLpKWncdiZsDR+dmuR1ZuC5NdBZ5KtdQ4aeEY+6yaTZ+mPMIE2Sye/cINMhwfW+KQhjcg== X-Received: by 2002:a05:600c:b8a:b0:434:a386:6ae with SMTP id 5b1f17b1804b1-438913cae53mr195905635e9.7.1737533471008; Wed, 22 Jan 2025 00:11:11 -0800 (PST) Received: from ?IPV6:2003:cb:c70b:db00:724d:8b0c:110e:3713? (p200300cbc70bdb00724d8b0c110e3713.dip0.t-ipconnect.de. [2003:cb:c70b:db00:724d:8b0c:110e:3713]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-438b31b6ca1sm14601155e9.25.2025.01.22.00.11.08 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 22 Jan 2025 00:11:10 -0800 (PST) Message-ID: <6d13a5e9-bdff-435b-ad7a-3a3a550738b0@redhat.com> Date: Wed, 22 Jan 2025 09:11:08 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: mm: CMA reservations require 32MiB alignment in 16KiB page size kernels instead of 8MiB in 4KiB page size kernel. To: Zi Yan , Juan Yescas Cc: Barry Song <21cnbao@gmail.com>, linux-mm@kvack.org, muchun.song@linux.dev, rppt@kernel.org, osalvador@suse.de, akpm@linux-foundation.org, lorenzo.stoakes@oracle.com, Jann Horn , Liam.Howlett@oracle.com, minchan@kernel.org, jaewon31.kim@samsung.com, charante@codeaurora.org, Suren Baghdasaryan , Kalesh Singh , "T.J. Mercier" , Isaac Manjarres , iamjoonsoo.kim@lge.com, quic_charante@quicinc.com References: <463eb421-ac16-435c-b0a0-51a6a92168f6@redhat.com> <8f36d3ca-3a31-4fc4-9eaa-c53ee84bf6e7@redhat.com> From: David Hildenbrand Autocrypt: addr=david@redhat.com; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwZgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63XOwU0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAHCwXwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt WNyWQQ== Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: DgZ8peyKP0nZDJrzxKgHDkOxRUUY3fv9TmhzeNvehJ4_1737533471 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: B9685A0005 X-Stat-Signature: rig83qkykg83fqi7un885jmjt7rm5pqp X-Rspam-User: X-HE-Tag: 1737533476-791580 X-HE-Meta: U2FsdGVkX1/74RtWmr1D6FSve0AXFJOIbTLKJeK5FNxDqvVeZoIQlFpQhi4L9iyVZc8BNOkux6hZL46YClDdOsCaSIku9djKZz0LPnmkUMnWElfrteZBpK59UyrMqmfNYyK0FeLZWAoFw5yGacaexSoftDxrpM8W7NMaiLANx9fhoD6saNqX7BspqRgUJmd8H8+tdWIozjQFysKEwkSvQ/Amzqptl2JmIPL5+2ca6PATWLqCG9HtHErg425m/eWxNRqq+04DOZpy0IxVSukL1qD+mZnRETCIyNpn0OGMh2eRb/f2hiI0vov98zDKdGUKEMB1L57aCzKbPjEpa7627xo6K8fFCfQ+l97rKO5qXQkzaQgIcrKDBJuG/wCZnVrgbP7ATXVaTwoMEUmbaJ6yzl/vmW9hvBegfGrTKzeLVJHeaGgvSU/cI1WSlOVCb1bi2S2O/i7UKj+FHOxrCDGrsfpkWX+ghuvUBEf/b9gCII5axydTFcf+doCN7R/fryfy8Ya4FZarlJLcA57VveVf4NoRszLk59VegQPrrs8cV9f5am28yY5+yrvapJ+CZVfRz07vPh5xydwL+pi2Q41DgIvJB1KsB2XYXgIo/Z1gnUa5EV6gaX5Hzq8XPQqE6iqnoWyUD/6omTzti5KVyMXeLFKjHE4Ry0rNpMSIhlx2QjOv5406AItJAeqNfFPsLvBzCkfZrughJ32JRL6ROHmhOm73vEeoHazdkrBWoJNcyN2DMH9OP5kYqc7oBgywl4UxhW5WF0nW0Csyp5batuoJLkTcwG0QW7Y6iCOwQUkfvBZPEH4p/ieGndB7lBqoVBIITUH4NEWdfZ7Vl9PE2E6b2FYrahapdMnSqeJy8WkFLS53rY2JhWXv5oWRq8Z14bOqz/0llDa3O2AlY0UybMkAD22VyJIci7zqrR0S4ceBkeytGLSfb3PayRIIUVu6QMmqJuGGoNVVsNGNEVpPS3m NhRv28h1 vr5kro2QO7VEUr+hMOEiPPPTTRibAj/IPfWnuaHT/MpOCeIvx1lYC6rbo62h05v4i94Kz3x9yd3MvJDRbJl3wQgdsluOXgHxpelc8gB1dwvv74FIB2+0PGUk0nnQzCzTVf5W/tWocXjWbd27FZWX4QqJH+c+ZQfBmRxDZ5YMabaNC1j57i9QjLRuFBubSj19jnuuJKX2nD644eVrGGyRJGrGn8n1nGmckgLGuQlH/f5Kkb9cja9u+n7a3CsDEz+Nu4Z/R6dDvuBIF4ba3vVdnZNUdx0li9XsK9bLUqldU2I9PB61qI5jiHndtHVyUKHD1NsvjFeRzrEHIdpjZDBoKnAJixGcbxX6MRrv2G98WhwsOH4NEanEIIhghUt+2H4who8PEjFotB70HVXw3XJf4jxYTPH+UcrRtKyJy8wvXUKbZJn28YpNcNACKuUgMRUqrSOky X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 22.01.25 03:24, Zi Yan wrote: > On Tue Jan 21, 2025 at 9:08 PM EST, Juan Yescas wrote: >> On Mon, Jan 20, 2025 at 9:59 AM David Hildenbrand wrote: >>> >>> On 20.01.25 16:29, Zi Yan wrote: >>>> On Mon Jan 20, 2025 at 3:14 AM EST, David Hildenbrand wrote: >>>>> On 20.01.25 01:39, Zi Yan wrote: >>>>>> On Sun Jan 19, 2025 at 6:55 PM EST, Barry Song wrote: >>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> However, with this workaround, we can't use transparent huge pages. >>>>>>>>>>> >>>>>>>>>>> Is the CMA_MIN_ALIGNMENT_BYTES requirement alignment only to support huge pages? >>>>>>>> No. CMA_MIN_ALIGNMENT_BYTES is limited by CMA_MIN_ALIGNMENT_PAGES, which >>>>>>>> is equal to pageblock size. Enabling THP just bumps the pageblock size. >>>>>>> >> >> Thanks, I can see the initialization in include/linux/pageblock-flags.h >> >> #define pageblock_order MIN_T(unsigned int, HUGETLB_PAGE_ORDER, MAX_PAGE_ORDER) >> >>>>>>> Currently, THP might be mTHP, which can have a significantly smaller >>>>>>> size than 32MB. For >>>>>>> example, on arm64 systems with a 16KiB page size, a 2MB CONT-PTE mTHP >>>>>>> is possible. >>>>>>> Additionally, mTHP relies on the CONFIG_TRANSPARENT_HUGEPAGE configuration. >>>>>>> >>>>>>> I wonder if it's possible to enable CONFIG_TRANSPARENT_HUGEPAGE >>>>>>> without necessarily >>>>>>> using 32MiB THP. If we use other sizes, such as 64KiB, perhaps a large >>>>>>> pageblock size wouldn't >>>>>>> be necessary? >> >> Do you mean with mTHP? We haven't explored that option. > > Yes. Unless your applications have special demands for PMD THPs. 2MB > mTHP should work. > >> >>>>>> >>>>>> I think this should work by reducing MAX_PAGE_ORDER like Juan did for >>>>>> the experiment. But MAX_PAGE_ORDER is a macro right now, Kconfig needs >>>>>> to be changed and kernel needs to be recompiled. Not sure if it is OK >>>>>> for Juan's use case. >>>>> >> >> The main goal is to reserve only the necessary CMA memory for the >> drivers, which is >> usually the same for 4kb and 16kb page size kernels. > > Got it. Based on your experiment, you changed MAX_PAGE_ORDER to get the > minimal CMA alignment size. Can you deploy that kernel to production? > If yes, you can use mTHP instead of PMD THP and still get the CMA > alignemnt you want. > >> >>>>> >>>>> IIRC, we set pageblock size == THP size because this is the granularity >>>>> we want to optimize defragmentation for. ("try keep pageblock >>>>> granularity of the same memory type: movable vs. unmovable") >>>> >>>> Right. In past, it is optimized for PMD THP. Now we have mTHP. If user >>>> does not care about PMD THP (32MB in ARM64 16KB base page case) and mTHP >>>> (2MB mTHP here) is good enough, reducing pageblock size works. >>>> >>>>> >>>>> However, the buddy already supports having different pagetypes for large >>>>> allocations. >>>> >>>> Right. To be clear, only MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE, and >>>> MIGRATE_MOVABLE can be merged. >>> >>> Yes! An a THP cannot span partial MIGRATE_CMA, which would be fine. >>> >>>> >>>>> >>>>> So we could leave MAX_ORDER alone and try adjusting the pageblock size >>>>> in these setups. pageblock size is already variable on some >>>>> architectures IIRC. >>>> >> >> Which values would work for the CMA_MIN_ALIGNMENT_BYTES macro? In the >> 16KiB page size kernel, >> I tried these 2 configurations: >> >> #define CMA_MIN_ALIGNMENT_BYTES (2048 * CMA_MIN_ALIGNMENT_PAGES) >> >> and >> >> #define CMA_MIN_ALIGNMENT_BYTES (4096 * CMA_MIN_ALIGNMENT_PAGES) >> >> with both of them, the kernel failed to boot. > > CMA_MIN_ALIGNMENT_BYTES needs to be PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES. > So you need to adjust CMA_MIN_ALIGNMENT_PAGES, which is set by pageblock > size. pageblock size is determined by pageblock order, which is > affected by MAX_PAGE_ORDER. Yes, most importantly we must not exceed MAX_PAGE_ORDER. Going smaller is the common case. > >> >>>> Making pageblock size a boot time variable? We might want to warn >>>> sysadmin/user that >pageblock_order THP/mTHP creation will suffer. >>> >>> Yes, some way to configure it. >>> >>>> >>>>> >>>>> We'd only have to check if all of the THP logic can deal with pageblock >>>>> size < THP size. >>>> >> >> The reason that THP was disabled in my experiment is because this >> assertion failed >> >> mm/huge_memory.c >> /* >> * hugepages can't be allocated by the buddy allocator >> */ >> MAYBE_BUILD_BUG_ON(HPAGE_PMD_ORDER > MAX_PAGE_ORDER); >> >> when >> >> config ARCH_FORCE_MAX_ORDER >> int >> ..... >> default "8" if ARM64_16K_PAGES >> > > You can remove that BUILD_BUG_ON and turn on mTHP and see if mTHP works. > >> >>>> Probably yes, pageblock should be independent of THP logic, although >>>> compaction (used to create THPs) logic is based on pageblock. >>> >>> Right. As raised in the past, we need a higher level mechanism that >>> tries to group pageblocks together during comapction/conversion to limit >>> fragmentation on a higher level. >>> >>> I assume that many use cases would be fine with not using 32MB/512MB >>> THPs at all for now -- and instead using 2 MB ones. Of course, for very >>> large installations it might be different. >>> >>>>> >>>>> This issue is even more severe on arm64 with 64k (pageblock = 512MiB). >>>> >> >> I agree, and if ARCH_FORCE_MAX_ORDER is configured to the max value we get: >> >> PAGE_SIZE | max MAX_PAGE_ORDER | CMA_MIN_ALIGNMENT_BYTES >> 4KiB | 15 | 4KiB >> * 32KiB = 128MiB >> 16KiB | 13 | 16KiB >> * 8KiB = 128MiB >> 64KiB | 13 | 64KiB >> * 8KiB = 512MiB >> >>>> This is also good for virtio-mem, since the offline memory block size >>>> can also be reduced. I remember you complained about it before. >>> >>> Yes, yes, yes! :) >>> > > David's proposal should work in general, but will might take non-trivial > amount of work: > > 1. keep pageblock size always at 4MB for all arch. My proposal was to leave it unchanged for most archs, but allow for overriding it on aarch64 as a first step. s390x is happy with 1MiB, x86 with 2MiB. It's aarch64 that does questionable things :) CONFIG_HUGETLB_PAGE_SIZE_VARIABLE already allows for variable pageblock_order. That whole code likely needs some love, but most of it should already be there. In the future, I could imagine just going for a smaller pageblock size on aarch64, and handling fragmentation avoidance for larger THPs (512 MiB really is close to 1 GiB on x86) differently, not using pageblocks. -- Cheers, David / dhildenb