From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CAE86EB64D9 for ; Fri, 7 Jul 2023 10:00:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 106438D0002; Fri, 7 Jul 2023 06:00:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0DDAD8D0001; Fri, 7 Jul 2023 06:00:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F0ED28D0002; Fri, 7 Jul 2023 06:00:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id E3B498D0001 for ; Fri, 7 Jul 2023 06:00:46 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id B0588A0D39 for ; Fri, 7 Jul 2023 10:00:46 +0000 (UTC) X-FDA: 80984371692.09.6D941EB Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf12.hostedemail.com (Postfix) with ESMTP id 6FE7840032 for ; Fri, 7 Jul 2023 10:00:42 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688724042; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CmaAHBRGXBWyp86BfpvfMqK7HuHX/7ERc0EkaSyIwQc=; b=6Sw6h65KD5rmZhaXfnx8/NSLZpqTDu8owsxdC9gjYr6LTBf2B3yyTVGVYnUYzRFzWh2AhD jDeLhPYlP+iVGWVHsA5NFjjBaMIPogdxQTw9MHC9W9+3MEdFPnhhDZslK9XixPpiZO40qq 5YUOoq51Sp5YGrWyKEX5bjv2TMlpLq8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688724042; a=rsa-sha256; cv=none; b=7A8Wy/l3S61ooo0pCQwFLXIAVk3k99ccDAaMwcMM+4XeQGnNrvzUhRY9VXwdDbu8NXhXp0 WJjDfLug8r5Ll6v3XrTJTf3rR2B6oBSImaX3AdOty6AgCfIrVqpPeZkTyP7dUTsBfHierv p7ibF+Y16GIqUE4QiZ0n6ilU9qkjaOM= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 26D80D75; Fri, 7 Jul 2023 03:01:23 -0700 (PDT) Received: from [10.57.77.63] (unknown [10.57.77.63]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0B9A53F740; Fri, 7 Jul 2023 03:00:38 -0700 (PDT) Message-ID: <3500d482-f319-4def-e40d-6d0ae2ada85b@arm.com> Date: Fri, 7 Jul 2023 11:00:37 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [PATCH v2 3/5] mm: Default implementation of arch_wants_pte_order() To: Matthew Wilcox , Yu Zhao Cc: Andrew Morton , "Kirill A. Shutemov" , Yin Fengwei , David Hildenbrand , Catalin Marinas , Will Deacon , Anshuman Khandual , Yang Shi , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20230703135330.1865927-1-ryan.roberts@arm.com> <20230703135330.1865927-4-ryan.roberts@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 6FE7840032 X-Rspam-User: X-Stat-Signature: jz54djdb579j8ut3nypnh3hpbt9unun6 X-Rspamd-Server: rspam03 X-HE-Tag: 1688724042-972340 X-HE-Meta: U2FsdGVkX1+Jqr0Q3WebB3totzK98pvwcbvxrim16ezDEttWlhgVqnC+9ddJQmFcOYHt1cya8bGXnkZcJaSPNe46D+PAcXLcjmNkbfGcXvi5SLVs8XBk/fGQRt5vZ+pul/EY0qqL20Z+QgLkm+9BjqlCwQMEvpF/5OlCYBSoZOu2Y4Lz1hRCOcbJhy0S0ajjxDKYTqEJJ7Ydz7yN2jFSA2OixeVDp+KPlE5zEasOkf/E9j307jZp6mjIeqHJHnmNyBc5RRbu76czXt+f2R9EthNEe+nET3lxINLlLIv6F3msFzgwNL4knLxW9taj6vn0Xr4wNCTt9OyeNpw9YpVSBnYCKxpkD/uft7YvsdYQoA9tByErDGz61P7IhzXH8Hr+ojPwMUGfWa30Ygqs7SxT3auIwR5YgoUu2MgdjLdXttMKN1Y5JRFmVQfRQd2YjRxtqnxiyTke/m7QQoELVQpvsU3NGa6zozCioOolkDFkCxsI/IEiu1HhJV/30xyDnYyoW7ji6xOiSgaN5PqU4WoykmlFZt8Ey1OtnHpebXFUrJq1O0LoY7In//t3vGz0HnCy51UrJlmClpX9xQrLw3T5KJ5DlrM6FW8UZMtqFxYwG/XFygLpCYJJ/JO7HDPE+Ij6BMC8q5pqmS71wk/O2TcDbOjHYXhO9hjBUsYkRifBaDizvuw3wpjmRaZBsjxmf3GY9b5X6t96yGf+iHs8AVWD3aL/ONV1V30wBsY4a2FYfgMD1WwGYCShtwVTTomX/VVOsj7mu33ymSjwucba6/GiP2Y33wR3umWrxIHklM9Z6W1sq/uCZCqV5VaSJma62CvIiqFnq5MLPAncdOCYX0lD+dND/8lELzR2X+UNgurkIwfmyL7i3GyZRwK1M4O1TjjHyAMrVz+tnZxm7z93VwTExwlg4ARKcJgGkDxEjdSTRyY4q+Gw9bw5U/w9tMF7nzYV1mH4TXx3WWVx7zUkp8/ thCVJC2Q 2kVMylshM2tJARuTyQzDLOxpbbcA4Y8iy+W8o2izxogymdYWAlxOnioR/C8EVk7wlI7ox0v7xNEwU3sGK9va/JZOLPd+flf+LAS+OXp8MmWq4lPYxr/h559xYQdidKVif7iMFeMtQG2duQhqCwvYmLUThceWSphrrlp0BB13MuZsf1NdZiEWr8Ij3b2nsBQN3zKcm+LDVEnnyOy5pAhghJQk1U7J27wczj3lJ8YxjWXHrPh/tvexQTpW6XvEzRc69CFlKZ7/BmPrKwlUKn3IaXg5Yjq6pSCchZn4dnJaVJ/JzgNbuYKeXwMP7vVBTtp3g/wA5 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 06/07/2023 20:33, Matthew Wilcox wrote: > On Tue, Jul 04, 2023 at 08:07:19PM -0600, Yu Zhao wrote: >>> - On arm64 when the process has marked the VMA for THP (or when >>> transparent_hugepage=always) but the VMA does not meet the requirements for a >>> PMD-sized mapping (or we failed to allocate, ...) then I'd like to map using >>> contpte. For 4K base pages this is 64K (order-4), for 16K this is 2M (order-7) >>> and for 64K this is 2M (order-5). The 64K base page case is very important since >>> the PMD size for that base page is 512MB which is almost impossible to allocate >>> in practice. >> >> Which case (server or client) are you focusing on here? For our client >> devices, I can confidently say that 64KB has to be after 16KB, if it >> happens at all. For servers in general, I don't know of any major >> memory-intensive workloads that are not THP-aware, i.e., I don't think >> "VMA does not meet the requirements" is a concern. > > It sounds like you've done some measurements, and I'd like to understand > those a bit better. There are a number of factors involved: I'm not sure if that's a question to me or Yu? I haven't personally done any measurements for the 64K base page case. But Arm has a partner that is pushing for this. I'm hoping to see some test results from them posted publicly in the coming weeks. See [1] for more explanation on the rationale. [1] https://lore.kernel.org/linux-mm/4d4c45a2-0037-71de-b182-f516fee07e67@arm.com/T/#m8a7c4b71f94224ec3fe6d0a407f48d74c789ba4f > > - A larger page size shrinks the length of the LRU list, so systems > which see heavy LRU lock contention benefit more > - A larger page size has more internal fragmentation, so we run out of > memory and have to do reclaim more often (and maybe workload which > used to fit in DRAM now do not) > (probably others; i'm not at 100% right now) > > I think concerns about "allocating lots of order-2 folios makes it harder > to allocate order-4 folios" are _probably_ not warranted (without data > to prove otherwise). All anonymous memory is movable, so our compaction > code should be able to create larger order folios. >