From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A591BC10F1A for ; Tue, 7 May 2024 15:53:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2F8A96B0088; Tue, 7 May 2024 11:53:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2A81B6B0089; Tue, 7 May 2024 11:53:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 196BE6B0092; Tue, 7 May 2024 11:53:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id F160E6B0088 for ; Tue, 7 May 2024 11:53:16 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id B2C13120BB3 for ; Tue, 7 May 2024 15:53:16 +0000 (UTC) X-FDA: 82092043992.07.3C3F32E Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf28.hostedemail.com (Postfix) with ESMTP id B7466C0002 for ; Tue, 7 May 2024 15:53:14 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=none; spf=pass (imf28.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715097195; a=rsa-sha256; cv=none; b=Juh5I8n2p3QPvV5aYNeAH21Ls5ae9OCFb0vqHLliQC4YW1XljJSaFPT0Wr2BfQwHBkcTcM mF333g4FIab2HVN1BOgYnefASRz8LHWF1JdAseZ98fvQlVFg2Ja9aBCHiji3DB3uB+YCkm 8nwek4xHBH1dNqm71fOQDmoRSJ7A5Ck= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=none; spf=pass (imf28.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715097195; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=j6vPqeP8YWRohWYEFqRqKvWnxyjBMPw6DQppmYSgey4=; b=jpWDLts1q2DwsprcqVN9yy2yegwb5UBiBwpV+fZTzWxPaDX3FQqIS9T2lTKh4jm/htV2mD X4Hh5Gt//T8zvvXnkcUHoYbf4rWcDUlT2h6YeE3cu63mxUb6kjIw3yCXy7T8VG04rLHh/w 9tOpWziurtMointSXcQhVFrGpoE129I= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 817A91063; Tue, 7 May 2024 08:53:39 -0700 (PDT) Received: from [10.1.34.181] (XHFQ2J9959.cambridge.arm.com [10.1.34.181]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3BF1A3F587; Tue, 7 May 2024 08:53:12 -0700 (PDT) Message-ID: <2b403705-a03c-4cfe-8d95-b38dd83fca52@arm.com> Date: Tue, 7 May 2024 16:53:10 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RESEND PATCH] mm: align larger anonymous mappings on THP boundaries Content-Language: en-GB To: Kefeng Wang , David Hildenbrand , Yang Shi Cc: Matthew Wilcox , Yang Shi , riel@surriel.com, cl@linux.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Ze Zuo References: <20231214223423.1133074-1-yang@os.amperecomputing.com> <1e8f5ac7-54ce-433a-ae53-81522b2320e1@arm.com> <1dc9a561-55f7-4d65-8b86-8a40fa0e84f9@arm.com> <6016c0e9-b567-4205-8368-1f1c76184a28@huawei.com> <2c14d9ad-c5a3-4f29-a6eb-633cdf3a5e9e@redhat.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Stat-Signature: wad71qa487ubpooqe9qzq89bipkskznz X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: B7466C0002 X-HE-Tag: 1715097194-266453 X-HE-Meta: U2FsdGVkX1/hv9UHr/Ty9Qd7d7Z1CfyVWUQypmtbdfjIDaeQkAg4X+lDMojzBlrKdq+DAOPWMkombQ2CDt5rAemtJYiOw67EfwTYvgvfvPFEw9GfOS5XG0gmP/7+0R1SpC6XSBvg5obWte71yDzldOpI/T4UzqQj7I+f40eF6NadwJI84e10qhWSSQGutBiUiiRhgU27cW7iXSX0tI1s/68tEpxSZBoMp80WHBEGWZ3o7eOn0N1DlgJVQEVOfYRFpZ8WUl45L57r/WWzdYvwyBqdOBqqQTe9zD6EjdOtaJvpdQlIQJwSx9XgKTMkJ6EpkmjvKWGbKpk18VYcpBTAjuvAR0uIDfttpVfCxyT3wQIfLS5pP/hATTQgFkPhtm90s6cmSVHIBQtmnz6b8ZButJQFqXTqBYMj2MUQN8ixR9foS1LN1AxQvGea9fBRrijEbWvuWosckgvceR/+eKP5IDwXJREaA6zKi0T+3b0yIluGKmNsuE/dCCQcoF/6qxkQS7ompvGwTby0lH0neaf0aWBXQgLPGpy9xEHrbE0198Ki9P8EcpgmbrkVjlVtgRhd3kvs+/PbhMhhtoy9oepn3g+HoF1yr/MVMUHOYXujqGJm6GA+ghXkzijVdtmjvFAh/nc+8gWZ1gVa4x5IyGTGT9nuat1UxLfcz2uKAfw+1mDbOwnGlPiaUJ4PSeR6YfpK64WgPDH+xCIWN+WJHIFn1gOsuV8iQQRAXUosiL0YDFf6zmnKwKbLpCyy5vsw/knYk0zLVsiHmokiIYY0kA6t/dVELfuDU07oj2A6K7NbrKyuy7hQZJxmKi144zZA5keRcRVuzG/eLzRkZzEvkt0ndkGzYoR11VK4RLiJQOtCLDB7cxnqm22ywVypgrjtdF3CjPDeHVR1bWViTIyLE+FWLdLaWow0pfItJ3N+HVisVoW3FJslZHDMnhAdhuPmSPU5rtxx+hFU95jNNdZIAc+ 3NklTvdR 5Q6GMWT8uKcUScLzMk/mEh+asN4hioL4D/wgXcH/cSGgQ0QopwZvNB+i0bh4leGd09qdPTIlyLk2lisSA+XLf3KqRx5OGLN2qNgi85cSaOdLqwqmMHibjcZH7fNUmbIVwYyuJPQKQ8qTG5TO7o0uUx39222410BKn39ocNxYfS8mLr9M= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 07/05/2024 14:53, Kefeng Wang wrote: > > > On 2024/5/7 19:13, David Hildenbrand wrote: >> >>> https://github.com/intel/lmbench/blob/master/src/lat_mem_rd.c#L95 >>> >>>> suggest. If you want to try something semi-randomly; it might be useful to rule >>>> out the arm64 contpte feature. I don't see how that would be interacting >>>> here if >>>> mTHP is disabled (is it?). But its new for 6.9 and arm64 only. Disable with >>>> ARM64_CONTPTE (needs EXPERT) at compile time. >>> I don't enabled mTHP, so it should be not related about ARM64_CONTPTE, >>> but will have a try. > > After ARM64_CONTPTE disabled, memory read latency is similar with ARM64_CONTPTE > enabled(default 6.9-rc7), still larger than align anon reverted. OK thanks for trying. Looking at the source for lmbench, its malloc'ing (512M + 8K) up front and using that for all sizes. That will presumably be considered "large" by malloc and will be allocated using mmap. So with the patch, it will be 2M aligned. Without it, it probably won't. I'm still struggling to understand why not aligning it in virtual space would make it more performant though... Is it possible to provide the smaps output for at least that 512M+8K block for both cases? It might give a bit of a clue. Do you have traditional (PMD-sized) THP enabled? If its enabled and unaligned then the front of the buffer wouldn't be mapped with THP, but if it is aligned, it will. That could affect it. > >> >> cont-pte can get active if we're just lucky when allocating pages in the right >> order, correct Ryan? >>