From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5CA3DC10F1A for ; Tue, 7 May 2024 10:59:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D16396B009D; Tue, 7 May 2024 06:59:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CC6796B009E; Tue, 7 May 2024 06:59:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BB48B6B009F; Tue, 7 May 2024 06:59:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A06386B009D for ; Tue, 7 May 2024 06:59:41 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 50D6BC0F96 for ; Tue, 7 May 2024 10:59:41 +0000 (UTC) X-FDA: 82091304162.06.6FE56EF Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by imf26.hostedemail.com (Postfix) with ESMTP id 2F71E14000E for ; Tue, 7 May 2024 10:59:36 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf26.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715079579; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TXjdK23qehsZRWm8OeTE1X+8Lrni8iqkOod23n0db2o=; b=C/VtT1TjC2x58gtTkzicI0HSlAm9wuPjJ5dMLDdFXfIeS27tOUftH0a/qliYps9oLRuaEY 5+4tUbdSNuN4Dz2wqWQRAKAQU4zuXNfffA7inDZj5HVwe97FTEjiyNeUYU51HK2I/m2q/I vmge/k1RZSxnbdLJymHR8wfYrSpTopI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715079579; a=rsa-sha256; cv=none; b=ZHspyV8cSmLZBSMpk/kWhcD81JUDkO7LE8WpnFlEoi72xY9iue8NPiE5IFBiX6Q+mW4av9 gAn8Gcjs2KEcwzfk/y6GVVvvTsaNb0CbN2aFezwQNBELFY4ew36v2q8frAzY4l9iNnVpv1 HMsgQpj8P+hNtDnogZtFZIDo0nYuFM8= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf26.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com Received: from mail.maildlp.com (unknown [172.19.163.252]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4VYZvf0cbsz1R9np; Tue, 7 May 2024 18:56:14 +0800 (CST) Received: from dggpemm100001.china.huawei.com (unknown [7.185.36.93]) by mail.maildlp.com (Postfix) with ESMTPS id 8E9FD180080; Tue, 7 May 2024 18:59:32 +0800 (CST) Received: from [10.174.177.243] (10.174.177.243) by dggpemm100001.china.huawei.com (7.185.36.93) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Tue, 7 May 2024 18:59:32 +0800 Message-ID: <6016c0e9-b567-4205-8368-1f1c76184a28@huawei.com> Date: Tue, 7 May 2024 18:59:31 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RESEND PATCH] mm: align larger anonymous mappings on THP boundaries Content-Language: en-US To: Ryan Roberts , Yang Shi CC: Matthew Wilcox , Yang Shi , , , , , , Ze Zuo References: <20231214223423.1133074-1-yang@os.amperecomputing.com> <1e8f5ac7-54ce-433a-ae53-81522b2320e1@arm.com> <1dc9a561-55f7-4d65-8b86-8a40fa0e84f9@arm.com> From: Kefeng Wang In-Reply-To: <1dc9a561-55f7-4d65-8b86-8a40fa0e84f9@arm.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.177.243] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggpemm100001.china.huawei.com (7.185.36.93) X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 2F71E14000E X-Stat-Signature: j53sxqsxkj4eppzuxtwtixsrjjn686ac X-Rspam-User: X-HE-Tag: 1715079576-878571 X-HE-Meta: U2FsdGVkX1/v61pPGxE/6qs1WX++I6AQKwi7BJfns5+x0FgEZeKWtgGWHyF1LWXhvCxeMYsXHWzTMDR2oPSGKB34JirVvmdRCA4yOItEltSl/p7ZAn7l7b+pW0Vq5N6BaEZBEQLj2PvakS0eGvzgv0IhdAXv78hgOrsea6EWHCe0b0pmzJ7PPUHXJt9BETyjmBzJuKWE0L4mEqHPNa+XMk9adsyGt002qSAdTpPEdLtK87JaquLfuxzMAFQz1l2fjeJpvbG26IJN/qiqh815TNV5AgoTzicgd6oWCn6v/++9J7QnfJkjTJLJESQk4oa1S5591U/A0NnY7JgYLVmFejUs9dPkXt+yLPDTzc3FkaAm3xriu8in/u5aoUgVjTSmXDgmor+hAIcQgY7L2/JJhX1lcjWERExJQnMGmi2LN8CTfQnQZBI6adKU1ND9I4TA/4+MRw+v7/fPsus7pCEjzMLyRlFibdPeY70YhfsXdu+pYTz13u0gq5i6PZaOqOrxG0w8sjS1MFHy6tnXtK6PtzAEgSrPZdVbSaJ72juyn/rT1psnNqq6D/cgZDmsbsuB01rvawd/adB3GRJad2poVG6yNyw8DZ6UPMUj+ZgudJvoVU7LZsL1tqEj9tnfmp2RBO1vMzGLTn4buyKMbj86SDewGONl+r3tEqRKPpGZ05QNWccy2lZ8sQdWYyhgq/0ZkA2JiD3YHfih0LxcV2rDsCuytd8ADCWq2aqutZ2jMQGvVKq1ENAVI8aQYACglfImIOS9FDer/ZkVf9QMQpWb0gtRl7sfKTpRpMyD619kfAdxRxbX6g02SBq4mbnBzeiABtB1+foD4xC9EXgnZM/lVvXktoFb4pTnZb5qRwCvgQPZMCNssg57sOyxjkdwcK9lPk8ARYKNvbgagJDPHOCukRmW2O+mgBwRiwYkQcJZlkL5D0j+DxlPzDTam+ZxCOYra/qT9f1VTgMRt6+HchU Va2gbm/R 7dor0Kw+5hJAvlLQBHJhcZTmgDUvvm+ZMGsOex367IIQsIH+V3IrOrn7yAcPAPO4BeiQM3dGKisPjIPYgb1paL5eP7EMDfcug7LNQ89wyascOmiGWSHf3rEsQSPnAG8IDNHLgeatiyu223PH13KuDkOaMUsinUGgZPCRqtKGNOtoJox3TqXWPgQ+NX9kqpObt8V+7ZRnboZ6ouhemLTw6hCN1JWLgbq6VWXexo3QEKJTfw4sHiYYaLBHdAA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/5/7 18:08, Ryan Roberts wrote: > On 07/05/2024 09:25, Kefeng Wang wrote: >> Hi Ryan, Yang and all, >> >> We see another regression on arm64(no issue on x86) when test memory >> latency from lmbench, >> >> ./lat_mem_rd -P 1 512M 128 > > Do you know exectly what this test is doing? lat_mem_rd measures memory read latency for varying memory sizes and strides, see https://lmbench.sourceforge.net/man/lat_mem_rd.8.html > >> >> memory latency(smaller is better) >> >> MiB     6.9-rc7    6.9-rc7+revert > > And what exactly have you reverted? I'm guessing just commit efa7df3e3bb5 ("mm: > align larger anonymous mappings on THP boundaries")? Yes, just revert efa7df3e3bb5. > >> 0.00049    1.539     1.539 >> 0.00098    1.539     1.539 >> 0.00195    1.539     1.539 >> 0.00293    1.539     1.539 >> 0.00391    1.539     1.539 >> 0.00586    1.539     1.539 >> 0.00781    1.539     1.539 >> 0.01172    1.539     1.539 >> 0.01562    1.539     1.539 >> 0.02344    1.539     1.539 >> 0.03125    1.539     1.539 >> 0.04688    1.539     1.539 >> 0.0625    1.540     1.540 >> 0.09375    3.634     3.086 > > So the first regression is for 96K - I'm guessing that's the mmap size? That > size shouldn't even be affected by this patch, apart from a few adds and a > compare which determines the size is too small to do PMD alignment for. Yes, no anon thp. > >> 0.125   3.874     3.175 >> 0.1875  3.544     3.288 >> 0.25    3.556     3.461 >> 0.375   3.641     3.644 >> 0.5     4.125     3.851 >> 0.75    4.968     4.323 >> 1       5.143     4.686 >> 1.5     5.309     4.957 >> 2       5.370     5.116 >> 3       5.430     5.471 >> 4       5.457     5.671 >> 6       6.100     6.170 >> 8       6.496     6.468 >> >> -----------------------s >> * L1 cache = 8M, it is no big changes below 8M * >> * but the latency reduce a lot when revert this patch from L2 * >> >> 12      6.917     6.840 >> 16      7.268     7.077 >> 24      7.536     7.345 >> 32      10.723     9.421 >> 48      14.220     11.350 >> 64      16.253     12.189 >> 96      14.494     12.507 >> 128     14.630     12.560 >> 192     15.402     12.967 >> 256     16.178     12.957 >> 384     15.177     13.346 >> 512     15.235     13.233 >> >> After quickly check the smaps, but don't find any clues, any suggestion? > > Without knowing exactly what the test does, it's difficult to know what to The major operation(memory read) shows below, #define ONE p = (char **)*p; #define FIVE ONE ONE ONE ONE ONE #define TEN FIVE FIVE #define FIFTY TEN TEN TEN TEN TEN #define HUNDRED FIFTY FIFTY while (iterations-- > 0) { for (i = 0; i < count; ++i) { HUNDRED; } } https://github.com/intel/lmbench/blob/master/src/lat_mem_rd.c#L95 > suggest. If you want to try something semi-randomly; it might be useful to rule > out the arm64 contpte feature. I don't see how that would be interacting here if > mTHP is disabled (is it?). But its new for 6.9 and arm64 only. Disable with > ARM64_CONTPTE (needs EXPERT) at compile time. I don't enabled mTHP, so it should be not related about ARM64_CONTPTE, but will have a try.