From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A368C197A0 for ; Thu, 16 Nov 2023 09:35:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E9A936B03C5; Thu, 16 Nov 2023 04:35:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E4A526B03C9; Thu, 16 Nov 2023 04:35:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D11D16B03CA; Thu, 16 Nov 2023 04:35:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id C13F36B03C5 for ; Thu, 16 Nov 2023 04:35:04 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 831A9A0BD4 for ; Thu, 16 Nov 2023 09:35:04 +0000 (UTC) X-FDA: 81463308528.25.82FEF2F Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf06.hostedemail.com (Postfix) with ESMTP id 6AD6218001A for ; Thu, 16 Nov 2023 09:35:02 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=none; spf=pass (imf06.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700127302; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=m+1/YV0hAugOc6yGncJs+DfT85L8hpPTwMbJK7Tbfmk=; b=1vPPZ/ZRUBmZCR9+dBwysPl7xGGq4bFaJJbbteCdGNkNdMPuOR3wqy3B5m9JR2bgyGizMx lAqd7fhZSZmxjW1z1bk9UBwwtilkLhWypz+N5DfIcnjJNsrW+pMsS8imV0E+n2ZDrvHuTX QUgIjN23ImjfDkz5B45qthJkzcL51D4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700127302; a=rsa-sha256; cv=none; b=yVW5lo+Due3N1iygCoqx7DFARkhV4bV5RktWXsWr4CT20oHVgq0fpZPsShAs1dFEg4pRtf /JdvwrdeMo3/MQ5BI0BJlh8qu4002UFGZvvyfANFW8ftf3/2n+5FZ/VWNKGlp4qYHjFhxp Q1ct5N8/rv22nTUQtEZZ/mfykS2rQ5Y= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=none; spf=pass (imf06.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E3AA61595; Thu, 16 Nov 2023 01:35:46 -0800 (PST) Received: from [10.1.35.163] (XHFQ2J9959.cambridge.arm.com [10.1.35.163]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B91443F6C4; Thu, 16 Nov 2023 01:34:57 -0800 (PST) Message-ID: Date: Thu, 16 Nov 2023 09:34:56 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 01/14] mm: Batch-copy PTE ranges during fork() To: Andrew Morton Cc: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland , David Hildenbrand , Kefeng Wang , John Hubbard , Zi Yan , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20231115163018.1303287-1-ryan.roberts@arm.com> <20231115163018.1303287-2-ryan.roberts@arm.com> <20231115133743.674690dc78041768b79fadd9@linux-foundation.org> Content-Language: en-GB From: Ryan Roberts In-Reply-To: <20231115133743.674690dc78041768b79fadd9@linux-foundation.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 6AD6218001A X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: nazcx4czckaa1irq3sct7r4j54f4kpid X-HE-Tag: 1700127302-202258 X-HE-Meta: U2FsdGVkX19G1CHvkVuwqKEgk78ZRSSfPf0O1SRuDcXYXy9y4UqrWZp2MTuCSxSLxga4yBA7Z61YIfcSSKaUMVKaTC7xU4QvENEHepMD3vSFdFYEpx984rncsACP7/o/5bkMRgFvD33b9Rv6mE25z/g+JE5/gOMYWIO49sKHpKW75v2Ul/dxZ6uIt4B+lfI9uPG1c5ScJB+zHAFCq+CLmILjDTsSf61IMPqh2csjr5Et6xzvbJDsUHkHGeBYO3GmywlibLO0bg3MhiWHc1Bksrnqxf4ZPEA4qY9MWxnjxG1D9AWIfntX7bg/fJRc8R49ft2ozbr7nzNNjOcSiOmNMcWgh2heM+puC+3iyhc3bwjJZ5zYABk6eLL6C9X3t2/NYHhyZwGbG0iWVxOXIPRmr5GiiqALdw5pf+YcwWKNRxGLr2mBgn7Y8DME0YcuxkN5Fc4mrsWEiDcn0DBtT3VOfbJVINv4jXkdfOhnWS1xS9IK/FRShdnv7JznPsAQTxi9O7AJHHoUxLrQQg2QbH8IlOlsLNcnwt64QMMbvAdUvUci0ZXM7Y73c7OJSVmr/y3fPPUU/RfgspQUbJlBXnUppdnoW1dN++xGZ0VNAMC3erpbN+uSjy1j27wjJ3gEpbymoqpiVX54FygbwXDcl25UQdkS8VDntLYSQ5RAqnyXkACen6pyEI0TZaM7jFbFnMvVqv+5N6oniAXSIVurBcjWOAEbe1hhPSav/p0c6klh+2T9YOqeqNBS2Odj4dlh2uci6eZdokdrTiy2YpuhOAfsAY1Kk4zF6x5GItvCcNLpnkr7Jdw901Hgszo6S6ocXnh6B3rJzBtIJ9bH5wYnY8C5ArVfs/ePwuV4/MWR6W1cOIKPJg1pM5dxFEJQXWheYjC9zWisSmlU1EOswtegcleZaRA8/b+R3H/KL/4KWXqeYMuqok+5W+geAGBKarIIs+H8AQJ4K4tcol8teEdwWBa gY+J5B/P HJtwXoNaSjyXYOWJJmDaCz1T5RDtPp4rLQTQmj+ekx5Gzuy0nNNMtpRq05uq2RF8p3RFr0R75sZg7+Q2GGrSMm49CK1HpHxHakJPHJyj5T8SNZGzAEx104VA+8Eqo7jnbU0EWIoqyDuNHq50LJYlB8dDaU9n5B5ibmnX4qM0MWWeyuWy5WA7S9avMkQcS8sF5SdWhrsRrT0f5EWNlZZ31o8dzA5miO3ahBW2HfIWxt4tMH5VG8KPjGToAAUETIjp7LnPQzlrq3XIMelXeaKsyctWHtA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 15/11/2023 21:37, Andrew Morton wrote: > On Wed, 15 Nov 2023 16:30:05 +0000 Ryan Roberts wrote: > >> However, the primary motivation for this change is to reduce the number >> of tlb maintenance operations that the arm64 backend has to perform >> during fork > > Do you have a feeling for how much performance improved due to this? The commit log for patch 13 (the one which implements ptep_set_wrprotects() for armt64) has performance numbers for a fork() microbenchmark with/without the optimization: ---8<--- I see huge performance regression when PTE_CONT support was added, then the regression is mostly fixed with the addition of this change. The following shows regression relative to before PTE_CONT was enabled (bigger negative value is bigger regression): | cpus | before opt | after opt | |-------:|-------------:|------------:| | 1 | -10.4% | -5.2% | | 8 | -15.4% | -3.5% | | 16 | -38.7% | -3.7% | | 24 | -57.0% | -4.4% | | 32 | -65.8% | -5.4% | ---8<--- Note that's running on Ampere Altra, where TLBI tends to have high cost. > > Are there other architectures which might similarly benefit? By > implementing ptep_set_wrprotects(), it appears. If so, what sort of > gains might they see? The rationale for this is to reduce expense for arm64 to manage contpte-mappings. If other architectures support contpte-mappings then they could benefit from this API for the same reasons that arm64 benefits. I have a vague understanding that riscv has a similar concept to the arm64's contiguous bit, so perhaps they are a future candidate. But I'm not familiar with the details of the riscv feature so couldn't say whether they would be likely to see the same level of perf improvement as arm64. Thanks, Ryan