From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20BABEB64DC for ; Mon, 17 Jul 2023 17:16:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7FAC48D0001; Mon, 17 Jul 2023 13:16:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7AABF6B0074; Mon, 17 Jul 2023 13:16:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 64BE58D0001; Mon, 17 Jul 2023 13:16:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 517AD6B0072 for ; Mon, 17 Jul 2023 13:16:28 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 266B3A04E4 for ; Mon, 17 Jul 2023 17:16:28 +0000 (UTC) X-FDA: 81021757656.11.54D1DAF Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf11.hostedemail.com (Postfix) with ESMTP id B14084000A for ; Mon, 17 Jul 2023 17:16:25 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=hlhB+r97; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf11.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689614185; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Z93VbPii01qcFcqdbTdmXDRfY7mxwnK6BwZz/0iOPoU=; b=TxLKKiL+/aF3L1eWh7eBsVHQUtyNRG9cJ80Kt0HGCNjdJhY+p1oFTk3KJ3KAKYynvh+NPa a3i6b+VY80GoQF8pPz/Y2ElLeJRlO5+h33mX5ABe1Ov7Mdzsd1TECXsf4fkkNi39ZEqlCs VaWn8T8s1C1GNsYwAUYZeJWsw3pYVtk= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=hlhB+r97; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf11.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689614185; a=rsa-sha256; cv=none; b=LNCvturU2hCO+HKGz36Zp6twoU8b7DCUT15lDgj7/vGLIwfkhrNukOJZjrenFPLqwoBfnt BwmGnJKAlcqkc2CeasAGk86VslHQYNTNrhgAZdGtdbLHToAK0oJVW1pGhpFZzBjMQrmJc0 OLNqL33JGUPZ6+vySLlPpiefLLuiD5g= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1689614185; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Z93VbPii01qcFcqdbTdmXDRfY7mxwnK6BwZz/0iOPoU=; b=hlhB+r979tby/BzTsuD/zbtM1it/ZBpboqAKXXbClqirxaIpYkjx9PN/56C8NTwnTEoipq S8fURqNOdLRSJ4LG7mJFU7EmapvgZ3U+znsWWnj+8JTAR/IKMWHz9qrNbxyfLxxa9p9z5g tB8zqKsas2DDTEd0pHMwtF/Cl49UL8k= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-310-JR1UaOwqNR-ywW8q5X8tXQ-1; Mon, 17 Jul 2023 13:16:23 -0400 X-MC-Unique: JR1UaOwqNR-ywW8q5X8tXQ-1 Received: by mail-wr1-f71.google.com with SMTP id ffacd0b85a97d-3143b277985so3059060f8f.2 for ; Mon, 17 Jul 2023 10:16:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689614182; x=1692206182; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Z93VbPii01qcFcqdbTdmXDRfY7mxwnK6BwZz/0iOPoU=; b=UmoahATuu0AnEFKYSssbHF/E5xVaRWtBVtSk0G2pf6OBhZlkNdicx1N8Y1vsUfbasZ JkUrxyjxH8QC36IeJ0tcehWkFzKiw2OIzacuUliBUhqqKI6YGp+llBmKppPulEV9YtNK tpO+unkvrDR7lLooRtlB6nUe6s6Ed9+MdqGO9Jnig0Ox+OdGrIt76mcI2DL98FGPDDQj PwImgEw1wd6geZ+ujqLPJ4pNXRJNzyensDo+kICHeTEu0pfdrkuGuk5ak3+vRxrJgqRx 5xn59oTgQDD1zm8cPEq3Bxed7K8Tp6TBxtFEBiLtAdyQ1X4cUIkRvLpCbaGNRmUhypk8 Kzhg== X-Gm-Message-State: ABy/qLYkhLJOoWmbZP0Ia2cPighEAUdKlDWyofnEuR1XCjGCfdtsKcbx NyJ4lGDyFI4Jlp4kaFRtIKvGci7QnNuhAWImNRIXGE76gbAfXZL8WiY5UYRyxxwuK8yVdFel0O7 bOIARz3Gy7a0= X-Received: by 2002:a05:6000:92:b0:314:1b4d:bb27 with SMTP id m18-20020a056000009200b003141b4dbb27mr11078201wrx.64.1689614182462; Mon, 17 Jul 2023 10:16:22 -0700 (PDT) X-Google-Smtp-Source: APBJJlGgNsttIXjLtwzqBtL5GNsyESvEGI1UI0n6w/ifdIVsxGT4NZG8pq+PeofDnr4R2rgRbGqRYw== X-Received: by 2002:a05:6000:92:b0:314:1b4d:bb27 with SMTP id m18-20020a056000009200b003141b4dbb27mr11078190wrx.64.1689614182092; Mon, 17 Jul 2023 10:16:22 -0700 (PDT) Received: from ?IPV6:2003:cb:c735:400:2501:5a2e:13c6:88da? (p200300cbc735040025015a2e13c688da.dip0.t-ipconnect.de. [2003:cb:c735:400:2501:5a2e:13c6:88da]) by smtp.gmail.com with ESMTPSA id c3-20020a5d4cc3000000b00311299df211sm19820659wrt.77.2023.07.17.10.16.20 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 17 Jul 2023 10:16:21 -0700 (PDT) Message-ID: Date: Mon, 17 Jul 2023 19:16:20 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [PATCH v3 3/4] mm: FLEXIBLE_THP for improved performance To: Yu Zhao Cc: Ryan Roberts , Andrew Morton , Matthew Wilcox , "Kirill A. Shutemov" , Yin Fengwei , Catalin Marinas , Will Deacon , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20230714160407.4142030-1-ryan.roberts@arm.com> <20230714161733.4144503-3-ryan.roberts@arm.com> <82c934af-a777-3437-8d87-ff453ad94bfd@redhat.com> From: David Hildenbrand Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: B14084000A X-Stat-Signature: nu5y6j56j88ne4ugb68zfm9wp1h6hucf X-Rspam-User: X-HE-Tag: 1689614185-645192 X-HE-Meta: U2FsdGVkX19opYKboZrv1S8/Y6yM0bcdyYWm7Dgbvl01BPrnfqMNMiQJjPrNwU6K/Rh7Zb3AiFCPpsl5iIKZFjm4sINHH3YI5NsTFAPQ2dCU8k/EilmFDOiK0VnKu+ISSHJj+r0K+ecAqcL5pqrU6x1kACfU6oxW+Wc7429FJP++uy2SaWBIk1N7Fh/vyxePSV6I/o8W80vpKbC9oUekpnFGvpZ8Y0B8H2Sdejr4E70xnVoqip8wEuMj3gV7WQP80Ei6qVlhmVGormEZt0CdFwbKPfMIwmDdn44EZ3thKcHX6szpEJrInrS3feIlLFGp65X49pvZWPl7XkVL7RsxYfAKs7InWLay12SeVHKVCHpEVr0994dkP/fbtM1EzHzjxZj99rUJy1hGmHVrUR4Pr4TJZFjB5qnY2m8wfGVGjdTmTYF3JPK9XYR8WJrM6YOdQvWt3vyO6yGkrbG6QjHVzsDlpe26BYIFGmXZBVod0dXxB0qcqzaqERIb8ote+h8LMtdO3zTHJfGl76Ryyc9Op/t0t7ce6SF8EqiipxrnTbOQ6n5uHhnijU1bxHPcdNgSurdgOPNofb6/zXQ44u2CWuA0AJs/mT3ErOASERnu3cUlLB1GvCkjm4YHtbEB7+NaYWrNs3VVNYqyFV+os+ZdwCxAovKQ+62zuwUxqZwKElYnpTc0fA+zLG1QIN3OLoccl75/+ZIeLvOG46hESUOlkwNlAsaEcH0J1U91bLlV+O3h6OCAta6DQR6jPl5EKyfF9mpx4XZYwUQ0rLj9MhReJyvUz32M+xPhbE2h1cbd1Jx/Mn9W0Chv4SRjqdX4oif3FdMWTbzazVfcDEBnbOEt4TInl/PoKetCyBmFTqRRXJ1aGzcv0UkzZDKNtJdhfe1D0z66Njxq9IXATWb3YFGUpmmbHIRSJCWdGUkopqO1kGxF6TnYgcrj/fOK/9Y7RIh8Fa5adLLVGZwj0MgTnTA EVRkvv7x d5ZUpJoFdnrK9kZNxhM51jrLOqthxuAKzN6iDFXCATGCFAF5sU2U+YD5njP+Q7pFXv+DFlX+XED5XfLOA57Wdi18nAPUc7vDfHxIjs0uacdIIczujq5fqvep2uIz8ISkBN5ISqMFj8sMhFdxRpTFpgrsIJP778I2UHiqnu/IdA6fJjn/rqqzDAz3LGISE2QaCx16x6FCXvOhe1ZYK+VuvfPlWqibPcbAT3PQ+Sv60xK9epm+OOMuMeah+oLUnouy/ptbxWq/V/2c7rkwNwLi/MoMzNEJ2rXLT1ba2J2oeDDbd7LdtaWlRHVgi9fFKrHrsKKiK2U/89bg3/11hgnhC9rG2bdyb5O/XDnTWwJLbqHVJ27Y8xuxkE6JBul4ubNMSljvUbJaKp2q2MEh2cZGZ325hguXlGd+RYTZdUKScnv+AEAOp6cwsBcm8C9EsS0YRuoupFja96FxIuPBoTCcZUrzYamnDFVcoA5W0Jyo6xNDMb6XjOJ2gR/+XXISwPfNqMsEDne4KfZrgt2Y= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 17.07.23 19:07, Yu Zhao wrote: > On Mon, Jul 17, 2023 at 7:06 AM David Hildenbrand wrote: >> >> On 14.07.23 19:17, Yu Zhao wrote: >>> On Fri, Jul 14, 2023 at 10:17 AM Ryan Roberts wrote: >>>> >>>> Introduce FLEXIBLE_THP feature, which allows anonymous memory to be >>>> allocated in large folios of a determined order. All pages of the large >>>> folio are pte-mapped during the same page fault, significantly reducing >>>> the number of page faults. The number of per-page operations (e.g. ref >>>> counting, rmap management lru list management) are also significantly >>>> reduced since those ops now become per-folio. >>>> >>>> The new behaviour is hidden behind the new FLEXIBLE_THP Kconfig, which >>>> defaults to disabled for now; The long term aim is for this to defaut to >>>> enabled, but there are some risks around internal fragmentation that >>>> need to be better understood first. >>>> >>>> When enabled, the folio order is determined as such: For a vma, process >>>> or system that has explicitly disabled THP, we continue to allocate >>>> order-0. THP is most likely disabled to avoid any possible internal >>>> fragmentation so we honour that request. >>>> >>>> Otherwise, the return value of arch_wants_pte_order() is used. For vmas >>>> that have not explicitly opted-in to use transparent hugepages (e.g. >>>> where thp=madvise and the vma does not have MADV_HUGEPAGE), then >>>> arch_wants_pte_order() is limited by the new cmdline parameter, >>>> `flexthp_unhinted_max`. This allows for a performance boost without >>>> requiring any explicit opt-in from the workload while allowing the >>>> sysadmin to tune between performance and internal fragmentation. >>>> >>>> arch_wants_pte_order() can be overridden by the architecture if desired. >>>> Some architectures (e.g. arm64) can coalsece TLB entries if a contiguous >>>> set of ptes map physically contigious, naturally aligned memory, so this >>>> mechanism allows the architecture to optimize as required. >>>> >>>> If the preferred order can't be used (e.g. because the folio would >>>> breach the bounds of the vma, or because ptes in the region are already >>>> mapped) then we fall back to a suitable lower order; first >>>> PAGE_ALLOC_COSTLY_ORDER, then order-0. >>>> >>>> Signed-off-by: Ryan Roberts >>>> --- >>>> .../admin-guide/kernel-parameters.txt | 10 + >>>> mm/Kconfig | 10 + >>>> mm/memory.c | 187 ++++++++++++++++-- >>>> 3 files changed, 190 insertions(+), 17 deletions(-) >>>> >>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt >>>> index a1457995fd41..405d624e2191 100644 >>>> --- a/Documentation/admin-guide/kernel-parameters.txt >>>> +++ b/Documentation/admin-guide/kernel-parameters.txt >>>> @@ -1497,6 +1497,16 @@ >>>> See Documentation/admin-guide/sysctl/net.rst for >>>> fb_tunnels_only_for_init_ns >>>> >>>> + flexthp_unhinted_max= >>>> + [KNL] Requires CONFIG_FLEXIBLE_THP enabled. The maximum >>>> + folio size that will be allocated for an anonymous vma >>>> + that has neither explicitly opted in nor out of using >>>> + transparent hugepages. The size must be a power-of-2 in >>>> + the range [PAGE_SIZE, PMD_SIZE). A larger size improves >>>> + performance by reducing page faults, while a smaller >>>> + size reduces internal fragmentation. Default: max(64K, >>>> + PAGE_SIZE). Format: size[KMG]. >>>> + >>> >>> Let's split this parameter into a separate patch. >>> >> >> Just a general comment after stumbling over patch #2, let's not start >> splitting patches into things that don't make any sense on their own; >> that just makes review a lot harder. > > Sorry to hear this -- but there are also non-subjective reasons we > split patches this way. > > Initially we had minimum to no common ground, so we had to divide and > conquer by smallest steps. > > if you look at previous discussions: there was a disagreement on patch > 2 in v2 -- that's the patch you asked to be squashed into the main > patch 3. Fortunately we've resolved that. If that disagreement had > persisted, we would leave patch 2 out rather than let it bog down > patch 3, which would work indifferently for all arches except arm and > could be merged separately. All makes sense to me, and squashing it now is most probably the logical step and was different before. As I said, just a general comment when we talk about splitting stuff out. -- Cheers, David / dhildenb