From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3B01EB64DA for ; Fri, 7 Jul 2023 11:29:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0BD368D0002; Fri, 7 Jul 2023 07:29:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 06B048D0001; Fri, 7 Jul 2023 07:29:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E75498D0002; Fri, 7 Jul 2023 07:29:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id D58ED8D0001 for ; Fri, 7 Jul 2023 07:29:09 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id A8B33160D5E for ; Fri, 7 Jul 2023 11:29:09 +0000 (UTC) X-FDA: 80984594418.24.6719DAB Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf20.hostedemail.com (Postfix) with ESMTP id 55C391C0027 for ; Fri, 7 Jul 2023 11:29:07 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=adrjJcAu; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf20.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688729347; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VWacR69Uz+QfEDSGYVxHfDaVc9q9XWEvvJ77b0V+Ztw=; b=s0OAwKBn2ve6fty1IDjeoO5GgQaM/CdxJKsYWa4usKTM6ioB09yxVJwGIUy2N/TTNrtpiM CHqwnkFVc2lEoXdKYIyWWBMTBy4HD+b7K1zAgO9eY8t1mRyhs9GvjTOuRKvpV3T3+BjYGF WhRclDJU1QMDXnNa+lwWoRaiHjMT+eQ= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=adrjJcAu; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf20.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688729347; a=rsa-sha256; cv=none; b=EpB7kDUG3dCwIf8N2YUw5cypNZXFbaCsYsVInto2TvF7TX3Wi7tj595z7r7BufyiRNDxV0 4yFgIVyelg/pRjX7pFX+wCSUcFWLoAnAqIzCA7DVUHC6ZC/IboWQgl+M+tD3rYNc8oo3Eh ZusapU8zMveRbWOblqGWWQmKd7v/ffo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1688729346; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VWacR69Uz+QfEDSGYVxHfDaVc9q9XWEvvJ77b0V+Ztw=; b=adrjJcAud9uNYthKWg3ALQN6PU5ONSIH2LX79vWSRDJXUW5M/N/4UZ5GuMXRNTuJ9BFbFY 0k6eyo4nch5gS/LdCGUodEotG7ERwzT26MD3B9pYefAClzJ8BjDRbm9himwgeHMpOP5uuc Mwfq5FReDCA9V4LpA/aT05geujHBIF0= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-480-EWEHLuHBN4CpjFSDs842DA-1; Fri, 07 Jul 2023 07:29:05 -0400 X-MC-Unique: EWEHLuHBN4CpjFSDs842DA-1 Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-313e6020882so1444837f8f.1 for ; Fri, 07 Jul 2023 04:29:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688729344; x=1691321344; h=content-transfer-encoding:in-reply-to:subject:organization:from :content-language:references:cc:to:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=VWacR69Uz+QfEDSGYVxHfDaVc9q9XWEvvJ77b0V+Ztw=; b=W+OKiUKBxk00ksO7cdJBaV+IQBO+zcLqoNAfv1lYa4Qk4drUCPWzgrDGNTlXFfq5hI +vqHl6IvorUaOY/GK2y5kfx8VHai9Pd/mREoQl0MvENlOKwimb1MggDhhsFmhuO8F5Zv C2YDFatNLaobuo+AqXzEXKmu/lzDwn2sa4BoSkJgLlTG4IW7h9eRuGOHDwf8hpxWU0BO O2BsaYLrkKXwvSuF2RuH6uWoepCawxsdo8wsepmwdXiGYBcpLWLtZf2b/wfRY/fcCPpP muvQJcHkHqBFABp2a0bXNIFu0cbphyfGL0zTQG2tL0f6qUMaWKw2mySlGmvr4ULQuX6m YZ9Q== X-Gm-Message-State: ABy/qLabda/GuDYj30BC3zNcLdzshtUn9Pn8inth1qhFPdkm3TAeVdho i/VHP80kbyt42CFD0T4NHkwIn7S4EXI6j67oEf/7foHBsWKcmBFfcm3IqhQUHRDAKkzbia56NX+ wgYTUxsT4MwU= X-Received: by 2002:adf:e74b:0:b0:313:f3c0:62d8 with SMTP id c11-20020adfe74b000000b00313f3c062d8mr4764575wrn.21.1688729344271; Fri, 07 Jul 2023 04:29:04 -0700 (PDT) X-Google-Smtp-Source: APBJJlHZpSY+2TfflZCKH26SgLqxtv0gTGR1lzGzm3LG1jIiHadGGqSAzJJRU7HiFrNDxrQGxIG+ag== X-Received: by 2002:adf:e74b:0:b0:313:f3c0:62d8 with SMTP id c11-20020adfe74b000000b00313f3c062d8mr4764551wrn.21.1688729343861; Fri, 07 Jul 2023 04:29:03 -0700 (PDT) Received: from ?IPV6:2003:d8:2f04:3c00:248f:bf5b:b03e:aac7? (p200300d82f043c00248fbf5bb03eaac7.dip0.t-ipconnect.de. [2003:d8:2f04:3c00:248f:bf5b:b03e:aac7]) by smtp.gmail.com with ESMTPSA id f14-20020adff44e000000b003142ea7a661sm4246965wrp.21.2023.07.07.04.29.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 07 Jul 2023 04:29:03 -0700 (PDT) Message-ID: <524bacd2-4a47-2b8b-6685-c46e31a01631@redhat.com> Date: Fri, 7 Jul 2023 13:29:02 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 To: Ryan Roberts , "Huang, Ying" Cc: Andrew Morton , Matthew Wilcox , "Kirill A. Shutemov" , Yin Fengwei , Yu Zhao , Catalin Marinas , Will Deacon , Anshuman Khandual , Yang Shi , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20230703135330.1865927-1-ryan.roberts@arm.com> <20230703135330.1865927-5-ryan.roberts@arm.com> <87edlkgnfa.fsf@yhuang6-desk2.ccr.corp.intel.com> <44e60630-5e9d-c8df-ab79-cb0767de680e@arm.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v2 4/5] mm: FLEXIBLE_THP for improved performance In-Reply-To: <44e60630-5e9d-c8df-ab79-cb0767de680e@arm.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 55C391C0027 X-Stat-Signature: s71ibw85cpm7az7mjohbdz557e9w6ji9 X-Rspam-User: X-HE-Tag: 1688729347-469909 X-HE-Meta: U2FsdGVkX18QtoEJAEZBLBPe4JY/ILilhMFCPsgJOZQwtuiv8WVSPtxM816wxfUEUuUjq32AM/v7QuhyczG0fAEAqjCAcvVmeOV301A/VUXfbyYPcfqRbwNUZx4ocak9LFQuiOWPBmH4ohKiSzFRmBZ+tnfUCtgxmcVIYI3lWk8G2aCw2L+f6TKEpA+DWfjFxjng3qVsoD5coGYNMFGExC0ZB7YV9gNZASXvuZSoKrsrTheVo8d277947KqNjVWys5G300BLGi4m6Xiwd1vcMZdQbqZcozIdwUwvM1YXN4GBJ3Eojv3YEvKwUgTj6fh9Icq35yQTYV5d7OrtSR0nUSNW7T4WsUFHReZAnz8zNNTMwGgbDs5suYT5Rk5okMZtnTI5u7ucsnVtjKVg923qZIP1rIBiOtRl065TkRGKVF1hatPUzfSSlwWlIVpYtoYXk0DJGXmdJc7x4cprq62j6i2DCW+FgJRPzKS4UrKXCAw0MbpSE9llVJ8ohqcMwBh6JDtDfWLoQBL9uGrflEypGzANT182W93eONFIrG7t9KUjt5L/x/2L4UWIzShi3kKfXv55qA6AQSh7DcyVS1bx3UvMM1i0EXkzzHGR/LYJ11qOMVU2l5mwC59UoWWl2PrdK4Uq2yikpgAh+x9GoEYYE5L+hItohLPQ65bk0qbkhtsSX2IiMYf5ZszkQ3zIuHg5yV7x4E8PwIYG+ava0JZH4Kol89eEGVYxM1gTjoHnoI5DCLJBMdFZHrr+9IyHEzLysmJMv5181ziKzoXMW5QyyM4yOoREvn9MhjXRD7cnM7YbFuQ2wS6WhoDStg9sJLUly+ttSVI1tLiLyRFtFOXiIjE0pivebljXIbKKF/shawt7d7tiOIHBQTVgeym+JgrXvPSar8uvAwL55cxJng7kXIn0w4yOGL1upqvC6uCeRTC0O0pJU7ooLu7jP2mp7kDcVXAgSOBQnunBRRnvohJ Xibx/85P MKRf0OR4IWX8rmEFjNCffCEmjp3pECJ0tPvszHfLnN6eFoDrjPooBT5y2XqOhtTkzfVYWOIuj2UotCUcGDCGeEvbutve/mBwN6wnnOxqfjOMJ9SGolqO41Fo57zXTL056BxPhhcld4qbT/tUASzur4eI62ElfRy8HBNFpKqRKWvFdKwe5taLFCHEGBOkMkt7/K5YU+7uaEYTRl5vGvMLx7R8tnEO5QxoilkJJO0wHM/qFjrZW3OUfDW4ECYbyfBFvM8VEV+VkeNkUSbytGfARlb+6b2/xLIxsx3rpMCE+JprkGOh37nJem+Y9JeRKHP6Xba1nztzH74/5XS6DDMF+exbrB6c5fuiPVHzV8H5djMHdwhC/xwb3d5D6hBWjgBQBlvZ9r+EnPdHc1cQuReMreTT21cSOYAy4RYkr0YeWeSVR56Hqvpvn+4lN+o8Qpss6vELmmjz0JOqRaNBR8UcNTNXXMQ7TTbqb+NFj46tubuqiVSY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 07.07.23 11:52, Ryan Roberts wrote: > On 07/07/2023 09:01, Huang, Ying wrote: >> Ryan Roberts writes: >> >>> Introduce FLEXIBLE_THP feature, which allows anonymous memory to be >>> allocated in large folios of a specified order. All pages of the large >>> folio are pte-mapped during the same page fault, significantly reducing >>> the number of page faults. The number of per-page operations (e.g. ref >>> counting, rmap management lru list management) are also significantly >>> reduced since those ops now become per-folio. >> >> I likes the idea to share as much code as possible between large >> (anonymous) folio and THP. Finally, THP becomes just a special kind of >> large folio. >> >> Although we can use smaller page order for FLEXIBLE_THP, it's hard to >> avoid internal fragmentation completely. So, I think that finally we >> will need to provide a mechanism for the users to opt out, e.g., >> something like "always madvise never" via >> /sys/kernel/mm/transparent_hugepage/enabled. I'm not sure whether it's >> a good idea to reuse the existing interface of THP. > > I wouldn't want to tie this to the existing interface, simply because that > implies that we would want to follow the "always" and "madvise" advice too; That > means that on a thp=madvise system (which is certainly the case for android and > other client systems) we would have to disable large anon folios for VMAs that > haven't explicitly opted in. That breaks the intention that this should be an > invisible performance boost. I think it's important to set the policy for use of It will never ever be a completely invisible performance boost, just like ordinary THP. Using the exact same existing toggle is the right thing to do. If someone specify "never" or "madvise", then do exactly that. It might make sense to have more modes or additional toggles, but "madvise=never" means no memory waste. I remember I raised it already in the past, but you *absolutely* have to respect the MADV_NOHUGEPAGE flag. There is user space out there (for example, userfaultfd) that doesn't want the kernel to populate any additional page tables. So if you have to respect that already, then also respect MADV_HUGEPAGE, simple. > THP separately to use of large anon folios. > > I could be persuaded on the merrits of a new runtime enable/disable interface if > there is concensus. There would have to be very good reason for a completely separate control. Bypassing MADV_NOHUGEPAGE or "madvise=never" simply because we add a "flexible" before the THP sounds broken. -- Cheers, David / dhildenb