From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42129C433F5 for ; Thu, 31 Mar 2022 03:46:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 68B2E6B0074; Wed, 30 Mar 2022 23:46:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6133B6B0075; Wed, 30 Mar 2022 23:46:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 48DBA8D0001; Wed, 30 Mar 2022 23:46:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0116.hostedemail.com [216.40.44.116]) by kanga.kvack.org (Postfix) with ESMTP id 32B286B0074 for ; Wed, 30 Mar 2022 23:46:08 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id D3E37A5A6C for ; Thu, 31 Mar 2022 03:46:07 +0000 (UTC) X-FDA: 79303293174.29.B66BAF4 Received: from mail-yw1-f179.google.com (mail-yw1-f179.google.com [209.85.128.179]) by imf31.hostedemail.com (Postfix) with ESMTP id EFB4420003 for ; Thu, 31 Mar 2022 03:46:06 +0000 (UTC) Received: by mail-yw1-f179.google.com with SMTP id 00721157ae682-2e5e9025c20so240606047b3.7 for ; Wed, 30 Mar 2022 20:46:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8+ueIIKq75xrOgcbpQGCP0Xp5hXOruV4yiy78MzdsaM=; b=xF/J7v97bhv1UsQTHnB+oBqMfeIAD9hShTLRksgabyqK2tVhgKtDtkLmc5GQKPQhfo dnm1DcIGeLSIDCQ/OkLHR4kYPHLDVeVa1G/NKeXdtKoyJubynsPePbmIvtT6tsPIElCv xhatSYIul1FpQ+5luE2LftI/IaNtqtDgcwiMiEJRe2lSE4op1T4f203G18T5JZRVM+WQ Y+1+SFuF6v00ObQ0dQyfehBxjpDArKxpBsg8GF2ZDtDKHMqQgsoVXTUUr9+/Hw5QtlLD llIC04kkVwqvRIAmy7WDCVRlb/tq4OGtEMZcFd7bbhdxjMxxf+GON+jNTy2a6YdUgXpy PAgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8+ueIIKq75xrOgcbpQGCP0Xp5hXOruV4yiy78MzdsaM=; b=irGGd9SLlP64OrExfXVd2AYI2GvV60HRd7YSjPd/BUVgy9+v1lnZECR2Gizb2wKJgX PoZpVGY7FuSCQAV9XB3IBrNghVFQybaksFez1Vg3MNrQJ0L1ibEUVZWJxlB8FyFS9X6V MyO+fyxCZCFGWuTHrP1QnOYHyTaZQl1QuB0uguH7WN5dhGoORKMzZvAVwDJ4yoTnhjPc oBwpx22ab2fncLgqOqWjS8+jDQfS1KzpvhcpYg6qEZIomDgGZubPSbD9wDdAT6UbD42B wUCjvHkPK4S9Kds8VfkXuM4cbeWMiLngK0RIg6uO2zD/3T/940S9rmTkziKLPcrAIvgz AtUA== X-Gm-Message-State: AOAM5331fvR371DWSeSy8X9OgMDQe4z+3QfvgH74j5sZm7e8qZvBF6GL 4OLfZRuVyVADannprpJg63keJ0pfEa5Zo65jr12v9w== X-Google-Smtp-Source: ABdhPJzAjwosvewbJoeV/iiCgGlooRUtsyvcPlkFEVpMn3m+Ye5MVLmT/reJ4+CbJIPHMSFZy+YQ1bp1/u/2GLohUgY= X-Received: by 2002:a81:3756:0:b0:2e3:3db4:7de1 with SMTP id e83-20020a813756000000b002e33db47de1mr3232544ywa.458.1648698365817; Wed, 30 Mar 2022 20:46:05 -0700 (PDT) MIME-Version: 1.0 References: <20220330153745.20465-1-songmuchun@bytedance.com> <20220330153745.20465-5-songmuchun@bytedance.com> <20220330193657.88f68bbf13fb198fb189bc15@linux-foundation.org> In-Reply-To: <20220330193657.88f68bbf13fb198fb189bc15@linux-foundation.org> From: Muchun Song Date: Thu, 31 Mar 2022 11:45:29 +0800 Message-ID: Subject: Re: [PATCH v6 4/4] mm: hugetlb_vmemmap: add hugetlb_free_vmemmap sysctl To: Andrew Morton Cc: Jonathan Corbet , Mike Kravetz , Luis Chamberlain , Kees Cook , Iurii Zaikin , Oscar Salvador , David Hildenbrand , Masahiro Yamada , Linux Doc Mailing List , LKML , Linux Memory Management List , Xiongchun duan , Muchun Song Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: 7js468humkjggk5sym7i3ggf483r61bm Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b="xF/J7v97"; spf=pass (imf31.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.128.179 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: EFB4420003 X-HE-Tag: 1648698366-353292 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Mar 31, 2022 at 10:37 AM Andrew Morton wrote: > > On Wed, 30 Mar 2022 23:37:45 +0800 Muchun Song wrote: > > > We must add "hugetlb_free_vmemmap=on" to boot cmdline and reboot the > > server to enable the feature of freeing vmemmap pages of HugeTLB > > pages. Rebooting usually takes a long time. Add a sysctl to enable > > or disable the feature at runtime without rebooting. > > I forget, why did we add the hugetlb_free_vmemmap option in the first > place? Why not just leave the feature enabled in all cases? The 1st reason is because we disable PMD/huge page mapping of vmemmap pages (in the original version) which increase page table pages. So if a user/sysadmin only uses a small number of HugeTLB pages (as a percentage of system memory), they could end up using more memory with hugetlb_free_vmemmap on as opposed to off. Now this tradeoff is gone. The 2nd reason is this feature adds more overhead in the path of HugeTLB allocation/freeing from/to the buddy system. As Mike said in the link [1]. " There are still some instances where huge pages are allocated 'on the fly' instead of being pulled from the pool. Michal pointed out the case of page migration. It is also possible for someone to use hugetlbfs without pre-allocating huge pages to the pool. I remember the use case pointed out in commit 099730d67417. It says, "I have a hugetlbfs user which is never explicitly allocating huge pages with 'nr_hugepages'. They only set 'nr_overcommit_hugepages' and then let the pages be allocated from the buddy allocator at fault time." In this case, I suspect they were using 'page fault' allocation for initialization much like someone using /proc/sys/vm/nr_hugepages. So, the overhead may not be as noticeable. " For those different workloads, we introduce hugetlb_free_vmemmap and expect users to make decisions based on their workloads. [1] https://patchwork.kernel.org/comment/23752641/ > > Furthermore, why would anyone want to tweak this at runtime? What is > the use case? Where is the end-user value in all of this? If the workload is changed in the future on a server. The users need to adapt this at runtime without rebooting the server. > > > Disabling requires there is no any optimized HugeTLB page in the > > system. If you fail to disable it, you can set "nr_hugepages" to 0 > > and then retry. > > > > --- a/Documentation/admin-guide/sysctl/vm.rst > > +++ b/Documentation/admin-guide/sysctl/vm.rst > > @@ -561,6 +561,20 @@ Change the minimum size of the hugepage pool. > > See Documentation/admin-guide/mm/hugetlbpage.rst > > > > > > +hugetlb_free_vmemmap > > +==================== > > + > > +Enable (set to 1) or disable (set to 0) the feature of optimizing vmemmap > > +pages associated with each HugeTLB page. Once true, the vmemmap pages of > > +subsequent allocation of HugeTLB pages from buddy system will be optimized, > > +whereas already allocated HugeTLB pages will not be optimized. If you fail > > +to disable this feature, you can set "nr_hugepages" to 0 and then retry > > +since it is only allowed to be disabled after there is no any optimized > > +HugeTLB page in the system. > > + > > Pity the poor user who is looking at this and wondering whether it will > improve or worsen things. If we don't tell them, who will? Are they > supposed to just experiment? > > What can we add here to help them understand whether this might be > beneficial? > My bad. I should explain more details to let users make better decisions. Thanks.