From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0ED3AC433F5 for ; Tue, 17 May 2022 09:16:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4500A6B0072; Tue, 17 May 2022 05:16:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3FE966B0073; Tue, 17 May 2022 05:16:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2ED598D0001; Tue, 17 May 2022 05:16:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 2058A6B0072 for ; Tue, 17 May 2022 05:16:21 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id D25102E158 for ; Tue, 17 May 2022 09:16:20 +0000 (UTC) X-FDA: 79474678920.16.FFD1193 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) by imf27.hostedemail.com (Postfix) with ESMTP id A5FA4400C5 for ; Tue, 17 May 2022 09:16:13 +0000 (UTC) Received: by mail-pl1-f172.google.com with SMTP id q4so16778242plr.11 for ; Tue, 17 May 2022 02:16:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=onNloPT2vOHeOV/PSAtr8Uk19Uz1c04GpqkHIaBzWIY=; b=ZyE57yWZL+SFAnt+RCxLDdFdvh/Ti2hIPUBotwr6QjEtQZ6l+iylAXCbEacALkbQjL VvDYdcDpxNWpvmCuw5V6kkBxIerpdJ1it39e1AcUARaymO6JLB+O9Q2if8mkrwmK3JxQ pP8r9pMKAMaTlidnbgZyBS3sDVguddEuzS3BrrAdAjJCAVpsmfQu99GPH+e/3GfF+ZRw gr8Kgwnq/h6eSIkM7JunuCmfFQFqRSgzPd37lqeuYAcsHpE9d6raEHZyBAU16UlMgGD2 KyHaf/GpDUjppELMMCFtwNgH+yIEj+lXLFGEaWxdZ+pqQ4MZ0CDr4gHHnGe3vaQSc5kE P+Ug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=onNloPT2vOHeOV/PSAtr8Uk19Uz1c04GpqkHIaBzWIY=; b=U5XaSy0OfLzGck06ikXdZP265VOVJfxgDkeuWJgS8IU+yFIR7Z5KnjRdF4wYhdUi8h 2rfVRdGCp3wAZDthzcV3PdEHb4xyfPL7AwPUwxWL9RVkvAMUuxpCgBSMD0obn1tr6R/H 7fhcURxTp72/vsPL0X5voEQMbRUq3I7V+u2g01yMWptmyc6GQXGrlGpMldIMYWQAC0jz MPfKLEUvMi62weL7pR06lRuFZ0HfEgbIWCv9B8CRB/2gsPztM8uyoyrvqtK0LXQA6Lrs 9GOusZ2BJul63z7RBxDQzAsRnXRiuPi4D/OndqFzYYCZfSutUP4MgQm+Er5IEMdjz4Nh 0q6A== X-Gm-Message-State: AOAM532e4svXSM6gQ2H7Fgmfenfey+N74eA8LZ8NB5XgypjgRGCwOcOc asQvjk7YSfUwpjj4EcU7COZMbg== X-Google-Smtp-Source: ABdhPJzuvNI99GDvviBh8Cb0S0VbIbzNldV4sCEMh/Ae6AtvyqRplmKR8tJnUOPkZJWORgAZgncEPQ== X-Received: by 2002:a17:903:2d0:b0:14d:8a8d:cb1 with SMTP id s16-20020a17090302d000b0014d8a8d0cb1mr21430704plk.50.1652778975070; Tue, 17 May 2022 02:16:15 -0700 (PDT) Received: from localhost ([139.177.225.250]) by smtp.gmail.com with ESMTPSA id jc13-20020a17090325cd00b001618b4d86b3sm2978176plb.180.2022.05.17.02.16.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 May 2022 02:16:14 -0700 (PDT) Date: Tue, 17 May 2022 17:16:11 +0800 From: Muchun Song To: Oscar Salvador Cc: corbet@lwn.net, mike.kravetz@oracle.com, akpm@linux-foundation.org, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, david@redhat.com, masahiroy@kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, smuchun@gmail.com Subject: Re: [PATCH v12 7/7] mm: hugetlb_vmemmap: add hugetlb_optimize_vmemmap sysctl Message-ID: References: <20220516102211.41557-1-songmuchun@bytedance.com> <20220516102211.41557-8-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: A5FA4400C5 X-Stat-Signature: rkjf464rxkgx7empqr4gdn6rb5c17py7 X-Rspam-User: Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=ZyE57yWZ; spf=pass (imf27.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-HE-Tag: 1652778973-629877 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, May 17, 2022 at 10:06:51AM +0200, Oscar Salvador wrote: > On Mon, May 16, 2022 at 06:22:11PM +0800, Muchun Song wrote: > > We must add hugetlb_free_vmemmap=on (or "off") to the boot cmdline and > > reboot the server to enable or disable the feature of optimizing vmemmap > > pages associated with HugeTLB pages. However, rebooting usually takes a > > long time. So add a sysctl to enable or disable the feature at runtime > > without rebooting. Why we need this? There are 3 use cases. > > > > 1) The feature of minimizing overhead of struct page associated with each > > HugeTLB is disabled by default without passing "hugetlb_free_vmemmap=on" > > to the boot cmdline. When we (ByteDance) deliver the servers to the > > users who want to enable this feature, they have to configure the grub > > (change boot cmdline) and reboot the servers, whereas rebooting usually > > takes a long time (we have thousands of servers). It's a very bad > > experience for the users. So we need a approach to enable this feature > > after rebooting. This is a use case in our practical environment. > > > > 2) Some use cases are that HugeTLB pages are allocated 'on the fly' > > instead of being pulled from the HugeTLB pool, those workloads would be > > affected with this feature enabled. Those workloads could be identified > > by the characteristics of they never explicitly allocating huge pages > > with 'nr_hugepages' but only set 'nr_overcommit_hugepages' and then let > > the pages be allocated from the buddy allocator at fault time. We can > > confirm it is a real use case from the commit 099730d67417. For those > > workloads, the page fault time could be ~2x slower than before. We > > suspect those users want to disable this feature if the system has enabled > > this before and they don't think the memory savings benefit is enough to > > make up for the performance drop. > > > > 3) If the workload which wants vmemmap pages to be optimized and the > > workload which wants to set 'nr_overcommit_hugepages' and does not want > > the extera overhead at fault time when the overcommitted pages be > > allocated from the buddy allocator are deployed in the same server. > > The user could enable this feature and set 'nr_hugepages' and > > 'nr_overcommit_hugepages', then disable the feature. In this case, > > the overcommited HugeTLB pages will not encounter the extra overhead > > at fault time. > > I am having issues parsing point 3), specially the first part. > IIUC, you are saying we have two kind of different workloads: > > - one that wants to have hugetlb vmemmap pages optimized > - one that wants to allocate hugetlb pages at fault time rather than > allocating them via /proc/..., but does not want to suffer the > overhead of optimizing the vmemmap pages when faulting them I need to clarify this workload, the one that does not want to suffer the overhead of optimizing the vmemmap pages when faulting them instead of wanting to allocate hugetlb pages at fault time. It is different from the one in the case 2). This one usually configures 'nr_overcommit_hugepages' as well as 'nr_hugepages', if it does not want to suffer the overhead of optimizing the vmemmap pages when faulting pages (must be overcommitted pages), then they could follow the steps mentioned above. > > Then you say the user could enable the optimization and allocate > those pages via nr_hugepages, and then disable the feature. > So, when we fault in those pages, the pages are already in the > pool, right? And are already optimized. > I mean the overcommitted pages (it could be allocated at fault time) as explained above. Thanks.