From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4C6EC54E60 for ; Thu, 14 Mar 2024 14:19:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 342AE800B1; Thu, 14 Mar 2024 10:19:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F1FB800A9; Thu, 14 Mar 2024 10:19:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1E13C800B1; Thu, 14 Mar 2024 10:19:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 0BC43800A9 for ; Thu, 14 Mar 2024 10:19:44 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id BB2671C17CD for ; Thu, 14 Mar 2024 14:19:43 +0000 (UTC) X-FDA: 81895853046.11.B8E572B Received: from mail-yw1-f169.google.com (mail-yw1-f169.google.com [209.85.128.169]) by imf04.hostedemail.com (Postfix) with ESMTP id C65AA40008 for ; Thu, 14 Mar 2024 14:19:41 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="iGHo/J2+"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf04.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.128.169 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710425981; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Gg2nzRe0L0YtyEcjhyzdyONnHk2DTccF/72el9UJrWg=; b=kTfxYbF+1MSIK3hXoZh3dB9pSVYZT037h3phYoGeqbe2RYyA9mpFXZGqqmvDioWJn32G0m U1FTWU1y5f+pMazS5U58IRs80YF0hHLF7c8k94VM3n2eQ31SHTHnufyC27HVRIaPs2Wu/a hl4kWzWOXyqBgnE19qjddF+IiM2iUBE= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="iGHo/J2+"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf04.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.128.169 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710425981; a=rsa-sha256; cv=none; b=Q4N8Ya2PqEOUNJiM7x92Q0OZOlRwZc8OokoFMfCix7poFan0GRQYEn9stln178WLVn61jc JGWyq2kHgc52KMyzbVcOtLoYa8DJb1OReIkC/jEsUceQ9gAs2kqZztf8q10jwFBPVyyU4B CSvjWWaN/UURBvzdS3jJcSknZyl7erg= Received: by mail-yw1-f169.google.com with SMTP id 00721157ae682-609eb3e5a56so11530947b3.1 for ; Thu, 14 Mar 2024 07:19:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1710425981; x=1711030781; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Gg2nzRe0L0YtyEcjhyzdyONnHk2DTccF/72el9UJrWg=; b=iGHo/J2+7RwFu+Xf3N80QXMa03ElqmJcN6bjYJb/tsZolf0u4uwfx982IX4I8F4ngt 8bR/+VkWZt9U1qWEtt+1IJCV8F1sAR6FOrVNz6hqYAIHEXeJWwKO4SDnDgSvMHs7lRyy oMj2hbaMYsagv/pb/DRh4HD2HBt3/fVZVNu6Nb/sLqi4FSnS/L0LPRMNSNZtKutyfAae hKFeD5bQu3sz71J+5UVRqOfmlYp71WvM2BcRiwwiF0yf9oXe1s0cY+YzAnhJTUttur9F yjSYxoPTMgP0hqP096R11hhrXG6+8tCpVNKLv5fuf0WMEuOVdyRUcwCqetG/mHddhqOa +3Zw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710425981; x=1711030781; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Gg2nzRe0L0YtyEcjhyzdyONnHk2DTccF/72el9UJrWg=; b=eWuVli5B+bdpAdOHFvq1JICGxCvRQI5ZNXLNSiaxtEoe/a/U20yHl7E7uQG+n30T1s RyWkBE00Dnpi3GQHUxMeXzp1jdfcEJWp9AutAa2o4QIiq1bt2D2VOaMlkiXGJQCuvnAR 2T/gJpa9lqd7bcUK1uwMoyg5bA3U+hQEHalXgHUn254u58auSXF8jsAV/+uuhlOM40Do 5+6dsawaJ3z/TapW1sk3cmEw5WRkF3DrDyXVeQTkaVao/3GDqtUiwJ7SvgrjE3/1ct1c FYa63gv1H5LpzgkzoOu7KpRmi1qQiebW0auKht5Ta/f3cRo3cElpO0rqRp3cuBZWw8Ad 8Egw== X-Forwarded-Encrypted: i=1; AJvYcCWO6qro8Q3ALNwd7QriIaXtHZKk1rp1+ibKh71Yey3NiUgcqp76aHvHF32GzDju4rUgK6y4TQbZPb5Cw4Gi5cp63rg= X-Gm-Message-State: AOJu0Ywo672ocumdzDpO/MVKza9usRKdoh68wJ5mCkJSZ3EJFchN/LvF qt2j+niwIeN4qyl4V3VkryhZswxiUWQCiZJacICzK4NLq/HXc6PKQf5gYKyWorngRBNPJO9yrQa M81lwYpHm+vV0fXBFUBp9Gi7AyvcTlRhJohVSRNPL6Kw= X-Google-Smtp-Source: AGHT+IHIs1ObIko8smQY33wEreFY8eUyGblmVsRNQSkdicymVs2+Fy5wx2ybecf64OFdVQWodvsoH2QB2pwgNeQY1+E= X-Received: by 2002:a0d:fd46:0:b0:60a:3487:6f74 with SMTP id n67-20020a0dfd46000000b0060a34876f74mr2123445ywf.7.1710425980805; Thu, 14 Mar 2024 07:19:40 -0700 (PDT) MIME-Version: 1.0 References: <20240308074921.45752-1-ioworker0@gmail.com> <75630ba6-79b6-4105-b614-29cfb0331084@redhat.com> In-Reply-To: From: Lance Yang Date: Thu, 14 Mar 2024 22:19:29 +0800 Message-ID: Subject: Re: [PATCH 1/1] mm/khugepaged: reduce process visible downtime by pre-zeroing hugepage To: David Hildenbrand Cc: akpm@linux-foundation.org, mhocko@suse.com, zokeefe@google.com, shy828301@gmail.com, xiehuan09@gmail.com, songmuchun@bytedance.com, minchan@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, libang.li@antgroup.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: C65AA40008 X-Stat-Signature: eichqh3dabptfpa7hkmmm1ugmqxsirjn X-HE-Tag: 1710425981-876264 X-HE-Meta: U2FsdGVkX19TyVr+sy65X9jjQDpldSFoNuYmQUh6rFjXyTF+FEzDRGG8o9EC0ZxKxBLL46yKBGsMcCYKCKmeb/sq1J+PTbxnmTqKAdeNam3Cvq6DKykRITycQ2+q+ZP7B9Z1kes6GQcMHC+7VNYD4lmIHs4qaEPAjQkHZCSfP8FUbW9GFA4hMB9094kzgBPsshmivcFD0Zjx+JEj2zQzxsl06K7FVRfk+7jYcKjJNbQX0rpLJk/D0iHLyxkQKEvK2FKXs/coIycRCtp5OBWGmES2zd1AoA/ue2Y8QnV2B99gMFBg+xEkEqv3BL18qhrkpD+CAMlCacuLxodYkLDhYPhD9nDC6xIa8ZZxJYgPl8e5oViGYH7fII+5Pl4zCRGtPVJG/zSHcISBwyJpeh8fGH7N0POAZB64U6BKZhmpaq4KZKdwLEsxBXf1CR5QY5p3RIRuLZctGpsonOP9KwGInk+6fX/9n68MDaMD5c09ufxkZafogAbL60G7J0AB65rq20SpKjPihRgJkcBJfaZ26Iq9rqJndwrF4Kv6tEJEieTe1p413uBJ31GQTFady1N9y6KnFcfFi4aeYFFXslIPJwGIYfDKw/BlxxWnQ1tToucp1X67orqT2/8Yv4nRXqhvOSSW+1clUx79kzKWbR+TNtpmiieMZm6NVTzwQD6TLEz5sHSyaBGlzWqCEHNAn8a283pDQkSgF0c38k/VSQMnbk0RnPe4//8F4G6gNNbHL8wxyZ0h7JahHYvJ5nLmVeSFSWtJn8OPnQL14CqVzslf7SgmenNBxa5TSHt+KCUxbWO2SMPFhrUSpJh5ZCF8kBQrO82hkzsOdxZyCGo7XSr4a9kSIx5Mahu7fcqlwa5eRdKY1f9v6bu/CYlrGkwCJSgZaVq3+f1EUdtEX2tRNbx+raAfr86jPHldww5Y0neWWXDqoUc57T3cOO6Yb3b5TiYNPIkMYOIbckmLda4JU5y km2mDrfx ppNYmK91lf5uzhuRAqgkoLnIkxBbzco5LFhLPoyrAePPPseGKcOMATJjktVRKSarMcyzp6s8+SRkA6qxeDY28MG4v7R+mIig7regNmD7oJ1JZ4D4GYAqdb1B7D1FiMqfMnEb+gU8wAzI7EydKSAqFhrfbpsNn+rzVEkbja5vUtNfjbIgqKMj6D3vQuyJ7+23+aIOK7QxGnFsiMoozYI/MuJ6IRMEVYrGXTJ6zpC0WmgMxVbLZmMe1Oay6k4fQaXEKmQjU6rL5zLdO4A46FMe4/2G8u8yrR1XigqRK1a3hJ81QqD37LYiZC0YAA0i3g6Kb6YTE1mW1qY3B/RBz5RA+8JdQcV5Y+6ZV0iJ6XBjXMWAqTWSa/JyK8klyAg6HXKDOVnf0/jA8TujArySR4kgxlGLxKQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Another thought suggested by Bang Li is that we record which pte is none in hpage_collapse_scan_pmd. Then, before acquiring the mmap_lock (write mod= e), we will pre-zero pages as needed. What do you think? Thanks, Lance On Tue, Mar 12, 2024 at 9:55=E2=80=AFPM Lance Yang wr= ote: > > On Tue, Mar 12, 2024 at 9:19=E2=80=AFPM David Hildenbrand wrote: > > > > On 12.03.24 14:09, Lance Yang wrote: > > > Hey David, > > > > > > Thanks for taking time to review! > > > > > > On Tue, Mar 12, 2024 at 12:19=E2=80=AFAM David Hildenbrand wrote: > > >> > > >> On 08.03.24 08:49, Lance Yang wrote: > > >>> The patch reduces the process visible downtime during hugepage > > >>> collapse. This is achieved by pre-zeroing the hugepage before > > >>> acquiring mmap_lock(write mode) if nr_pte_none >=3D 256, without > > >>> affecting the efficiency of khugepaged. > > >>> > > >>> On an Intel Core i5 CPU, the process visible downtime during > > >>> hugepage collapse is as follows: > > >>> > > >>> | nr_ptes_none | w/o __GFP_ZERO | w/ __GFP_ZERO | Change | > > >>> --------------------------------------------------=E2=80=94--------= -- > > >>> | 511 | 233us | 95us | -59.21%| > > >>> | 384 | 376us | 219us | -41.20%| > > >>> | 256 | 421us | 323us | -23.28%| > > >>> | 128 | 523us | 507us | -3.06%| > > >>> > > >>> Of course, alloc_charge_hpage() will take longer to run with > > >>> the __GFP_ZERO flag. > > >>> > > >>> | Func | w/o __GFP_ZERO | w/ __GFP_ZERO | > > >>> |----------------------|----------------|---------------| > > >>> | alloc_charge_hpage | 198us | 295us | > > >>> > > >>> But it's not a big deal because it doesn't impact the total > > >>> time spent by khugepaged in collapsing a hugepage. In fact, > > >>> it would decrease. > > >> > > >> It does look sane to me and not overly complicated. > > >> > > >> But, it's an optimization really only when we have quite a bunch of > > >> pte_none(), possibly repeatedly so that it really makes a difference= . > > >> > > >> Usually, when we repeatedly collapse that many pte_none() we're just > > >> wasting a lot of memory and should re-evaluate life choices :) > > > > > > Agreed! It seems that the default value of max_pte_none may be set to= o > > > high, which could result in the memory wastage issue we're discussing= . > > > > IIRC, some companies disable it completely (set to 0) because of that. > > > > > > > >> > > >> So my question is: do we really care about it that much that we care= to > > >> optimize? > > > > > > IMO, although it may not be our main concern, reducing the impact of > > > khugepaged on the process remains crucial. I think that users also pr= efer > > > minimal interference from khugepaged. > > > > The problem I am having with this is that for the *common* case where w= e > > have a small number of pte_none(), we cannot really optimize because we > > have to perform the copy. > > > > So this feels like we're rather optimizing a corner case, and I am not > > so sure if that is really worth it. > > > > Other thoughts? > > Another thought is to introduce khugepaged/alloc_zeroed_hpage for THP > sysfs settings. This would enable users to decide whether to avoid unnece= ssary > copies when nr_ptes_none > 0. > > > > > -- > > Cheers, > > > > David / dhildenb > >