From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F5D1C19F2A for ; Thu, 11 Aug 2022 22:12:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BFB376B0073; Thu, 11 Aug 2022 18:12:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B833D6B0075; Thu, 11 Aug 2022 18:12:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9FCFB8E0001; Thu, 11 Aug 2022 18:12:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 8A0276B0073 for ; Thu, 11 Aug 2022 18:12:20 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 578B11207FD for ; Thu, 11 Aug 2022 22:12:20 +0000 (UTC) X-FDA: 79788711240.09.3C53C99 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by imf17.hostedemail.com (Postfix) with ESMTP id E5EAB4017D for ; Thu, 11 Aug 2022 22:12:19 +0000 (UTC) Received: by mail-pl1-f182.google.com with SMTP id o3so18085329ple.5 for ; Thu, 11 Aug 2022 15:12:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc; bh=OaIrkj9R9JywIX725xkBPhpXdGqairQfBX0VVLZM/go=; b=PX9vP9e6D2i3BRyZIySwJPDwHZx7MDUBg6U5qRUg5aispyC8WAyr3Pv8uyVvQhlvEz RTKtOCkS4QygbmLPynk0vftHzNBqCz15oBYSTIYhE94i0LCQR6BknkOUvScurFBTnb2a 4og4ZP2b8dNIfA+lVGi/77/+uXNCAd6lVc00uxhG+VKLw5Vw+htZ9EOr0ccjbOJb9Yka 367PUfLyCGVuykPzpCgGJReFq+9EwFmjFjKv3QSYHj6omBI45WM7AkGdtAj39X3Ri2I2 z2xnfGw4ZAble6fV1USkw4ESzm9ysBiDgG2zKFhGjOvhRkMI+e2pMaAGEcS6DxTuBa24 erjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc; bh=OaIrkj9R9JywIX725xkBPhpXdGqairQfBX0VVLZM/go=; b=dxgT8otK4c8VLTnNR/u3jumTyDX5Nui8eGFBOb2ekJVALLjAgqC+nOWSHWnIz5mwzy 7gm2y6+qnOT6rkVuF+btnDrSho1l4f2ZF0ENr4KCSzpRFRI+9HeNlUrzTbeGp+/lVnrk 9XZySLsIteNPuia7zBxHo4lxbHxmtHsQhuQnRdhsDs8UX8X70IiiaN3G7X2H9w7jmBkK ueEOE7JUn2XmuLOImCIsm/pxTNOq0TDzwExXBi3Tsb3EAdajaRSDwwdyi7xyWSMWll0j rOfjxKK5y89SNalR+EBpe29unhCRHMIt2LvyaGFLLdMnl3xSf6n4hsODEwEtTGzEbm0a YeCA== X-Gm-Message-State: ACgBeo13tOf7S9MdstWhmDRFHDUZbyqgm6ozGGyw7Wd5M9J8yqrgzoc6 395l4BHRZsEs4DU4Mpli/Ueyjj8dXkefHS1JX7g= X-Google-Smtp-Source: AA6agR4Ot+zakUPtBASQJ+JpMLGbKl6h8feAP2zD6KXiS0sw8eytNEw/mfvgwHuT1R2IzYUAVJhtM2W/oTP4/jDPM6k= X-Received: by 2002:a17:902:a516:b0:16d:4379:f34a with SMTP id s22-20020a170902a51600b0016d4379f34amr1199760plq.26.1660255938814; Thu, 11 Aug 2022 15:12:18 -0700 (PDT) MIME-Version: 1.0 References: <20220805184016.2926168-1-alexlzhu@fb.com> <0b16dbac6444bfcdfbeb4df4280354839bfe1a8f.camel@fb.com> <1F8B9D85-A735-4832-AD58-CA4BD474248D@fb.com> <868F0874-70E8-4416-B39B-DA74C9D76A40@fb.com> <3195C304-2140-4E5D-890D-AC55653193E5@fb.com> In-Reply-To: From: Yang Shi Date: Thu, 11 Aug 2022 15:12:06 -0700 Message-ID: Subject: Re: [PATCH v3] mm: add thp_utilization metrics to /proc/thp_utilization To: Yu Zhao Cc: "Alex Zhu (Kernel)" , Rik van Riel , Kernel Team , "linux-mm@kvack.org" , "willy@infradead.org" , "linux-kernel@vger.kernel.org" , "akpm@linux-foundation.org" , Ning Zhang , Miaohe Lin Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=PX9vP9e6; spf=pass (imf17.hostedemail.com: domain of shy828301@gmail.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660255940; a=rsa-sha256; cv=none; b=YLK/UXsxoTzuNz/oH5+p1/pGJVxm3ozwZQ58eWOfKBfTFbCv3HbhJpVV/ghjizk+its52N lqmMjx4Nb4f7ifhGAPKED2qE3w4Fn3JEYiHdmHr07tjrw2jph1WR3wVU1Pkao8zLYg7a5M KoTPDVVuBDAo4Izhy28svsFVO/R5X8U= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660255940; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OaIrkj9R9JywIX725xkBPhpXdGqairQfBX0VVLZM/go=; b=Q8drOZRTSKv9pEztlTEDriH/6UKUbaIhdsh6dZ9OfOn1D1xb66lQJK/MdAlZra/8CroB33 l5Vk9ds7MBKrGKYDs3w87bGQyf0RFhUJJ/ACo7uaFF1RUBDlvBHv6iszaAGufIsaNO05jF gf0CceMRMFAVdgHlrq5Nv4x1LHP5DEw= Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=PX9vP9e6; spf=pass (imf17.hostedemail.com: domain of shy828301@gmail.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: E5EAB4017D X-Stat-Signature: 3aj4u3mfzguc4r9aktucr7r6fbd9xt4h X-Rspam-User: X-HE-Tag: 1660255939-424291 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Aug 11, 2022 at 2:55 PM Yu Zhao wrote: > > On Thu, Aug 11, 2022 at 1:20 PM Alex Zhu (Kernel) wrote= : > > > > Hi Yu, > > > > I=E2=80=99ve updated your patch set from last year to work with folio a= nd am testing it now. The functionality in split_huge_page() is the same as= what I have. Was there any follow up work done later? > > Yes, but it won't change the landscape any time soon (see below). So > please feel free to continue along your current direction. > > > If not, I would like to incorporate this into what I have, and then res= ubmit. Will reference the original patchset. We need this functionality for= the shrinker, but even the changes to split_huge_page() by itself it shoul= d show some performance improvement when used by the existing deferred_spli= t_huge_page(). > > SGTM. Thanks! > > A side note: > > I'm working on a new mode: THP=3Dauto, meaning the kernel will detect > internal fragmentation of 2MB compound pages to decide whether to map > them by PMDs or split them under memory pressure. The general workflow > of this new mode is as follows. I tend to agree that avoiding allocating THP in the first place is the preferred way to avoid internal fragmentation. But I got some questions about your design/implementation: > > In the page fault path: > 1. Compound pages are allocated as usual. > 2. Each is mapped by 512 consecutive PTEs rather than a PMD. > 3. There will be more TLB misses but the same number of page faults. > 4. TLB coalescing can mitigate the performance degradation. Why not just allocate base pages in the first place? Khugepaged has max_pte_none tunable to detect internal fragmentation. If you worry about zero page, you could add max_pte_zero tunable. Or did you investigate whether the new MADV_COLLAPSE may be helpful or not? It leaves the decision to the userspace. > > In khugepaged: > 1. Check the dirty bit in the PTEs mapping a compound page, to > determine its utilization. > 2. Remap compound pages that meet a certain utilization threshold by > PMDs in place, i.e., no migrations. > > In the reclaim path, e.g., MGLRU page table scanning: > 1. Decide whether compound pages mapped by PTEs should be split based > on their utilizations and memory pressure, e.g., reclaim priority. > 2. Clean subpages should be freed directly after split, rather than swapp= ed out. > > N.B. > 1. This workflow relies on the dirty bit rather examining the content of = a page. > 2. Sampling can be done by periodically switching between a PMD and > 512 consecutive PTEs. > 3. It only needs to hold mmap_lock for read because this special mode > (512 consecutive PTEs) is not considered the split mode. > 4. Don't hold your breath :) > > Other references: > 1. https://www.usenix.org/system/files/atc20-zhu-weixi_0.pdf > 2. https://www.usenix.org/system/files/osdi21-hunter.pdf