From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4110FC4345F for ; Fri, 12 Apr 2024 07:38:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B2BB76B0088; Fri, 12 Apr 2024 03:38:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AB44F6B0089; Fri, 12 Apr 2024 03:38:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9073E6B008A; Fri, 12 Apr 2024 03:38:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 6C89D6B0088 for ; Fri, 12 Apr 2024 03:38:07 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 059A31C12D2 for ; Fri, 12 Apr 2024 07:38:07 +0000 (UTC) X-FDA: 82000076214.10.62564EC Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) by imf01.hostedemail.com (Postfix) with ESMTP id 515C84000F for ; Fri, 12 Apr 2024 07:38:05 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=IxzzxZiJ; spf=pass (imf01.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712907485; a=rsa-sha256; cv=none; b=UDqFScucNl057oxa50vHRRq9orQpxPlvXLZhUTZSlHAAjXYnmDhMEzJ1p3qUm+xMR+y3q9 CKBkflu94CbXgxWhmgdrt6E6E0Mey022TzpMJJ3euvsp64iYinwBQaXsgW7byCWC8gJYwl dAK2FBFNrI7O35OQZahGbnC+BptCrTk= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=IxzzxZiJ; spf=pass (imf01.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712907485; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=grQ4HG8GdSNxCcCAmJJR/tpqgvcFuaNbvEKv4Is26aY=; b=Ssa7q5fJ3YF6AE65TwmPjDtAK78OgVae9EQsnxlS3agy+h2dD5doKFs161QAq6c1KW+0DJ KKchwxyS7fXmV8IeEk9FY9+wbpX8V3/QZSY+IrgC234QJtttM/Ek7zVSFLy9h2cAB98ysg FauWD3qnaWK2bbRhSTugXZ6ebikqCnM= Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-1e504f58230so5668285ad.2 for ; Fri, 12 Apr 2024 00:38:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1712907484; x=1713512284; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=grQ4HG8GdSNxCcCAmJJR/tpqgvcFuaNbvEKv4Is26aY=; b=IxzzxZiJVCjuE/eZ05nw41mrSCfZTMXYXexPQh3U2GM2Z1xfr9v9T3xTL595TzIvwl eXyHpe1Us8GFcYq7fjosWesUp2hajLYfqpfhR7PIX+W/e2fcPcF5hA9GMfjVhq4ZPh8D a2jNlOemVAxU9kDkbXegq81RuBI5ZGCInbZ2QxmP5FMYAJeRDCuZS2dNTxgMib8cXYl/ oa1fAQg28gPpFic06An1BeJVCC9UqCJnhRrDe/Hqv4WsMw44G3nIuxTdfdm/o9MYl6g7 fEps2rWYHhlvsZtoZ0f7HLgDDTVsrqGnkCYJHUdrLrsQVdR1AOoxf1ykncKcHVgUXkMf Xxng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712907484; x=1713512284; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=grQ4HG8GdSNxCcCAmJJR/tpqgvcFuaNbvEKv4Is26aY=; b=j5JEiVw0Mkg5uHjWeOJGWDZ+Qr2raxYOF6G9G+KtcgNPQD3E5ZYNNolI+Zl8G+MGqu hmsRNDwVjJV0Wettq4lChEPo4ieIl2pxVJ/5KEz5MpF/mIvHPr+Eac8vhwW3iDUotfaN 9dH9h1sBLlmXVQLYkTQUTnGHUBm9EioZuhq5iY/IbYgo8ap7KSYC77yNdESR0oYb1PIY XerZ/aJThNEh1PjTHS0eSxma28Ygj35oRCGnsZArZqRAc7ZlUIrkFROrA5rluNPmdMrR SLeL5+xycLE1jvGiBL+WZOR9Migie6ZKqUTZcXqRGvQC53SZ2hN7ovPcD3DWBTRXXG4S gzdA== X-Forwarded-Encrypted: i=1; AJvYcCW6l5LltJmgz9kaWAyRCy+9vJeIZ/5yEjXtSe2kE/Swjm2DgVuUIkon10P/EZLKgU2enQqLsY37LIswDOfg+Rpsi60= X-Gm-Message-State: AOJu0YytQRuM/dbMOVzlsd+sn1TuVmSm9rDZ6Dqho4sWhVN3I4xxCHPf okrMhB+kowEVoa6f9M9QTjHkVDRUpNIbuRj8L0AtiUfTsykrZ/Fj X-Google-Smtp-Source: AGHT+IE+Kawy3AjgOA5jbKXP+/9bdEvPRSrd/EvA5EuSv+JyLrfRn6UvgmK3uhhBkgHSzBuIuxpkcA== X-Received: by 2002:a17:902:ea10:b0:1dd:135e:2604 with SMTP id s16-20020a170902ea1000b001dd135e2604mr1985869plg.40.1712907484158; Fri, 12 Apr 2024 00:38:04 -0700 (PDT) Received: from localhost.localdomain ([2407:7000:8942:5500:aaa1:59ff:fe57:eb97]) by smtp.gmail.com with ESMTPSA id n5-20020a170903110500b001e478f359bdsm2344011plh.127.2024.04.12.00.37.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 12 Apr 2024 00:38:03 -0700 (PDT) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: cerasuolodomenico@gmail.com, chrisl@kernel.org, david@redhat.com, kasong@tencent.com, linux-kernel@vger.kernel.org, peterx@redhat.com, ryan.roberts@arm.com, surenb@google.com, v-songbaohua@oppo.com, willy@infradead.org, yosryahmed@google.com, yuzhao@google.com, corbet@lwn.net Subject: [PATCH v5 1/4] mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback counters Date: Fri, 12 Apr 2024 19:37:37 +1200 Message-Id: <20240412073740.294272-2-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240412073740.294272-1-21cnbao@gmail.com> References: <20240412073740.294272-1-21cnbao@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 515C84000F X-Stat-Signature: jkgyukerq8d6n5wq77cmp1yxfjt47xdh X-Rspam-User: X-HE-Tag: 1712907485-236222 X-HE-Meta: U2FsdGVkX189Kf6wwmUpHPNcjRzKCyALnWTtBKikZZv9Lgxt4NriGefnV0eIWqZtXHn62JmlNkKHoSF8MYUTZjDaNviaOZOlRI9+32j7VPlaV+VGvv7g7UuCvAXYRaHj17DR1ImRQZR+bzPnVrUnBXxsvg+fV4epBiCJJabdyVDsJxAv3pl8rPKlARYUP8OWzq+T9sCBUkFofmULF8m6HVr552mDRcAKalCMXmr1rKomxNYymgLPDTqdz8DJtnVx5DuB1wG1RYpbNI/H9HOP4/C0km2K8jSZsXnaILB6o5jrIcvjHn1usQs+JeWzHtXeQlvMccMYZCv6pgRdp6ytULFg2zfdPOGC3LcNUHaETYAlAkYmoOkuNJ/KfO5YGfyd8ffBruPUnp/biLRoEdn5ttqEDo84zJF5oO/fbz933wkGpf9k1+QA8Wi5xiPRwcWJQy0M2WjrHmvqJicG1SABakbMylBQ+No0m5Vz7ywa6BvArnWH8+Oc3RZAW6EjGrwSMDLP7i35CX3YF8VedCnKfj2wagcm4HOQsT9k2A0Vb3cSiSsfrNevQJk5DlKL4O6o55TKtoYDmM9C3Tr0Zhj7rIzRUV+aMyKLhKvqNNxwQHxH1bWHhpqZBNdUzluslXzCXd3wELcz1xkF4ByAwjRENnDH5anJQarLZkRK2E8F8+dsN0/eUe7LEI4PR5oq5tM8tHG2mayg9J6YHZ1lQlp7Prq3Pr9MbpmuXfNMSqhtZbQTVZygwLim2bfs8ms+hKK9RUSg8uVNP3LaX2GrcJA3JiiyAhjLJFYH5KOavfvMFUVuNwM2OQEnuJNSdVK0fnET3NuZ3chqDOPQ6ikFZkjKRqf5VQZuDD3Z645p8eSQtEuQoJoZV+khsRYCpWca+1JRq5ZaG6H43+IdGBOYQoYENTlcFWuiYgO9kycoMUvsLNQ2aztKUzatcnNHWe8EZIHHu2Qn16UzdA+GsL5z6lo kW+pgIiG HFlou9Y0uGXNM3VMH/H8Kq7cRfSaULLDDnLu6QPbp6RtoRZQpo3EbuG/g1fYLwntHswoeoRHaCsVvDLfTLsjYgI4KNhfC/jVx4Tptwjtt8a6JQRsjB0v864YrmCxa0y6qNt39HTc8xhqvFBMZkXV3d2cItZWjrxUYZjiy X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Barry Song Profiling a system blindly with mTHP has become challenging due to the lack of visibility into its operations. Presenting the success rate of mTHP allocations appears to be pressing need. Recently, I've been experiencing significant difficulty debugging performance improvements and regressions without these figures. It's crucial for us to understand the true effectiveness of mTHP in real-world scenarios, especially in systems with fragmented memory. This patch establishes the framework for per-order mTHP counters. It begins by introducing the anon_fault_alloc and anon_fault_fallback counters. Additionally, to maintain consistency with thp_fault_fallback_charge in /proc/vmstat, this patch also tracks anon_fault_fallback_charge when mem_cgroup_charge fails for mTHP. Incorporating additional counters should now be straightforward as well. Signed-off-by: Barry Song Cc: Chris Li Cc: David Hildenbrand Cc: Domenico Cerasuolo Cc: Kairui Song Cc: Matthew Wilcox (Oracle) Cc: Peter Xu Cc: Ryan Roberts Cc: Suren Baghdasaryan Cc: Yosry Ahmed Cc: Yu Zhao --- include/linux/huge_mm.h | 51 ++++++++++++++++++++++++++++++++++ mm/huge_memory.c | 61 +++++++++++++++++++++++++++++++++++++++++ mm/memory.c | 3 ++ mm/page_alloc.c | 4 +++ 4 files changed, 119 insertions(+) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index e896ca4760f6..c5beb54b97cb 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -264,6 +264,57 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma, enforce_sysfs, orders); } +enum mthp_stat_item { + MTHP_STAT_ANON_FAULT_ALLOC, + MTHP_STAT_ANON_FAULT_FALLBACK, + MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE, + __MTHP_STAT_COUNT +}; + +struct mthp_stat { + unsigned long stats[0][__MTHP_STAT_COUNT]; +}; + +extern struct mthp_stat __percpu *mthp_stats; + +static inline void count_mthp_stat(int order, enum mthp_stat_item item) +{ + if (order <= 0 || order > PMD_ORDER || !mthp_stats) + return; + + this_cpu_inc(mthp_stats->stats[order][item]); +} + +static inline void count_mthp_stats(int order, enum mthp_stat_item item, long delta) +{ + if (order <= 0 || order > PMD_ORDER || !mthp_stats) + return; + + this_cpu_add(mthp_stats->stats[order][item], delta); +} + +/* + * Fold the foreign cpu mthp stats into our own. + * + * This is adding to the stats on one processor + * but keeps the global counts constant. + */ +static inline void mthp_stats_fold_cpu(int cpu) +{ + struct mthp_stat *fold_stat; + int i, j; + + if (!mthp_stats) + return; + fold_stat = per_cpu_ptr(mthp_stats, cpu); + for (i = 1; i <= PMD_ORDER; i++) { + for (j = 0; j < __MTHP_STAT_COUNT; j++) { + count_mthp_stats(i, j, fold_stat->stats[i][j]); + fold_stat->stats[i][j] = 0; + } + } +} + #define transparent_hugepage_use_zero_page() \ (transparent_hugepage_flags & \ (1<stats[order][item]; + } + cpus_read_unlock(); + + return sum; +} + +#define DEFINE_MTHP_STAT_ATTR(_name, _index) \ +static ssize_t _name##_show(struct kobject *kobj, \ + struct kobj_attribute *attr, char *buf) \ +{ \ + int order = to_thpsize(kobj)->order; \ + \ + return sysfs_emit(buf, "%lu\n", sum_mthp_stat(order, _index)); \ +} \ +static struct kobj_attribute _name##_attr = __ATTR_RO(_name) + +DEFINE_MTHP_STAT_ATTR(anon_fault_alloc, MTHP_STAT_ANON_FAULT_ALLOC); +DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK); +DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE); + +static struct attribute *stats_attrs[] = { + &anon_fault_alloc_attr.attr, + &anon_fault_fallback_attr.attr, + &anon_fault_fallback_charge_attr.attr, + NULL, +}; + +static struct attribute_group stats_attr_group = { + .name = "stats", + .attrs = stats_attrs, +}; + static struct thpsize *thpsize_create(int order, struct kobject *parent) { unsigned long size = (PAGE_SIZE << order) / SZ_1K; @@ -549,6 +593,12 @@ static struct thpsize *thpsize_create(int order, struct kobject *parent) return ERR_PTR(ret); } + ret = sysfs_create_group(&thpsize->kobj, &stats_attr_group); + if (ret) { + kobject_put(&thpsize->kobj); + return ERR_PTR(ret); + } + thpsize->order = order; return thpsize; } @@ -691,6 +741,11 @@ static int __init hugepage_init(void) */ MAYBE_BUILD_BUG_ON(HPAGE_PMD_ORDER < 2); + mthp_stats = __alloc_percpu((PMD_ORDER + 1) * sizeof(mthp_stats->stats[0]), + sizeof(unsigned long)); + if (!mthp_stats) + return -ENOMEM; + err = hugepage_init_sysfs(&hugepage_kobj); if (err) goto err_sysfs; @@ -725,6 +780,8 @@ static int __init hugepage_init(void) err_slab: hugepage_exit_sysfs(hugepage_kobj); err_sysfs: + free_percpu(mthp_stats); + mthp_stats = NULL; return err; } subsys_initcall(hugepage_init); @@ -880,6 +937,8 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, folio_put(folio); count_vm_event(THP_FAULT_FALLBACK); count_vm_event(THP_FAULT_FALLBACK_CHARGE); + count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_FALLBACK); + count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE); return VM_FAULT_FALLBACK; } folio_throttle_swaprate(folio, gfp); @@ -929,6 +988,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, mm_inc_nr_ptes(vma->vm_mm); spin_unlock(vmf->ptl); count_vm_event(THP_FAULT_ALLOC); + count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC); count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC); } @@ -1050,6 +1110,7 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf) folio = vma_alloc_folio(gfp, HPAGE_PMD_ORDER, vma, haddr, true); if (unlikely(!folio)) { count_vm_event(THP_FAULT_FALLBACK); + count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_FALLBACK); return VM_FAULT_FALLBACK; } return __do_huge_pmd_anonymous_page(vmf, &folio->page, gfp); diff --git a/mm/memory.c b/mm/memory.c index 649a547fe8e3..06048af7cf9a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4368,6 +4368,7 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf) folio = vma_alloc_folio(gfp, order, vma, addr, true); if (folio) { if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) { + count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE); folio_put(folio); goto next; } @@ -4376,6 +4377,7 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf) return folio; } next: + count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK); order = next_order(&orders, order); } @@ -4485,6 +4487,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) folio_ref_add(folio, nr_pages - 1); add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages); + count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC); folio_add_new_anon_rmap(folio, vma, addr); folio_add_lru_vma(folio, vma); setpte: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b51becf03d1e..3135b5ca2457 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5840,6 +5840,10 @@ static int page_alloc_cpu_dead(unsigned int cpu) */ vm_events_fold_cpu(cpu); +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + mthp_stats_fold_cpu(cpu); +#endif + /* * Zero the differential counters of the dead processor * so that the vm statistics are consistent. -- 2.34.1