From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DC30BC38A02 for ; Wed, 26 Oct 2022 00:27:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4E9448E0006; Tue, 25 Oct 2022 20:27:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 424418E0001; Tue, 25 Oct 2022 20:27:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 29DDD8E0006; Tue, 25 Oct 2022 20:27:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 14AA68E0001 for ; Tue, 25 Oct 2022 20:27:17 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id D610BC02ED for ; Wed, 26 Oct 2022 00:27:16 +0000 (UTC) X-FDA: 80061211272.29.3A00FF1 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by imf20.hostedemail.com (Postfix) with ESMTP id 5850B1C0003 for ; Wed, 26 Oct 2022 00:27:16 +0000 (UTC) Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 29PMGmDF005484 for ; Tue, 25 Oct 2022 17:27:15 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-type : content-transfer-encoding : mime-version; s=facebook; bh=Wb6nV9t0z273dN7zrsM1YQsGi5Qpl7jeylOj5oWv/Yc=; b=RZtzWIC2X8P4309mfsvzhhM99n+q1jsBDnN8uvhnENuyRw/k09fOY/lE/VbrRQBaFsFN WiRkxGrkqcU8VjYH218FFf4xskJBGpmzLFO1t3IdbZLqWoTihzh+eDLCXcDhX3jADOys boZARMRnnb9Mnjl8eQTnkGyfgq1tknov3Mg= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3ke3685y0r-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 25 Oct 2022 17:27:15 -0700 Received: from twshared3704.02.ash9.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:21d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 25 Oct 2022 17:27:13 -0700 Received: by devvm6390.atn0.facebook.com (Postfix, from userid 352741) id 05D5E586CD79; Tue, 25 Oct 2022 17:27:01 -0700 (PDT) From: To: , CC: , , , , , Alexander Zhu Subject: [PATCH v5 2/5] mm: changes to split_huge_page() to free zero filled tail pages Date: Tue, 25 Oct 2022 17:26:55 -0700 Message-ID: <70465f9747f5856ae2913a4bccd732bcaaa2b699.1666743422.git.alexlzhu@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: References: X-FB-Internal: Safe Content-Type: text/plain X-Proofpoint-GUID: ElvyRik9gikoBxFjum4oucrvrN6FmOCI X-Proofpoint-ORIG-GUID: ElvyRik9gikoBxFjum4oucrvrN6FmOCI Content-Transfer-Encoding: quoted-printable X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-10-25_14,2022-10-25_01,2022-06-22_01 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666744036; a=rsa-sha256; cv=none; b=c0kW5aYfsoshhMWqKSfkS/ZIuJ+OYTqSkVGyh27QeyxeSnJkAlRY996Z7HScVg/jSqA+GL 92BVKCIr8HWeCP0uYHiOJ180kAHc/s0FUF/4ZUNJxF8rO3NJ+8VenydisSlMzsq5DHmHnh X9CYl1jr+qdPdb4VOjFQE+DhQaoRvMA= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=fb.com header.s=facebook header.b=RZtzWIC2; dmarc=pass (policy=reject) header.from=fb.com; spf=pass (imf20.hostedemail.com: domain of "prvs=1298d8d15a=alexlzhu@meta.com" designates 67.231.145.42 as permitted sender) smtp.mailfrom="prvs=1298d8d15a=alexlzhu@meta.com" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666744036; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Wb6nV9t0z273dN7zrsM1YQsGi5Qpl7jeylOj5oWv/Yc=; b=8FqMSs4Fx4vXaOjFw5H96fAYQI2NAxv3ReO7D6z1anOLXo4cDx8ppnH+WB7o8v6cg9n4ia jxx61vWgADYJpgcxJH/zV3uWywHolR1UM+Fq/ixxtHeIUDsv1BQW1YyFh/RTi/rZBtfZqI gS8RV7cHTl8LD2D5Vu49KjvYM0pAmzs= X-Rspamd-Queue-Id: 5850B1C0003 Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=fb.com header.s=facebook header.b=RZtzWIC2; dmarc=pass (policy=reject) header.from=fb.com; spf=pass (imf20.hostedemail.com: domain of "prvs=1298d8d15a=alexlzhu@meta.com" designates 67.231.145.42 as permitted sender) smtp.mailfrom="prvs=1298d8d15a=alexlzhu@meta.com" X-Rspamd-Server: rspam12 X-Rspam-User: X-Stat-Signature: izbz9tmu9ogqjonrht3nhpzzoax4rpth X-HE-Tag: 1666744036-706469 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexander Zhu Currently, when /sys/kernel/mm/transparent_hugepage/enabled=3Dalways is set there are a large number of transparent hugepages that are almost entirely zero filled. This is mentioned in a number of previous patchsets including: https://lore.kernel.org/all/20210731063938.1391602-1-yuzhao@google.com/ https://lore.kernel.org/all/ 1635422215-99394-1-git-send-email-ningzhang@linux.alibaba.com/ Currently, split_huge_page() does not have a way to identify zero filled pages within the THP. Thus these zero pages get remapped and continue to create memory waste. In this patch, we identify and free tail pages that are zero filled in split_huge_page(). In this way, we avoid mapping these pages back into page table entries and can free up unused memory within THPs. This is based off the previously mentioned patchset by Yu Zhao. However, we chose to free anonymous zero tail pages whenever they are encountered instead of only on reclaim or migration. Signed-off-by: Alexander Zhu --- v4 to v5 -split out split_huge_page changes into three different patches. One for za= pping zero pages, one for not remapping zero pages, and one for self tests. include/linux/vm_event_item.h | 3 +++ mm/huge_memory.c | 37 +++++++++++++++++++++++++++++++++++ mm/vmstat.c | 3 +++ 3 files changed, 43 insertions(+) diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 3518dba1e02f..3618b10ddec9 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -111,6 +111,9 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD THP_SPLIT_PUD, #endif + THP_SPLIT_FREE, + THP_SPLIT_UNMAP, + THP_SPLIT_REMAP_READONLY_ZERO_PAGE, THP_ZERO_PAGE_ALLOC, THP_ZERO_PAGE_ALLOC_FAILED, THP_SWPOUT, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 1cc4a5f4791e..363218e8a22a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2496,6 +2496,8 @@ static void __split_huge_page(struct page *page, stru= ct list_head *list, struct address_space *swap_cache =3D NULL; unsigned long offset =3D 0; unsigned int nr =3D thp_nr_pages(head); + LIST_HEAD(pages_to_free); + int nr_pages_to_free =3D 0; int i; =20 /* complete memcg works before add pages to LRU */ @@ -2572,6 +2574,34 @@ static void __split_huge_page(struct page *page, str= uct list_head *list, continue; unlock_page(subpage); =20 + /* + * If a tail page has only two references left, one inherited + * from the isolation of its head and the other from + * lru_add_page_tail() which we are about to drop, it means this + * tail page was concurrently zapped. Then we can safely free it + * and save page reclaim or migration the trouble of trying it. + */ + if (list && page_ref_freeze(subpage, 2)) { + VM_BUG_ON_PAGE(PageLRU(subpage), subpage); + VM_BUG_ON_PAGE(PageCompound(subpage), subpage); + VM_BUG_ON_PAGE(page_mapped(subpage), subpage); + + ClearPageActive(subpage); + ClearPageUnevictable(subpage); + list_move(&subpage->lru, &pages_to_free); + nr_pages_to_free++; + continue; + } + + /* + * If a tail page has only one reference left, it will be freed + * by the call to free_page_and_swap_cache below. Since zero + * subpages are no longer remapped, there will only be one + * reference left in cases outside of reclaim or migration. + */ + if (page_ref_count(subpage) =3D=3D 1) + nr_pages_to_free++; + /* * Subpages may be freed if there wasn't any mapping * like if add_to_swap() is running on a lru page that @@ -2581,6 +2611,13 @@ static void __split_huge_page(struct page *page, str= uct list_head *list, */ free_page_and_swap_cache(subpage); } + + if (!nr_pages_to_free) + return; + + mem_cgroup_uncharge_list(&pages_to_free); + free_unref_page_list(&pages_to_free); + count_vm_events(THP_SPLIT_FREE, nr_pages_to_free); } =20 /* Racy check whether the huge page can be split */ diff --git a/mm/vmstat.c b/mm/vmstat.c index b2371d745e00..3d802eb6754d 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1359,6 +1359,9 @@ const char * const vmstat_text[] =3D { #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD "thp_split_pud", #endif + "thp_split_free", + "thp_split_unmap", + "thp_split_remap_readonly_zero_page", "thp_zero_page_alloc", "thp_zero_page_alloc_failed", "thp_swpout", --=20 2.30.2