From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 466CEC28D13 for ; Mon, 22 Aug 2022 17:24:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CA33F8D0003; Mon, 22 Aug 2022 13:24:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C52C38D0002; Mon, 22 Aug 2022 13:24:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AF47E8D0003; Mon, 22 Aug 2022 13:24:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 9C0418D0002 for ; Mon, 22 Aug 2022 13:24:28 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 70448A22BA for ; Mon, 22 Aug 2022 17:24:28 +0000 (UTC) X-FDA: 79827902616.19.8E0EC48 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by imf16.hostedemail.com (Postfix) with ESMTP id 0AA661801A1 for ; Mon, 22 Aug 2022 17:24:26 +0000 (UTC) Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27MDA5Mg019745; Mon, 22 Aug 2022 10:24:25 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-id : mime-version : content-type : content-transfer-encoding; s=facebook; bh=e4B2/kb039nR3VI7mECyd5wYO5OK60LgZ7BkDZOZAIA=; b=UyOfYfw05BBvZQbP2T/aIy1Elll+e04jQZ1Xf8Vzwa3va9PlZSmyhMYu/IR27lozyXJN zkcyzL0/rJYEMLsvJSEifxN0QUl5rV24jJxcro6jeTNvjjTeJ/k2ipIRLKJSG60rwHsz EoLMTHHDa0pxnAL80YbGelhQ0HuZscKyHi4= Received: from nam10-bn7-obe.outbound.protection.outlook.com (mail-bn7nam10lp2105.outbound.protection.outlook.com [104.47.70.105]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j4aedhy02-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 22 Aug 2022 10:24:25 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=PMb1Hem2hQ9laHQH6N9ugGCGU87PcFXdzERc4aCfR42cLM+PnZv2RIt9oMckqIIgjts6Rrk7qR3vLfOfdRFy0XJZZl9fyLo1aEI5z8EPG6hq2XsMlheMdJGqGiyD+BujqFb7salpgFcb4Uc2064/Ycu+S8m9y4tMjS+hN6+sH7YRdt2xEasUvEfIF4VGqfhm290VUsqZilcZKUyubRPZ3eQPE8jOTPEdb1qBPqlAK0HOKy9KT6r2NUIptKkp03Q0hbeM5OHo9i2OQAunJ7b0LFCm0QluYHYD87nSV3QIwEiwGsNvP9Ss48/FIZ6/tdVwFZLy+AxWMenMBTJnytgcJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=cvkJKdPLVZKUm3UPD4cYEbyFP9SrrN24F6p4MKFUKyA=; b=Z+bUJbEUSQDLBMWpVPRRzrSliciAsAFx9I6+3y3vTfEZdk4W3qD0kp5nq5xuQQ6g1bwjncPAHsXalh6L+eO9IA1aaMxqDz5XEwn/1jZQpIpryaoEvSBaul7Oi6UFdOjg7gPYL1GRPYSKtxLR4Hl8W8WoNFZkPXkROxZeHnxyaQI9uLkV0N999UU6FjeGGVjRd2uBy/+C5HxeJ7cFxqF+QlmprjPzoO8fQUOqH7AaMUMOZg6EfevqjsyXgSTODTvBrPr9pyfygH4c43WdRaLG59U+Xg9+r8wUopHhlMsI/Ux9VRQ8OzpFSBkuf9g3/Xjd6/z7HG6fgDaRvAmH3V7CxA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=fb.com; dmarc=pass action=none header.from=fb.com; dkim=pass header.d=fb.com; arc=none Received: from CY4PR15MB1781.namprd15.prod.outlook.com (2603:10b6:910:1f::13) by DM6PR15MB3116.namprd15.prod.outlook.com (2603:10b6:5:143::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5546.22; Mon, 22 Aug 2022 17:24:23 +0000 Received: from CY4PR15MB1781.namprd15.prod.outlook.com ([fe80::351d:e6bc:a9e4:4118]) by CY4PR15MB1781.namprd15.prod.outlook.com ([fe80::351d:e6bc:a9e4:4118%12]) with mapi id 15.20.5525.019; Mon, 22 Aug 2022 17:24:22 +0000 From: "Alex Zhu (Kernel)" To: Andrew Morton CC: "linux-mm@kvack.org" , Kernel Team , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH v3] mm: add thp_utilization metrics to debugfs Thread-Topic: [PATCH v3] mm: add thp_utilization metrics to debugfs Thread-Index: AQHYspWmjjoEfkQDpEicyOTjiMWdEq24XToAgALVoQA= Date: Mon, 22 Aug 2022 17:24:22 +0000 Message-ID: <3F4CC516-21B5-4E9F-8427-E54B4D0F2E41@fb.com> References: <20220818000112.2722201-1-alexlzhu@fb.com> <20220820150712.53ec2dd281dfe894ad3fe2df@linux-foundation.org> In-Reply-To: <20220820150712.53ec2dd281dfe894ad3fe2df@linux-foundation.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 0ce20b76-47ed-402a-22e0-08da846327b1 x-ms-traffictypediagnostic: DM6PR15MB3116:EE_ x-fb-source: Internal x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: q74pFUeYJulpzxO8UdsDgHfbhUTXiCmQpGmITIQRZOow7ombQwJQoeZ/NbdXH8eZ3pZKgz1EhHVWCTuMH5qMpnv3F2pFWrs3Mys8xSao+av2pRBjKaA40ry+m9Ny8njTL4sEcspKWoc6MZ8aCC4D0dzz7IbGdeQiA0QTbUIBkEFWtPrZ+Br1zQivdy9NfpmKBcySUa63tT/vC8O2VFAuuphUL5lAAHw9/31/a6ekMwuIMg9pGMeYO1MO6qTWtHtZMnqmwmwPDAVfh2/uSo8sf0IhqFzw9o8o8yZ+F9H0NGcAC2UMXMmfOgkHuy2Hb+oxbo8ihYiKC/RDJPhAy+tPXixDSmXWGOXwFr2GdkcGVMfGmsoUdOdfgK/PvVUqnGBgNvMN8eiHE1/PTyaetIPHPnPxUF+Hx7v502KwNwB9QskliY/P4NMpMWCafWHTqD3h9Mo83UhX3fKR9H7nAUhouTrKOreXJF3M0PgF+OfEJJNX+gfmHFYPproArk0Uml+/vhmkGUM4P2HYA2IBjjPpIaCQNX8uvkLocSKa0o0/6zD/8N420+co/IvkScM1C6a1u7DauOt3PmPvaaCoWnJsIyO8npjNrOaDEqVL3loTujjbYJ1qYePkk2M/Ebg6dUCxxAPsUVf5ncNR1YhqJppIPw5Bxwz58F7/jTBEKEHpdqdJmatAsWdCnXKIcJc6ogQqVzY70xOlyqEsgWrlzdkSu0LWAgADLjrzfa1xzTWvvKXPRQpmW65DRt/eOPQ72hd6bvUqm/DHQxyHrHmI2Vk/NqpxCcosz7W3ruYB5cN34s0= x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CY4PR15MB1781.namprd15.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230016)(4636009)(396003)(39860400002)(346002)(376002)(366004)(136003)(83380400001)(36756003)(38070700005)(38100700002)(122000001)(64756008)(33656002)(4326008)(66446008)(8676002)(66476007)(66556008)(66946007)(76116006)(91956017)(71200400001)(5660300002)(8936002)(6506007)(6512007)(478600001)(6486002)(316002)(6916009)(41300700001)(54906003)(86362001)(186003)(2906002)(2616005)(53546011)(45980500001);DIR:OUT;SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?aQuhqoFprVyZ/8TtWQ9TAc4i7JmynE7maZGIOoBUHw9vItpqWlNHQQkVRQHI?= =?us-ascii?Q?M8KPFKdK7L4RkW7ckrwlQ3Z6IAtlAF0s6oZ3Jkh0krDCalvT71lCvO2c3MZ0?= =?us-ascii?Q?/A2Qwq30wMJoOKRyH5yQKDj1MFQvsnlpohCD7slO1maqvGW8TOSnhqXSOO91?= =?us-ascii?Q?BWNJlu3WpVuzo782CPB+gs08w89HZz3+fJmsZy+q6ev+nse6S6Pn4GuZZrJt?= =?us-ascii?Q?mwvPPoxc8uAXe9urCKSeBFtTB0ZAToyETggL5iE++7fxxZE5wC1Xd8CXOPpU?= =?us-ascii?Q?laMVaHfRKlA0Ha3bNvvGl1OHMYuVlRiN0WdQivbJ/1nPPXDkSw0vxkJ0N4C/?= =?us-ascii?Q?ZbUCrnO7NFGBtx5VjIt7ExrLU+FhRSXJNAh8PR+b8xKWP8fwz5tNB6wVdEQq?= =?us-ascii?Q?8/noVYv1PhkROacwztgv1Y/PIkgUS+3jLWtxriaCd9m2vnvdr/1oS5kwGP9w?= =?us-ascii?Q?lOUOozOUdUK3l3bsTFesmhAtDQwZhAZBCcxrSP/2CGBTfeLkZ4aBAsnv69NR?= =?us-ascii?Q?CeC+17P+kE9ygT3LdHh6as8ZXSATn11qChYMKR/qXQmZni2RQblvFCKLUP52?= =?us-ascii?Q?5RseV6hrIfsFiVJYhae2gusgz+9t0U5HRF30VBfJptYfnf9G+ovWwjslR9nd?= =?us-ascii?Q?AYTQADOlVeUiL0AyM1lmHark/61gP1qWthkAAU5t+HM7i6M3r1GN6iVcn1rp?= =?us-ascii?Q?GaemGX5vo2L8EIukKEyIe+2nIU8qht7JYtZzNGlrixH2qVOD4MOSP/CJtWmV?= =?us-ascii?Q?bInURdLBbw0MYu5iyTrUz8IWe9qHpdwnKhHzA7EpjC+nFo8sRpfjvmCOsxoh?= =?us-ascii?Q?Q033dqoJeU7rG16o5HEv8vmtT81uMeFfa4nfFH+MsETeflroZntL5sWa0gfq?= =?us-ascii?Q?Q4bc2VB/dIzteeMw2WzdyrYoiDqsKCWbOnsW+s42dmgcMB3wfGtXbM5xLWns?= =?us-ascii?Q?tAQT3RhuY5wI+KUiNK2y0dLuov7qgiRhcj/ZADqsWA2v/+Rxuzc0blfHzJnm?= =?us-ascii?Q?sGiCTutNVtpmDbfVGVWaQPtg+hb0A1A84xHR8ADR4ok0DBDKtQwC0xKuzHsm?= =?us-ascii?Q?Q2jGPdZzDJUcX7xEikzxX0G9t6KDw1rM4k3Zlu+swJlAz3rvtrMGEeMujfVg?= =?us-ascii?Q?d9HaWtA3qDSYbl1lb7XEhz4EVg+7gWJ6HpMTJcIMnFltpXemo2WWG/yCaP8c?= =?us-ascii?Q?6S3kgsLCKpHg6d4k1MR15T4pbuoyoP2cMvJvbRqhE6Ceec3dCCw4ogOttfYE?= =?us-ascii?Q?lLRybdUCeuyPxSqNSxCnI3O8XSEW5m5v8Q700nqzohoH/y1Isv5Mi1YE+Nwy?= =?us-ascii?Q?xAuhM6VcQQhpptgNOuuSnPjwlLNoajotqx8KJG2ykkXbl35FHp07tVOrAPsq?= =?us-ascii?Q?XZVU7zNoALQhvTRHUP1NJ3mxCDa/PVVLF4Z2DEP/il/eC3N8mauxDnxte6EL?= =?us-ascii?Q?ACH+I3O+ibyQnCuG2ZW+AimWPM9Eb3Gfv/72l3eWzQod45nkS0dBikt+9et/?= =?us-ascii?Q?lyaneZnKtrNULn79NPK55ceMSPN+mbG/mqJkCiiQvvFiVT0xnvyNUY2G2Lsr?= =?us-ascii?Q?bgD92vosmHax7U245uzO8XCmTIm30eda9emGSjjlJh2pxOrhi7dXV1rJ3U8d?= =?us-ascii?Q?ihQegntWF/E7EfrlDW02Rts=3D?= Content-ID: <20A6CBFA7011D04B88EDF666D135213B@namprd15.prod.outlook.com> MIME-Version: 1.0 X-OriginatorOrg: fb.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: CY4PR15MB1781.namprd15.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 0ce20b76-47ed-402a-22e0-08da846327b1 X-MS-Exchange-CrossTenant-originalarrivaltime: 22 Aug 2022 17:24:22.8582 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 8ae927fe-1255-47a7-a2af-5f3a069daaa2 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: Z/NglPiuU/9TTcRdouasZzE6728qB53vKpv86JqxbkTXsI7Eacymft4gXA+jZTJa X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR15MB3116 X-Proofpoint-GUID: GuwgHhl__Sq89C9sQ8Td93i1xzb1WTOD X-Proofpoint-ORIG-GUID: GuwgHhl__Sq89C9sQ8Td93i1xzb1WTOD Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-22_10,2022-08-22_02,2022-06-22_01 ARC-Authentication-Results: i=2; imf16.hostedemail.com; dkim=pass header.d=fb.com header.s=facebook header.b=UyOfYfw0; arc=reject ("signature check failed: fail, {[1] = sig:microsoft.com:reject}"); spf=pass (imf16.hostedemail.com: domain of "prvs=923365222b=alexlzhu@fb.com" designates 67.231.145.42 as permitted sender) smtp.mailfrom="prvs=923365222b=alexlzhu@fb.com"; dmarc=pass (policy=reject) header.from=fb.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1661189067; a=rsa-sha256; cv=fail; b=oHA/fmA7cdxPfK3gJDr7NwfS8oHt7c59S866Uc5VlgrmPuxFJLU24Quc7dRkY5CqyiFV8u +kOcsrX3SbB5CJgwSXMB0gSrP+9c/GWy6ED11FpdMHw+fZMsvfN14OOcYPe5fhRVzffmt+ PwQaP7cfrQ09hZsUaOOlgpgZIbdgOS0= ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661189067; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=e4B2/kb039nR3VI7mECyd5wYO5OK60LgZ7BkDZOZAIA=; b=ve5e1PphtTzYecNObid4m/QMbQZLrpGi+wp2/drteMIG+gJHS5w/8HDlbrCmQlsbdixYvz TS2HqTd3R1cpnO3wPDfxYL6PJvxeeYCNKYjNVZsq5DWy0GzPoSrVQ6SDtueWRgjQY3qV8P Uy7DJjFx7riZctTjbiweZ0nBcrVX1lA= Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=fb.com header.s=facebook header.b=UyOfYfw0; arc=reject ("signature check failed: fail, {[1] = sig:microsoft.com:reject}"); spf=pass (imf16.hostedemail.com: domain of "prvs=923365222b=alexlzhu@fb.com" designates 67.231.145.42 as permitted sender) smtp.mailfrom="prvs=923365222b=alexlzhu@fb.com"; dmarc=pass (policy=reject) header.from=fb.com X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 0AA661801A1 X-Stat-Signature: d39uaowfqr5erthghqiaxxc4csid36tp X-HE-Tag: 1661189066-978646 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Aug 20, 2022, at 3:07 PM, Andrew Morton wr= ote: >=20 > >=20 > On Wed, 17 Aug 2022 17:01:12 -0700 wrote: >=20 >> THPs have historically been enabled on a per application basis due to >> performance increase or decrease depending on how the particular >> application uses physical memory. When THPs are heavily utilized, >> application performance improves due to fewer TLB cache misses. >> It has long been suspected that performance regressions when THP >> is enabled happens due to heavily underutilized anonymous THPs. >>=20 >> Previously there was no way to track how much of a THP is >> actually being used. With this change, we seek to gain visibility >> into the utilization of THPs in order to make more intelligent >> decisions regarding paging. >>=20 >> This change introduces a tool that scans through all of physical >> memory for anonymous THPs and groups them into buckets based >> on utilization. It also includes an interface under >> /sys/kernel/debug/thp_utilization. >>=20 >> Utilization of a THP is defined as the percentage of nonzero >> pages in the THP. The worker thread will scan through all >> of physical memory and obtain utilization of all anonymous >> THPs. It will gather this information by periodically scanning >> through all of physical memory for anonymous THPs, group them >> into buckets based on utilization, and report utilization >=20 > I'd like to see sample debugfs output right here in the changelog, for > reviewers to review. In some detail. I should have included that in the description, sorry about that. A sample = output:=20 Utilized[0-50]: 1331 680884 Utilized[51-101]: 9 3983 Utilized[102-152]: 3 1187 Utilized[153-203]: 0 0 Utilized[204-255]: 2 539 Utilized[256-306]: 5 1135 Utilized[307-357]: 1 192 Utilized[358-408]: 0 0 Utilized[409-459]: 1 57 Utilized[460-512]: 400 13 Last Scan Time: 223.98 Last Scan Duration: 70.65 This indicates that there are 1331 THPs that have between 0 and 50 utilized (non zero) pages. In total there are 680884 zero pages in this utilization bucket. THPs in the [0-50] bucket compose 76% of total THPs, and are responsible for 99% of total zero pages across all THPs. In other words, the least utilized THPs are responsible for almost all of the memory waste when THP is always enabled. Similar results have been observed across production workloads.=20 The last two lines indicate the timestamp and duration of the most recent scan through all of physical memory. Here we see that the last scan took 70.65 seconds.=20 >=20 > And I'd like to see the code commented! Especially > thp_utilization_workfn(), thp_util_scan() and thp_scan_next_zone().=20 > What are their roles and responsibilities? How long do they take, by > what means do they scan? >=20 > I mean, scanning all of physical memory is a huge task. How do we > avoid chewing vast amounts of CPU? What is the chosen approach and > what are the tradeoffs? Why is is done within a kernel thread at all, > rather than putting the load into the context of the reader of the > stats (which is more appropriate). etcetera. There are many traps, > tradeoffs and hidden design decisions here. Please unhide them. >=20 > This comment, which is rather a core part of these tradeoffs: >=20 > +/* > + * The number of addresses to scan through on each periodic > + * run of the scanner that generates /sys/kernel/debug/thp_utilization. > + */ > +#define THP_UTIL_SCAN_SIZE 256 >=20 > isn't very helpful. "number of addresses"? Does it mean we scan 256 > bytes at a time? 256 pages? 256 hugepages? Something else? 256 hugepages. We scan through physical memory 2MB at a time with the alignment at 2MB. For the moment, we have observed that scanning 256 PFNs p= er second with a kernel thread does not produce any noticeable side effects on x86_64. >=20 > How can any constant make sense when different architectures have > different [huge]page sizes? Should it be scaled by pagesize? And if > we're going to do that, we should scale it by CPU speed at the same time. >=20 This sounds very interesting. I would be happy to hear any suggestions for = how=20 we can scale this value more systematically. The way we decided on this val= ue initially was as a value that should be small enough that there are no noti= ceable Side effects, and then we could tune it later.=20 > Or bypass all of that and simply scan for a certain amount of *time*, > rather than scan a certain amount of memory. After all, chunking up > the scan time is what we're trying to achieve by chunking up the scan > amount. Why not chunk up the scan time directly? >=20 > See where I'm going? I see many hidden assumptions, design decisions > and tradeoffs here. Can we please attempt to spell them out and review > them. I will send out an RFC patchset including this one, split_huge_page, and th= e shrinker later this week. Will add more description and include any suggestions in t= hat. Thanks! >=20 > Anyway. There were many review comments on previous versions. It > would have been better had those reviewers been cc'ed on this version.=20 > I'll go into hiding and see what people think.