From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0161C433FE for ; Thu, 10 Nov 2022 10:10:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C63518E0001; Thu, 10 Nov 2022 05:10:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C13C46B0075; Thu, 10 Nov 2022 05:10:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A8EC18E0001; Thu, 10 Nov 2022 05:10:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 961FD6B0074 for ; Thu, 10 Nov 2022 05:10:53 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6908D415DD for ; Thu, 10 Nov 2022 10:10:53 +0000 (UTC) X-FDA: 80117113986.13.64CE03F Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by imf28.hostedemail.com (Postfix) with ESMTP id DF1EAC000E for ; Thu, 10 Nov 2022 10:10:52 +0000 (UTC) Received: from pps.filterd (m0246632.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2AAA3Z4U026461; Thu, 10 Nov 2022 10:10:51 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=message-id : date : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding : mime-version; s=corp-2022-7-12; bh=+c+VybTbvXWy9RIKaybNkbaT8REeXqS81Pd4nl18PUo=; b=iPWna1qNmbIfOmbz/6Cg4s9/61MHGD4IqEzWZCddMIyvO038kLJ99pahIr7b6rZBLpIY 760gJ2aSJOWdjRQlyey1kI78ePXIjT9Pc47ivBFCDQqJ+heR37hJi8xkVn+zwFaIqANJ ai21LOdnBBsHwZJc8WnW1oVB2t4jphPRJoXEeyn7t9wR39QDJsTNYjfdf5d26n7bKPfh sxhY2aBknMw9ph37YAlDyQk6H5rrZAS9b9HR3MCdiTN4ZHstOXNDDbFjJWus44yjMEuK S5A8lI6GpBcvUpxtUlvrfZejTb0xP3wZLXX/5odPMmMPQclgsTbPIo9wpeUm+2cnRyMO GA== Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.appoci.oracle.com [147.154.114.232]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3kry7680m5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 10 Nov 2022 10:10:50 +0000 Received: from pps.filterd (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 2AA92v9C019143; Thu, 10 Nov 2022 10:10:49 GMT Received: from nam11-bn8-obe.outbound.protection.outlook.com (mail-bn8nam11lp2169.outbound.protection.outlook.com [104.47.58.169]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3kpctp669w-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 10 Nov 2022 10:10:49 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=KbDaR906C7SCiywbWpBYk0j+HMBsKqTOq6j/rzvojEnSY7KqeT5uc6wOcrxZruG8B/LE6/TvVv9wLABqtDXK+A0OkrFl6P1T71r2QsombgxubQG4gxDJZSdx6WChPi6gko+bWuvmWZvkFolOJ5giVdl7hxOc9SvmmwXkEyiuQfU7F1FSexopHq8FRmd7hz61p+TmcE72+ycyAqz0NJWSn7miQ/bem+TdezBCs9mZpCrrshlNKEdEL3/2RTnvDoHIOyrZWcY4UaW67sOglGzyrjFpy/AxnUnEINjYm2lhSl9DBk9CP52ytqod8h6EePFPtcwo/SwaB6l8xETAivsmBg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+c+VybTbvXWy9RIKaybNkbaT8REeXqS81Pd4nl18PUo=; b=b6zwm7ogFDmyoe392v383ahRWsjCZYVWs2UveQ9j2fONGhagZgFL6onZTujQUX94JJSGtuic+TlV3j+jwE93mlNzw/V38mVthYUjhipKBNfoqsmWS8ahlfeKphvwaysAWbAHLTfUhzw4JoOzSvkvZMpgPLL1KhNGrnl2lF3G3ngIKHVYVxCtEuw2531W0ncm0VT3Cr1o1zvs/P7YxzkoHBqMHoQh5k+RatURsJg3+4s/sBXm5c+02bHChZIW/GNxJMZaHygz5D+1gXDzb5gj2wsYd/wz+iTtpn2ztMp2P/jeeLWRumsFqWyhH9EWmfLJfiPI7/8q+fM2R+o4qQDrmw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=+c+VybTbvXWy9RIKaybNkbaT8REeXqS81Pd4nl18PUo=; b=JAHJCCWVdRKHPJKl3JO40uYU+c3BTgeEGMCCrR+iVWhOq+6YgXpAighlxL/ux0klpNUGbCA4WvhAQKoFYRSfblD5Q0YZiK2LujOo9ROWjGj81UHRtDjm0ptHoff2PiHuovd9cjS92ufPRzbH+ljOYt3kspkluT1O/TEo9ovw3CY= Received: from BLAPR10MB4835.namprd10.prod.outlook.com (2603:10b6:208:331::11) by PH7PR10MB6274.namprd10.prod.outlook.com (2603:10b6:510:212::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.22; Thu, 10 Nov 2022 10:10:47 +0000 Received: from BLAPR10MB4835.namprd10.prod.outlook.com ([fe80::29f3:99bc:a5f0:10ec]) by BLAPR10MB4835.namprd10.prod.outlook.com ([fe80::29f3:99bc:a5f0:10ec%2]) with mapi id 15.20.5791.027; Thu, 10 Nov 2022 10:10:47 +0000 Message-ID: <8ab4a36f-b1c2-549a-0a11-693b3e66c5a9@oracle.com> Date: Thu, 10 Nov 2022 10:10:35 +0000 Subject: Re: [PATCH v3] mm/hugetlb_vmemmap: remap head page to newly allocated page Content-Language: en-US To: Muchun Song Cc: Linux Memory Management List , Muchun Song , Mike Kravetz , Andrew Morton References: <20221109200623.96867-1-joao.m.martins@oracle.com> <903F9F8D-98A0-4114-8BC2-9738B98C8F23@linux.dev> From: Joao Martins In-Reply-To: <903F9F8D-98A0-4114-8BC2-9738B98C8F23@linux.dev> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-ClientProxiedBy: SI2PR04CA0016.apcprd04.prod.outlook.com (2603:1096:4:197::7) To BLAPR10MB4835.namprd10.prod.outlook.com (2603:10b6:208:331::11) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BLAPR10MB4835:EE_|PH7PR10MB6274:EE_ X-MS-Office365-Filtering-Correlation-Id: e3d74807-47c9-4333-0dfd-08dac303d5e6 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: vEltd4nFRI1/GRRLHbSYkJ+msLIi8hBd8yTZpow063qFQ8ULJDBigcEj3avIbgLrRo+ffcwzozL3hVY79e8Cn+L/DFRqMZp7nv8Fv71oLwRxDszIolyR2XMSB4WNJtcfq0S+Jdqc5RPlcrep0KdPJzB5QTVkXaVpGEX4YoggtROzCSSF4R/uqLWEsEMhTd/krzKyXQ9SPqVM1KOPqrKQLc3MIvu+oKm7iGbPiCrKtFRUUsm6s0x0wNE3PAFPhdo9lV9TARityNG7yEvdTv7ej88X5FOl5+PAi+hlYZGNCtXWSL38/yyeYjkptKJ//W4drQLubLhF5QnszRlBonKjHp3mJ8FBMBCQN+cDsojAMFVTyB+r6+eHs7wiMSv8CAXmPGKPXwdxj2cZY8P+tiJvQfhIZxjeTZFyvuFhOIE//MJoB0K2QPN+VKjRBvYZNgkIbDCfXtOI9tyCmVe/bZeRkdWP3UQGoP40HXLpsbZdzafk5YDg8q7ngoPSDuB0IL0euE6ARxp6AOcH9yX2dkSRAuAvdoTfYVQ4kpIBmDw9U8t7zrXigYji7/nlx2Bl9CnBjm/068ViDZ5HajCpspnIFdsOBgvRLYavEZhVt0BKoJ/uuEVUVIS+orYWM1d/ZsJs8bjdsm91xSteW1U0Msn+zQlC4ph6zzp6iyXW19LlPMCNkQIDv5lUg1ZD+ZtW3x+rYj5FTnGx9MAprHpZIGeOdG761ZZ9J0+SFvVwEuZS8bv6AXYb0QaLX4m2WceME0CvctDDtRKFwwCq2VEEymzJZA== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BLAPR10MB4835.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(346002)(136003)(396003)(376002)(366004)(39860400002)(451199015)(31686004)(2906002)(36756003)(66556008)(31696002)(6666004)(66946007)(54906003)(38100700002)(316002)(4326008)(66476007)(5660300002)(6506007)(478600001)(6512007)(86362001)(41300700001)(8676002)(6916009)(186003)(8936002)(26005)(83380400001)(53546011)(2616005)(6486002)(45980500001);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?L1o2Y2hScUVZejBOU3FFQmVDYktBRlFzak5ZK1kzdUo5aUhCTklsV0M2WmpG?= =?utf-8?B?L0VQRlg4MW9QblMxVHZyQ0t1OHFpS25zTktEY3VCR2Zhakw3VmxEUjB0U3Fw?= =?utf-8?B?aEtTSHJlM1lNWTdEMUhWUmszRmZpaERubXNqNXdVa1BmQldwdGswODBzQ09s?= =?utf-8?B?RHJsSUtOdVEvVEJRUTJ3YzFuZXRNT1N0aHdxbjJTRzlBWldVLy9aMGt5K1A3?= =?utf-8?B?MnFDSDBvYkNVU2FWMlFMRG02RVhnZGt3RFdxc0RmNzkrOFVkZnVrVWlmWDVt?= =?utf-8?B?OXBNUk9iUGRlMTE2czg4d3R2UDJjL0FKcWIrSzhXOVZma1BpOVJHdG9oa1E1?= =?utf-8?B?d0lKVjZNY2t6enVUYnZiS3Q0Z0VId1BqN1RzcUpLc3BVcUtwR2taUGxDSWYv?= =?utf-8?B?NTdKRC8xaFBGQkxPRktvYnc3MkR5WWhHVkxoUjR3aEJWU1ZKMWlFRE1GTVZN?= =?utf-8?B?VFBVSDY0MThUa1d5WGRsb0hwUkVqczZCRUhlRjR3b0Qzb1psWmsyMGpxeTFu?= =?utf-8?B?dmxNSkU4cTBJSk1lM3VhVW0rb2tPZHkxb2c1RmRxVFlNcTdqZzZTcXE2aDZN?= =?utf-8?B?cEwvWlI0RjlQempDdXRleXgwaXFrd1ZsR1dSQ0FHVUlNWXcvK1l3bEtMWUk3?= =?utf-8?B?YTgrenowekpmM1RlLzBBeXp1Zll5U0RoYVhycVFmRjJtMThqQk9uZ2MyRmJH?= =?utf-8?B?VDVXczZRYXV2RWFESkZpUVRrVGVCQldtbmpkVXlLcEZialFkUkFneE9tOWFI?= =?utf-8?B?UStNeVpUa01XaUpIWDVBQjRpOFdBUzJ2Y2hZM0hQRmdzQVBNeFBvamJzZU93?= =?utf-8?B?c0ZBUnBHNGJnVklacGwrUUhWRXN1YXRMTmY1VGpja1FJajRaTzFBUzRKMmNk?= =?utf-8?B?cC93ekJzYnR6SjlxNXllSFFNWXhhZlk5R1JMMXkzbVBoWDAyMHJPT1lwYXhz?= =?utf-8?B?T3pMSzRUOFBlZCtyV0FZM0tjSFVKaFB3T3pXWllvVXltSWFGTDgxeFZzWWdI?= =?utf-8?B?NWN5MHQrbTdJa2wwbHl2bGRWR2YvRkl3ZzJtTzdlb21FOGFrd3M2M2NQUjNW?= =?utf-8?B?YllldEdlOUtEWnFYbmwrZXF4Tk1temdZdkhHTjNlajl1Q2t0aUFLaGVFTGxu?= =?utf-8?B?OWpPQWdzdXpyUHowUzdMWkhIRDhrUXFQU0JsWnNzUXpqM1lxR0dqRXBoQXhX?= =?utf-8?B?SVBjUmVtUklaMEE4RUNIeXR1Mm0vYVJRNGgrREpGaWNIaUZlZm1RZ3FJd0xo?= =?utf-8?B?L0NTS1FIalJzQmR2RDZoVUFIbUlodHFXWld5VHJheGN2aWZ1RUxjblhmeWVP?= =?utf-8?B?clVtRzF1dm90cTlWUzkrOUsrcHNIQVYvSUppb1RERE1VWmFqTFFCa1BwdzFB?= =?utf-8?B?L091VTBURmlZdEZXN2pnbzBjYUxvcUNiU3BRcXAyY0kwb0pxdXZPdlN0cndT?= =?utf-8?B?ZFB6V3FqQW50UEJNWWVXU1djUm4yWGxSYWhBSjU3ZnR6MkgzZVVab0txUyt5?= =?utf-8?B?WC92MFpKZFQrUDFKRTFENEY2UGtzbFFIdDVIU0dWRmM4MkZLSDJEOXdVWEJw?= =?utf-8?B?QmxEK2Rha3M3QUxqUFAvcmpSUGUvTVc1MFgrbGh5V2praGo4c3ZUTEk2Rnlz?= =?utf-8?B?M2VBdUVPM010VTlJZ2lRYUQrSXJmQzByeFBaVUg3Q0xvSWdWcG1aTnEwVWVs?= =?utf-8?B?UCsxckJHN0YzY2hPak9aUHpJWmdFbHZpbS9GbThtUDJqc0xGTWptOTJkeDR1?= =?utf-8?B?QVE4cGFTOEJKMVB5L1BtY05xcXhRSXk1RUtON1c4NHJVUm1OTU9sNGhEUTRO?= =?utf-8?B?MGRGcXNzM3NPUnZtckdoOVJ6bU5iS1dMdldkRnkxT05XbW5NQ0srVUZ1di9q?= =?utf-8?B?OTVZOEw4alVBTlhvREhoV0VNYmcwY3pvQ01TdjlCQjAvcDQ3TzVLUTcwNXA3?= =?utf-8?B?ZDhNUTVEckhoNEQzSG4wdHByMlhsbmwvVkg3NElBZFAvRFNoVlloRkFHZnNn?= =?utf-8?B?NXB4MHJkdUFHeldzZkNIN0hPWnZaZ3R2QnNWMWFyM0JxTlNNZWtVSEJmcnNU?= =?utf-8?B?R0FPQTY3SXBMbS9TOE40elF0VEtvaGNGSWlFQU9FMTRPMlA1cTVJenYrRlpN?= =?utf-8?B?QldBR09keUthVWVFNlVidWUvMjVWNTJzVTRGR3hpYmtoZHZvczhCcHl2bCtx?= =?utf-8?B?SVE9PQ==?= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: e3d74807-47c9-4333-0dfd-08dac303d5e6 X-MS-Exchange-CrossTenant-AuthSource: BLAPR10MB4835.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Nov 2022 10:10:47.1364 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: c9r0RHQ9AYY5G8cAvXNqPGybvzblfnwQZkanmWQiVBIYjM/TnO5JfIecg0eO0ZZPRy+1a0EqA+eRTf4upj4/akyrdus38dpqU/c0gfrFLVc= X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR10MB6274 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-10_07,2022-11-09_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 spamscore=0 bulkscore=0 adultscore=0 mlxscore=0 suspectscore=0 malwarescore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2211100075 X-Proofpoint-ORIG-GUID: rz7Ng9cAgCscRDPBY932cIfhRCkQphTB X-Proofpoint-GUID: rz7Ng9cAgCscRDPBY932cIfhRCkQphTB ARC-Authentication-Results: i=2; imf28.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2022-7-12 header.b=iPWna1qN; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=JAHJCCWV; spf=pass (imf28.hostedemail.com: domain of joao.m.martins@oracle.com designates 205.220.177.32 as permitted sender) smtp.mailfrom=joao.m.martins@oracle.com; arc=pass ("microsoft.com:s=arcselector9901:i=1"); dmarc=pass (policy=none) header.from=oracle.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1668075053; a=rsa-sha256; cv=pass; b=pPXc4ky4WfZbpoKN88JI5azZ4EeB7bm4nQY1vH7wnSlIpRqpH0MCqiZ5LKoM6xQCWt5z9W yUKLqPSho9xGPTAM3EM9mQ1IzjdRKQy/uz1lOb+QMcqNsoeJYUa7/8+xab7LhVMFoB4iyR nxc0wuDVrSk8Hw7Eb5rKhVOauIZkAAE= ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1668075053; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+c+VybTbvXWy9RIKaybNkbaT8REeXqS81Pd4nl18PUo=; b=rQXIB3yZAsy1T/8F/QlwbR8vWWAMJK2wEeCC4gJVrR3QoDSl4Lk+I8l4a9GczmLn3MhiK9 juyqc+dq/1+SVtG3Z0EM2e+S7kqOn2oByzSAtaeu4/ZO9qNum7Pmk+foO2DCJz/IJw3pAe RHhvaK1/E0Wphl5rjqzKD8WjYhaFpuo= X-Stat-Signature: rg97xrbra955k81nn5c8mrp9dxukhn6g X-Rspamd-Queue-Id: DF1EAC000E Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2022-7-12 header.b=iPWna1qN; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=JAHJCCWV; spf=pass (imf28.hostedemail.com: domain of joao.m.martins@oracle.com designates 205.220.177.32 as permitted sender) smtp.mailfrom=joao.m.martins@oracle.com; arc=pass ("microsoft.com:s=arcselector9901:i=1"); dmarc=pass (policy=none) header.from=oracle.com X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1668075052-723090 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 10/11/2022 03:28, Muchun Song wrote: >> On Nov 10, 2022, at 04:06, Joao Martins wrote: >> >> Today with `hugetlb_free_vmemmap=on` the struct page memory that is freed >> back to page allocator is as following: for a 2M hugetlb page it will reuse >> the first 4K vmemmap page to remap the remaining 7 vmemmap pages, and for a >> 1G hugetlb it will remap the remaining 4095 vmemmap pages. Essentially, >> that means that it breaks the first 4K of a potentially contiguous chunk of >> memory of 32K (for 2M hugetlb pages) or 16M (for 1G hugetlb pages). For >> this reason the memory that it's free back to page allocator cannot be used >> for hugetlb to allocate huge pages of the same size, but rather only of a >> smaller huge page size: >> >> Trying to assign a 64G node to hugetlb (on a 128G 2node guest, each node >> having 64G): >> >> * Before allocation: >> Free pages count per migrate type at order 0 1 2 3 >> 4 5 6 7 8 9 10 >> ... >> Node 0, zone Normal, type Movable 340 100 32 15 >> 1 2 0 0 0 1 15558 >> >> $ echo 32768 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages >> $ cat /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages >> 31987 >> >> * After: >> >> Node 0, zone Normal, type Movable 30893 32006 31515 7 >> 0 0 0 0 0 0 0 >> >> Notice how the memory freed back are put back into 4K / 8K / 16K page >> pools. And it allocates a total of 31987 pages (63974M). >> >> To fix this behaviour rather than remapping second vmemmap page (thus >> breaking the contiguous block of memory backing the struct pages) >> repopulate the first vmemmap page with a new one. We allocate and copy >> from the currently mapped vmemmap page, and then remap it later on. >> The same algorithm works if there's a pre initialized walk::reuse_page >> and the head page doesn't need to be skipped and instead we remap it >> when the @addr being changed is the @reuse_addr. >> >> The new head page is allocated in vmemmap_remap_free() given that on >> restore there's no need for functional change. Note that, because right >> now one hugepage is remapped at a time, thus only one free 4K page at a >> time is needed to remap the head page. Should it fail to allocate said >> new page, it reuses the one that's already mapped just like before. As a >> result, for every 64G of contiguous hugepages it can give back 1G more >> of contiguous memory per 64G, while needing in total 128M new 4K pages >> (for 2M hugetlb) or 256k (for 1G hugetlb). >> >> After the changes, try to assign a 64G node to hugetlb (on a 128G 2node >> guest, each node with 64G): >> >> * Before allocation >> Free pages count per migrate type at order 0 1 2 3 >> 4 5 6 7 8 9 10 >> ... >> Node 0, zone Normal, type Movable 1 1 1 0 >> 0 1 0 0 1 1 15564 >> >> $ echo 32768 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages >> $ cat /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages >> 32394 >> >> * After: >> >> Node 0, zone Normal, type Movable 0 50 97 108 >> 96 81 70 46 18 0 0 >> >> In the example above, 407 more hugeltb 2M pages are allocated i.e. 814M out >> of the 32394 (64788M) allocated. So the memory freed back is indeed being >> used back in hugetlb and there's no massive order-0..order-2 pages >> accumulated unused. >> >> Signed-off-by: Joao Martins > > Thanks. > > Reviewed-by: Muchun Song > > A nit below. > Thanks >> --- >> Changes since v2: >> Comments from Muchun: >> * Delete the comment above the tlb flush >> * Move the head vmemmap page copy into vmemmap_remap_free() >> * Add and del the new head page to the vmemmap_pages (to be freed >> in case of error) >> * Move the remap of the head like the tail pages in vmemmap_remap_pte() >> but special casing only when addr == reuse_Addr >> * Removes the PAGE_SIZE alignment check as the code has the assumption >> that start/end are page-aligned (and VM_BUG_ON otherwise). >> * Adjusted commit message taking the above changes into account. >> --- >> mm/hugetlb_vmemmap.c | 34 +++++++++++++++++++++++++++------- >> 1 file changed, 27 insertions(+), 7 deletions(-) >> >> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c >> index 7898c2c75e35..f562b3f46410 100644 >> --- a/mm/hugetlb_vmemmap.c >> +++ b/mm/hugetlb_vmemmap.c >> @@ -203,12 +203,7 @@ static int vmemmap_remap_range(unsigned long start, unsigned long end, >> return ret; >> } while (pgd++, addr = next, addr != end); >> >> - /* >> - * We only change the mapping of the vmemmap virtual address range >> - * [@start + PAGE_SIZE, end), so we only need to flush the TLB which >> - * belongs to the range. >> - */ >> - flush_tlb_kernel_range(start + PAGE_SIZE, end); >> + flush_tlb_kernel_range(start, end); >> >> return 0; >> } >> @@ -244,9 +239,16 @@ static void vmemmap_remap_pte(pte_t *pte, unsigned long addr, >> * to the tail pages. >> */ >> pgprot_t pgprot = PAGE_KERNEL_RO; >> - pte_t entry = mk_pte(walk->reuse_page, pgprot); >> struct page *page = pte_page(*pte); >> + pte_t entry; >> >> + /* Remapping the head page requires r/w */ >> + if (unlikely(addr == walk->reuse_addr)) { >> + pgprot = PAGE_KERNEL; >> + list_del(&walk->reuse_page->lru); > > Maybe smp_wmb() should be inserted here to make sure the copied data is visible > before set_pte_at() like the commit 939de63d35dde45 does. > I've added the barrier and comment above the barrier as the copy is not immediately obvious where it takes place. See below snip as to what I added in v4: diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index f562b3f46410..45e93a545dd7 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -246,6 +246,13 @@ static void vmemmap_remap_pte(pte_t *pte, unsigned long addr, if (unlikely(addr == walk->reuse_addr)) { pgprot = PAGE_KERNEL; list_del(&walk->reuse_page->lru); + + /* + * Makes sure that preceding stores to the page contents from + * vmemmap_remap_free() become visible before the set_pte_at() + * write. + */ + smp_wmb(); } entry = mk_pte(walk->reuse_page, pgprot);