From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,MSGID_FROM_MTA_HEADER,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8031C433DB for ; Thu, 4 Mar 2021 16:46:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0A95064EC4 for ; Thu, 4 Mar 2021 16:46:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0A95064EC4 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 855FE6B000E; Thu, 4 Mar 2021 11:46:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 805B36B0012; Thu, 4 Mar 2021 11:46:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6318D6B0023; Thu, 4 Mar 2021 11:46:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0034.hostedemail.com [216.40.44.34]) by kanga.kvack.org (Postfix) with ESMTP id 3D45B6B000E for ; Thu, 4 Mar 2021 11:46:28 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id C9CB862C0 for ; Thu, 4 Mar 2021 16:46:27 +0000 (UTC) X-FDA: 77882770014.23.8B5C51C Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by imf09.hostedemail.com (Postfix) with ESMTP id 3CBFF6000119 for ; Thu, 4 Mar 2021 16:46:23 +0000 (UTC) Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 124GWeEk015303; Thu, 4 Mar 2021 08:45:16 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=date : from : to : cc : subject : message-id : references : content-type : in-reply-to : mime-version; s=facebook; bh=7/wi5nLlKwMXcwkxKh1qeW09mRfLsO6llwgRUNYJodM=; b=L44R53tAD75GgWEEAFwOFxmdK/ioaQri38iN0z97sYUrS7EKcJPOxf09tqcrk3FTLipA HmCJoYj/eVtviYLMGEgx4zWROlsG7C86/GGnGQmapud9GEADEtECPZQcSKg5WKxArBMs KR7LEMPKGoJfQ/eKXBw2NwkTBmmONep79zw= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com with ESMTP id 372n7jbxd1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Thu, 04 Mar 2021 08:45:16 -0800 Received: from NAM11-CO1-obe.outbound.protection.outlook.com (100.104.31.183) by o365-in.thefacebook.com (100.104.35.175) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1979.3; Thu, 4 Mar 2021 08:45:14 -0800 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=UpEoHhgXHr2HbiqVu6LQ9HnGWa5kiqUphoyvwCruXrTqknN2CZ5SCFBti3bDOga/RBvjDXzufbJ35mRPoazYZLQb3J/rbQgO2JnCV1W0hAYOvfYSYYvhBkaduOMBN5XcTgs5AkUAfbQ/xmH4FIfjENPXxVZym4RNd7tkO+ammkkggqCifBhVjLAJFTNyxF//U39AgDkyjKDmd/18TULNKMhKuPBC0mueQPdVRXqloqRjveiUbBgY8TCiD3aTHqsPD75DNS8pIxTkJ05l9h2TT37KahGFfzZqOsanjqPqeotgu/YYIlCn+Hr54nhEUU1jOJD+kmLueFF+nmfnCRK1iw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=7/wi5nLlKwMXcwkxKh1qeW09mRfLsO6llwgRUNYJodM=; b=QiqucE+xnZTGG19J1HiE2JPZiNGnAUfhM77I+isw49XlrEFLQKEjYjYScdesLrV589GKapKm/zsVjY82RFZxhtYcKaTdUffcNs6hn0XgnHR4NCx4CNVG7KvcNqBEN3Tb3/6g11TKwp1cqeAAV8odtHW8bywAuQVudEZxjc0m7GCeFbBQBjAE/OVS1+bvjY2qfG7PZrlSHz2Ws5RrzcVBxVPtZGXDNYwbW3QD2ZSTIMFLitWYwtSAC6qrsfCuNcqzAJsxk42pphc9yngetId0DXpb7ffdqNA3amOpZ9tFvO3ao9Bino8lJnoIs6wFKr3V3j/e6eG1loB6nq21t5aujA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=fb.com; dmarc=pass action=none header.from=fb.com; dkim=pass header.d=fb.com; arc=none Authentication-Results: nvidia.com; dkim=none (message not signed) header.d=none;nvidia.com; dmarc=none action=none header.from=fb.com; Received: from BYAPR15MB4136.namprd15.prod.outlook.com (2603:10b6:a03:96::24) by BYAPR15MB2502.namprd15.prod.outlook.com (2603:10b6:a02:83::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3912.17; Thu, 4 Mar 2021 16:45:12 +0000 Received: from BYAPR15MB4136.namprd15.prod.outlook.com ([fe80::53a:b2c3:8b03:12d1]) by BYAPR15MB4136.namprd15.prod.outlook.com ([fe80::53a:b2c3:8b03:12d1%7]) with mapi id 15.20.3912.022; Thu, 4 Mar 2021 16:45:12 +0000 Date: Thu, 4 Mar 2021 08:45:07 -0800 From: Roman Gushchin To: Zi Yan CC: , Matthew Wilcox , "Kirill A . Shutemov" , Andrew Morton , Yang Shi , Michal Hocko , John Hubbard , Ralph Campbell , David Nellans , Jason Gunthorpe , David Rientjes , Vlastimil Babka , David Hildenbrand , Mike Kravetz , Song Liu Subject: Re: [RFC PATCH v3 00/49] 1GB PUD THP support on x86_64 Message-ID: References: <20210224223536.803765-1-zi.yan@sent.com> <890DE8FE-DAF6-49A2-8C62-40B6FD593B4A@nvidia.com> Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <890DE8FE-DAF6-49A2-8C62-40B6FD593B4A@nvidia.com> X-Originating-IP: [2620:10d:c090:400::5:ddb8] X-ClientProxiedBy: MW4PR04CA0413.namprd04.prod.outlook.com (2603:10b6:303:80::28) To BYAPR15MB4136.namprd15.prod.outlook.com (2603:10b6:a03:96::24) X-MS-Exchange-MessageSentRepresentingType: 1 Received: from carbon.dhcp.thefacebook.com (2620:10d:c090:400::5:ddb8) by MW4PR04CA0413.namprd04.prod.outlook.com (2603:10b6:303:80::28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3912.17 via Frontend Transport; Thu, 4 Mar 2021 16:45:11 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 43550327-1f38-4e98-b946-08d8df2ce13b X-MS-TrafficTypeDiagnostic: BYAPR15MB2502: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: X-FB-Source: Internal X-MS-Oob-TLC-OOBClassifiers: OLM:9508; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: /POJMjK01GQn8hntTnPCkgwdqSytb7uZz0WWIrV1Tw2ps+0Ur1OIVmxCzU2EhPV3P8Js2DYDeGhoVCBxAg+bEh2EoIDtM25mknh/LZsLNsgwjya1i8tZRmVKbBpn6w53wbZVY46peQElhc0PB/wmFGFbESDjzCECT/tC6I8ueOOcHaBZKI/CKc5mx47d0dq37FlCy28+62xduDCtG7Woq+x7q4ecuRgUUrDOYXXiV3vUr+mDKpn2PUnlOYYYd991QmMDEZ76ymWJ0xYS/IuqFHw0wgA6CDW3lMx6ncLxGbNUt31hT6+cUBoN8PAZpdYffStZaVC+FGEZK6tbPNuUvC9SBn8Jat1xzye55FPGSLKXI95lGBPmdR9/38gwwBRQYziPjZe+Iy4S1DLlPkl8YGSfwIFqCTqORqwZYSC9pT9H53KV/chwR3ve2TiNU0Ba1eVnIFxOtbdAjabdOvXkiwCzEQDkpwGoKbVhA2kDq1+umNtA2dHlCL1th0jqpG3DyFzKU4R3hA1BIWdxx8xXbymRtNyHCoEWUk/KvQRMIPZHllLBnk+UIueuo41Cxj+vXCF1Au6jm0CtelKDFV4NiY5OsS8wla43e8xOm4Dbl+vv+iAahM0Te7PprG5o1GQ3E0/JI8LAA3/y1nqbRe5nyw== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BYAPR15MB4136.namprd15.prod.outlook.com;PTR:;CAT:NONE;SFS:(366004)(136003)(396003)(39860400002)(376002)(346002)(66556008)(86362001)(83380400001)(6666004)(54906003)(478600001)(53546011)(4326008)(6916009)(8936002)(6506007)(2906002)(316002)(7416002)(55016002)(186003)(9686003)(966005)(7696005)(52116002)(5660300002)(16526019)(66476007)(8676002)(66946007)(67856001);DIR:OUT;SFP:1102; X-MS-Exchange-AntiSpam-MessageData: =?us-ascii?Q?QKjz6Ov+5KgnHQDOpUDozGyDAAOF98wLoRhP0POI9wS8Ye0JlN4dkFo2W3LH?= =?us-ascii?Q?/OpsNC/Lcga3IpsefhO6g3WhIsttHJTwF8iKdwwg9N9MDmvt+0d1RE3Wpzrz?= =?us-ascii?Q?aa+CpY9vH04HrPLEq0HQWXa6rrOEqp0R+HbcYBTN2tKOExUoV2pB6Nhv9Ls3?= =?us-ascii?Q?+PHtHPHeenAr/zjKN/0tBmPzDXRJoXEb9ENZyqinoH9l26rg4HkKhZ1U/EMs?= =?us-ascii?Q?8jAAhgFFzMKM5T1vwThNmOi+5JGqjukVOm8BishoZcRhpIplN2hrVdMA6n2k?= =?us-ascii?Q?44KOggfGPrPjyUFEeceXW7IIacGlbJFtSxAggea7SOiIZECOhcz+Yn47QpvO?= =?us-ascii?Q?TE+C7ZMu+ukVVVd9opaErjRBVxjw9k49zqh2FnOvWDIANsJpCOIabsl41un9?= =?us-ascii?Q?XwLmuaeeJ+CnIkQv/4jydPqo3IbEemOiYjo60DAov/RX1jqAm56KJV0Nee98?= =?us-ascii?Q?ud6m+OK9MCxx0rRAQSyoC/X8kqvq92OEArt8InbtBfhz/iBmusZyzLqMEjQy?= =?us-ascii?Q?TH/Tw1UAzpb4b0Z9CsL+Hfdg2yvid6Bi2G1mxclcMO1+X79kZQNi5IDB7Y5p?= =?us-ascii?Q?ApaOlT993XRk+Z+Q8SRFfl4eYIf98ezTjRZXACHGcNYYiYRAGYlAjIa4FAWz?= =?us-ascii?Q?jbj2+0di7P9eAVoyRvS7sb65bV5Mt9YBBi4bzCRjJNJ1CO9GkL87d7Eb4VHO?= =?us-ascii?Q?FzcmdLwTNBAorAiTUzJIaU9qb/PD6fgoa9AujIqxmjgoPHm6lQS+IqaLXfj/?= =?us-ascii?Q?QKxJ8PyMuTThBreWBavqvuNv34m4lMBfmenuoknUpF06TA93ITmbolS5CsHI?= =?us-ascii?Q?zm3qvOEd/N79HLWFg2V47X/cuzviS740NGHh+XQCFj67r3UDuxf8B+4B+LfG?= =?us-ascii?Q?tCM6X9jDV+mvGsVGIU8R7l+q0wOXJ9xSpeHQBUKGIHBpTIPKgGE+mXdFIm2U?= =?us-ascii?Q?zOwevTCiwgSGP00g9hpCtIV0LSeepYT7evK7rlJL0hSrcAp9hbt7/YykiSfS?= =?us-ascii?Q?WB2AAmGrhhVcbGw4ef7pGNeCUi69XT5SYu46OxqS1CBwidLjBCmGoSCYnBEN?= =?us-ascii?Q?3P03T2Es/hzcAi17i/Fswy9AV2jfFKq5HMQMXDbtiFHunpGLv+1v3s7pNQko?= =?us-ascii?Q?Es3OYhnePCV1u7riwfqFKF9u6k4OLE7Y+kQ4Yi2Di9U55T03stxCRvdemHck?= =?us-ascii?Q?xS3oCpktHFc261X2AGO9SV9O1hhCOen7bppv681f1XhjIyte4yL8HAZ5JGcU?= =?us-ascii?Q?EyViqU/UfPUEYVr2CoN3zeiAGrhNIFvTVJoeYqQvFZR3XW1KxlV3AgrPeSDr?= =?us-ascii?Q?R6QjghC2vA7t2ozCSjSeXNaEd/Y210LGIy1ILgIuSVCRDA=3D=3D?= X-MS-Exchange-CrossTenant-Network-Message-Id: 43550327-1f38-4e98-b946-08d8df2ce13b X-MS-Exchange-CrossTenant-AuthSource: BYAPR15MB4136.namprd15.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 04 Mar 2021 16:45:12.6166 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 8ae927fe-1255-47a7-a2af-5f3a069daaa2 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: OR6mtE7Eekk7MjGque/XE9BiPmgI76G0n/yxlXSiE2KFhE+2PoKbE3QOzUxX3bWA X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR15MB2502 X-OriginatorOrg: fb.com X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.369,18.0.761 definitions=2021-03-04_05:2021-03-03,2021-03-04 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 mlxlogscore=796 bulkscore=0 priorityscore=1501 spamscore=0 lowpriorityscore=0 malwarescore=0 phishscore=0 clxscore=1015 mlxscore=0 adultscore=0 suspectscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2103040078 X-FB-Internal: deliver X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 3CBFF6000119 X-Stat-Signature: tnaud8okcqnccjxrofwd3mbgq779bi4z Received-SPF: none (fb.com>: No applicable sender policy available) receiver=imf09; identity=mailfrom; envelope-from=""; helo=mx0a-00082601.pphosted.com; client-ip=67.231.145.42 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1614876383-660722 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Mar 04, 2021 at 11:26:03AM -0500, Zi Yan wrote: > On 1 Mar 2021, at 20:59, Roman Gushchin wrote: > > > On Wed, Feb 24, 2021 at 05:35:36PM -0500, Zi Yan wrote: > >> From: Zi Yan > >> > >> Hi all, > >> > >> I have rebased my 1GB PUD THP support patches on v5.11-mmotm-2021-02-18-18-29 > >> and the code is available at > >> https://github.com/x-y-z/linux-1gb-thp/tree/1gb_thp_v5.11-mmotm-2021-02-18-18-29 > >> if you want to give it a try. The actual 49 patches are not sent out with this > >> cover letter. :) > >> > >> Instead of asking for code review, I would like to discuss on the concerns I got > >> from previous RFCs. I think there are two major ones: > >> > >> 1. 1GB page allocation. Current implementation allocates 1GB pages from CMA > >> regions that are reserved at boot time like hugetlbfs. The concerns on > >> using CMA is that an educated guess is needed to avoid depleting kernel > >> memory in case CMA regions are set too large. Recently David Rientjes > >> proposes to use process_madvise() for hugepage collapse, which is an > >> alternative [1] but might not work for 1GB pages, since there is no way of > >> _allocating_ a 1GB page to which collapse pages. I proposed a similar > >> approach at LSF/MM 2019, generating physically contiguous memory after pages > >> are allocated [2], which is usable for 1GB THPs. This approach does in-place > >> huge page promotion thus does not require page allocation. > > > > Well, I don't think there an alternative to cma as now. When the memory is almost > > filled at least once, any subsequent activity leading to substantial slab allocations > > (e.g. run git gc) will fragment the memory, so that there are virtually no chances > > to find a continuous GB. > > > > It's possible in theory to reduce the fragmentation on 1GB scale by grouping > > non-movable pageblocks, but it seems a separate project. > > My experiments showed that finding continuous GBs is possible, but I agree that > CMA is more reliable and 1GB scale defragmentation should be a separate project. I actually ran a large scale experiment (on tens of thousands of machines) in the last several months. It was about hugetlbfs 1GB pages, but the allocation mechanism is the same. My goal as to allocate a relatively small number of 1GB pages (<20% of the total memory). Without cma chances are reaching 0% very fast after reboot, and even manual manipulations like shutting down all workloads, dropping caches, calling sync, compaction, etc. do not help much. Sometimes you can allocate maybe 1-2 pages, but that's about it. Even with cma we had to fix a number of additional problems (like sub-optimal placement of cma areas, 2MB THP migration, some ext4 and btrfs page migration issues) to have a reasonable success rate about ~95-99%. And it's not 100% anyway. The problem with artificial tests is that you're likely experimenting on a freshly rebooted machine which isn't/wasn't doing much. It's a bad model of the real memory state of a production server.