From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2BB9DEB64DA for ; Sat, 1 Jul 2023 01:17:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4B0348E0060; Fri, 30 Jun 2023 21:17:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 461098E0059; Fri, 30 Jun 2023 21:17:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2DA018E0060; Fri, 30 Jun 2023 21:17:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 19C348E0059 for ; Fri, 30 Jun 2023 21:17:27 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id E1BFE140343 for ; Sat, 1 Jul 2023 01:17:26 +0000 (UTC) X-FDA: 80961280092.08.A652CAA Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2062.outbound.protection.outlook.com [40.107.93.62]) by imf19.hostedemail.com (Postfix) with ESMTP id 075081A0006 for ; Sat, 1 Jul 2023 01:17:23 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=r5VBLiAt; dmarc=pass (policy=reject) header.from=nvidia.com; spf=pass (imf19.hostedemail.com: domain of ziy@nvidia.com designates 40.107.93.62 as permitted sender) smtp.mailfrom=ziy@nvidia.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688174244; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LILShz3FU5TMB4oTnuBOM4qF4oE1zE5dzYFMS3KRY6k=; b=r1KFNVGAvcSp06pWoXo/wPcsQo7fQ7YQhTbpRPcSd0MOEVuSvqozeBNr4fq3tc0ZF7ZJ2g dFopEf4BNL6GM1uOmvZcbGFt6TOCwyJCaxuaiXE6hHhme2s3tgYKhBYGqlY0Nq+SH0C9NX Ww16EWYUl9ZW1C1d47gMt5NgJ1QFh54= ARC-Authentication-Results: i=2; imf19.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=r5VBLiAt; dmarc=pass (policy=reject) header.from=nvidia.com; spf=pass (imf19.hostedemail.com: domain of ziy@nvidia.com designates 40.107.93.62 as permitted sender) smtp.mailfrom=ziy@nvidia.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1688174244; a=rsa-sha256; cv=pass; b=aY95OKBw88hvvOks2Zhc2fanWuTKIgsfYLjiVd/U6me+V9Cr1J6N7LcrOrMr5B/GEj8TdR MrKe2V1hUcWBQavuPOnebP2wEdAsCU2P0FJ2ZsleuIxbgDhG3orWMTW7HrtjJ8Z7dF3DDT 8ju/HvcJPxUU2iflUNoLGx/QCTCN5Z4= ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=GTdPIRmPE2ygYW3Lub3mnlWqSmP/C4EgIwAi0/znxo8IY3rEiLRECQlS7h6su588uzxz8HOYu8yKlFMUszKnwtDW0Jgp4eZhZqOe2v8De9cT+rq2w/wZPRTZohMWklami9biNk9rE8tkJ5orVTXAI3TPTXCu/urqTuMWgVLJK9t81ZhEGsVEqMyKkJnqyqfOlAnVqaTqZXzWhEueM7uKSEaP7uwhIJguTAKmM5qLH0Qqy4Rpaxzu4Xm4VZ9UEg7kUVOe7fJqb5DHtzw3FUIEiODC3RxVMlRbvLoDGptbnyAgYUaCEVtAd3bJgHAuLEImeIdt3c6g9TnxwXK0Qshuww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=LILShz3FU5TMB4oTnuBOM4qF4oE1zE5dzYFMS3KRY6k=; b=R14SpEkmNwiz/Ug17olbuz8jVQJuZiszrAidETsE69XvBar0wZv+LEefxSzRYwnTzIIBb5ZOZIDiNzuYzNZIjfM40cxBRYaOZ4o09k4bX89C/6ywAlZTkRwv3kaY2HzArzM33e+5fIi7dDK2TY9o0Tn8cHVct80dYQlPRvzSK86GRupGR2Wg0l2QhuUWfax1MS9/P1gnLP6TiEHZ+zLKXt3Ow95IIZ1V0OnLyv9FA18OE286N/uXR+fyAlBQum3Avy1ZSlXriUjKZ8mTn8pFdjBIp3glkVgO+L34td0qEtw634RKdR3VjetouuDGbNMuBoWSMBlFLaJF9Ye1kk9dsg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=LILShz3FU5TMB4oTnuBOM4qF4oE1zE5dzYFMS3KRY6k=; b=r5VBLiAtDT5TrVE0SWcz/hiIrBdfFuymaefXwrK1dv/gSzKQo7LX0FGNfsyuLZSGIFpDLujcBB6tUAib3sMIvU//n05XD1CtvFBPi3A+yP1BnMAmBhIPuIkqoQzbfCrmvdsUWMUj7PY1219ZlJCzZcLKN0Fnm2uXoDSJJZzsmUjwj/V37kDTiZHpdEVKe/+e1Qz2arZjtSC2hbVTUl5tGfKXKeoqzNq5kFevmWmgMM/tCk1jo8ADCE7EDXQOqw5NNZiRTTTFWJE6tAEhti3PBKMbbaMEMjDuvj4cMBomS4+kOvHIMrWkl9aPwKZzs/N8VMYrfs0a8U2cxdPJ8DnQIQ== Received: from DS7PR12MB5744.namprd12.prod.outlook.com (2603:10b6:8:73::18) by MN2PR12MB4238.namprd12.prod.outlook.com (2603:10b6:208:199::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.24; Sat, 1 Jul 2023 01:17:21 +0000 Received: from DS7PR12MB5744.namprd12.prod.outlook.com ([fe80::12b7:fbc0:80e1:4b8b]) by DS7PR12MB5744.namprd12.prod.outlook.com ([fe80::12b7:fbc0:80e1:4b8b%3]) with mapi id 15.20.6521.026; Sat, 1 Jul 2023 01:17:21 +0000 From: Zi Yan To: "Yin, Fengwei" Cc: Matthew Wilcox , linux-mm@kvack.org, Vishal Moola , Hugh Dickins , Rik van Riel , David Hildenbrand Subject: Re: Folio mapcount Date: Fri, 30 Jun 2023 21:17:19 -0400 X-Mailer: MailMate (1.14r5964) Message-ID: In-Reply-To: <310c4d8a-e14c-742b-5c6c-018c01ed897e@intel.com> References: <7DCA075B-1E43-47B1-9402-66C54513D52E@nvidia.com> <310c4d8a-e14c-742b-5c6c-018c01ed897e@intel.com> Content-Type: multipart/signed; boundary="=_MailMate_4AE353E2-84D1-4474-ACB2-E5C1606C93C9_="; micalg=pgp-sha512; protocol="application/pgp-signature" X-ClientProxiedBy: MN2PR14CA0003.namprd14.prod.outlook.com (2603:10b6:208:23e::8) To DS7PR12MB5744.namprd12.prod.outlook.com (2603:10b6:8:73::18) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS7PR12MB5744:EE_|MN2PR12MB4238:EE_ X-MS-Office365-Filtering-Correlation-Id: f0dc80df-ab53-4d82-a609-08db79d0eb22 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: coX5x2ZoqHQm3oy97ualKMEn/6hBRq7whpwAuQw07vv2KDNXbN4Jm1n80OIjISzOolf8JJwpAGFhbMyt1F8H0X1Hn37/EbN+hhxkWUtfqt/gCcGAAMqe3DfXx7TW5fmMwKT2wfx42E3Imhm3FXCmZ0B8J3Gpq0hjyiWaYryKzLnFC8YPDuu4e+U5XKKnVrvwRtobKNNj75MLQtY3x7OSzrRRrjaBI//vE04aHt3/8/aP4YPqCPYJlquz5/LBSWrtWynmNIBhr8qvcdjADK6X1CS5qP2ymmUqqAOvKwKhy/lxi+b/4SCVYuVk0/7x0DJfctAtlM3figGbxjwC4QxcCfIbTkqTSdD+3BMFuHCvGfF/Ia9w+xXOfaHwkqfb0k6vflYRN+Uqusf/b3T2RG439lPLKLyhoNx5kxloU7a6SFOW63u8uKOPZosQCVhcGQ7z24/SEOsb5ZnEIPzhbF7QXOCTa7VIZTd/LaYlLj0jt5psYdqt71NPPOXYlMDlhbZRxqGwom5Kbpp6uEWTKlYfF+pdS2SH72VDD64mGBX0kw5G7Z8c8yfrUEDbV81R2STOMHMcWVMDUaBI86zQUU0nv8IRH6m2nqO9F747od4TD8gjXtt0HKDKvvXpKMqMEH5EL4c9ism2Uk1QArCvtmVjyDTbi5nh5qomRwYcUAtVZIeqAHj6PDCN7MSknfc51OlyDtSqSiAFvzaMRujn71KbysnsMBMOp7yiVET72vwaWNc= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DS7PR12MB5744.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230028)(4636009)(376002)(39860400002)(396003)(136003)(366004)(346002)(451199021)(235185007)(186003)(26005)(7116003)(6916009)(2906002)(38100700002)(2616005)(3480700007)(6506007)(6512007)(53546011)(5660300002)(66476007)(8936002)(41300700001)(66946007)(66556008)(83380400001)(36756003)(316002)(8676002)(33656002)(54906003)(4326008)(86362001)(966005)(6486002)(478600001)(45980500001)(72826004);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?GDe6zfq1SF5I9Q6d01v94hZVTdkBCmofQgae+C06Pe+l3C4s/FUdVxwMD/oc?= =?us-ascii?Q?zRimqXuT0LyKpezr8oSIUYfYlFDc8jw4Fufa0tqJVemfItZFrha8HjhVP0+4?= =?us-ascii?Q?TZRn3VMmBZrjXP6IfkhLTWGPljQLDdy9a1XvQ8gfeU8GCZJ8Y39gFweTdntc?= =?us-ascii?Q?QRWJRqbRlSO96piXilZ8l9f7lR8FdqDE62wcb+AxygjjywfDs8zCFfTD4zUv?= =?us-ascii?Q?Q0pdNntp75KSbPHHcrAvQeDU/HmaN75pBidyKXHSAJKpnrUJxy3X0/nnRCUd?= =?us-ascii?Q?5/uPcT3y38Kh2WyWLocr+IQzE8WRHeaWdkiRMyb8zYTq6/7Q0pXgrnpkRCWz?= =?us-ascii?Q?vakjekZPjCDa5EwwJRzweb084tAN7p3YDV1vO7Pdc11d8Y8yf/8H8aCHjm3U?= =?us-ascii?Q?EaWboEz7S0srwXu++/kPwUMgbzFf+gPJtOKYUV4DeNRJNDXqrcgZsNhPDkbR?= =?us-ascii?Q?L0bIkcBpH0vejER3hp/hT18kEFyvcwyHHeLNFXnNGU3PVCZvgE0bpAcDRwJC?= =?us-ascii?Q?5tIIxVlr8wVEvNmHvj1lF6Z4ELKmhRJzmFBwRknVdnoIYux+EN3hoSCD+laa?= =?us-ascii?Q?rBNY/7bxEQe7rK67rZqEMm65O9BMqQLIHDgB0JfOhlwpjrWuFgTQVakuYjKo?= =?us-ascii?Q?txeHsOA0CCmP+psjjr2KOig4YaXJKSM6MMBvlr/YfEciyCeQC2dAjyWx6deQ?= =?us-ascii?Q?fIrYk+C0CC52a2Q9JM3xonq1rzKnf8pjhiZAXzh1Y2G50eQNvLMuhpb/Jycx?= =?us-ascii?Q?DBMvqj2fbjJeYLHMHWMG8+UgF75snRryz4zcuyULbPhVhJCAH/sMZY+97XIs?= =?us-ascii?Q?Yjq5c3P1qElAatUBy91kyJU4LPtOt14pTYq0D3WBsmf1wkvdFU6PfyW/9MAm?= =?us-ascii?Q?W5QE/vvJcZKyirnlcpDnIGP/IDa+F5gXkXytHwx3FvXUwIFEBESA3H4ogNPW?= =?us-ascii?Q?YPDu0GATpQBIEmduomXidTJjLPeFK9eGqpZmOklJ5VUnVOXmzJ2SLXkAih0L?= =?us-ascii?Q?5+3yVRt06yHC8cMO7S68DtXPVdMPqBQBLA4wRu0NcXeQeFiA55AiHqTwNZI5?= =?us-ascii?Q?ehbPdd2JOFcAcokYxx+pC6xyXfusXv7DX8xShvU6iQha4qnIVXpuPokWMbL9?= =?us-ascii?Q?aZf7OlngA+6qwycFGe9a59Y829QvGgdLMa2Wb1OmCiLzSO+JWGaK6KU8dLJK?= =?us-ascii?Q?P92CT8z9VC1VMfyMLLerVGEyFrqSHTAOZXXdCc4j9BaMHX9HWVjuSSt8brVm?= =?us-ascii?Q?nO0U5OF6BTp2K7Z8B5fPvRckPNXnNr6Io1JN6Eb7ifbFBSMIWS/awuI+FpZw?= =?us-ascii?Q?K2ncMv/W5d8M/O3DeGKHnWpRegWXPpnXn5hcqMNPRoYFgbJ7K492LlL7c8aB?= =?us-ascii?Q?u2XXQGepYzG3xjR0+sjwV+Li1xY1lUza9CBz3Xrcxn95aDiSZ3D4iD+rB4E6?= =?us-ascii?Q?eTyvX78xwx7KhOJipJqzAzUUw7xyco/SCmrMx1b9H7N8yANMWUJqeSiWqhN1?= =?us-ascii?Q?HNuklbzV1RuMmHMLGvF9G/jAq0P0ERGbN2dpMdIl/wOEBORBJ93lq53jBgR+?= =?us-ascii?Q?TtZT/Ej2VkLYM1oQ3v0=3D?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: f0dc80df-ab53-4d82-a609-08db79d0eb22 X-MS-Exchange-CrossTenant-AuthSource: DS7PR12MB5744.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Jul 2023 01:17:20.9640 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: RlZvDXirmKf736sTPhEUbs4cSbxdvQvzsQ5WpI67LfyHq5N9SttkAWGTcGfbsq9g X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR12MB4238 X-Rspamd-Queue-Id: 075081A0006 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 3ij5mhp5dg3rcr9p3y3hhuafpt3cigtt X-HE-Tag: 1688174243-712079 X-HE-Meta: U2FsdGVkX19sIDJppO1F81hnUQn7O28ziHQXgRn9KAk0KuRBGmcsQBfL0UQCtcALBKRdAKepNua4D1/Xmq2Yuv6WJCEKLfibviyIFj6K+S73srvMengp2LW6LmG2FIn05oG2sMmJTjWfGuXe5s/ldghEnu3tDiq97PFvaSwaJz9lOpZhk4LDeQbzBZH9rZV9ky3vciVRj1yDRtTB1ZdTpNqFQreN2LNS4jueuNXm40brIh+47EQsoTy0u6F89Lbsg1EHTHBlYnUOn8lEPnW2kQAOhFnKLYlpfuFN2QUR+HrIDq1hUEhMH/GucwuXZUm26NFLXYx/D9T7nc+7k7hXo1w28CubsIqFdO/eeLf/SvBX+Uhx69XtA7iPTOm++IB/XNSZYDlaelYt7seio3bJ8AfXqqcg20w5dAszBn61AbNhP9snU58Gw/72Rk4a95aDJ8hb5CbWo+ZZn7RUKdfJe/ZW+5yHyAbpK64R5w7uI7DCALUhKTkcSFyrZV/n/QHiGQC2tVRdaNXZ3YvRFaoopzGl+v5HKXeW70cTnPig0QcMl4GExXzGORzXClsOMv2dnatCQ/JdVZAs5YvZdzQBvyyVmMLMSGUykhH2cCSk0CGia4pdPwjaB0rWr67U0zGCwkHNdkpY69UXNJryzGhfJ5xUtDivk3ehArYLVj5UMiddtweN9Vsr32aYGBEJ7xoOt51QqozZJEBYjVNUp405XwR/NbkIC2jceQ30cD9bVo0t4wWqoVSO3i8AdEUOUbrO9Y73ucrXkzoOPzzkFIzOhOGqcveHGb3rdVCvp1Os/PUdCq6A10U5RW33lBMLFDI48HQFQh+MD0eBp8/6iY4w6v+6Ou96l9CVKyAPn8xcGg9PE+1ff7u79dQNexoQyi9g7oNXWxEUC3oF0FHqzaO3yZuuGuDL9aA4uqefiYtPFpKsx+WZS+m1/3JKyjNFwIO3uDPwtu2ZJ+dnrS+JKq/ /2bISDK9 PHWZxmSFKE5owZk835IIm3j8IWqEDDcUdA7CX6XYq4tyw6yESC+rGheR4g8rgT/z8IeU7hFl7OU+G/0y8F/pnuz6fObOn1GYhssgr4AWlAF2QlZeMmvR2+7EdSFHifT3IJudD767gzQdpY9rspR/47KvEt4yB1SMGS8gwoPrLq4SW+v3OJ2RaJvwxNDHClFYPph70Jt4K7Vi/XRotn0feiqSvONYo9128Ichr5llzHe00fiITFYEMy0JGkQKffIir6ApMSeLyIt3f5Eszm4y/vRTHU++ucSfLh5tnNhySSP3LOAy29JcNhk/oKmtFuzjNPPODim7Uvedekv7ebRWgPSoeWTZgcPGXa6R6ivXyArChsM7ZxYyBXBT0I9cj+qaFL5EPtTSqL40ZiyWtsbR27zMmgwb9NYGF4leY0i96yLap/jij0hgUuZsDT162Uw86YG8gE715uc5r4uxMCH90j/HXKhhHmk///bfDpC18Mcaeg5w= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000004, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --=_MailMate_4AE353E2-84D1-4474-ACB2-E5C1606C93C9_= Content-Type: text/plain On 29 Mar 2023, at 10:02, Yin, Fengwei wrote: > Hi Matthew, > > On 2/9/2023 3:54 AM, Matthew Wilcox wrote: >> On Wed, Feb 08, 2023 at 02:36:41PM -0500, Zi Yan wrote: >>> On 7 Feb 2023, at 11:51, Matthew Wilcox wrote: >>> >>>> On Tue, Feb 07, 2023 at 11:23:31AM -0500, Zi Yan wrote: >>>>> On 24 Jan 2023, at 13:13, Matthew Wilcox wrote: >>>>> >>>>>> Once we get to the part of the folio journey where we have >>>>>> one-pointer-per-page, we can't afford to maintain per-page state. >>>>>> Currently we maintain a per-page mapcount, and that will have to go. >>>>>> We can maintain extra state for a multi-page folio, but it has to be a >>>>>> constant amount of extra state no matter how many pages are in the folio. >>>>>> >>>>>> My proposal is that we maintain a single mapcount per folio, and its >>>>>> definition is the number of (vma, page table) tuples which have a >>>>>> reference to any pages in this folio. >>>>> >>>>> How about having two, full_folio_mapcount and partial_folio_mapcount? >>>>> If partial_folio_mapcount is 0, we can have a fast path without doing >>>>> anything at page level. >>>> >>>> A fast path for what? I don't understand your vision; can you spell it >>>> out for me? My current proposal is here: >>> >>> A fast code path for only handling folios as a whole. For cases that >>> subpages are mapped from a folio, traversing through subpages might be >>> needed and will be slow. A code separation might be cleaner and makes >>> folio as a whole handling quicker. >> >> To be clear, in this proposal, there is no subpage mapcount. I've got >> my eye on one struct folio per allocation, so there will be no more >> tail pages. The proposal has one mapcount, and that's it. I'd be >> open to saying "OK, we need two mapcounts", but not to anything that >> needs to scale per number of pages in the folio. >> >>> For your proposal, "How many VMAs have one-or-more pages of this folio mapped" >>> should be the responsibility of rmap. We could add a counter to rmap >>> instead. It seems that you are mixing page table mapping with virtual >>> address space (VMA) mapping together. >> >> rmap tells you how many VMAs cover this folio. It doesn't tell you >> how many of those VMAs have actually got any pages from it mapped. >> It's also rather slower than a simple atomic_read(), so I think >> you'll have an uphill battle trying to convince people to use rmap >> for this purpose. >> >> I'm not sure what you mean by "add a counter to rmap"? One count >> per mapped page in the vma? >> >>>> >>>> https://lore.kernel.org/linux-mm/Y+FkV4fBxHlp6FTH@casper.infradead.org/ >>>> >>>> The three questions we need to be able to answer (in my current >>>> understanding) are laid out here: >>>> >>>> https://lore.kernel.org/linux-mm/Y+HblAN5bM1uYD2f@casper.infradead.org/ >>> >>> I think we probably need to clarify the definition of "map" in your >>> questions. Does it mean mapped by page tables or VMAs? When a page >>> is mapped into a VMA, it can be mapped by one or more page table entries, >>> but not the other way around, right? Or is shared page table entry merged >>> now so that more than one VMAs can use a single page table entry to map >>> a folio? >> >> Mapped by page tables, just like today. It'd be quite the change to >> figure out the mapcount of a page newly brought into the page cache; >> we'd have to do an rmap walk to see how many mapcounts to give it. >> I don't think this is a great idea. >> >> As far as I know, shared page tables are only supported by hugetlbfs, >> and I prefer to stick cheese in my ears and pretend they don't exist. >> >> To be absolutely concrete about this, my proposal is: >> >> Folio brought into page cache has mapcount 0 (whether or not there are any VMAs >> that cover it) >> When we take a page fault on one of the pages in it, its mapcount >> increases from 0 to 1. >> When we take another page fault on a page in it, we do a pvmw to >> determine if any pages from this folio are already mapped by this VMA; >> we see that there is one and we do not increment the mapcount. >> We partially munmap() so that we need to unmap one of the pages. >> We remove it from the page tables and call page_remove_rmap(). >> That does another pvmw and sees there's still a page in this folio >> mapped by this VMA, does not decrement the refcount >> We truncate() the file smaller than the position of the folio, which >> causes us to unmap the rest of the folio. The pvmw walk detects no >> more pages from this folio mapped and we decrement the mapcount. >> >> Clear enough? > > I thought about this proposal for some time and would like to give it > a try. > > I did a test about getting mapcount with pvmw walk vs folio_mapcount() > call like: > 1. > while (page_vma_mapped_walk(&pvmw)) { > mapcount++; > } > > 2. > mapcount = folio_mapcount(folio); > > The pvmw walk is 3X slower than folio_mapcount() call on a Ice Lake > platform. > > > Also noticed following thing when I read related code: > 1. If it's entire folio is mapped to VMA, it's not necessary to do > pvmw walk. We can just increase mapcount (or decrease mapcount if > folio is unmapped from VMA). > > 2. The folio refcount update needs be changed to match mapcount > change. Otherwise, the #3 question in > https://lore.kernel.org/linux-mm/Y+HblAN5bM1uYD2f@casper.infradead.org/ > can't be answered. > > 3. The meaning of lruvec stat of NR_FILE_MAPPED will be changed as > we don't track each page mapcount. This info is exposed to user space > through meminfo interface. > > 4. The new mapcount present how many VMAs the folio map to. So during > split_vma/merge_vma operation, we need to update the mapcount if the > split/merge happens in the middle of folio. > > Consider following case: > A large folio with two cow pages in the middle of it. > |-----------------VMA---------------------------| > |---folio--|cow page1|cow page2|---folio| > > And the split_vma happens between cow page1/page2 > |----------VMA1----------| |-----------VMA2-----| > |---folio--|cow page1| |cow page2|---folio| > | split_vma here > > How do we detect we should update folio mapcount in this case? > Or I am just concerning the thing which is not possible to happen? I also did some study on mapcount and tried to use a single mapcount instead of existing various mapcounts. My conclusion is that from kernel perspective, a single mapcount is enough, but we will need per-page mapcount and entire_mapcount for userspace stats, NR_{ANON,FILE}_MAPPED, and NR_ANON_THPS. In kernel, almost all code only cares: 1) if a page/folio has extra pins by checking if mapcount is equal to refcount + extra, and 2) if a page/folio is mapped multiple times. A single mapcount can meet these two needs. But in userspace, to maintain the accuracy of NR_{ANON,FILE}_MAPPED, and NR_ANON_THPS, kernel needs to know when the corresponding mapcount goes from 0 to 1 (increase the counter) and 1 to 0 (decrease the counter). For NR_{ANON,FILE}_MAPPED, it is increased when a page is first mapped either by PTE or covered by PMD and decreased when a page loses its last mapping from PTE or PMD. This means without per-page mapcount and entire_mapcount, we cannot get them right. For NR_ANON_THPS, entire_mapcount is needed. A single mapcount is a mix of per-page mapcount and entire_mapcount and kernel is not able to recover the necessary information for NR_*. I wonder if userspace can live without these stats or different counters. NR_ANON_MAPPED is "AnonPages", NR_FILE_MAPPED, is "Mapped" or "file", NR_ANON_THPS is "AnonHugePages", "anon_thp". Can we just count anonymous pages and file pages regardless they are mapped or not instead. Does userspace really want to know the mapped pages? If that change can be done, we probably can have a single mapcount. BTW, I am not sure pvmw would work to check per-page or entire mapcounts, since that means for every rmap removal, pvmw is needed to decide whether to decrease NR_* counters. That seems to be expensive. Let me know if I miss anything. Thanks. -- Best Regards, Yan, Zi --=_MailMate_4AE353E2-84D1-4474-ACB2-E5C1606C93C9_= Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- iQJDBAEBCgAtFiEE6rR4j8RuQ2XmaZol4n+egRQHKFQFAmSffp8PHHppeUBudmlk aWEuY29tAAoJEOJ/noEUByhUHhMP/14xYDhHFGujEqQ30q5jQqrP2dEgJ1qXRUFc /6gCW19thYStP0KM7SOblujrJ+2dDD/Rz8H5kSrmuyr4AlRUvHmPY313pgaVRi7c 2J+I2G6MGDqCgV/nH3Mvi7xh1XpFlJ7KRxuMq5/+amMQf+bMlQS9yfj1fHAnlJq0 xhmfn5/LGFK/S4xrcydn35WXvSl7xwXmP6dNbHtAt250J5hFEFGq8x9XQkaiQcGt ruXsTChXvsR38iPDMW3apuvdf/CmZHeacg7uBlEQpy7bGwpEGu1fY/SbA84mQph1 AQMw43vZZh83fFYtikonbfAITbAOE4lFaqeK8c2J9b5PLnyRFQ83RNzLmyyr3jPT sQXs7Qy+X0Az5J85cz8/JMi3JviURA1y6T7ViWjNJDF6BLOuIRsSMmMw48Hx6oZ1 ZbOpFN9rfB3FTIUBW01dZY965eYCFnCa/m43RqB3NyoV7a7eLqrdb+pyq48S5s8d wVu+Gx3fHoiCS8qhXq1QdRn4rQ5c7VhsisaaKx6KI2Kr28WgV92zXHq1r79NJkel bU6K/XW0+MOKCE4LGL4ZAxfcobW7THcUNUlGEXmlHL6uF39vK+t/YxJgc1L5aBMY Wdw9AioJGFMVo+v8yw61AO967eHqutHaTorza+PNaIXn3BgI2VczaK3K0wmOD67X W7ad/d25 =Ujp5 -----END PGP SIGNATURE----- --=_MailMate_4AE353E2-84D1-4474-ACB2-E5C1606C93C9_=--