From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, PDS_BAD_THREAD_QP_64,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67192C47092 for ; Tue, 1 Jun 2021 17:39:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 066F6613CE for ; Tue, 1 Jun 2021 17:39:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 066F6613CE Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 90AE16B0070; Tue, 1 Jun 2021 13:39:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8BB126B0071; Tue, 1 Jun 2021 13:39:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6C0126B0072; Tue, 1 Jun 2021 13:39:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0028.hostedemail.com [216.40.44.28]) by kanga.kvack.org (Postfix) with ESMTP id 388996B0070 for ; Tue, 1 Jun 2021 13:39:28 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id B289E12F2 for ; Tue, 1 Jun 2021 17:39:27 +0000 (UTC) X-FDA: 78205866774.31.FA69C53 Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) by imf10.hostedemail.com (Postfix) with ESMTP id 6EDCC4202A1B for ; Tue, 1 Jun 2021 17:39:15 +0000 (UTC) Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 151Ha57F103582; Tue, 1 Jun 2021 17:39:25 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : content-id : content-transfer-encoding : mime-version; s=corp-2020-01-29; bh=H18XAj4D+KEJImFPXqj64GLmwSJIUxyk9XCqob+z9C0=; b=oi/N1Q6xLL6nYjZiCMEF3J7jXK6qrkZqi40DC9p9iGmEFpgrqOmptjUKI3q4Jc/CMf2l cga9iZufPMKXJWv5JH1zzriVQgg5UJS73ND7ji+4FHDNzOfRbDe70nzZr790VcWDuCNm b9uU1SfOzJhvoinPzs+njHz58Jdv2U73n0TNjMFeQEgYypRUY+Vetqz9GuihFMsICalg px4oZAUYwDUxdefsPH8UiQMsJwBykxabvastYZOmUFOE2Ft4Mm6bkQ5xTxmD3MUJZQM1 sqP6yGoIIlSPbcuJJKA9ckEfB2ZxPBluEbUt1lGuqSlooW2YbUDuK2/eoSxKenoQDllx vw== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by aserp2120.oracle.com with ESMTP id 38udjmpa0q-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 01 Jun 2021 17:39:25 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 151HUjss016220; Tue, 1 Jun 2021 17:39:24 GMT Received: from nam11-co1-obe.outbound.protection.outlook.com (mail-co1nam11lp2175.outbound.protection.outlook.com [104.47.56.175]) by aserp3030.oracle.com with ESMTP id 38ubndbngq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 01 Jun 2021 17:39:24 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=D9w8mvVP2rAXvIg4aSoLtBydrKOSG4SlXXxSGAQwzrgy4Xb6wwbqMpWnlCAoxr1ffPxuY0lCtlOFa71mlf5vmepETCET6R4I/WYWavcKdYgulcv1jfNHTDL/0oTDmC0hCRujMcUSY3cMGElm00RTQCb4bLWAh/nTvJankZc7t7qU+8RuGaFxJiBq3UwteRV7phaKTFVukCUAyu/jLxTaXjayMo6HXjQq2qNULlIwL+DP4woJJqrH2pxj1reyeG5jaH57n7ZNIM7J4VwCFZwPiNOVudjZ3yib8B15xHaCGiElT9jVX96r0Rdb4Recc11HT7uUgurnWqQsqudp/tblJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=H18XAj4D+KEJImFPXqj64GLmwSJIUxyk9XCqob+z9C0=; b=UCm35/EpWdib52biRQjHA5wLbYfq4ACzikmyC8/0Amv9ZWbjHxkJNpwscd/20kz5GR2i615cYaGdNBIC0iwVbMMQFTJGkntD2IKGE/WI8FdJksieG+VZD+kE2rnmzkjE8FkQMIGligC8a3BUq1iMfj+C3SU2YQnFqTi7uhu/BrwRtpd71f2wsrspX1K2EiH39KGjU5NNcjd8jaIVHooOsIVykJ06Aq0lMeIaSakSkeJPV124l2UPv8ktgACU03xWU1LnFOTaavvQoYTiOSyVi21kjwO+WR/SPFcCLYeE+vTOmGv1O2cOzT8gnHSQ/eB+3sWvWGYnTtTvMwhI7QVxnw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=H18XAj4D+KEJImFPXqj64GLmwSJIUxyk9XCqob+z9C0=; b=p2KJY6rxdRgPSCcQU4o5ZCl6VmmQnqs8K1vftSrj02uuisa+tpgqcedj+hBVaHAz/FjrTmNDgzJjCsU74/lmsCXUD0r/Iq6H8Gimo0FZW9VSY356yCMLDE3vKT+MrF7tvNN0fnpfG1FMtKATYta5d75njqpNJonpmeqT644hYK8= Received: from MWHPR10MB1582.namprd10.prod.outlook.com (2603:10b6:300:22::8) by CO6PR10MB5395.namprd10.prod.outlook.com (2603:10b6:5:35c::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4173.22; Tue, 1 Jun 2021 17:39:22 +0000 Received: from MWHPR10MB1582.namprd10.prod.outlook.com ([fe80::353a:1802:6e91:1811]) by MWHPR10MB1582.namprd10.prod.outlook.com ([fe80::353a:1802:6e91:1811%8]) with mapi id 15.20.4173.030; Tue, 1 Jun 2021 17:39:22 +0000 From: Liam Howlett To: Michel Lespinasse CC: "lsf-pc@lists.linux-foundation.org" , "linux-mm@kvack.org" Subject: Re: [LSF/MM TOPIC] mmap locking topics Thread-Topic: [LSF/MM TOPIC] mmap locking topics Thread-Index: AQHXVqFuONuNJCeqrUCNctNASYqTwqr/bNIA Date: Tue, 1 Jun 2021 17:39:22 +0000 Message-ID: <20210601173916.56qowjj3ctnc7lwh@revolver> References: <20210601044845.GA12713@lespinasse.org> In-Reply-To: <20210601044845.GA12713@lespinasse.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [23.233.25.87] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 14e684d5-b23f-4cb4-e152-08d925243136 x-ms-traffictypediagnostic: CO6PR10MB5395: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: PEVj5R3MU5ItdPKVcGGn4SY9ox2OAEfbP+7uOeB29RbgieXFt7ZgBQLWB+b1Jyg2hioOeBBWMMAuxvqjOCJybqQTXB1XaJ0/AnWtXRGV47pHrt2QImK/r3XneGnwWtbAW8UVASfQrfd4jFYqjMPKcPQW02tebo3SzINbnMEHZOPbzp9o6ct6S/h0rJ3MlWmpWZBA6d74QFm3mxBh+5OdxBCYiEpVPgfJaUe2YiSj/jljPXrngdWINj0RJ2KPk+GeJChShS3+N5MRDN7UIrj6LYhaCssVNnywgWAsmnpy/yG9PPwGF6564YbDWNY5eCJDICgAEwcHNJCcPnCCXg694gKAQUdRgjIvOFrsn5p6eJNuKr6P7Jl+NTrfJgV9ofE2UFpLDJqq/9lKHADWkQ/zqD97NVyBWldv75w+zMUcWMYzfOFJUxOsw842Unm+M8KbVx+gUMnhSjBFmBFlm0EcZzWNkDDozZip/ZG2c6zPUs3b/RQ7CHHa8+Mv6BRliJs9woHWV5Y7VjcwceeOx240+Ja0pBb0IpsagCVVh842DGw6AcNpjH6QB5SJws57rXgeoF0qNMoigR4nJjpIHAqGG0YxdEnIxeAl3YmYu7URb3c= x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:MWHPR10MB1582.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(7916004)(366004)(498600001)(66446008)(33716001)(64756008)(66556008)(83380400001)(71200400001)(6916009)(6486002)(44832011)(91956017)(9686003)(6512007)(186003)(26005)(5660300002)(2906002)(8936002)(38100700002)(54906003)(66476007)(76116006)(66946007)(122000001)(86362001)(4326008)(1076003)(8676002)(6506007);DIR:OUT;SFP:1101; x-ms-exchange-antispam-messagedata: =?us-ascii?Q?FpedLt+ozJahtx2kjgIUB4BAnWUYjelm7w5Csaz+/5KaL2tfyXgVFM3NjQMa?= =?us-ascii?Q?5VdTg9eNtIyBHWpeada6yIjSyylOwxHBi/1vYkyV2KujAGHbv+krF2imYQ95?= =?us-ascii?Q?jLPuIh9eIXfz92QestElsTbeLJNXtSiwA+iX4QF+QRow8OEzuffXWqzJ/e8n?= =?us-ascii?Q?ckPmVK/7JF9aC3LDE5xRRH7xcP8wP2MYX6XmnvrgXSHYTC0sstXdEicLV/Es?= =?us-ascii?Q?UwKNV+kdEVpPj+WlAdgMykm56MsQw3XzDOlcspXzsDZOJ5p0DZ1+UlY+VTsK?= =?us-ascii?Q?onxjj/c/PumTZjorpBEoVxziUViFZdqjc5JVOZ956wuIAJwNmCvSp15fYsG0?= =?us-ascii?Q?YNCSsf5r9IyjHw3q26OmYVWIVcBhmKowoCTiPG25KlyFQJM2/3BNfJgk6Xl4?= =?us-ascii?Q?jvaCxfnvS/FoJguwes4T5jvAuUB9+Syft2XEH4wPhGn3HMCbuMlDrVLN13tb?= =?us-ascii?Q?ILzlr8z0HIK/9kBrm7uyYy3BFs2nRhtxNr2Zm6k76HpGAtHBfPFLUFDg8eAD?= =?us-ascii?Q?fKnwgTK7PEUNbDwxKFOOVD7146WsKl4s1xZAr940uqsR6gH4e00DAOREjUp9?= =?us-ascii?Q?qoQaw7S5daPf2oUcaAkbgD3Va1o6wMYqkhZGT4vNx1KJ5INKAuNixzeXCKwn?= =?us-ascii?Q?g6vqb/Ev8g2MBvVWWh7dULPpQD9rFIngR4vTT08ki0pI07JWqFwzetChYCjI?= =?us-ascii?Q?nNp2qs8XHpJ+X7wjVZaipYdPbiPhhBZVXLNqiHxLClWjv4hczQfO/LUOI6QK?= =?us-ascii?Q?qDKTBCsgnwc33tgrrFrspWT2vXD0yXyZww7dgRr42L+PqcEjjnyF3WcUsD1T?= =?us-ascii?Q?0LKBJ1OxUxigp5zy2NJ6/01GNMeTA97JMjtZoyrp8r8nkk0uNNvGfvqLGuS1?= =?us-ascii?Q?ooBQaYwcjPo1ET1zrvoUYIRFZ2QJ++AR138lZqc50GgC2qSLo07Tni/YkuYG?= =?us-ascii?Q?8vYsvIuuKV7rDjzyy8jTEN8+0Jp6z9h8wZ8EFdaNeEQ81YL8459M7UWFiPYR?= =?us-ascii?Q?mbw44Mchg8WwPvDOVC9vfJlhmqrX8FEztser3k0PGO1ir/bZPEoOkIgU91To?= =?us-ascii?Q?WevRCRTjD2OMhKbRDd7qlbLD0h3Xe2NMew1n+KkV8+vOUBjBEDOpHWxSKM4T?= =?us-ascii?Q?99z4py0AbsqF1ghRVpoWbwdHfgXkV4CnixLcLzYKiTFmR0RG8ZxDc2M4PYde?= =?us-ascii?Q?he4gHQkWqm8qJf6OH0i8UxhUOOn2cWQqwFI+4aeeJeTEwdzoZw1xv4oHT4xU?= =?us-ascii?Q?J3JyOZ46GqEndwjXvlpIuYKi13qkeZks6nk7w+GO27rUj/x9LvL6rlo2g7j6?= =?us-ascii?Q?IBK3UPtEeyo/PxBYMZ0uRh7V?= x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="us-ascii" Content-ID: <7A67944D2030D145B94D3ADF8468D767@namprd10.prod.outlook.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: MWHPR10MB1582.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 14e684d5-b23f-4cb4-e152-08d925243136 X-MS-Exchange-CrossTenant-originalarrivaltime: 01 Jun 2021 17:39:22.2967 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: M+SlH4LGfQHkMf7+jaOtNqkk/Q0I1JKFiGKlmav7gYErh2rEKrelxg4PRprVeOEIlegnH5flPp19+jkU9fCs7A== X-MS-Exchange-Transport-CrossTenantHeadersStamped: CO6PR10MB5395 X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=10002 signatures=668682 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 bulkscore=0 phishscore=0 spamscore=0 malwarescore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104190000 definitions=main-2106010118 X-Proofpoint-GUID: SuNI9xpSoCDRuKU6OZ_rVYpg_b45Zkyz X-Proofpoint-ORIG-GUID: SuNI9xpSoCDRuKU6OZ_rVYpg_b45Zkyz X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=10002 signatures=668682 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 malwarescore=0 impostorscore=0 clxscore=1011 lowpriorityscore=0 mlxscore=0 adultscore=0 spamscore=0 suspectscore=0 bulkscore=0 priorityscore=1501 mlxlogscore=999 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104190000 definitions=main-2106010118 Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2020-01-29 header.b="oi/N1Q6x"; dkim=pass header.d=oracle.onmicrosoft.com header.s=selector2-oracle-onmicrosoft-com header.b=p2KJY6rx; spf=none (imf10.hostedemail.com: domain of liam.howlett@oracle.com has no SPF policy when checking 141.146.126.78) smtp.mailfrom=liam.howlett@oracle.com; dmarc=pass (policy=none) header.from=oracle.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 6EDCC4202A1B X-Stat-Signature: 6mmd7g5cma4xqaj1fkcg1mht5i4tofqm X-HE-Tag: 1622569155-606581 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: * Michel Lespinasse [210601 00:48]: > Hi, >=20 > I have two MM topics to propose for LSF/MM/BPF 2021, > both in the area of mmap lock performance: >=20 >=20 > I - Speculative page faults >=20 > The idea there is to avoid taking the mmap lock during page faults, > at least for the easier cases. This requiers the fault handler to be > a careful to avoid races with mmap writers (and most particularly > munmap), and when the new page is ready to be inserted into the user > process, to verify, at the last moment (after taking the page table > lock), that there has been no race between the fault handler and any > mmap writers. Such checks can be implemented locally, without hitting > any global locks, which results in very nice scalability improvements > when processing concurrent faults. >=20 > I think the idea is ready for prime time, and a patchset has been propose= d, > but it is not getting much traction yet. I suspect we will need to discus= s > the idea in person to figure out the next steps. I agree that the locking should be avoided, especially in this critical path. I'd like to do this by simplifying the data structures tracking the VMAs. I feel like adding more tracking and special cases will further complicate the existing code - which is already overly complicated. > II - Fine grained MM locking >=20 > A major limitation of the current mmap lock design is that it covers a > process's entire address space. In threaded applications, it is common > for threads to issue concurrent requests for non-overlapping parts of > the process address space - for example, one thread might be mmaping > new memory while another releases a different range, and a third might > fault within his own address range too. The current mmap lock design > does not take the non-overlapping ranges into consideration, and > consequently serialises the 3 above requests rather than letting them > proceed in parallel. >=20 > There has been a lot of work spent mitigating the problem by reducing > the mmap lock hold times (for example, dropping the mmap lock during > page faults that hit disk, or lowering to a read lock during longer > mmap/munmap/populate operations). But this approach is hitting its > limits, and I think it would be better to fix the core of the problem > by making the mmap lock capable of allowing concurrent non-overlapping > operations. >=20 > I would like to propose an approach that: > - separates the mmap lock into two separate locks, one that is only > held for short periods of time to protect mm-wide data structures > (including the vma tree), and another that functions as a range lock > and can be held for longer periods of time; > - allows for incremental conversion from the current code to being > aware about locking ranges; >=20 > I have been maintaining a prototype for this, which has been shared > with a small set of people. The main holdup is with page fault > performance; in order to allow non-overlapping writers to proceed > while some page faults are in progress, the prototype needs to > maintain a shared structure holding addresses for each pending page > fault. Updating this shared structure gets very expenside in high > concurrency page fault benchmarks, though it seems quite unnoticeable > in macro benchmarks I hae looked at. >=20 Although locking the entire VMA has caused a bottleneck with the increased thread count in modern hardware, I do not believe locking a range of VMAs is the answer. There is currently 3 data structures plus the mmap_sem (and sometimes the page table lock), not to mention the reverse mapping - all to keep track of VMAs. There are currently three projects with at least five organizations involved in tackling the mmap semaphore locking issue. It would be beneficial for all involved to hash out an overall view of where these solutions should fit into the larger picture. I am aware of the following projects in this area: - Replacing the rbtree with the Maple Tree - Speculative page faults (SPF), as discussed above. - Range Locking of VMAs, as discussed above. If anyone has any other projects under development or ideas, please reply a= nd add them. Thanks, Liam=