From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B2B2C433F5 for ; Fri, 25 Mar 2022 00:02:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 118416B0071; Thu, 24 Mar 2022 20:02:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 09FAF6B0073; Thu, 24 Mar 2022 20:02:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E827D6B0074; Thu, 24 Mar 2022 20:02:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id D29EA6B0071 for ; Thu, 24 Mar 2022 20:02:10 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id C03AD6243D for ; Fri, 25 Mar 2022 00:02:10 +0000 (UTC) X-FDA: 79280956020.02.F749CA5 Received: from mail2.intersystems.com (mail2.intersystems.com [38.105.105.84]) by imf24.hostedemail.com (Postfix) with SMTP id 0069D180040 for ; Fri, 25 Mar 2022 00:02:09 +0000 (UTC) X-InterSystems: Sent from InterSystems X-InterSystems: Sent from InterSystems X-InterSystems: Sent from InterSystems X-InterSystems: Sent from InterSystems From: Ray Fucillo To: Mike Kravetz CC: Ray Fucillo , "linux-kernel@vger.kernel.org" , linux-mm Subject: Re: scalability regressions related to hugetlb_fault() changes Thread-Topic: scalability regressions related to hugetlb_fault() changes Thread-Index: AQHYP7uAlCiDRvHUGUacjbJUOtVkhKzPVz2AgAANE4CAABZqAA== Date: Fri, 25 Mar 2022 00:02:08 +0000 Message-ID: <8E9438A4-56BF-4DBF-9424-2161A488352B@intersystems.com> References: <43faf292-245b-5db5-cce9-369d8fb6bd21@infradead.org> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.17.254.204] x-c2processedorg: 5d7e5ca7-6395-445f-80da-8568a4fc58e5 Content-Type: text/plain; charset="us-ascii" Content-ID: <9BD5E1F3232CF04A8B389603D457D492@exchangemail.iscinternal.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 0069D180040 X-Rspam-User: Authentication-Results: imf24.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=intersystems.com; spf=pass (imf24.hostedemail.com: domain of Ray.Fucillo@intersystems.com designates 38.105.105.84 as permitted sender) smtp.mailfrom=Ray.Fucillo@intersystems.com X-Stat-Signature: 8s4aifs3g1rf9cp9gnnmripgx7q3ycg7 X-HE-Tag: 1648166529-635211 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Mar 24, 2022, at 6:41 PM, Mike Kravetz wrote= : >=20 > I also seem to remember thinking about the possibility of > avoiding the synchronization if pmd sharing was not possible. That may b= e > a relatively easy way to speed things up. Not sure if pmd sharing comes > into play in your customer environments, my guess would be yes (shared > mappings ranges more than 1GB in size and aligned to 1GB). Hi Mike,=20 This is one very large shared memory segment allocated at database startup.= It's common for it to be hundreds of GB. We allocate it with shmget() pa= ssing SHM_HUGETLB (when huge pages have been reserved for us). Not sure if= that answers... > Also, do you have any specifics about the regressions your customers are > seeing? Specifically what paths are holding i_mmap_rwsem in write mode > for long periods of time. I would expect something related to unmap. > Truncation can have long hold times especially if there are may shared > mapping. Always worth checking specifics, but more likely this is a gene= ral > issue. We've seen the write lock originate from calling shmat(), shmdt() and proce= ss exit. We've also seen it from a fork() off of one of the processes that= are attached to the shared memory segment. Some evidence suggests that fo= rk is a more costly case. However, while there are some important places w= here we'd use fork(), it's more unusual because most process creation will = vfork() and execv() a new database process (which then attaches with shmat(= )).=