From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B098C433EF for ; Thu, 24 Mar 2022 21:55:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 01D7C6B0071; Thu, 24 Mar 2022 17:55:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F0FBB6B0073; Thu, 24 Mar 2022 17:55:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DD67A6B0074; Thu, 24 Mar 2022 17:55:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0033.hostedemail.com [216.40.44.33]) by kanga.kvack.org (Postfix) with ESMTP id CAE316B0071 for ; Thu, 24 Mar 2022 17:55:17 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 7502318289DC0 for ; Thu, 24 Mar 2022 21:55:17 +0000 (UTC) X-FDA: 79280636274.25.6C00A95 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf16.hostedemail.com (Postfix) with ESMTP id 8E49418002C for ; Thu, 24 Mar 2022 21:55:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:Content-Type: In-Reply-To:From:References:To:Subject:MIME-Version:Date:Message-ID:Sender: Reply-To:Cc:Content-ID:Content-Description; bh=KTgRqvDEK7bov5PjUtR6QEBdXtcARc2MNPBx4i2j3ek=; b=cEa84EFIQiG58QGKExhB/lbcDn Snfo3AYg9VqwdhJy1GAV5K3re8XcGzn7o5ovqcyln6XbFc3sZVf48ja/sUM5CzHiz/+FqqKhmzh/D AI2Sa55QlLFcS8VLstxVZyCAtSXpHPdRwmTrjNInpxbY8FzHzqkMoINcl5P4jYUaMtvBEoYuoAC+m utS3y8egJbexKS+b9DnCFiq+HGXjmQfeXu/uO66ssBRErBU5ZLQgJDOBmbAcIOFiTmxbKxOzHTWgC JQkissx/B+hdFVjl7x8X8CNmCIJxln2mmjiwrzsteccSx9Rteej4FItojHL3RVBlHZ557ZbxgW9JA j0FKqK+Q==; Received: from [2601:1c0:6280:3f0::aa0b] by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nXVQF-00Dmp7-TJ; Thu, 24 Mar 2022 21:55:12 +0000 Message-ID: <43faf292-245b-5db5-cce9-369d8fb6bd21@infradead.org> Date: Thu, 24 Mar 2022 14:55:07 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: scalability regressions related to hugetlb_fault() changes Content-Language: en-US To: Ray Fucillo , "linux-kernel@vger.kernel.org" , linux-mm References: From: Randy Dunlap In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=cEa84EFI; spf=none (imf16.hostedemail.com: domain of rdunlap@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=rdunlap@infradead.org; dmarc=none X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 8E49418002C X-Stat-Signature: 87yyofjwu3sg8o5tb944rdczfu5qdc9c X-HE-Tag: 1648158916-395949 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000076, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: [add linux-mm mailing list] On 3/24/22 13:12, Ray Fucillo wrote: > In moving to newer versions of the kernel, our customers have experienced dramatic new scalability problems in our database application, InterSystems IRIS. Our research has narrowed this down to new processes that attach to the database's shared memory segment taking very long delays (in some cases ~100ms!) acquiring the i_mmap_lock_read() in hugetlb_fault() as they fault in the huge page for the first time. The addition of this lock in hugetlb_fault() matches the versions where we see this problem. It's not just slowing the new process that incurs the delay, but backing up other processes if the page fault occurs inside a critical section within the database application. > > Is there something that can be improved here? > > The read locks in hugetlb_fault() contend with write locks that seem to be taken in very common application code paths: shmat(), process exit, fork() (not vfork()), shmdt(), presumably others. So hugetlb_fault() contending to read turns out to be common. When the system is loaded, there will be many new processes faulting in pages that may blocks the write lock, which in turn blocks more readers in fault behind it, and so on... I don't think there's any support for shared page tables in hugetlb to avoid the faults altogether. > > Switching to 1GB huge pages instead of 2MB is a good mitigation in reducing the frequency of fault, but not a complete solution. > > Thanks for considering. > > Ray -- ~Randy