From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51]) by kanga.kvack.org (Postfix) with ESMTP id A0B046B0087 for ; Wed, 8 Dec 2010 18:58:25 -0500 (EST) Received: from hpaq13.eem.corp.google.com (hpaq13.eem.corp.google.com [172.25.149.13]) by smtp-out.google.com with ESMTP id oB8NwNFs006073 for ; Wed, 8 Dec 2010 15:58:23 -0800 Received: from qwh6 (qwh6.prod.google.com [10.241.194.198]) by hpaq13.eem.corp.google.com with ESMTP id oB8NwLus023591 for ; Wed, 8 Dec 2010 15:58:22 -0800 Received: by qwh6 with SMTP id 6so1949310qwh.35 for ; Wed, 08 Dec 2010 15:58:21 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20101208152740.ac449c3d.akpm@linux-foundation.org> References: <1291335412-16231-1-git-send-email-walken@google.com> <1291335412-16231-2-git-send-email-walken@google.com> <20101208152740.ac449c3d.akpm@linux-foundation.org> Date: Wed, 8 Dec 2010 15:58:21 -0800 Message-ID: Subject: Re: [PATCH 1/6] mlock: only hold mmap_sem in shared mode when faulting in pages From: Michel Lespinasse Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org To: Andrew Morton Cc: linux-mm@kvack.org, Hugh Dickins , Rik van Riel , Peter Zijlstra , Nick Piggin , KOSAKI Motohiro , linux-kernel@vger.kernel.org, Linus Torvalds List-ID: On Wed, Dec 8, 2010 at 3:27 PM, Andrew Morton w= rote: >> Currently mlock() holds mmap_sem in exclusive mode while the pages get >> faulted in. In the case of a large mlock, this can potentially take a >> very long time, during which various commands such as 'ps auxw' will >> block. This makes sysadmins unhappy: >> >> real =A0 =A014m36.232s >> user =A0 =A00m0.003s >> sys =A0 =A0 0m0.015s >>(output from 'time ps auxw' while a 20GB file was being mlocked without >> being previously preloaded into page cache) > > The kernel holds down_write(mmap_sem) for 14m36s? Yes... [... patch snipped off ...] > Am I correct in believing that we'll still hold down_read(mmap_sem) for > a quarter hour? Yes, patch 1/6 changes the long hold time to be in read mode instead of write mode, which is only a band-aid. But, this prepares for patch 5/6, which releases mmap_sem whenever there is contention on it or when blocking on disk reads. > We don't need to hold mmap_sem at all while faulting in those pages, > do we? =A0We could just do > > =A0 =A0 =A0 =A0for (addr =3D start, addr < end; addr +=3D PAGE_SIZE) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0get_user(x, addr); > > and voila. =A0If the pages are in cache and the ptes are set up then that > will be *vastly* faster than the proposed code. =A0If the get_user() > takes a minor fault then it'll be slower. =A0If it's a major fault then > the difference probably doesn't matter much. get_user wouldn't suffice if the page is already mapped in, as we need to mark it as PageMlocked. Also, we need to skip IO and PFNMAP regions. I don't think you can make things much simpler than what I ended up with. > But whatever. =A0Is this patchset a half-fix, and should we rather be > looking for a full-fix? I think the series fully fixes the mlock() and mlockall() cases, which has been the more pressing use case for us. Even then, there are still cases where we could still observe long mmap_sem hold times - fundamentally, every place that calls get_user_pages (or do_mmap, in the mlockall MCL_FUTURE case) with a large page range may create such problems. From the looks of it, most of these places wouldn't actually care if the mmap_sem got dropped in the middle of the operation, but a general fix will have to involve looking at all the call sites to be sure. --=20 Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: email@kvack.org