From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f173.google.com (mail-wi0-f173.google.com [209.85.212.173]) by kanga.kvack.org (Postfix) with ESMTP id EB6806B0037 for ; Mon, 4 Aug 2014 04:56:04 -0400 (EDT) Received: by mail-wi0-f173.google.com with SMTP id f8so4635561wiw.6 for ; Mon, 04 Aug 2014 01:56:04 -0700 (PDT) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id yv2si32220949wjc.173.2014.08.04.01.56.01 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 04 Aug 2014 01:56:02 -0700 (PDT) From: Vlastimil Babka Subject: [PATCH v6 01/13] mm, THP: don't hold mmap_sem in khugepaged when allocating THP Date: Mon, 4 Aug 2014 10:55:12 +0200 Message-Id: <1407142524-2025-2-git-send-email-vbabka@suse.cz> In-Reply-To: <1407142524-2025-1-git-send-email-vbabka@suse.cz> References: <1407142524-2025-1-git-send-email-vbabka@suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton , David Rientjes Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Vlastimil Babka , Minchan Kim , Joonsoo Kim , Michal Nazarewicz , Naoya Horiguchi , Christoph Lameter , Rik van Riel , Mel Gorman , Zhang Yanfei When allocating huge page for collapsing, khugepaged currently holds mmap_sem for reading on the mm where collapsing occurs. Afterwards the read lock is dropped before write lock is taken on the same mmap_sem. Holding mmap_sem during whole huge page allocation is therefore useless, the vma needs to be rechecked after taking the write lock anyway. Furthemore, huge page allocation might involve a rather long sync compaction, and thus block any mmap_sem writers and i.e. affect workloads that perform frequent m(un)map or mprotect oterations. This patch simply releases the read lock before allocating a huge page. It also deletes an outdated comment that assumed vma must be stable, as it was using alloc_hugepage_vma(). This is no longer true since commit 9f1b868a13ac ("mm: thp: khugepaged: add policy for finding target node"). Signed-off-by: Vlastimil Babka Cc: Minchan Kim Acked-by: Mel Gorman Cc: Joonsoo Kim Cc: Michal Nazarewicz Cc: Naoya Horiguchi Cc: Christoph Lameter Cc: Rik van Riel Acked-by: David Rientjes --- mm/huge_memory.c | 20 +++++++------------- 1 file changed, 7 insertions(+), 13 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index d9a21d06..7cfc325 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2319,23 +2319,17 @@ static struct page int node) { VM_BUG_ON_PAGE(*hpage, *hpage); + /* - * Allocate the page while the vma is still valid and under - * the mmap_sem read mode so there is no memory allocation - * later when we take the mmap_sem in write mode. This is more - * friendly behavior (OTOH it may actually hide bugs) to - * filesystems in userland with daemons allocating memory in - * the userland I/O paths. Allocating memory with the - * mmap_sem in read mode is good idea also to allow greater - * scalability. + * Before allocating the hugepage, release the mmap_sem read lock. + * The allocation can take potentially a long time if it involves + * sync compaction, and we do not need to hold the mmap_sem during + * that. We will recheck the vma after taking it again in write mode. */ + up_read(&mm->mmap_sem); + *hpage = alloc_pages_exact_node(node, alloc_hugepage_gfpmask( khugepaged_defrag(), __GFP_OTHER_NODE), HPAGE_PMD_ORDER); - /* - * After allocating the hugepage, release the mmap_sem read lock in - * preparation for taking it in write mode. - */ - up_read(&mm->mmap_sem); if (unlikely(!*hpage)) { count_vm_event(THP_COLLAPSE_ALLOC_FAILED); *hpage = ERR_PTR(-ENOMEM); -- 1.8.4.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org