From: Andrew Morton <akpm@linux-foundation.org>
To: Andy Whitcroft <apw@shadowen.org>
Cc: gerald.schaefer@de.ibm.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, schwidefsky@de.ibm.com,
heiko.carstens@de.ibm.com, mel@csn.ul.ie
Subject: Re: [PATCH 1/1] allocate structures for reservation tracking in hugetlbfs outside of spinlocks
Date: Thu, 7 Aug 2008 14:38:24 -0700 [thread overview]
Message-ID: <20080807143824.8e0803da.akpm@linux-foundation.org> (raw)
In-Reply-To: <1218140903-9757-1-git-send-email-apw@shadowen.org>
On Thu, 7 Aug 2008 21:28:23 +0100
Andy Whitcroft <apw@shadowen.org> wrote:
> [Andrew, this fixes a problem in the private reservations stack, shown up
> by some testing done by Gerald on s390 with PREEMPT. It fixes an attempt
> at allocation while holding locks. This should be merged up to mainline
> as a bug fix to those patches.]
>
> In the normal case, hugetlbfs reserves hugepages at map time so that the
> pages exist for future faults. A struct file_region is used to track
> when reservations have been consumed and where. These file_regions
> are allocated as necessary with kmalloc() which can sleep with the
> mm->page_table_lock held. This is wrong and triggers may-sleep warning
> when PREEMPT is enabled.
>
> Updates to the underlying file_region are done in two phases. The first
> phase prepares the region for the change, allocating any necessary memory,
> without actually making the change. The second phase actually commits
> the change. This patch makes use of this by checking the reservations
> before the page_table_lock is taken; triggering any necessary allocations.
> This may then be safely repeated within the locks without any allocations
> being required.
>
> Credit to Mel Gorman for diagnosing this failure and initial versions of
> the patch.
>
After applying the patch:
: int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
: unsigned long address, int write_access)
: {
: pte_t *ptep;
: pte_t entry;
: int ret;
: struct page *pagecache_page = NULL;
: static DEFINE_MUTEX(hugetlb_instantiation_mutex);
: struct hstate *h = hstate_vma(vma);
:
: ptep = huge_pte_alloc(mm, address, huge_page_size(h));
: if (!ptep)
: return VM_FAULT_OOM;
:
: /*
: * Serialize hugepage allocation and instantiation, so that we don't
: * get spurious allocation failures if two CPUs race to instantiate
: * the same page in the page cache.
: */
: mutex_lock(&hugetlb_instantiation_mutex);
: entry = huge_ptep_get(ptep);
: if (huge_pte_none(entry)) {
: ret = hugetlb_no_page(mm, vma, address, ptep, write_access);
: mutex_unlock(&hugetlb_instantiation_mutex);
: return ret;
: }
:
: ret = 0;
:
: /*
: * If we are going to COW the mapping later, we examine the pending
: * reservations for this page now. This will ensure that any
: * allocations necessary to record that reservation occur outside the
: * spinlock. For private mappings, we also lookup the pagecache
: * page now as it is used to determine if a reservation has been
: * consumed.
: */
: if (write_access && !pte_write(entry)) {
: vma_needs_reservation(h, vma, address);
:
: if (!(vma->vm_flags & VM_SHARED))
: pagecache_page = hugetlbfs_pagecache_page(h,
: vma, address);
: }
There's a seeming race window here, where a new page can get
instantiated. But down-read(mmap_sem) plus hugetlb_instantiation_mutex
prevents that, yes?
: spin_lock(&mm->page_table_lock);
: /* Check for a racing update before calling hugetlb_cow */
: if (likely(pte_same(entry, huge_ptep_get(ptep))))
: if (write_access && !pte_write(entry))
: ret = hugetlb_cow(mm, vma, address, ptep, entry,
: pagecache_page);
: spin_unlock(&mm->page_table_lock);
:
: if (pagecache_page) {
: unlock_page(pagecache_page);
: put_page(pagecache_page);
: }
:
: mutex_unlock(&hugetlb_instantiation_mutex);
:
: return ret;
: }
:
:
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-08-07 21:38 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-06 14:43 [BUG] hugetlb: sleeping function called from invalid context Gerald Schaefer
2008-08-06 19:03 ` [PATCH 1/1] allocate structures for reservation tracking in hugetlbfs outside of spinlocks Andy Whitcroft
2008-08-07 11:35 ` Gerald Schaefer
2008-08-07 20:28 ` Andy Whitcroft
2008-08-07 21:38 ` Andrew Morton [this message]
2008-08-08 8:33 ` Mel Gorman
2008-08-08 10:16 ` Andy Whitcroft
2008-08-08 11:10 ` [PATCH 1/1] allocate structures for reservation tracking in hugetlbfs outside of spinlocks v2 Andy Whitcroft
2008-08-08 12:57 ` Gerald Schaefer
2008-08-11 17:58 ` Andy Whitcroft
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080807143824.8e0803da.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=apw@shadowen.org \
--cc=gerald.schaefer@de.ibm.com \
--cc=heiko.carstens@de.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=schwidefsky@de.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox