From: Mark Hairgrove <mhairgrove@nvidia.com>
To: "Jérôme Glisse" <jglisse@redhat.com>
Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org,
Linus Torvalds <torvalds@linux-foundation.org>,
joro@8bytes.org, Mel Gorman <mgorman@suse.de>,
"H. Peter Anvin" <hpa@zytor.com>,
Peter Zijlstra <peterz@infradead.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Johannes Weiner <jweiner@redhat.com>,
Larry Woodman <lwoodman@redhat.com>,
Rik van Riel <riel@redhat.com>, Dave Airlie <airlied@redhat.com>,
Brendan Conoboy <blc@redhat.com>,
Joe Donohue <jdonohue@redhat.com>,
Christophe Harle <charle@nvidia.com>,
Duncan Poole <dpoole@nvidia.com>,
Sherry Cheung <SCheung@nvidia.com>,
Subhash Gutti <sgutti@nvidia.com>,
John Hubbard <jhubbard@nvidia.com>,
Lucien Dunning <ldunning@nvidia.com>,
Cameron Buschardt <cabuschardt@nvidia.com>,
Arvind Gopalakrishnan <arvindg@nvidia.com>,
Haggai Eran <haggaie@mellanox.com>,
Shachar Raindel <raindel@mellanox.com>,
Liran Liss <liranl@mellanox.com>,
Roland Dreier <roland@purestorage.com>,
Ben Sander <ben.sander@amd.com>,
Greg Stoner <Greg.Stoner@amd.com>,
John Bridgman <John.Bridgman@amd.com>,
Michael Mantor <Michael.Mantor@amd.com>,
Paul Blinzer <Paul.Blinzer@amd.com>,
Leonid Shamis <Leonid.Shamis@amd.com>,
Laurent Morichetti <Laurent.Morichetti@amd.com>,
Alexander Deucher <Alexander.Deucher@amd.com>
Subject: Re: [PATCH 02/15] mmu_notifier: keep track of active invalidation ranges v4
Date: Mon, 31 Aug 2015 20:27:17 -0700 [thread overview]
Message-ID: <alpine.DEB.2.00.1508312003400.18393@mdh-linux64-2.nvidia.com> (raw)
In-Reply-To: <1439493328-1028-3-git-send-email-jglisse@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 6415 bytes --]
On Thu, 13 Aug 2015, JA(C)rA'me Glisse wrote:
> The invalidate_range_start() and invalidate_range_end() can be
> considered as forming an "atomic" section for the cpu page table
> update point of view. Between this two function the cpu page
> table content is unreliable for the address range being
> invalidated.
>
> This patch use a structure define at all place doing range
> invalidation. This structure is added to a list for the duration
> of the update ie added with invalid_range_start() and removed
> with invalidate_range_end().
>
> Helpers allow querying if a range is valid and wait for it if
> necessary.
>
> For proper synchronization, user must block any new range
> invalidation from inside there invalidate_range_start() callback.
s/there/their/
> Otherwise there is no garanty that a new range invalidation will
s/garanty/guarantee/
> not be added after the call to the helper function to query for
> existing range.
>
> [...]
>
> +/* mmu_notifier_range_is_valid_locked() - test if range overlap with active
s/overlap/overlaps/
> + * invalidation.
> + *
> + * @mm: The mm struct.
> + * @start: Start address of the range (inclusive).
> + * @end: End address of the range (exclusive).
> + * Returns: false if overlap with an active invalidation, true otherwise.
> + *
> + * This function test whether any active invalidated range conflict with a
s/test/tests/
s/invalidated/invalidation/
s/conflict/conflicts/
> + * given range ([start, end[), active invalidation are added to a list inside
end[ -> end]
s/invalidation/invalidations/
> + * __mmu_notifier_invalidate_range_start() and removed from that list inside
> + * __mmu_notifier_invalidate_range_end().
> + */
> +static bool mmu_notifier_range_is_valid_locked(struct mm_struct *mm,
> + unsigned long start,
> + unsigned long end)
> +{
> + struct mmu_notifier_range *range;
> +
> + list_for_each_entry(range, &mm->mmu_notifier_mm->ranges, list) {
> + if (range->end > start && range->start < end)
> + return false;
> + }
> + return true;
> +}
> +
> +/* mmu_notifier_range_is_valid() - test if range overlap with active
s/overlap/overlaps/
> + * invalidation.
> + *
> + * @mm: The mm struct.
> + * @start: Start address of the range (inclusive).
> + * @end: End address of the range (exclusive).
> + *
> + * This function wait for any active range invalidation that conflict with the
> + * given range, to end. See mmu_notifier_range_wait_valid() on how to use this
> + * function properly.
Bad copy/paste from range_wait_valid? mmu_notifier_range_is_valid just
queries the state, it doesn't wait.
> + */
> +bool mmu_notifier_range_is_valid(struct mm_struct *mm,
> + unsigned long start,
> + unsigned long end)
> +{
> + bool valid;
> +
> + spin_lock(&mm->mmu_notifier_mm->lock);
> + valid = mmu_notifier_range_is_valid_locked(mm, start, end);
> + spin_unlock(&mm->mmu_notifier_mm->lock);
> + return valid;
> +}
> +EXPORT_SYMBOL_GPL(mmu_notifier_range_is_valid);
> +
> +/* mmu_notifier_range_wait_valid() - wait for a range to have no conflict with
> + * active invalidation.
> + *
> + * @mm: The mm struct.
> + * @start: Start address of the range (inclusive).
> + * @end: End address of the range (exclusive).
> + *
> + * This function wait for any active range invalidation that conflict with the
> + * given range, to end.
> + *
> + * Note by the time this function return a new range invalidation that conflict
> + * might have started. So you need to atomically block new range and query
> + * again if range is still valid with mmu_notifier_range_is_valid(). So call
> + * sequence should be :
> + *
> + * again:
> + * mmu_notifier_range_wait_valid()
> + * // block new invalidation using that lock inside your range_start callback
> + * lock_block_new_invalidation()
> + * if (!mmu_notifier_range_is_valid())
> + * goto again;
> + * unlock()
I think this example sequence can deadlock so I wouldn't want to encourage
its use. New invalidation regions are added to the list before the
range_start callback is invoked.
Thread A Thread B
----------------- -----------------
mmu_notifier_range_wait_valid
// returns
__mmu_notifier_invalidate_range_start
list_add_tail
lock_block_new_invalidation
->invalidate_range_start
// invalidation blocked in callback
mmu_notifier_range_is_valid // fails
goto again
mmu_notifier_range_wait_valid // deadlock
mmu_notifier_range_wait_valid can't finish until thread B's callback
returns, but thread B's callback can't return because it's blocked.
I see that HMM in later patches takes the approach of not holding the lock
when mmu_notifier_range_is_valid returns false. Instead of stalling new
invalidations it returns -EAGAIN to the caller. While that resolves the
deadlock, it won't prevent the faulting thread from being starved in the
pathological case.
Is it out of the question to build a lock into the mmu notifier API
directly? It's a little worrisome to me that the complexity for this
locking is pushed into the callbacks rather than handled in the core.
Something like this:
mmu_notifier_range_lock(start, end)
mmu_notifier_range_unlock(start, end)
If that's not feasible and we have to stick with the current approach,
then I suggest changing the "valid" name. "valid" doesn't have a clear
meaning at first glance because the reader doesn't know what would make a
range "valid." How about "active" instead? Then the names would look
something like this, assuming the polarity matches their current versions:
mmu_notifier_range_inactive_locked
mmu_notifier_range_inactive
mmu_notifier_range_wait_active
> + */
> +void mmu_notifier_range_wait_valid(struct mm_struct *mm,
> + unsigned long start,
> + unsigned long end)
> +{
> + spin_lock(&mm->mmu_notifier_mm->lock);
> + while (!mmu_notifier_range_is_valid_locked(mm, start, end)) {
> + int nranges = mm->mmu_notifier_mm->nranges;
> +
> + spin_unlock(&mm->mmu_notifier_mm->lock);
> + wait_event(mm->mmu_notifier_mm->wait_queue,
> + nranges != mm->mmu_notifier_mm->nranges);
> + spin_lock(&mm->mmu_notifier_mm->lock);
> + }
> + spin_unlock(&mm->mmu_notifier_mm->lock);
> +}
> +EXPORT_SYMBOL_GPL(mmu_notifier_range_wait_valid);
> +
next prev parent reply other threads:[~2015-09-01 3:27 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-13 19:15 HMM (Heterogeneous Memory Management) v10 Jérôme Glisse
2015-08-13 19:15 ` [PATCH 01/15] mmu_notifier: add event information to address invalidation v8 Jérôme Glisse
2015-08-13 19:15 ` [PATCH 02/15] mmu_notifier: keep track of active invalidation ranges v4 Jérôme Glisse
2015-09-01 3:27 ` Mark Hairgrove [this message]
2015-09-01 14:58 ` Jerome Glisse
2015-08-13 19:15 ` [PATCH 03/15] mmu_notifier: pass page pointer to mmu_notifier_invalidate_page() v2 Jérôme Glisse
2015-08-13 19:15 ` [PATCH 04/15] mmu_notifier: allow range invalidation to exclude a specific mmu_notifier Jérôme Glisse
2015-08-13 19:15 ` [PATCH 05/15] HMM: introduce heterogeneous memory management v5 Jérôme Glisse
2015-08-13 19:15 ` [PATCH 06/15] HMM: add HMM page table v4 Jérôme Glisse
2015-08-13 19:15 ` [PATCH 07/15] HMM: add per mirror " Jérôme Glisse
2015-08-13 19:15 ` [PATCH 08/15] HMM: add device page fault support v4 Jérôme Glisse
2015-08-13 19:15 ` [PATCH 09/15] HMM: add mm page table iterator helpers Jérôme Glisse
2015-08-13 19:15 ` [PATCH 10/15] HMM: use CPU page table during invalidation Jérôme Glisse
2015-08-13 19:15 ` [PATCH 11/15] HMM: add discard range helper (to clear and free resources for a range) Jérôme Glisse
2015-08-13 19:15 ` [PATCH 12/15] HMM: add dirty range helper (toggle dirty bit inside mirror page table) v2 Jérôme Glisse
2015-08-13 19:15 ` [PATCH 13/15] HMM: DMA map memory on behalf of device driver v2 Jérôme Glisse
2015-08-13 19:15 ` [PATCH 14/15] HMM: add documentation explaining HMM internals and how to use it Jérôme Glisse
2015-08-13 19:15 ` [PATCH 15/15] hmm/dummy: dummy driver for testing and showcasing the HMM API Jérôme Glisse
2015-09-23 10:21 ` HMM (Heterogeneous Memory Management) v10 Pavel Machek
-- strict thread matches above, loose matches on Subject: below --
2015-07-17 18:52 [PATCH 00/15] HMM (Heterogeneous Memory Management) v9 Jérôme Glisse
2015-07-17 18:52 ` [PATCH 02/15] mmu_notifier: keep track of active invalidation ranges v4 Jérôme Glisse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.00.1508312003400.18393@mdh-linux64-2.nvidia.com \
--to=mhairgrove@nvidia.com \
--cc=Alexander.Deucher@amd.com \
--cc=Greg.Stoner@amd.com \
--cc=John.Bridgman@amd.com \
--cc=Laurent.Morichetti@amd.com \
--cc=Leonid.Shamis@amd.com \
--cc=Michael.Mantor@amd.com \
--cc=Paul.Blinzer@amd.com \
--cc=SCheung@nvidia.com \
--cc=aarcange@redhat.com \
--cc=airlied@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=arvindg@nvidia.com \
--cc=ben.sander@amd.com \
--cc=blc@redhat.com \
--cc=cabuschardt@nvidia.com \
--cc=charle@nvidia.com \
--cc=dpoole@nvidia.com \
--cc=haggaie@mellanox.com \
--cc=hpa@zytor.com \
--cc=jdonohue@redhat.com \
--cc=jglisse@redhat.com \
--cc=jhubbard@nvidia.com \
--cc=joro@8bytes.org \
--cc=jweiner@redhat.com \
--cc=ldunning@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=liranl@mellanox.com \
--cc=lwoodman@redhat.com \
--cc=mgorman@suse.de \
--cc=peterz@infradead.org \
--cc=raindel@mellanox.com \
--cc=riel@redhat.com \
--cc=roland@purestorage.com \
--cc=sgutti@nvidia.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox