linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Mel Gorman <mgorman@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vlastimil Babka <vbabka@suse.cz>, Jan Kara <jack@suse.cz>,
	Michal Hocko <mhocko@suse.cz>, Hugh Dickins <hughd@google.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	Linux-FSDevel <linux-fsdevel@vger.kernel.org>,
	Paul McKenney <paulmck@linux.vnet.ibm.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	David Howells <dhowells@redhat.com>
Subject: Re: [PATCH] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath v4
Date: Thu, 15 May 2014 15:20:58 +0200	[thread overview]
Message-ID: <20140515132058.GL30445@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20140515104808.GF23991@suse.de>

[-- Attachment #1: Type: text/plain, Size: 7865 bytes --]

On Thu, May 15, 2014 at 11:48:09AM +0100, Mel Gorman wrote:

> +static inline wait_queue_head_t *clear_page_waiters(struct page *page)
>  {
> +	wait_queue_head_t *wqh = NULL;
> +
> +	if (!PageWaiters(page))
> +		return NULL;
> +
> +	/*
> +	 * Prepare to clear PG_waiters if the waitqueue is no longer
> +	 * active. Note that there is no guarantee that a page with no
> +	 * waiters will get cleared as there may be unrelated pages
> +	 * sleeping on the same page wait queue. Accurate detection
> +	 * would require a counter. In the event of a collision, the
> +	 * waiter bit will dangle and lookups will be required until
> +	 * the page is unlocked without collisions. The bit will need to
> +	 * be cleared before freeing to avoid triggering debug checks.
> +	 *
> +	 * Furthermore, this can race with processes about to sleep on
> +	 * the same page if it adds itself to the waitqueue just after
> +	 * this check. The timeout in sleep_on_page prevents the race
> +	 * being a terminal one. In effect, the uncontended and non-race
> +	 * cases are faster in exchange for occasional worst case of the
> +	 * timeout saving us.
> +	 */
> +	wqh = page_waitqueue(page);
> +	if (!waitqueue_active(wqh))
> +		ClearPageWaiters(page);
> +
> +	return wqh;
> +}

So clear_page_waiters() is I think a bad name for this function, for one
it doesn't relate to returning a wait_queue_head.

Secondly, I think the clear condition is wrong, if I understand the rest
of the code correctly we'll keep PageWaiters set until the above
condition, which is not a single waiter on the waitqueue.

Would it not make much more sense to clear the page when there are no
more waiters of this page?

For the case where there are no waiters at all, this is the same
condition, but in case there's a hash collision and there's other pages
waiting, we'll iterate the lot anyway, so we might as well clear it
there.

> +/* Returns true if the page is locked */
> +static inline bool prepare_wait_bit(struct page *page, wait_queue_head_t *wqh,
> +			wait_queue_t *wq, int state, int bit_nr, bool exclusive)
> +{
> +
> +	/* Set PG_waiters so a racing unlock_page will check the waitiqueue */
> +	if (!PageWaiters(page))
> +		SetPageWaiters(page);
> +
> +	if (exclusive)
> +		prepare_to_wait_exclusive(wqh, wq, state);
> +	else
> +		prepare_to_wait(wqh, wq, state);
> +	return test_bit(bit_nr, &page->flags);
>  }
>  
>  void wait_on_page_bit(struct page *page, int bit_nr)
>  {
> +	wait_queue_head_t *wqh;
>  	DEFINE_WAIT_BIT(wait, &page->flags, bit_nr);
>  
> +	if (!test_bit(bit_nr, &page->flags))
> +		return;
> +	wqh = page_waitqueue(page);
> +
> +	do {
> +		if (prepare_wait_bit(page, wqh, &wait.wait, TASK_KILLABLE, bit_nr, false))
> +			sleep_on_page_killable(page);
> +	} while (test_bit(bit_nr, &page->flags));
> +	finish_wait(wqh, &wait.wait);
>  }
>  EXPORT_SYMBOL(wait_on_page_bit);

Afaict, after this patch, wait_on_page_bit() is only used by
wait_on_page_writeback(), and might I ask why that needs the PageWaiter
set?

>  int wait_on_page_bit_killable(struct page *page, int bit_nr)
>  {
> +	wait_queue_head_t *wqh;
>  	DEFINE_WAIT_BIT(wait, &page->flags, bit_nr);
> +	int ret = 0;
>  
>  	if (!test_bit(bit_nr, &page->flags))
>  		return 0;
> +	wqh = page_waitqueue(page);
> +
> +	do {
> +		if (prepare_wait_bit(page, wqh, &wait.wait, TASK_KILLABLE, bit_nr, false))
> +			ret = sleep_on_page_killable(page);
> +	} while (!ret && test_bit(bit_nr, &page->flags));
> +	finish_wait(wqh, &wait.wait);
>  
> +	return ret;
>  }

The only user of wait_on_page_bit_killable() _was_
wait_on_page_locked_killable(), but you've just converted that to use
__wait_on_page_bit_killable().

So we can scrap this function.

>  /**
> @@ -721,6 +785,8 @@ void add_page_wait_queue(struct page *page, wait_queue_t *waiter)
>  	unsigned long flags;
>  
>  	spin_lock_irqsave(&q->lock, flags);
> +	if (!PageWaiters(page))
> +		SetPageWaiters(page);
>  	__add_wait_queue(q, waiter);
>  	spin_unlock_irqrestore(&q->lock, flags);
>  }

What does add_page_wait_queue() do and why does it need PageWaiters?

> @@ -740,10 +806,29 @@ EXPORT_SYMBOL_GPL(add_page_wait_queue);
>   */
>  void unlock_page(struct page *page)
>  {
> +	wait_queue_head_t *wqh = clear_page_waiters(page);
> +
>  	VM_BUG_ON_PAGE(!PageLocked(page), page);
> +
> +	/*
> +	 * clear_bit_unlock is not necessary in this case as there is no
> +	 * need to strongly order the clearing of PG_waiters and PG_locked.
> +	 * The smp_mb__after_atomic() barrier is still required for RELEASE
> +	 * semantics as there is no guarantee that a wakeup will take place
> +	 */
> +	clear_bit(PG_locked, &page->flags);
>  	smp_mb__after_atomic();

If you need RELEASE, use _unlock() because that's exactly what it does.

> +
> +	/*
> +	 * Wake the queue if waiters were detected. Ordinarily this wakeup
> +	 * would be unconditional to catch races between the lock bit being
> +	 * set and a new process joining the queue. However, that would
> +	 * require the waitqueue to be looked up every time. Instead we
> +	 * optimse for the uncontended and non-race case and recover using
> +	 * a timeout in sleep_on_page.
> +	 */
> +	if (wqh)
> +		__wake_up_bit(wqh, &page->flags, PG_locked);

And the only reason we're not clearing PageWaiters under q->lock is to
skimp on the last contended unlock_page() ?

>  }
>  EXPORT_SYMBOL(unlock_page);
>  
> @@ -795,22 +884,69 @@ EXPORT_SYMBOL_GPL(page_endio);
>   */
>  void __lock_page(struct page *page)
>  {
> +	wait_queue_head_t *wqh = page_waitqueue(page);
>  	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
>  
> +	do {
> +		if (prepare_wait_bit(page, wqh, &wait.wait, TASK_UNINTERRUPTIBLE, PG_locked, true))
> +			sleep_on_page(page);
> +	} while (!trylock_page(page));
> +
> +	finish_wait(wqh, &wait.wait);
>  }



So I suppose I'm failing to see the problem with something like:

extern void __lock_page(struct page *);
extern void __unlock_page(struct page *);

static inline void lock_page(struct page *page)
{
	if (!trylock_page(page))
		__lock_page(page);
}

static inline void unlock_page(struct page *page)
{
	clear_bit_unlock(&page->flags, PG_locked);
	if (PageWaiters(page))
		__unlock_page();
}

void __lock_page(struct page *page)
{
	struct wait_queue_head_t *wqh = page_waitqueue(page);
	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);

	spin_lock_irq(&wqh->lock);
	if (!PageWaiters(page))
		SetPageWaiters(page);

	wait.flags |= WQ_FLAG_EXCLUSIVE;
	preempt_disable();
	do {
		if (list_empty(&wait->task_list))
			__add_wait_queue_tail(wqh, &wait);

		set_current_state(TASK_UNINTERRUPTIBLE);

		if (test_bit(wait.key.bit_nr, wait.key.flags)) {
			spin_unlock_irq(&wqh->lock);
			schedule_preempt_disabled();
			spin_lock_irq(&wqh->lock);
		}
	} while (!trylock_page(page));

	__remove_wait_queue(wqh, &wait);
	__set_current_state(TASK_RUNNING);
	preempt_enable();
	spin_unlock_irq(&wqh->lock);
}

void __unlock_page(struct page *page)
{
	struct wait_bit_key key = __WAIT_BIT_KEY_INITIALIZER(&page->flags, PG_locked);
	struct wait_queue_head_t *wqh = page_waitqueue(page);
	wait_queue_t *curr;

	spin_lock_irq(&wqh->lock);
	list_for_each_entry(curr, &wqh->task_list, task_list) {
		unsigned int flags = curr->flags;

		if (curr->func(curr, TASK_NORMAL, 0, &key))
			goto unlock;
	}
	ClearPageWaiters(page);
unlock:
	spin_unlock_irq(&wqh->lock);
}

Yes, the __unlock_page() will have the unconditional wqh->lock, but it
should also call __unlock_page() a lot less, and it doesn't have that
horrid timeout.

Now, the above is clearly sub-optimal when !extended_page_flags, but I
suppose we could have two versions of __unlock_page() for that.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

  reply	other threads:[~2014-05-15 13:21 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-13  9:45 [PATCH 00/19] Misc page alloc, shmem, mark_page_accessed and page_waitqueue optimisations v3r33 Mel Gorman
2014-05-13  9:45 ` [PATCH 01/19] mm: page_alloc: Do not update zlc unless the zlc is active Mel Gorman
2014-05-13  9:45 ` [PATCH 02/19] mm: page_alloc: Do not treat a zone that cannot be used for dirty pages as "full" Mel Gorman
2014-05-13  9:45 ` [PATCH 03/19] jump_label: Expose the reference count Mel Gorman
2014-05-13  9:45 ` [PATCH 04/19] mm: page_alloc: Use jump labels to avoid checking number_of_cpusets Mel Gorman
2014-05-13 10:58   ` Peter Zijlstra
2014-05-13 12:28     ` Mel Gorman
2014-05-13  9:45 ` [PATCH 05/19] mm: page_alloc: Calculate classzone_idx once from the zonelist ref Mel Gorman
2014-05-13 22:25   ` Andrew Morton
2014-05-14  6:32     ` Mel Gorman
2014-05-14 20:29     ` Mel Gorman
2014-05-13  9:45 ` [PATCH 06/19] mm: page_alloc: Only check the zone id check if pages are buddies Mel Gorman
2014-05-13  9:45 ` [PATCH 07/19] mm: page_alloc: Only check the alloc flags and gfp_mask for dirty once Mel Gorman
2014-05-13  9:45 ` [PATCH 08/19] mm: page_alloc: Take the ALLOC_NO_WATERMARK check out of the fast path Mel Gorman
2014-05-13  9:45 ` [PATCH 09/19] mm: page_alloc: Use word-based accesses for get/set pageblock bitmaps Mel Gorman
2014-05-22  9:24   ` Vlastimil Babka
2014-05-22 18:23     ` Andrew Morton
2014-05-22 18:45       ` Vlastimil Babka
2014-05-13  9:45 ` [PATCH 10/19] mm: page_alloc: Reduce number of times page_to_pfn is called Mel Gorman
2014-05-13 13:27   ` Vlastimil Babka
2014-05-13 14:09     ` Mel Gorman
2014-05-13  9:45 ` [PATCH 11/19] mm: page_alloc: Lookup pageblock migratetype with IRQs enabled during free Mel Gorman
2014-05-13 13:36   ` Vlastimil Babka
2014-05-13 14:23     ` Mel Gorman
2014-05-13  9:45 ` [PATCH 12/19] mm: page_alloc: Use unsigned int for order in more places Mel Gorman
2014-05-13  9:45 ` [PATCH 13/19] mm: page_alloc: Convert hot/cold parameter and immediate callers to bool Mel Gorman
2014-05-13  9:45 ` [PATCH 14/19] mm: shmem: Avoid atomic operation during shmem_getpage_gfp Mel Gorman
2014-05-13  9:45 ` [PATCH 15/19] mm: Do not use atomic operations when releasing pages Mel Gorman
2014-05-13  9:45 ` [PATCH 16/19] mm: Do not use unnecessary atomic operations when adding pages to the LRU Mel Gorman
2014-05-13  9:45 ` [PATCH 17/19] fs: buffer: Do not use unnecessary atomic operations when discarding buffers Mel Gorman
2014-05-13 11:09   ` Peter Zijlstra
2014-05-13 12:50     ` Mel Gorman
2014-05-13 13:49       ` Jan Kara
2014-05-13 14:30         ` Mel Gorman
2014-05-13 14:01       ` Peter Zijlstra
2014-05-13 14:46         ` Mel Gorman
2014-05-13 13:50   ` Jan Kara
2014-05-13 22:29   ` Andrew Morton
2014-05-14  6:12     ` Mel Gorman
2014-05-13  9:45 ` [PATCH 18/19] mm: Non-atomically mark page accessed during page cache allocation where possible Mel Gorman
2014-05-13 14:29   ` Theodore Ts'o
2014-05-20 15:49   ` [PATCH] mm: non-atomically mark page accessed during page cache allocation where possible -fix Mel Gorman
2014-05-20 19:34     ` Andrew Morton
2014-05-21 12:09       ` Mel Gorman
2014-05-21 22:11         ` Andrew Morton
2014-05-22  0:07           ` Mel Gorman
2014-05-22  5:35       ` Prabhakar Lad
2014-05-13  9:45 ` [PATCH 19/19] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath Mel Gorman
2014-05-13 12:53   ` Mel Gorman
2014-05-13 14:17     ` Peter Zijlstra
2014-05-13 15:27       ` Paul E. McKenney
2014-05-13 15:44         ` Peter Zijlstra
2014-05-13 16:14           ` Paul E. McKenney
2014-05-13 18:57             ` Oleg Nesterov
2014-05-13 20:24               ` Paul E. McKenney
2014-05-14 14:25                 ` Oleg Nesterov
2014-05-13 18:22           ` Oleg Nesterov
2014-05-13 18:18         ` Oleg Nesterov
2014-05-13 18:24           ` Peter Zijlstra
2014-05-13 18:52           ` Paul E. McKenney
2014-05-13 19:31             ` Oleg Nesterov
2014-05-13 20:32               ` Paul E. McKenney
2014-05-14 16:11       ` Oleg Nesterov
2014-05-14 16:17         ` Peter Zijlstra
2014-05-16 13:51           ` [PATCH 0/1] ptrace: task_clear_jobctl_trapping()->wake_up_bit() needs mb() Oleg Nesterov
2014-05-16 13:51             ` [PATCH 1/1] " Oleg Nesterov
2014-05-21  9:29               ` Peter Zijlstra
2014-05-21 19:19                 ` Andrew Morton
2014-05-21 19:18             ` [PATCH 0/1] " Andrew Morton
2014-05-14 19:29         ` [PATCH 19/19] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath Oleg Nesterov
2014-05-14 20:53           ` Mel Gorman
2014-05-15 10:48           ` [PATCH] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath v4 Mel Gorman
2014-05-15 13:20             ` Peter Zijlstra [this message]
2014-05-15 13:29               ` Peter Zijlstra
2014-05-15 15:34               ` Oleg Nesterov
2014-05-15 15:45                 ` Peter Zijlstra
2014-05-15 16:18               ` Mel Gorman
2014-05-15 15:03             ` Oleg Nesterov
2014-05-15 21:24             ` Andrew Morton
2014-05-21 12:15               ` [PATCH] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath v5 Mel Gorman
2014-05-21 13:02                 ` Peter Zijlstra
2014-05-21 15:33                   ` Mel Gorman
2014-05-21 16:08                     ` Peter Zijlstra
2014-05-21 21:26                 ` Andrew Morton
2014-05-21 21:33                   ` Peter Zijlstra
2014-05-21 21:50                     ` Andrew Morton
2014-05-22  0:07                       ` Mel Gorman
2014-05-22  7:20                         ` Peter Zijlstra
2014-05-22 10:40                           ` [PATCH] mm: filemap: Avoid unnecessary barriers and waitqueue lookups in unlock_page fastpath v7 Mel Gorman
2014-05-22 10:56                             ` Peter Zijlstra
2014-05-22 13:00                               ` Mel Gorman
2014-05-22 14:40                               ` Mel Gorman
2014-05-22 15:04                                 ` Peter Zijlstra
2014-05-22 15:36                                   ` Mel Gorman
2014-05-22 16:58                                   ` [PATCH] mm: filemap: Avoid unnecessary barriers and waitqueue lookups in unlock_page fastpath v8 Mel Gorman
2014-05-22  6:45                       ` [PATCH] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath v5 Peter Zijlstra
2014-05-22  8:46                         ` Mel Gorman
2014-05-22 17:47                           ` Andrew Morton
2014-05-22 19:53                             ` Mel Gorman
2014-05-21 23:35                   ` Mel Gorman
2014-05-13 16:52   ` [PATCH 19/19] mm: filemap: Avoid unnecessary barries and waitqueue lookups in unlock_page fastpath Peter Zijlstra
2014-05-14  7:31     ` Mel Gorman
2014-05-19  8:57 ` [PATCH] mm: Avoid unnecessary atomic operations during end_page_writeback Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140515132058.GL30445@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=dhowells@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox