linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] hugetlbfs: Disable IRQ when taking hugetlb_lock in set_max_huge_pages()
Date: Mon, 9 Dec 2019 19:46:46 -0500	[thread overview]
Message-ID: <a7ea9e1a-be9e-e6ee-5b30-602166041509@redhat.com> (raw)
In-Reply-To: <20191209164907.GD32169@bombadil.infradead.org>

On 12/9/19 11:49 AM, Matthew Wilcox wrote:
> On Mon, Dec 09, 2019 at 11:01:50AM -0500, Waiman Long wrote:
>> [  613.245273] Call Trace:
>> [  613.256273]  <IRQ>
>> [  613.265273]  dump_stack+0x9a/0xf0
>> [  613.281273]  mark_lock+0xd0c/0x12f0
>> [  613.341273]  __lock_acquire+0x146b/0x48c0
>> [  613.401273]  lock_acquire+0x14f/0x3b0
>> [  613.440273]  _raw_spin_lock+0x30/0x70
>> [  613.477273]  free_huge_page+0x36f/0xaa0
>> [  613.495273]  bio_check_pages_dirty+0x2fc/0x5c0
> Oh, this is fun.  So we kicked off IO to a hugepage, then truncated or
> otherwise caused the page to come free.  Then the IO finished and did the
> final put on the page ... from interrupt context.  Splat.  Not something
> that's going to happen often, but can happen if a process dies during
> IO or due to a userspace bug.
>
> Maybe we should schedule work to do the actual freeing of the page
> instead of this rather large patch.  It doesn't seem like a case we need
> to optimise for.
I think that may be a good idea to try it out.
>
> Further, this path is called from softirq context, not hardirq context.
> That means the whole mess could be a spin_lock_bh(), not spin_lock_irq()
> which is rather cheaper.  Anyway, I think the schedule-freeing-of-a-page-
> if-in-irq-context approach is likely to be better.

Yes, using spin_lock_bh() may be better.


>> +/*
>> + * Check to make sure that IRQ is enabled before calling spin_lock_irq()
>> + * so that after a matching spin_unlock_irq() the system won't be in an
>> + * incorrect state.
>> + */
>> +static __always_inline void spin_lock_irq_check(spinlock_t *lock)
>> +{
>> +	lockdep_assert_irqs_enabled();
>> +	spin_lock_irq(lock);
>> +}
>> +#ifdef spin_lock_irq
>> +#undef spin_lock_irq
>> +#endif
>> +#define spin_lock_irq(lock)	spin_lock_irq_check(lock)
> Don't leave your debugging code in the patch you submit for merging.

As I am not 100% sure that free_huge_page() is the only lock taking
function that will be called from the interrupt context, I purposely add
this to catch other possible code path when lockdep is enabled in a
debug kernel. It has no side effect when lockdep isn't enabled. In the
future, if other lock taking functions is being accessed from the
interrupt context, we can spot that more easily. I don't see this
preventive debug code as a problem.

This change, however, should be in a separate patch.

>> @@ -1775,7 +1793,11 @@ static void return_unused_surplus_pages(struct hstate *h,
>>  		unused_resv_pages--;
>>  		if (!free_pool_huge_page(h, &node_states[N_MEMORY], 1))
>>  			goto out;
>> -		cond_resched_lock(&hugetlb_lock);
>> +		if (need_resched()) {
>> +			spin_unlock_irq(&hugetlb_lock);
>> +			cond_resched();
>> +			spin_lock_irq(&hugetlb_lock);
>> +		}
> This doesn't work.  need_resched() is only going to be set due to things
> happening in interrupt context.  But you've disabled interrupts, so
> need_resched() is never going to be set.

You are probably right. Thanks for pointing this out.

Cheers,
Longman



  reply	other threads:[~2019-12-10  0:46 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-09 16:01 Waiman Long
2019-12-09 16:05 ` Waiman Long
2019-12-09 16:49 ` Matthew Wilcox
2019-12-10  0:46   ` Waiman Long [this message]
2019-12-10  2:37     ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a7ea9e1a-be9e-e6ee-5b30-602166041509@redhat.com \
    --to=longman@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox