Re: arm64 flushing 255GB of vmalloc space takes too long

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Eric Miao <eric.y.miao@gmail.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Laura Abbott <lauraa@codeaurora.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Will Deacon <Will.Deacon@arm.com>,
	Russell King <linux@arm.linux.org.uk>
Subject: Re: arm64 flushing 255GB of vmalloc space takes too long
Date: Wed, 9 Jul 2014 11:04:39 -0700	[thread overview]
Message-ID: <CAMPhdO_XqAL4oXcuJkp2PTQ-J07sGG4Nm5HjHO=yGqS+KuWQzg@mail.gmail.com> (raw)
In-Reply-To: <20140709174055.GC2814@arm.com>

On Wed, Jul 9, 2014 at 10:40 AM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> On Wed, Jul 09, 2014 at 05:53:26PM +0100, Eric Miao wrote:
>> On Tue, Jul 8, 2014 at 6:43 PM, Laura Abbott <lauraa@codeaurora.org> wrote:
>> > I have an arm64 target which has been observed hanging in __purge_vmap_area_lazy
>> > in vmalloc.c The root cause of this 'hang' is that flush_tlb_kernel_range is
>> > attempting to flush 255GB of virtual address space. This takes ~2 seconds and
>> > preemption is disabled at this time thanks to the purge lock. Disabling
>> > preemption for that time is long enough to trigger a watchdog we have setup.
>
> That's definitely not good.
>
>> > A couple of options I thought of:
>> > 1) Increase the timeout of our watchdog to allow the flush to occur. Nobody
>> > I suggested this to likes the idea as the watchdog firing generally catches
>> > behavior that results in poor system performance and disabling preemption
>> > for that long does seem like a problem.
>> > 2) Change __purge_vmap_area_lazy to do less work under a spinlock. This would
>> > certainly have a performance impact and I don't even know if it is plausible.
>> > 3) Allow module unloading to trigger a vmalloc purge beforehand to help avoid
>> > this case. This would still be racy if another vfree came in during the time
>> > between the purge and the vfree but it might be good enough.
>> > 4) Add 'if size > threshold flush entire tlb' (I haven't profiled this yet)
>>
>> We have the same problem. I'd agree with point 2 and point 4, point 1/3 do not
>> actually fix this issue. purge_vmap_area_lazy() could be called in other
>> cases.
>
> I would also discard point 2 as it still takes ~2 seconds, only that not
> under a spinlock.
>

Point is - we could still end up a good amount of time in that function,
giving the default value of lazy_vfree_pages to be 32MB * log(ncpu),
worst case of all vmap areas being only one page, tlb flush page by
page, and traversal of the list, calling __free_vmap_area() that many
times won't likely to reduce the execution time to microsecond level.

If it's something inevitable - we do it in a bit cleaner way.

>> w.r.t the threshold to flush entire tlb instead of doing that page-by-page, that
>> could be different from platform to platform. And considering the cost of tlb
>> flush on x86, I wonder why this isn't an issue on x86.
>
> The current __purge_vmap_area_lazy() was done as an optimisation (commit
> db64fe02258f1) to avoid IPIs. So flush_tlb_kernel_range() would only be
> IPI'ed once.
>
> IIUC, the problem is how start/end are computed in
> __purge_vmap_area_lazy(), so even if you have only two vmap areas, if
> they are 255GB apart you've got this problem.

Indeed.

>
> One temporary option is to limit the vmalloc space on arm64 to something
> like 2 x RAM-size (haven't looked at this yet). But if you get a
> platform with lots of RAM, you hit this problem again.
>
> Which leaves us with point (4) but finding the threshold is indeed
> platform dependent. Another way could be a check for latency - so if it
> took certain usecs, we break the loop and flush the whole TLB.

Or we end up having platform specific tlb flush implementation just as we
did for cache ops. I would expect only few platforms will have their own
thresholds. A simple heuristic guess of the threshold based on number of
tlb entries would be good to go?

>
> --
> Catalin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2014-07-09 18:05 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-09 16:53 Eric Miao
2014-07-09 17:40 ` Catalin Marinas
2014-07-09 18:04   ` Eric Miao [this message]
2014-07-11  1:26     ` Laura Abbott
2014-07-11 12:45       ` Catalin Marinas
2014-07-23 21:25         ` Mark Salter
2014-07-24 14:24           ` Catalin Marinas
2014-07-24 14:56             ` [PATCH] arm64: fix soft lockup due to large tlb flush range Mark Salter
2014-07-24 17:47               ` Catalin Marinas
  -- strict thread matches above, loose matches on Subject: below --
2014-07-09  1:43 arm64 flushing 255GB of vmalloc space takes too long Laura Abbott

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMPhdO_XqAL4oXcuJkp2PTQ-J07sGG4Nm5HjHO=yGqS+KuWQzg@mail.gmail.com' \
    --to=eric.y.miao@gmail.com \
    --cc=Will.Deacon@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=lauraa@codeaurora.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@arm.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox