linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Baoquan He <bhe@redhat.com>
To: Omar Sandoval <osandov@osandov.com>
Cc: linux-mm@kvack.org, kexec@lists.infradead.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	Christoph Hellwig <hch@lst.de>,
	x86@kernel.org, kernel-team@fb.com
Subject: Re: [PATCH v3] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore
Date: Thu, 14 Apr 2022 18:33:43 +0800	[thread overview]
Message-ID: <Ylf4h2/vfelRoc3a@MiWiFi-R3L-srv> (raw)
In-Reply-To: <53f25ddc953211d50bed06427d695f51f5ea37c7.1649867251.git.osandov@fb.com>

On 04/13/22 at 09:28am, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Commit 3ee48b6af49c ("mm, x86: Saving vmcore with non-lazy freeing of
> vmas") introduced set_iounmap_nonlazy(), which sets vmap_lazy_nr to
> lazy_max_pages() + 1, ensuring that any future vunmaps() immediately
> purge the vmap areas instead of doing it lazily.
> 
> Commit 690467c81b1a ("mm/vmalloc: Move draining areas out of caller
> context") moved the purging from the vunmap() caller to a worker thread.
> Unfortunately, set_iounmap_nonlazy() can cause the worker thread to spin
> (possibly forever). For example, consider the following scenario:
> 
> 1. Thread reads from /proc/vmcore. This eventually calls
>    __copy_oldmem_page() -> set_iounmap_nonlazy(), which sets
>    vmap_lazy_nr to lazy_max_pages() + 1.
> 2. Then it calls free_vmap_area_noflush() (via iounmap()), which adds 2
>    pages (one page plus the guard page) to the purge list and
>    vmap_lazy_nr. vmap_lazy_nr is now lazy_max_pages() + 3, so the
>    drain_vmap_work is scheduled.
> 3. Thread returns from the kernel and is scheduled out.
> 4. Worker thread is scheduled in and calls drain_vmap_area_work(). It
>    frees the 2 pages on the purge list. vmap_lazy_nr is now
>    lazy_max_pages() + 1.
> 5. This is still over the threshold, so it tries to purge areas again,
>    but doesn't find anything.
> 6. Repeat 5.
> 
> If the system is running with only one CPU (which is typicial for kdump)
> and preemption is disabled, then this will never make forward progress:
> there aren't any more pages to purge, so it hangs. If there is more than
> one CPU or preemption is enabled, then the worker thread will spin
> forever in the background. (Note that if there were already pages to be
> purged at the time that set_iounmap_nonlazy() was called, this bug is
> avoided.)
> 
> This can be reproduced with anything that reads from /proc/vmcore
> multiple times. E.g., vmcore-dmesg /proc/vmcore.
> 
> It turns out that improvements to vmap() over the years have obsoleted
> the need for this "optimization". I benchmarked
> `dd if=/proc/vmcore of=/dev/null` with 4k and 1M read sizes on a system
> with a 32GB vmcore. The test was run on 5.17, 5.18-rc1 with a fix that
> avoided the hang, and 5.18-rc1 with set_iounmap_nonlazy() removed
> entirely:
> 
>   |5.17  |5.18+fix|5.18+removal
> 4k|40.86s|  40.09s|      26.73s
> 1M|24.47s|  23.98s|      21.84s
> 
> The removal was the fastest (by a wide margin with 4k reads). This patch
> removes set_iounmap_nonlazy().
> 
> Fixes: 690467c81b1a ("mm/vmalloc: Move draining areas out of caller context")
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> Acked-by: Chris Down <chris@chrisdown.name>
> Signed-off-by: Omar Sandoval <osandov@fb.com>

This is a great fix and with detailed explanation, thanks.

Acked-by: Baoquan He <bhe@redhat.com>

> ---
> Changes from v2 -> v3:
> 
> - Add Fixes and Reviewed-by tags (no code changes)
> 
> Changes from v1 -> v2:
> 
> - Remove set_iounmap_nonlazy() entirely instead of fixing it.
> 
>  arch/x86/include/asm/io.h       |  2 --
>  arch/x86/kernel/crash_dump_64.c |  1 -
>  mm/vmalloc.c                    | 11 -----------
>  3 files changed, 14 deletions(-)
> 
> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
> index f6d91ecb8026..e9736af126b2 100644
> --- a/arch/x86/include/asm/io.h
> +++ b/arch/x86/include/asm/io.h
> @@ -210,8 +210,6 @@ void __iomem *ioremap(resource_size_t offset, unsigned long size);
>  extern void iounmap(volatile void __iomem *addr);
>  #define iounmap iounmap
>  
> -extern void set_iounmap_nonlazy(void);
> -
>  #ifdef __KERNEL__
>  
>  void memcpy_fromio(void *, const volatile void __iomem *, size_t);
> diff --git a/arch/x86/kernel/crash_dump_64.c b/arch/x86/kernel/crash_dump_64.c
> index a7f617a3981d..97529552dd24 100644
> --- a/arch/x86/kernel/crash_dump_64.c
> +++ b/arch/x86/kernel/crash_dump_64.c
> @@ -37,7 +37,6 @@ static ssize_t __copy_oldmem_page(unsigned long pfn, char *buf, size_t csize,
>  	} else
>  		memcpy(buf, vaddr + offset, csize);
>  
> -	set_iounmap_nonlazy();
>  	iounmap((void __iomem *)vaddr);
>  	return csize;
>  }
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index e163372d3967..0b17498a34f1 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1671,17 +1671,6 @@ static DEFINE_MUTEX(vmap_purge_lock);
>  /* for per-CPU blocks */
>  static void purge_fragmented_blocks_allcpus(void);
>  
> -#ifdef CONFIG_X86_64
> -/*
> - * called before a call to iounmap() if the caller wants vm_area_struct's
> - * immediately freed.
> - */
> -void set_iounmap_nonlazy(void)
> -{
> -	atomic_long_set(&vmap_lazy_nr, lazy_max_pages()+1);
> -}
> -#endif /* CONFIG_X86_64 */
> -
>  /*
>   * Purges all lazily-freed vmap areas.
>   */
> -- 
> 2.35.2
> 
> 
> 



      reply	other threads:[~2022-04-14 10:33 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-13 16:28 Omar Sandoval
2022-04-14 10:33 ` Baoquan He [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ylf4h2/vfelRoc3a@MiWiFi-R3L-srv \
    --to=bhe@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hch@lst.de \
    --cc=kernel-team@fb.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=osandov@osandov.com \
    --cc=urezki@gmail.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox