From: Stillinux <stillinux@gmail.com>
To: Alexey Avramov <hakavlad@inbox.lv>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
liuzhengyuan@kylinos.cn, liuyun01@kylinos.cn
Subject: Re: [RFC PATCH] mm/swap: fix system stuck due to infinite loop
Date: Wed, 7 Apr 2021 06:49:17 +0800 [thread overview]
Message-ID: <CAKN5gCgpY=D5RG73ZQ9ELqHayTocsdcma0MDYpMAKEcz4HYzgQ@mail.gmail.com> (raw)
In-Reply-To: <20210406065944.08d8aa76@mail.inbox.lv>
[-- Attachment #1: Type: text/plain, Size: 14429 bytes --]
Hi Alexey, Thank you for the patch! looks cool, we will try this patch
for cutdown io operations during high memory pressure test.
and after check our vmcore, we can see our system io pressure under the
swap_writepage and swap_readpage to under the shrink list operations.
On Tue, Apr 6, 2021 at 5:59 AM Alexey Avramov <hakavlad@inbox.lv> wrote:
> > In the case of high system memory and load pressure, we ran ltp test
> > and found that the system was stuck, the direct memory reclaim was
> > all stuck in io_schedule
>
> > For the first time involving the swap part, there is no good way to fix
> > the problem
>
> The solution is protecting the clean file pages.
>
> Look at this:
>
> > On ChromiumOS, we do not use swap. When memory is low, the only
> > way to free memory is to reclaim pages from the file list. This
> > results in a lot of thrashing under low memory conditions. We see
> > the system become unresponsive for minutes before it eventually OOMs.
> > We also see very slow browser tab switching under low memory. Instead
> > of an unresponsive system, we'd really like the kernel to OOM as soon
> > as it starts to thrash. If it can't keep the working set in memory,
> > then OOM. Losing one of many tabs is a better behaviour for the user
> > than an unresponsive system.
>
> > This patch create a new sysctl, min_filelist_kbytes, which disables
> > reclaim of file-backed pages when when there are less than
> min_filelist_bytes
> > worth of such pages in the cache. This tunable is handy for low memory
> > systems using solid-state storage where interactive response is more
> important
> > than not OOMing.
>
> > With this patch and min_filelist_kbytes set to 50000, I see very little
> block
> > layer activity during low memory. The system stays responsive under low
> > memory and browser tab switching is fast. Eventually, a process a gets
> killed
> > by OOM. Without this patch, the system gets wedged for minutes before it
> > eventually OOMs.
>
> — https://lore.kernel.org/patchwork/patch/222042/
>
> This patch can almost completely eliminate thrashing under memory pressure.
>
> Effects
> - Improving system responsiveness under low-memory conditions;
> - Improving performans in I/O bound tasks under memory pressure;
> - OOM killer comes faster (with hard protection);
> - Fast system reclaiming after OOM.
>
> Read more: https://github.com/hakavlad/le9-patch
>
> The patch:
>
> From 371e3e5290652e97d5279d8cd215cd356c1fb47b Mon Sep 17 00:00:00 2001
> From: Alexey Avramov <hakavlad@inbox.lv>
> Date: Mon, 5 Apr 2021 01:53:26 +0900
> Subject: [PATCH] mm/vmscan: add sysctl knobs for protecting the specified
> amount of clean file cache
>
> The kernel does not have a mechanism for targeted protection of clean
> file pages (CFP). A certain amount of the CFP is required by the userspace
> for normal operation. First of all, you need a cache of shared libraries
> and executable files. If the volume of the CFP cache falls below a certain
> level, thrashing and even livelock occurs.
>
> Protection of CFP may be used to prevent thrashing and reducing I/O under
> memory pressure. Hard protection of CFP may be used to avoid high latency
> and prevent livelock in near-OOM conditions. The patch provides sysctl
> knobs for protecting the specified amount of clean file cache under memory
> pressure.
>
> The vm.clean_low_kbytes sysctl knob provides *best-effort* protection of
> CFP. The CFP on the current node won't be reclaimed uder memory pressure
> when their volume is below vm.clean_low_kbytes *unless* we threaten to OOM
> or have no swap space or vm.swappiness=0. Setting it to a high value may
> result in a early eviction of anonymous pages into the swap space by
> attempting to hold the protected amount of clean file pages in memory. The
> default value is defined by CONFIG_CLEAN_LOW_KBYTES (suggested 0 in
> Kconfig).
>
> The vm.clean_min_kbytes sysctl knob provides *hard* protection of CFP. The
> CFP on the current node won't be reclaimed under memory pressure when their
> volume is below vm.clean_min_kbytes. Setting it to a high value may result
> in a early out-of-memory condition due to the inability to reclaim the
> protected amount of CFP when other types of pages cannot be reclaimed. The
> default value is defined by CONFIG_CLEAN_MIN_KBYTES (suggested 0 in
> Kconfig).
>
> Reported-by: Artem S. Tashkinov <aros@gmx.com>
> Signed-off-by: Alexey Avramov <hakavlad@inbox.lv>
> ---
> Documentation/admin-guide/sysctl/vm.rst | 37 +++++++++++++++++++++
> include/linux/mm.h | 3 ++
> kernel/sysctl.c | 14 ++++++++
> mm/Kconfig | 35 +++++++++++++++++++
> mm/vmscan.c | 59
> +++++++++++++++++++++++++++++++++
> 5 files changed, 148 insertions(+)
>
> diff --git a/Documentation/admin-guide/sysctl/vm.rst
> b/Documentation/admin-guide/sysctl/vm.rst
> index f455fa00c..5d5ddfc85 100644
> --- a/Documentation/admin-guide/sysctl/vm.rst
> +++ b/Documentation/admin-guide/sysctl/vm.rst
> @@ -26,6 +26,8 @@ Currently, these files are in /proc/sys/vm:
>
> - admin_reserve_kbytes
> - block_dump
> +- clean_low_kbytes
> +- clean_min_kbytes
> - compact_memory
> - compaction_proactiveness
> - compact_unevictable_allowed
> @@ -113,6 +115,41 @@ block_dump enables block I/O debugging when set to a
> nonzero value. More
> information on block I/O debugging is in
> Documentation/admin-guide/laptops/laptop-mode.rst.
>
>
> +clean_low_kbytes
> +=====================
> +
> +This knob provides *best-effort* protection of clean file pages. The
> clean file
> +pages on the current node won't be reclaimed uder memory pressure when
> their
> +volume is below vm.clean_low_kbytes *unless* we threaten to OOM or have no
> +swap space or vm.swappiness=0.
> +
> +Protection of clean file pages may be used to prevent thrashing and
> +reducing I/O under low-memory conditions.
> +
> +Setting it to a high value may result in a early eviction of anonymous
> pages
> +into the swap space by attempting to hold the protected amount of clean
> file
> +pages in memory.
> +
> +The default value is defined by CONFIG_CLEAN_LOW_KBYTES.
> +
> +
> +clean_min_kbytes
> +=====================
> +
> +This knob provides *hard* protection of clean file pages. The clean file
> pages
> +on the current node won't be reclaimed under memory pressure when their
> volume
> +is below vm.clean_min_kbytes.
> +
> +Hard protection of clean file pages may be used to avoid high latency and
> +prevent livelock in near-OOM conditions.
> +
> +Setting it to a high value may result in a early out-of-memory condition
> due to
> +the inability to reclaim the protected amount of clean file pages when
> other
> +types of pages cannot be reclaimed.
> +
> +The default value is defined by CONFIG_CLEAN_MIN_KBYTES.
> +
> +
> compact_memory
> ==============
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index db6ae4d3f..7799f1555 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -202,6 +202,9 @@ static inline void __mm_zero_struct_page(struct page
> *page)
>
> extern int sysctl_max_map_count;
>
> +extern unsigned long sysctl_clean_low_kbytes;
> +extern unsigned long sysctl_clean_min_kbytes;
> +
> extern unsigned long sysctl_user_reserve_kbytes;
> extern unsigned long sysctl_admin_reserve_kbytes;
>
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index afad08596..854b311cd 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -3083,6 +3083,20 @@ static struct ctl_table vm_table[] = {
> },
> #endif
> {
> + .procname = "clean_low_kbytes",
> + .data = &sysctl_clean_low_kbytes,
> + .maxlen = sizeof(sysctl_clean_low_kbytes),
> + .mode = 0644,
> + .proc_handler = proc_doulongvec_minmax,
> + },
> + {
> + .procname = "clean_min_kbytes",
> + .data = &sysctl_clean_min_kbytes,
> + .maxlen = sizeof(sysctl_clean_min_kbytes),
> + .mode = 0644,
> + .proc_handler = proc_doulongvec_minmax,
> + },
> + {
> .procname = "user_reserve_kbytes",
> .data = &sysctl_user_reserve_kbytes,
> .maxlen = sizeof(sysctl_user_reserve_kbytes),
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 390165ffb..3915c71e1 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -122,6 +122,41 @@ config SPARSEMEM_VMEMMAP
> pfn_to_page and page_to_pfn operations. This is the most
> efficient option when sufficient kernel resources are available.
>
> +config CLEAN_LOW_KBYTES
> + int "Default value for vm.clean_low_kbytes"
> + depends on SYSCTL
> + default "0"
> + help
> + The vm.clean_file_low_kbytes sysctl knob provides *best-effort*
> + protection of clean file pages. The clean file pages on the
> current
> + node won't be reclaimed uder memory pressure when their volume is
> + below vm.clean_low_kbytes *unless* we threaten to OOM or have
> + no swap space or vm.swappiness=0.
> +
> + Protection of clean file pages may be used to prevent thrashing
> and
> + reducing I/O under low-memory conditions.
> +
> + Setting it to a high value may result in a early eviction of
> anonymous
> + pages into the swap space by attempting to hold the protected
> amount of
> + clean file pages in memory.
> +
> +config CLEAN_MIN_KBYTES
> + int "Default value for vm.clean_min_kbytes"
> + depends on SYSCTL
> + default "0"
> + help
> + The vm.clean_file_min_kbytes sysctl knob provides *hard*
> protection
> + of clean file pages. The clean file pages on the current node
> won't be
> + reclaimed under memory pressure when their volume is below
> + vm.clean_min_kbytes.
> +
> + Hard protection of clean file pages may be used to avoid high
> latency and
> + prevent livelock in near-OOM conditions.
> +
> + Setting it to a high value may result in a early out-of-memory
> condition
> + due to the inability to reclaim the protected amount of clean
> file pages
> + when other types of pages cannot be reclaimed.
> +
> config HAVE_MEMBLOCK_PHYS_MAP
> bool
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 7b4e31eac..77e98c43e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -120,6 +120,19 @@ struct scan_control {
> /* The file pages on the current node are dangerously low */
> unsigned int file_is_tiny:1;
>
> + /*
> + * The clean file pages on the current node won't be reclaimed when
> + * their volume is below vm.clean_low_kbytes *unless* we threaten
> + * to OOM or have no swap space or vm.swappiness=0.
> + */
> + unsigned int clean_below_low:1;
> +
> + /*
> + * The clean file pages on the current node won't be reclaimed when
> + * their volume is below vm.clean_min_kbytes.
> + */
> + unsigned int clean_below_min:1;
> +
> /* Allocation order */
> s8 order;
>
> @@ -166,6 +179,17 @@ struct scan_control {
> #define prefetchw_prev_lru_page(_page, _base, _field) do { } while (0)
> #endif
>
> +#if CONFIG_CLEAN_LOW_KBYTES < 0
> +#error "CONFIG_CLEAN_LOW_KBYTES must be >= 0"
> +#endif
> +
> +#if CONFIG_CLEAN_MIN_KBYTES < 0
> +#error "CONFIG_CLEAN_MIN_KBYTES must be >= 0"
> +#endif
> +
> +unsigned long sysctl_clean_low_kbytes __read_mostly =
> CONFIG_CLEAN_LOW_KBYTES;
> +unsigned long sysctl_clean_min_kbytes __read_mostly =
> CONFIG_CLEAN_MIN_KBYTES;
> +
> /*
> * From 0 .. 200. Higher means more swappy.
> */
> @@ -2283,6 +2307,16 @@ static void get_scan_count(struct lruvec *lruvec,
> struct scan_control *sc,
> }
>
> /*
> + * Force-scan anon if clean file pages is under vm.clean_min_kbytes
> + * or vm.clean_low_kbytes (unless the swappiness setting
> + * disagrees with swapping).
> + */
> + if ((sc->clean_below_low || sc->clean_below_min) && swappiness) {
> + scan_balance = SCAN_ANON;
> + goto out;
> + }
> +
> + /*
> * If there is enough inactive page cache, we do not reclaim
> * anything from the anonymous working right now.
> */
> @@ -2418,6 +2452,13 @@ static void get_scan_count(struct lruvec *lruvec,
> struct scan_control *sc,
> BUG();
> }
>
> + /*
> + * Don't reclaim clean file pages when their volume is
> below
> + * vm.clean_min_kbytes.
> + */
> + if (file && sc->clean_below_min)
> + scan = 0;
> +
> nr[lru] = scan;
> }
> }
> @@ -2768,6 +2809,24 @@ static void shrink_node(pg_data_t *pgdat, struct
> scan_control *sc)
> anon >> sc->priority;
> }
>
> + if (sysctl_clean_low_kbytes || sysctl_clean_min_kbytes) {
> + unsigned long reclaimable_file, dirty, clean;
> +
> + reclaimable_file =
> + node_page_state(pgdat, NR_ACTIVE_FILE) +
> + node_page_state(pgdat, NR_INACTIVE_FILE) +
> + node_page_state(pgdat, NR_ISOLATED_FILE);
> + dirty = node_page_state(pgdat, NR_FILE_DIRTY);
> + if (reclaimable_file > dirty)
> + clean = (reclaimable_file - dirty) << (PAGE_SHIFT
> - 10);
> +
> + sc->clean_below_low = clean < sysctl_clean_low_kbytes;
> + sc->clean_below_min = clean < sysctl_clean_min_kbytes;
> + } else {
> + sc->clean_below_low = false;
> + sc->clean_below_min = false;
> + }
> +
> shrink_node_memcgs(pgdat, sc);
>
> if (reclaim_state) {
> --
> 2.11.0
>
>
[-- Attachment #2: Type: text/html, Size: 16760 bytes --]
prev parent reply other threads:[~2021-04-06 22:49 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-02 7:03 Stillinux
2021-04-03 0:44 ` Andrew Morton
2021-04-04 9:26 ` Stillinux
[not found] ` <20210406065944.08d8aa76@mail.inbox.lv>
2021-04-06 0:15 ` [PATCH] mm/vmscan: add sysctl knobs for protecting the specified kernel test robot
2021-04-06 1:16 ` kernel test robot
2021-04-06 22:49 ` Stillinux [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAKN5gCgpY=D5RG73ZQ9ELqHayTocsdcma0MDYpMAKEcz4HYzgQ@mail.gmail.com' \
--to=stillinux@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=hakavlad@inbox.lv \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=liuyun01@kylinos.cn \
--cc=liuzhengyuan@kylinos.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox