linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: fix low-high watermark distance on small systems
@ 2018-03-15 10:34 Vinayak Menon
  2018-03-15 14:34 ` Minchan Kim
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Vinayak Menon @ 2018-03-15 10:34 UTC (permalink / raw)
  To: hannes, mgorman, linux-mm
  Cc: akpm, mhocko, vbabka, sfr, pasha.tatashin, penguin-kernel, Vinayak Menon

It is observed that watermark_scale_factor when used to reduce
thundering herds in direct reclaim, reduces the direct reclaims,
but results in unnecessary reclaim due to kswapd running for long
after being woken up. The tests are done with 4 GB of RAM and the
tests done are multibuild and another which opens a set of apps
sequentially on Android and repeating the sequence N times. The
tests are done on 4.9 kernel.

The issue is caused by watermark_scale_factor creating larger than
required gap between low and high watermarks. The following results
are with watermark_scale_factor of 120.

                       wsf-120-default  wsf-120-reduced-low-high-gap
workingset_activate    15120206         8319182
pgpgin                 269795482        147928581
allocstall             1406             1498
pgsteal_kswapd         68676960         38105142
slabs_scanned          94181738         49085755

Even though kswapd does 80% more steal in the default case, it doesn't
make any significant improvement to the direct reclaims. The excessive
reclaims cause more pgpgin and increases app launch latencies.

The min-low gap is untouched by the patch. The low-high gap is made
a percentage of min-low gap. The fraction was derived considering
these,

1) The existing watermark_scale_factor logic was designed to fix
issues on high RAM machines and I assume that the current low-high
gap works well on those.
2) The gap should be reduced on low RAM targets where thrashing is
observed.

The multiplier 4 was chosen empirically which was seen to fix the
thrashing on <8GB devices.

With watermark_scale_factor as default 10, the low-high gap for different
memory sizes.
           default-low_high_gap-pages  this-patch-low_high_gap-pages
16M        4                           4
512M       131                         131
1024M      262                         256
2048M      524                         362
4096M      1048                        512
8192M      2097                        724
16384M     4194                        1717
32768M     8388                        4858
65536M     16777                       13743
102400M    26214                       26214
143360M    36700                       36700

Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
---
 mm/page_alloc.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c0f0b1b..ac75985 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7223,7 +7223,20 @@ static void __setup_per_zone_wmarks(void)
 				      watermark_scale_factor, 10000));
 
 		zone->watermark[WMARK_LOW]  = min_wmark_pages(zone) + tmp;
-		zone->watermark[WMARK_HIGH] = min_wmark_pages(zone) + tmp * 2;
+
+		/*
+		 * Set the kswapd low-high distance as a percentage of
+		 * min-low distance in such a way that the distance
+		 * increases non-linearly with available memory. This
+		 * ensures enough free memory without causing thrashing
+		 * on small machines due to excessive reclaim by kswapd.
+		 * Ensure a minimum distance on very small machines.
+		 */
+		tmp = clamp_t(u64, mult_frac(tmp,
+				int_sqrt(4 * zone->managed_pages), 10000),
+					min_wmark_pages(zone) >> 2, tmp);
+
+		zone->watermark[WMARK_HIGH] = low_wmark_pages(zone) + tmp;
 
 		spin_unlock_irqrestore(&zone->lock, flags);
 	}
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: fix low-high watermark distance on small systems
  2018-03-15 10:34 [PATCH] mm: fix low-high watermark distance on small systems Vinayak Menon
@ 2018-03-15 14:34 ` Minchan Kim
  2018-03-16 11:03   ` Vinayak Menon
  2018-03-17  0:42 ` kbuild test robot
  2018-03-17  1:40 ` kbuild test robot
  2 siblings, 1 reply; 9+ messages in thread
From: Minchan Kim @ 2018-03-15 14:34 UTC (permalink / raw)
  To: Vinayak Menon
  Cc: hannes, mgorman, linux-mm, akpm, mhocko, vbabka, sfr,
	pasha.tatashin, penguin-kernel

Hi Vinayak,

On Thu, Mar 15, 2018 at 04:04:39PM +0530, Vinayak Menon wrote:
> It is observed that watermark_scale_factor when used to reduce
> thundering herds in direct reclaim, reduces the direct reclaims,
> but results in unnecessary reclaim due to kswapd running for long
> after being woken up. The tests are done with 4 GB of RAM and the
> tests done are multibuild and another which opens a set of apps
> sequentially on Android and repeating the sequence N times. The
> tests are done on 4.9 kernel.
> 
> The issue is caused by watermark_scale_factor creating larger than
> required gap between low and high watermarks. The following results
> are with watermark_scale_factor of 120.
> 
>                        wsf-120-default  wsf-120-reduced-low-high-gap
> workingset_activate    15120206         8319182
> pgpgin                 269795482        147928581
> allocstall             1406             1498
> pgsteal_kswapd         68676960         38105142
> slabs_scanned          94181738         49085755

"required gap" you mentiond is very dependent for your workload.
You had an experiment with wsf-120. It means user wanted to be more
aggressive for kswapd while your load is not enough to make meomry
consumption spike. Couldn't you decrease wfs?

Don't get me wrong. I don't want you test all of wfs with varios
workload to prove your logic is better. What I want to say here is
it's heuristic so it couldn't be perfect for every workload so
if you change to non-linear, you could be better but others might be not.

In such context, current linear linear scale factor is simple enough
to understand. IMO, if we really want to enhance watermark, the low/high
wmark shold be adaptable according to memory spike. One of rough idea is
to change low/high wmark based on kswapd_[high|low]_wmark_hit_quickly.

> 
> Even though kswapd does 80% more steal in the default case, it doesn't
> make any significant improvement to the direct reclaims. The excessive
> reclaims cause more pgpgin and increases app launch latencies.
> 
> The min-low gap is untouched by the patch. The low-high gap is made
> a percentage of min-low gap. The fraction was derived considering
> these,
> 
> 1) The existing watermark_scale_factor logic was designed to fix
> issues on high RAM machines and I assume that the current low-high
> gap works well on those.
> 2) The gap should be reduced on low RAM targets where thrashing is
> observed.
> 
> The multiplier 4 was chosen empirically which was seen to fix the
> thrashing on <8GB devices.
> 
> With watermark_scale_factor as default 10, the low-high gap for different
> memory sizes.
>            default-low_high_gap-pages  this-patch-low_high_gap-pages
> 16M        4                           4
> 512M       131                         131
> 1024M      262                         256
> 2048M      524                         362
> 4096M      1048                        512
> 8192M      2097                        724
> 16384M     4194                        1717
> 32768M     8388                        4858
> 65536M     16777                       13743
> 102400M    26214                       26214
> 143360M    36700                       36700
> 
> Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
> ---
>  mm/page_alloc.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c0f0b1b..ac75985 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7223,7 +7223,20 @@ static void __setup_per_zone_wmarks(void)
>  				      watermark_scale_factor, 10000));
>  
>  		zone->watermark[WMARK_LOW]  = min_wmark_pages(zone) + tmp;
> -		zone->watermark[WMARK_HIGH] = min_wmark_pages(zone) + tmp * 2;
> +
> +		/*
> +		 * Set the kswapd low-high distance as a percentage of
> +		 * min-low distance in such a way that the distance
> +		 * increases non-linearly with available memory. This
> +		 * ensures enough free memory without causing thrashing
> +		 * on small machines due to excessive reclaim by kswapd.
> +		 * Ensure a minimum distance on very small machines.
> +		 */
> +		tmp = clamp_t(u64, mult_frac(tmp,
> +				int_sqrt(4 * zone->managed_pages), 10000),
> +					min_wmark_pages(zone) >> 2, tmp);
> +
> +		zone->watermark[WMARK_HIGH] = low_wmark_pages(zone) + tmp;
>  
>  		spin_unlock_irqrestore(&zone->lock, flags);
>  	}
> -- 
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
> member of the Code Aurora Forum, hosted by The Linux Foundation
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: fix low-high watermark distance on small systems
  2018-03-15 14:34 ` Minchan Kim
@ 2018-03-16 11:03   ` Vinayak Menon
  2018-03-20 10:16     ` Minchan Kim
  0 siblings, 1 reply; 9+ messages in thread
From: Vinayak Menon @ 2018-03-16 11:03 UTC (permalink / raw)
  To: Minchan Kim
  Cc: hannes, mgorman, linux-mm, akpm, mhocko, vbabka, sfr,
	pasha.tatashin, penguin-kernel


On 3/15/2018 8:04 PM, Minchan Kim wrote:
> Hi Vinayak,

Thanks for your comments Minchan.
>
> On Thu, Mar 15, 2018 at 04:04:39PM +0530, Vinayak Menon wrote:
>> It is observed that watermark_scale_factor when used to reduce
>> thundering herds in direct reclaim, reduces the direct reclaims,
>> but results in unnecessary reclaim due to kswapd running for long
>> after being woken up. The tests are done with 4 GB of RAM and the
>> tests done are multibuild and another which opens a set of apps
>> sequentially on Android and repeating the sequence N times. The
>> tests are done on 4.9 kernel.
>>
>> The issue is caused by watermark_scale_factor creating larger than
>> required gap between low and high watermarks. The following results
>> are with watermark_scale_factor of 120.
>>
>>                        wsf-120-default  wsf-120-reduced-low-high-gap
>> workingset_activate    15120206         8319182
>> pgpgin                 269795482        147928581
>> allocstall             1406             1498
>> pgsteal_kswapd         68676960         38105142
>> slabs_scanned          94181738         49085755
> "required gap" you mentiond is very dependent for your workload.
> You had an experiment with wsf-120. It means user wanted to be more
> aggressive for kswapd while your load is not enough to make meomry
> consumption spike. Couldn't you decrease wfs?

I did try reducing the wsf for both multibuild and Android workloads. But that results in kswapd
waking up late and thus latency issues due to higher direct reclaims. As I understand the problem, the
wsf in its current form helps in tuning the kswapd wakeups (and note that I have not touched the
wsf logic to calculate min-low gap), but the issue arises due to the depth to which kswapd scans the LRUs in a
single run, causing thrashing, due to the higher low-high gap. From experiments, it looks like kswapd waking
up few more times and doing shorter steals is better than kswapd stealing more in a single run. The latter
does not better direct reclaims and causes thrashing too.

>
> Don't get me wrong. I don't want you test all of wfs with varios
> workload to prove your logic is better. What I want to say here is
> it's heuristic so it couldn't be perfect for every workload so
> if you change to non-linear, you could be better but others might be not.

Yes I understand your point. But since mmtests and Android tests showed similar results, I thought the
heuristic may just work across workloads. I assume from Johannes's tests on 140GB machine (from the
commit msg of the patch which introduced wsf) that the current low-high gap works well without thrashing
on bigger machines. This made me assume that the behavior is non-linear. So the non-linear behavior will
not make any difference to higher RAM machines as the low-high remains almost same as shown in the table
below. But I understand your point, for a different workload on smaller machines, I am not sure the benefit I
see would be observed, though that's the same problem with current wsf too.

>
> In such context, current linear linear scale factor is simple enough
> to understand. IMO, if we really want to enhance watermark, the low/high
> wmark shold be adaptable according to memory spike. One of rough idea is
> to change low/high wmark based on kswapd_[high|low]_wmark_hit_quickly.

That seems like a nice idea to me.A  But considering the current case with and without this patch,
the kswapd_low_wmark_hit quickly is actually less without the patch. But that comes at the cost of thrashing.
Which then would mean we need to detect thrashing to adjust the watermark. We may get the thrashing data
from workingset refaults, but I am not sure if we can take it as an input to adjust watermark since thrashing can
be due to other reasons too. Maybe there are ways to make this adaptive, like using Johannes's memdelay feature
to detect more time spent in direct reclaims and then raise the low watermark, and then use time spent in refault
and stabilized direct reclaim time to bring down or stop raising the low watermark.
But do we need that adaptive logic now if this (or other similar) patch just works across workloads for small machines ?
What is your suggestion ?

>> Even though kswapd does 80% more steal in the default case, it doesn't
>> make any significant improvement to the direct reclaims. The excessive
>> reclaims cause more pgpgin and increases app launch latencies.
>>
>> The min-low gap is untouched by the patch. The low-high gap is made
>> a percentage of min-low gap. The fraction was derived considering
>> these,
>>
>> 1) The existing watermark_scale_factor logic was designed to fix
>> issues on high RAM machines and I assume that the current low-high
>> gap works well on those.
>> 2) The gap should be reduced on low RAM targets where thrashing is
>> observed.
>>
>> The multiplier 4 was chosen empirically which was seen to fix the
>> thrashing on <8GB devices.
>>
>> With watermark_scale_factor as default 10, the low-high gap for different
>> memory sizes.
>>            default-low_high_gap-pages  this-patch-low_high_gap-pages
>> 16M        4                           4
>> 512M       131                         131
>> 1024M      262                         256
>> 2048M      524                         362
>> 4096M      1048                        512
>> 8192M      2097                        724
>> 16384M     4194                        1717
>> 32768M     8388                        4858
>> 65536M     16777                       13743
>> 102400M    26214                       26214
>> 143360M    36700                       36700
>>
>> Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
>> ---
>>  mm/page_alloc.c | 15 ++++++++++++++-
>>  1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index c0f0b1b..ac75985 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -7223,7 +7223,20 @@ static void __setup_per_zone_wmarks(void)
>>  				      watermark_scale_factor, 10000));
>>  
>>  		zone->watermark[WMARK_LOW]  = min_wmark_pages(zone) + tmp;
>> -		zone->watermark[WMARK_HIGH] = min_wmark_pages(zone) + tmp * 2;
>> +
>> +		/*
>> +		 * Set the kswapd low-high distance as a percentage of
>> +		 * min-low distance in such a way that the distance
>> +		 * increases non-linearly with available memory. This
>> +		 * ensures enough free memory without causing thrashing
>> +		 * on small machines due to excessive reclaim by kswapd.
>> +		 * Ensure a minimum distance on very small machines.
>> +		 */
>> +		tmp = clamp_t(u64, mult_frac(tmp,
>> +				int_sqrt(4 * zone->managed_pages), 10000),
>> +					min_wmark_pages(zone) >> 2, tmp);
>> +
>> +		zone->watermark[WMARK_HIGH] = low_wmark_pages(zone) + tmp;
>>  
>>  		spin_unlock_irqrestore(&zone->lock, flags);
>>  	}
>> -- 
>> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
>> member of the Code Aurora Forum, hosted by The Linux Foundation
>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: fix low-high watermark distance on small systems
  2018-03-15 10:34 [PATCH] mm: fix low-high watermark distance on small systems Vinayak Menon
  2018-03-15 14:34 ` Minchan Kim
@ 2018-03-17  0:42 ` kbuild test robot
  2018-03-17  1:40 ` kbuild test robot
  2 siblings, 0 replies; 9+ messages in thread
From: kbuild test robot @ 2018-03-17  0:42 UTC (permalink / raw)
  To: Vinayak Menon
  Cc: kbuild-all, hannes, mgorman, linux-mm, akpm, mhocko, vbabka, sfr,
	pasha.tatashin, penguin-kernel

[-- Attachment #1: Type: text/plain, Size: 1008 bytes --]

Hi Vinayak,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on v4.16-rc4]
[also build test ERROR on next-20180316]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Vinayak-Menon/mm-fix-low-high-watermark-distance-on-small-systems/20180317-073051
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   mm/page_alloc.o: In function `__setup_per_zone_wmarks':
>> page_alloc.c:(.text+0xa10): undefined reference to `__udivdi3'
>> page_alloc.c:(.text+0xa41): undefined reference to `__umoddi3'
   page_alloc.c:(.text+0xa58): undefined reference to `__udivdi3'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6733 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: fix low-high watermark distance on small systems
  2018-03-15 10:34 [PATCH] mm: fix low-high watermark distance on small systems Vinayak Menon
  2018-03-15 14:34 ` Minchan Kim
  2018-03-17  0:42 ` kbuild test robot
@ 2018-03-17  1:40 ` kbuild test robot
  2 siblings, 0 replies; 9+ messages in thread
From: kbuild test robot @ 2018-03-17  1:40 UTC (permalink / raw)
  To: Vinayak Menon
  Cc: kbuild-all, hannes, mgorman, linux-mm, akpm, mhocko, vbabka, sfr,
	pasha.tatashin, penguin-kernel

[-- Attachment #1: Type: text/plain, Size: 3690 bytes --]

Hi Vinayak,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on v4.16-rc4]
[also build test ERROR on next-20180316]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Vinayak-Menon/mm-fix-low-high-watermark-distance-on-small-systems/20180317-073051
config: i386-randconfig-a0-201810 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   mm/page_alloc.o: In function `__setup_per_zone_wmarks':
>> mm/page_alloc.c:7061: undefined reference to `__udivdi3'
>> mm/page_alloc.c:7061: undefined reference to `__umoddi3'
>> mm/page_alloc.c:7061: undefined reference to `__udivdi3'

vim +7061 mm/page_alloc.c

  6999	
  7000	static void __setup_per_zone_wmarks(void)
  7001	{
  7002		unsigned long pages_min = min_free_kbytes >> (PAGE_SHIFT - 10);
  7003		unsigned long lowmem_pages = 0;
  7004		struct zone *zone;
  7005		unsigned long flags;
  7006	
  7007		/* Calculate total number of !ZONE_HIGHMEM pages */
  7008		for_each_zone(zone) {
  7009			if (!is_highmem(zone))
  7010				lowmem_pages += zone->managed_pages;
  7011		}
  7012	
  7013		for_each_zone(zone) {
  7014			u64 tmp;
  7015	
  7016			spin_lock_irqsave(&zone->lock, flags);
  7017			tmp = (u64)pages_min * zone->managed_pages;
  7018			do_div(tmp, lowmem_pages);
  7019			if (is_highmem(zone)) {
  7020				/*
  7021				 * __GFP_HIGH and PF_MEMALLOC allocations usually don't
  7022				 * need highmem pages, so cap pages_min to a small
  7023				 * value here.
  7024				 *
  7025				 * The WMARK_HIGH-WMARK_LOW and (WMARK_LOW-WMARK_MIN)
  7026				 * deltas control asynch page reclaim, and so should
  7027				 * not be capped for highmem.
  7028				 */
  7029				unsigned long min_pages;
  7030	
  7031				min_pages = zone->managed_pages / 1024;
  7032				min_pages = clamp(min_pages, SWAP_CLUSTER_MAX, 128UL);
  7033				zone->watermark[WMARK_MIN] = min_pages;
  7034			} else {
  7035				/*
  7036				 * If it's a lowmem zone, reserve a number of pages
  7037				 * proportionate to the zone's size.
  7038				 */
  7039				zone->watermark[WMARK_MIN] = tmp;
  7040			}
  7041	
  7042			/*
  7043			 * Set the kswapd watermarks distance according to the
  7044			 * scale factor in proportion to available memory, but
  7045			 * ensure a minimum size on small systems.
  7046			 */
  7047			tmp = max_t(u64, tmp >> 2,
  7048				    mult_frac(zone->managed_pages,
  7049					      watermark_scale_factor, 10000));
  7050	
  7051			zone->watermark[WMARK_LOW]  = min_wmark_pages(zone) + tmp;
  7052	
  7053			/*
  7054			 * Set the kswapd low-high distance as a percentage of
  7055			 * min-low distance in such a way that the distance
  7056			 * increases non-linearly with available memory. This
  7057			 * ensures enough free memory without causing thrashing
  7058			 * on small machines due to excessive reclaim by kswapd.
  7059			 * Ensure a minimum distance on very small machines.
  7060			 */
> 7061			tmp = clamp_t(u64, mult_frac(tmp,
  7062					int_sqrt(4 * zone->managed_pages), 10000),
  7063						min_wmark_pages(zone) >> 2, tmp);
  7064	
  7065			zone->watermark[WMARK_HIGH] = low_wmark_pages(zone) + tmp;
  7066	
  7067			spin_unlock_irqrestore(&zone->lock, flags);
  7068		}
  7069	
  7070		/* update totalreserve_pages */
  7071		calculate_totalreserve_pages();
  7072	}
  7073	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26851 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: fix low-high watermark distance on small systems
  2018-03-16 11:03   ` Vinayak Menon
@ 2018-03-20 10:16     ` Minchan Kim
  2018-03-20 10:52       ` Vinayak Menon
  0 siblings, 1 reply; 9+ messages in thread
From: Minchan Kim @ 2018-03-20 10:16 UTC (permalink / raw)
  To: Vinayak Menon
  Cc: hannes, mgorman, linux-mm, akpm, mhocko, vbabka, sfr,
	pasha.tatashin, penguin-kernel

On Fri, Mar 16, 2018 at 04:33:54PM +0530, Vinayak Menon wrote:
> 
> On 3/15/2018 8:04 PM, Minchan Kim wrote:
> > Hi Vinayak,
> 
> Thanks for your comments Minchan.
> >
> > On Thu, Mar 15, 2018 at 04:04:39PM +0530, Vinayak Menon wrote:
> >> It is observed that watermark_scale_factor when used to reduce
> >> thundering herds in direct reclaim, reduces the direct reclaims,
> >> but results in unnecessary reclaim due to kswapd running for long
> >> after being woken up. The tests are done with 4 GB of RAM and the
> >> tests done are multibuild and another which opens a set of apps
> >> sequentially on Android and repeating the sequence N times. The
> >> tests are done on 4.9 kernel.
> >>
> >> The issue is caused by watermark_scale_factor creating larger than
> >> required gap between low and high watermarks. The following results
> >> are with watermark_scale_factor of 120.
> >>
> >>                        wsf-120-default  wsf-120-reduced-low-high-gap
> >> workingset_activate    15120206         8319182
> >> pgpgin                 269795482        147928581
> >> allocstall             1406             1498
> >> pgsteal_kswapd         68676960         38105142
> >> slabs_scanned          94181738         49085755
> > "required gap" you mentiond is very dependent for your workload.
> > You had an experiment with wsf-120. It means user wanted to be more
> > aggressive for kswapd while your load is not enough to make meomry
> > consumption spike. Couldn't you decrease wfs?
> 
> I did try reducing the wsf for both multibuild and Android workloads. But that results in kswapd
> waking up late and thus latency issues due to higher direct reclaims. As I understand the problem, the
> wsf in its current form helps in tuning the kswapd wakeups (and note that I have not touched the
> wsf logic to calculate min-low gap), but the issue arises due to the depth to which kswapd scans the LRUs in a
> single run, causing thrashing, due to the higher low-high gap. From experiments, it looks like kswapd waking

wsf conducts kswapd sleep time as well as wakeup time.

"This factor controls the aggressiveness of kswapd. It defines the
amount of memory left in a node/system before kswapd is woken up and
how much memory needs to be free before kswapd goes back to sleep."

> up few more times and doing shorter steals is better than kswapd stealing more in a single run. The latter
> does not better direct reclaims and causes thrashing too.


That's the tradeoff of kswapd aggressiveness to avoid high rate
direct reclaim.

> 
> >
> > Don't get me wrong. I don't want you test all of wfs with varios
> > workload to prove your logic is better. What I want to say here is
> > it's heuristic so it couldn't be perfect for every workload so
> > if you change to non-linear, you could be better but others might be not.
> 
> Yes I understand your point. But since mmtests and Android tests showed similar results, I thought the
> heuristic may just work across workloads. I assume from Johannes's tests on 140GB machine (from the
> commit msg of the patch which introduced wsf) that the current low-high gap works well without thrashing
> on bigger machines. This made me assume that the behavior is non-linear. So the non-linear behavior will
> not make any difference to higher RAM machines as the low-high remains almost same as shown in the table
> below. But I understand your point, for a different workload on smaller machines, I am not sure the benefit I
> see would be observed, though that's the same problem with current wsf too.

True. That's why I don't want to make it complicate. Later, if someone complains
"linear is better for his several testing", are you happy to rollback to it? 

You might argue it's same problem now but at least as-is code is simple to
understand. 

> 
> >
> > In such context, current linear linear scale factor is simple enough
> > to understand. IMO, if we really want to enhance watermark, the low/high
> > wmark shold be adaptable according to memory spike. One of rough idea is
> > to change low/high wmark based on kswapd_[high|low]_wmark_hit_quickly.
> 
> That seems like a nice idea to me.  But considering the current case with and without this patch,
> the kswapd_low_wmark_hit quickly is actually less without the patch. But that comes at the cost of thrashing.
> Which then would mean we need to detect thrashing to adjust the watermark. We may get the thrashing data
> from workingset refaults, but I am not sure if we can take it as an input to adjust watermark since thrashing can
> be due to other reasons too. Maybe there are ways to make this adaptive, like using Johannes's memdelay feature
> to detect more time spent in direct reclaims and then raise the low watermark, and then use time spent in refault
> and stabilized direct reclaim time to bring down or stop raising the low watermark.
> But do we need that adaptive logic now if this (or other similar) patch just works across workloads for small machines ?
> What is your suggestion ?

Sorry for being grumpy.

Let's leave it as-is. Such a simple heuristic could be never optimal
for everyone. If we really need to fix it, I want to see better logic
to convince others.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: fix low-high watermark distance on small systems
  2018-03-20 10:16     ` Minchan Kim
@ 2018-03-20 10:52       ` Vinayak Menon
  2018-03-20 11:29         ` Minchan Kim
  0 siblings, 1 reply; 9+ messages in thread
From: Vinayak Menon @ 2018-03-20 10:52 UTC (permalink / raw)
  To: Minchan Kim
  Cc: hannes, mgorman, linux-mm, akpm, mhocko, vbabka, sfr,
	pasha.tatashin, penguin-kernel, peter.enderborg


On 3/20/2018 3:46 PM, Minchan Kim wrote:
> On Fri, Mar 16, 2018 at 04:33:54PM +0530, Vinayak Menon wrote:
>> On 3/15/2018 8:04 PM, Minchan Kim wrote:
>>> Hi Vinayak,
>> Thanks for your comments Minchan.
>>> On Thu, Mar 15, 2018 at 04:04:39PM +0530, Vinayak Menon wrote:
>>>> It is observed that watermark_scale_factor when used to reduce
>>>> thundering herds in direct reclaim, reduces the direct reclaims,
>>>> but results in unnecessary reclaim due to kswapd running for long
>>>> after being woken up. The tests are done with 4 GB of RAM and the
>>>> tests done are multibuild and another which opens a set of apps
>>>> sequentially on Android and repeating the sequence N times. The
>>>> tests are done on 4.9 kernel.
>>>>
>>>> The issue is caused by watermark_scale_factor creating larger than
>>>> required gap between low and high watermarks. The following results
>>>> are with watermark_scale_factor of 120.
>>>>
>>>>                        wsf-120-default  wsf-120-reduced-low-high-gap
>>>> workingset_activate    15120206         8319182
>>>> pgpgin                 269795482        147928581
>>>> allocstall             1406             1498
>>>> pgsteal_kswapd         68676960         38105142
>>>> slabs_scanned          94181738         49085755
>>> "required gap" you mentiond is very dependent for your workload.
>>> You had an experiment with wsf-120. It means user wanted to be more
>>> aggressive for kswapd while your load is not enough to make meomry
>>> consumption spike. Couldn't you decrease wfs?
>> I did try reducing the wsf for both multibuild and Android workloads. But that results in kswapd
>> waking up late and thus latency issues due to higher direct reclaims. As I understand the problem, the
>> wsf in its current form helps in tuning the kswapd wakeups (and note that I have not touched the
>> wsf logic to calculate min-low gap), but the issue arises due to the depth to which kswapd scans the LRUs in a
>> single run, causing thrashing, due to the higher low-high gap. From experiments, it looks like kswapd waking
> wsf conducts kswapd sleep time as well as wakeup time.
>
> "This factor controls the aggressiveness of kswapd. It defines the
> amount of memory left in a node/system before kswapd is woken up and
> how much memory needs to be free before kswapd goes back to sleep."

Yes I understand that. What I meant was the current wsf helps in tuning the wake up time properly, but causes the
kswapd to sleep late and thus causing thrashing.

>
>> up few more times and doing shorter steals is better than kswapd stealing more in a single run. The latter
>> does not better direct reclaims and causes thrashing too.
>
> That's the tradeoff of kswapd aggressiveness to avoid high rate
> direct reclaim.

We can call it trade off only if increasing the aggressiveness of kswapd reduces the direct reclaims ?
But as shown by the data I had shared, the aggressiveness does not improve direct reclaims. It is just causing
unnecessary reclaim. i.e. a much lower low-high gap gives the same benefit on direct reclaims with far less
reclaim.

>>> Don't get me wrong. I don't want you test all of wfs with varios
>>> workload to prove your logic is better. What I want to say here is
>>> it's heuristic so it couldn't be perfect for every workload so
>>> if you change to non-linear, you could be better but others might be not.
>> Yes I understand your point. But since mmtests and Android tests showed similar results, I thought the
>> heuristic may just work across workloads. I assume from Johannes's tests on 140GB machine (from the
>> commit msg of the patch which introduced wsf) that the current low-high gap works well without thrashing
>> on bigger machines. This made me assume that the behavior is non-linear. So the non-linear behavior will
>> not make any difference to higher RAM machines as the low-high remains almost same as shown in the table
>> below. But I understand your point, for a different workload on smaller machines, I am not sure the benefit I
>> see would be observed, though that's the same problem with current wsf too.
> True. That's why I don't want to make it complicate. Later, if someone complains
> "linear is better for his several testing", are you happy to rollback to it? 
>
> You might argue it's same problem now but at least as-is code is simple to
> understand. 

Yes I agree that there can be workloads on low RAM that may see a side effect.A  But since popular use case like those on Android
and also the mmtests shows the problem, and fixed by the patch, can we try to pick it and see if someone complains ? I see that
there were other reports of this https://lkml.org/lkml/2017/11/24/167 . Do you suggest a tunable approach taken by the patch
in that link ? So that varying use cases can be accommodated. I wanted to avoid a new tunable if some heuristic like the patch does
just works.

Thanks,
Vinayak

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: fix low-high watermark distance on small systems
  2018-03-20 10:52       ` Vinayak Menon
@ 2018-03-20 11:29         ` Minchan Kim
  2018-03-22 13:43           ` Vinayak Menon
  0 siblings, 1 reply; 9+ messages in thread
From: Minchan Kim @ 2018-03-20 11:29 UTC (permalink / raw)
  To: Vinayak Menon
  Cc: hannes, mgorman, linux-mm, akpm, mhocko, vbabka, sfr,
	pasha.tatashin, penguin-kernel, peter.enderborg

On Tue, Mar 20, 2018 at 04:22:36PM +0530, Vinayak Menon wrote:
> >
> >> up few more times and doing shorter steals is better than kswapd stealing more in a single run. The latter
> >> does not better direct reclaims and causes thrashing too.
> >
> > That's the tradeoff of kswapd aggressiveness to avoid high rate
> > direct reclaim.
> 
> We can call it trade off only if increasing the aggressiveness of kswapd reduces the direct reclaims ?
> But as shown by the data I had shared, the aggressiveness does not improve direct reclaims. It is just causing
> unnecessary reclaim. i.e. a much lower low-high gap gives the same benefit on direct reclaims with far less
> reclaim.

Said again, it depends on workload. I can make simple test to break it easily.

> 
> >>> Don't get me wrong. I don't want you test all of wfs with varios
> >>> workload to prove your logic is better. What I want to say here is
> >>> it's heuristic so it couldn't be perfect for every workload so
> >>> if you change to non-linear, you could be better but others might be not.
> >> Yes I understand your point. But since mmtests and Android tests showed similar results, I thought the
> >> heuristic may just work across workloads. I assume from Johannes's tests on 140GB machine (from the
> >> commit msg of the patch which introduced wsf) that the current low-high gap works well without thrashing
> >> on bigger machines. This made me assume that the behavior is non-linear. So the non-linear behavior will
> >> not make any difference to higher RAM machines as the low-high remains almost same as shown in the table
> >> below. But I understand your point, for a different workload on smaller machines, I am not sure the benefit I
> >> see would be observed, though that's the same problem with current wsf too.
> > True. That's why I don't want to make it complicate. Later, if someone complains
> > "linear is better for his several testing", are you happy to rollback to it? 
> >
> > You might argue it's same problem now but at least as-is code is simple to
> > understand. 
> 
> Yes I agree that there can be workloads on low RAM that may see a side effect.  But since popular use case like those on Android

My concern is not side effect but putting more heuristic without proving
it's generally better.

I don't think repeated app launching on android doesn't reflect real user
scenario. Anyone don't do that in real life except some guys want to show
benchmark result in youtube.
About mmtests, what kinds of tests did you perform? So what's the result?
If you reduced thrashing, how much the test result is improved?
Every tests are improved? Need not vmstat but result from the benchmark.
Such wide testing would make more conviction.

> and also the mmtests shows the problem, and fixed by the patch, can we try to pick it and see if someone complains ? I see that
> there were other reports of this https://lkml.org/lkml/2017/11/24/167 . Do you suggest a tunable approach taken by the patch
> in that link ? So that varying use cases can be accommodated. I wanted to avoid a new tunable if some heuristic like the patch does
> just works.

Actually, I don't want to touch it unless we have more nice feedback
algorithm.

Anyway, it's just my opinion. I did best effort to explain. I will
defer to maintainer.

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: fix low-high watermark distance on small systems
  2018-03-20 11:29         ` Minchan Kim
@ 2018-03-22 13:43           ` Vinayak Menon
  0 siblings, 0 replies; 9+ messages in thread
From: Vinayak Menon @ 2018-03-22 13:43 UTC (permalink / raw)
  To: Minchan Kim
  Cc: hannes, mgorman, linux-mm, akpm, mhocko, vbabka, sfr,
	pasha.tatashin, penguin-kernel, peter.enderborg

On 3/20/2018 4:59 PM, Minchan Kim wrote:
> On Tue, Mar 20, 2018 at 04:22:36PM +0530, Vinayak Menon wrote:
>>>> up few more times and doing shorter steals is better than kswapd stealing more in a single run. The latter
>>>> does not better direct reclaims and causes thrashing too.
>>> That's the tradeoff of kswapd aggressiveness to avoid high rate
>>> direct reclaim.
>> We can call it trade off only if increasing the aggressiveness of kswapd reduces the direct reclaims ?
>> But as shown by the data I had shared, the aggressiveness does not improve direct reclaims. It is just causing
>> unnecessary reclaim. i.e. a much lower low-high gap gives the same benefit on direct reclaims with far less
>> reclaim.
> Said again, it depends on workload. I can make simple test to break it easily.
>
>>

> I don't think repeated app launching on android doesn't reflect real user
> scenario. Anyone don't do that in real life except some guys want to show
> benchmark result in youtube.

Agree that user won't open apps continuously in sequence like the test does. But I believe it is very similar to what
a user would do with the device. i.e. open an app, do something, switch to another app..then come
back to previous app. The test tries to emulate the same.

> About mmtests, what kinds of tests did you perform? So what's the result?
> If you reduced thrashing, how much the test result is improved?
> Every tests are improved? Need not vmstat but result from the benchmark.
> Such wide testing would make more conviction.

I had performed the multibuild test of mmtests (3G RAM). Results below.

A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A  wsf-defaultA A A A A A A A A  wsf-this-patch
multibuild time (secs)A A A A A A  7338A A A A A A A A A A A A A A A  A  A A  7186
workingset_refaultA A A A A A A A A A A  1228216A A A A A A A A A A A A A  974074A  (-20%)
workingset_activateA A A A A A A A A  292110A A A  A A A  A A A  A A A  181789A  (-37%)
pgpginA A A A A A A A A A A A A A A A A A A A A A  A A A A A A A A A A  11307694A A A A A A A A A A A  8678200 (-23%)
allocstallA A A  A A A  A A A  A A A  A A A  A A A A A A A A A A  98A A A  A A A  A A A  A A A  A A A A A A A A  103


>> and also the mmtests shows the problem, and fixed by the patch, can we try to pick it and see if someone complains ? I see that
>> there were other reports of this https://lkml.org/lkml/2017/11/24/167 . Do you suggest a tunable approach taken by the patch
>> in that link ? So that varying use cases can be accommodated. I wanted to avoid a new tunable if some heuristic like the patch does
>> just works.
> Actually, I don't want to touch it unless we have more nice feedback
> algorithm.
>
> Anyway, it's just my opinion. I did best effort to explain. I will
> defer to maintainer.
>
> Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-03-22 13:43 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-15 10:34 [PATCH] mm: fix low-high watermark distance on small systems Vinayak Menon
2018-03-15 14:34 ` Minchan Kim
2018-03-16 11:03   ` Vinayak Menon
2018-03-20 10:16     ` Minchan Kim
2018-03-20 10:52       ` Vinayak Menon
2018-03-20 11:29         ` Minchan Kim
2018-03-22 13:43           ` Vinayak Menon
2018-03-17  0:42 ` kbuild test robot
2018-03-17  1:40 ` kbuild test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox