From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-f173.google.com (mail-ob0-f173.google.com [209.85.214.173]) by kanga.kvack.org (Postfix) with ESMTP id 58C896B0253 for ; Thu, 18 Feb 2016 02:55:15 -0500 (EST) Received: by mail-ob0-f173.google.com with SMTP id gc3so53920563obb.3 for ; Wed, 17 Feb 2016 23:55:15 -0800 (PST) Received: from mail-ob0-x22a.google.com (mail-ob0-x22a.google.com. [2607:f8b0:4003:c01::22a]) by mx.google.com with ESMTPS id d5si7530886oic.38.2016.02.17.23.55.14 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 17 Feb 2016 23:55:14 -0800 (PST) Received: by mail-ob0-x22a.google.com with SMTP id wb13so55089392obb.1 for ; Wed, 17 Feb 2016 23:55:14 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: <56C2EDC1.2090509@huawei.com> <20160216173849.GA10487@kroah.com> Date: Thu, 18 Feb 2016 15:55:14 +0800 Message-ID: Subject: Re: [PATCH] mm: add MM_SWAPENTS and page table when calculate tasksize in lowmem_scan() From: "Figo.zhang" Content-Type: multipart/alternative; boundary=001a11c2a5f6cf41c4052c06af1d Sender: owner-linux-mm@kvack.org List-ID: To: David Rientjes Cc: Greg Kroah-Hartman , Xishi Qiu , arve@android.com, riandrews@android.com, devel@driverdev.osuosl.org, zhong jiang , LKML , Linux MM --001a11c2a5f6cf41c4052c06af1d Content-Type: text/plain; charset=UTF-8 2016-02-17 8:35 GMT+08:00 David Rientjes : > On Tue, 16 Feb 2016, Greg Kroah-Hartman wrote: > > > On Tue, Feb 16, 2016 at 05:37:05PM +0800, Xishi Qiu wrote: > > > Currently tasksize in lowmem_scan() only calculate rss, and not > include swap. > > > But usually smart phones enable zram, so swap space actually use ram. > > > > Yes, but does that matter for this type of calculation? I need an ack > > from the android team before I could ever take such a core change to > > this code... > > > > The calculation proposed in this patch is the same as the generic oom > killer, it's an estimate of the amount of memory that will be freed if it > is killed and can exit. This is better than simply get_mm_rss(). > > However, I think we seriously need to re-consider the implementation of > the lowmem killer entirely. It currently abuses the use of TIF_MEMDIE, > which should ideally only be set for one thread on the system since it > allows unbounded access to global memory reserves. > i don't understand why it need wait 1 second: if (test_tsk_thread_flag(p, TIF_MEMDIE) && time_before_eq(jiffies, lowmem_deathpending_timeout)) { task_unlock(p); rcu_read_unlock(); return 0; <= why return rather than continue? } and it will retry and wait many CPU times if one task holding the TIF_MEMDI. shrink_slab_node() while() shrinker->scan_objects(); lowmem_scan() if (test_tsk_thread_flag(p, TIF_MEMDIE) && time_before_eq(jiffies, lowmem_deathpending_timeout)) > > It also abuses the user-visible /proc/self/oom_score_adj tunable: this > tunable is used by the generic oom killer to bias or discount a proportion > of memory from a process's usage. This is the only supported semantic of > the tunable. The lowmem killer uses it as a strict prioritization, so any > process with oom_score_adj higher than another process is preferred for > kill, REGARDLESS of memory usage. This leads to priority inversion, the > user is unable to always define the same process to be killed by the > generic oom killer and the lowmem killer. This is what happens when a > tunable with a very clear and defined purpose is used for other reasons. > > I'd seriously consider not accepting any additional hacks on top of this > code until the implementation is rewritten. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > --001a11c2a5f6cf41c4052c06af1d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


2016-02-17 8:35 GMT+08:00 David Rientjes <rientjes@google.com&g= t;:
On Tue, 16 Feb 2016, Greg K= roah-Hartman wrote:

> On Tue, Feb 16, 2016 at 05:37:05PM +0800, Xishi Qiu wrote:
> > Currently tasksize in lowmem_scan() only calculate rss, and not i= nclude swap.
> > But usually smart phones enable zram, so swap space actually use = ram.
>
> Yes, but does that matter for this type of calculation?=C2=A0 I need a= n ack
> from the android team before I could ever take such a core change to > this code...
>

The calculation proposed in this patch is the same as the generic oo= m
killer, it's an estimate of the amount of memory that will be freed if = it
is killed and can exit.=C2=A0 This is better than simply get_mm_rss().

However, I think we seriously need to re-consider the implementation of
the lowmem killer entirely.=C2=A0 It currently abuses the use of TIF_MEMDIE= ,
which should ideally only be set for one thread on the system since it
allows unbounded access to global memory reserves.

i don't understand why it need wait 1 second= :

= if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
=C2=A0 =C2=A0time_before_eq= (jiffies, lowmem_deathpending_timeout)) {
task_unlock(p);
rcu_read_unlock();
return 0; =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 <=3D why return rather than continue?
}

and it wil= l retry and wait many CPU times if one task holding the TIF_MEMDI.
=C2=A0 =C2=A0shrink_slab_node() =C2=A0=C2=A0
=C2=A0 =C2=A0 =C2= =A0 =C2=A0while()
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0shrink= er->scan_objects();
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0lowmem_scan()
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
=C2=A0=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 time_before_eq(jiffies, lo= wmem_deathpending_timeout))=C2=A0

=C2=A0

It also abuses the user-visible /proc/self/oom_score_adj tunable: this
tunable is used by the generic oom killer to bias or discount a proportion<= br> of memory from a process's usage.=C2=A0 This is the only supported sema= ntic of
the tunable.=C2=A0 The lowmem killer uses it as a strict prioritization, so= any
process with oom_score_adj higher than another process is preferred for
kill, REGARDLESS of memory usage.=C2=A0 This leads to priority inversion, t= he
user is unable to always define the same process to be killed by the
generic oom killer and the lowmem killer.=C2=A0 This is what happens when a=
tunable with a very clear and defined purpose is used for other reasons.
I'd seriously consider not accepting any additional hacks on top of thi= s
code until the implementation is rewritten.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.= =C2=A0 For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=3Dmailto:"dont@kvack.org"> email@kva= ck.org </a>

--001a11c2a5f6cf41c4052c06af1d-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org