From: Bharath Vedartham <linux.bhar@gmail.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Khalid Aziz <khalid.aziz@oracle.com>,
akpm@linux-foundation.org, vbabka@suse.cz,
mgorman@techsingularity.net, dan.j.williams@intel.com,
osalvador@suse.de, richard.weiyang@gmail.com, hannes@cmpxchg.org,
arunks@codeaurora.org, rppt@linux.vnet.ibm.com, jgg@ziepe.ca,
amir73il@gmail.com, alexander.h.duyck@linux.intel.com,
linux-mm@kvack.org,
linux-kernel-mentees@lists.linuxfoundation.org,
linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 0/2] Add predictive memory reclamation and compaction
Date: Wed, 28 Aug 2019 18:39:22 +0530 [thread overview]
Message-ID: <20190828130922.GA10127@bharath12345-Inspiron-5559> (raw)
In-Reply-To: <20190827061606.GN7538@dhcp22.suse.cz>
Hi Michal, Thank you for spending your time on this.
On Tue, Aug 27, 2019 at 08:16:06AM +0200, Michal Hocko wrote:
> On Tue 27-08-19 02:14:20, Bharath Vedartham wrote:
> > Hi Michal,
> >
> > Here are some of my thoughts,
> > On Wed, Aug 21, 2019 at 04:06:32PM +0200, Michal Hocko wrote:
> > > On Thu 15-08-19 14:51:04, Khalid Aziz wrote:
> > > > Hi Michal,
> > > >
> > > > The smarts for tuning these knobs can be implemented in userspace and
> > > > more knobs added to allow for what is missing today, but we get back to
> > > > the same issue as before. That does nothing to make kernel self-tuning
> > > > and adds possibly even more knobs to userspace. Something so fundamental
> > > > to kernel memory management as making free pages available when they are
> > > > needed really should be taken care of in the kernel itself. Moving it to
> > > > userspace just means the kernel is hobbled unless one installs and tunes
> > > > a userspace package correctly.
> > >
> > > From my past experience the existing autotunig works mostly ok for a
> > > vast variety of workloads. A more clever tuning is possible and people
> > > are doing that already. Especially for cases when the machine is heavily
> > > overcommited. There are different ways to achieve that. Your new
> > > in-kernel auto tuning would have to be tested on a large variety of
> > > workloads to be proven and riskless. So I am quite skeptical to be
> > > honest.
> > Could you give some references to such works regarding tuning the kernel?
>
> Talk to Facebook guys and their usage of PSI to control the memory
> distribution and OOM situations.
Yup. Thanks for the pointer.
> > Essentially, Our idea here is to foresee potential memory exhaustion.
> > This foreseeing is done by observing the workload, observing the memory
> > usage of the workload. Based on this observations, we make a prediction
> > whether or not memory exhaustion could occur.
>
> I understand that and I am not disputing this can be useful. All I do
> argue here is that there is unlikely a good "crystall ball" for most/all
> workloads that would justify its inclusion into the kernel and that this
> is something better done in the userspace where you can experiment and
> tune the behavior for a particular workload of your interest.
>
> Therefore I would like to shift the discussion towards existing APIs and
> whether they are suitable for such an advance auto-tuning. I haven't
> heard any arguments about missing pieces.
I understand your concern here. Just confirming, by APIs you are
referring to sysctls, sysfs files and stuff like that right?
> > If memory exhaustion
> > occurs, we reclaim some more memory. kswapd stops reclaim when
> > hwmark is reached. hwmark is usually set to a fairly low percentage of
> > total memory, in my system for zone Normal hwmark is 13% of total pages.
> > So there is scope for reclaiming more pages to make sure system does not
> > suffer from a lack of pages.
>
> Yes and we have ways to control those watermarks that your monitoring
> tool can use to alter the reclaim behavior.
Just to confirm here, I am aware of one way which is to alter
min_kfree_bytes values. What other ways are there to alter watermarks
from user space?
> [...]
> > > Therefore I would really focus on discussing whether we have sufficient
> > > APIs to tune the kernel to do the right thing when needed. That requires
> > > to identify gaps in that area.
> > One thing that comes to my mind is based on the issue Khalid mentioned
> > earlier on how his desktop took more than 30secs to boot up because of
> > the caches using up a lot of memory.
> > Rather than allowing any unused memory to be the page cache, would it be
> > a good idea to fix a size for the caches and elastically change the size
> > based on the workload?
>
> I do not think so. Limiting the pagecache is unlikely to help as it is
> really cheap to reclaim most of the time. In those cases when this is
> not the case (e.g. the underlying FS needs to flush and/or metadata)
> then the same would be possible in a restricted page cache situation
> and you could easily end up stalled waiting for pagecache (e.g. any
> executable/library) while there is a lot of memory.
That makes sense to me.
> I cannot comment on the Khalid's example because there were no details
> there but I would be really surprised if the primary source of stall was
> the pagecache.
Should have done more research before talking :) Sorry about that.
> --
> Michal Hocko
> SUSE Labs
next prev parent reply other threads:[~2019-08-28 13:09 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-13 1:40 Khalid Aziz
2019-08-13 1:40 ` [RFC PATCH 1/2] mm: Add trend based prediction algorithm for memory usage Khalid Aziz
2019-08-13 1:40 ` [RFC PATCH 2/2] mm/vmscan: Add fragmentation and page starvation prediction to kswapd Khalid Aziz
2019-08-13 14:05 ` [RFC PATCH 0/2] Add predictive memory reclamation and compaction Michal Hocko
2019-08-13 15:20 ` Khalid Aziz
2019-08-14 8:58 ` Michal Hocko
2019-08-15 16:27 ` Khalid Aziz
2019-08-15 17:02 ` Michal Hocko
2019-08-15 20:51 ` Khalid Aziz
2019-08-21 14:06 ` Michal Hocko
2019-08-26 20:44 ` Bharath Vedartham
2019-08-27 6:16 ` Michal Hocko
2019-08-28 13:09 ` Bharath Vedartham [this message]
2019-08-28 13:15 ` Michal Hocko
2019-08-30 21:35 ` Khalid Aziz
2019-09-02 8:02 ` Michal Hocko
2019-09-03 19:45 ` Khalid Aziz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190828130922.GA10127@bharath12345-Inspiron-5559 \
--to=linux.bhar@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.h.duyck@linux.intel.com \
--cc=amir73il@gmail.com \
--cc=arunks@codeaurora.org \
--cc=dan.j.williams@intel.com \
--cc=hannes@cmpxchg.org \
--cc=jgg@ziepe.ca \
--cc=khalid.aziz@oracle.com \
--cc=linux-kernel-mentees@lists.linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@kernel.org \
--cc=osalvador@suse.de \
--cc=richard.weiyang@gmail.com \
--cc=rppt@linux.vnet.ibm.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox