From: Mikulas Patocka <mpatocka@redhat.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Helge Deller <deller@gmx.de>,
"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
John David Anglin <dave.anglin@bell.net>,
linux-parisc@vger.kernel.org, linux-mm@kvack.org,
Vlastimil Babka <vbabka@suse.cz>,
Andrea Arcangeli <aarcange@redhat.com>,
Zi Yan <zi.yan@cs.rutgers.edu>
Subject: Re: Memory management broken by "mm: reclaim small amounts of memory when an external fragmentation event occurs"
Date: Mon, 8 Apr 2019 07:10:11 -0400 (EDT) [thread overview]
Message-ID: <alpine.LRH.2.02.1904080639570.4674@file01.intranet.prod.int.rdu2.redhat.com> (raw)
In-Reply-To: <20190408095224.GA18914@techsingularity.net>
On Mon, 8 Apr 2019, Mel Gorman wrote:
> On Sat, Apr 06, 2019 at 11:20:35AM -0400, Mikulas Patocka wrote:
> > Hi
> >
> > The patch 1c30844d2dfe272d58c8fc000960b835d13aa2ac ("mm: reclaim small
> > amounts of memory when an external fragmentation event occurs") breaks
> > memory management on parisc.
> >
> > I have a parisc machine with 7GiB RAM, the chipset maps the physical
> > memory to three zones:
> > 0) Start 0x0000000000000000 End 0x000000003fffffff Size 1024 MB
> > 1) Start 0x0000000100000000 End 0x00000001bfdfffff Size 3070 MB
> > 2) Start 0x0000004040000000 End 0x00000040ffffffff Size 3072 MB
> > (but it is not NUMA)
> >
> > With the patch 1c30844d2, the kernel will incorrectly reclaim the first
> > zone when it fills up, ignoring the fact that there are two completely
> > free zones. Basiscally, it limits cache size to 1GiB.
> >
> > For example, if I run:
> > # dd if=/dev/sda of=/dev/null bs=1M count=2048
> >
> > - with the proper kernel, there should be "Buffers - 2GiB" when this
> > command finishes. With the patch 1c30844d2, buffers will consume just 1GiB
> > or slightly more, because the kernel was incorrectly reclaiming them.
> >
>
> I could argue that the feature is behaving as expected for separate
> pgdats but that's neither here nor there. The bug is real but I have a
> few questions.
>
> First, if pa-risc is !NUMA then why are separate local ranges
> represented as separate nodes? Is it because of DISCONTIGMEM or something
> else? DISCONTIGMEM is before my time so I'm not familiar with it and
I'm not an expert in this area, I don't know.
> I consider it "essentially dead" but the arch init code seems to setup
> pgdats for each physical contiguous range so it's a possibility. The most
> likely explanation is pa-risc does not have hardware with addressing
> limitations smaller than the CPUs physical address limits and it's
> possible to have more ranges than available zones but clarification would
> be nice. By rights, SPARSEMEM would be supported on pa-risc but that
> would be a time-consuming and somewhat futile exercise. Regardless of the
> explanation, as pa-risc does not appear to support transparent hugepages,
> an option is to special case watermark_boost_factor to be 0 on DISCONTIGMEM
> as that commit was primarily about THP with secondary concerns around
> SLUB. This is probably the most straight-forward solution but it'd need
> a comment obviously. I do not know what the distro configurations for
> pa-risc set as I'm not a user of gentoo or debian.
I use Debian Sid, but I compile my own kernel. I uploaded the kernel
.config here:
http://people.redhat.com/~mpatocka/testcases/parisc-config.txt
> Second, if you set the sysctl vm.watermark_boost_factor=0, does the
> problem go away? If so, an option would be to set this sysctl to 0 by
> default on distros that support pa-risc. Would that be suitable?
I have tried it and the problem almost goes away. With
vm.watermark_boost_factor=0, if I read 2GiB data from the disk, the buffer
cache will contain about 1.8GiB. So, there's still some superfluous page
reclaim, but it is smaller.
BTW. I'm interested - on real NUMA machines - is reclaiming the file cache
really a better option than allocating the file cache from non-local node?
> Finally, I'm sure this has been asked before buy why is pa-risc alive?
> It appears a new CPU has not been manufactured since 2005. Even Alpha
> I can understand being semi-alive since it's an interesting case for
> weakly-ordered memory models. pa-risc appears to be supported and active
> for debian at least so someone cares. It's not the only feature like this
> that is bizarrely alive but it is curious -- 32 bit NUMA support on x86,
> I'm looking at you, your machines are all dead since the early 2000's
> AFAIK and anyone else using NUMA on 32-bit x86 needs their head examined.
I use it to test programs for portability to risc.
If one could choose between buying an expensive power system or a cheap
pa-risc system, pa-risc may be a better choice. The last pa-risc model has
four cores at 1.1GHz, so it is not completely unuseable.
Mikulas
> --
> Mel Gorman
> SUSE Labs
>
next prev parent reply other threads:[~2019-04-08 11:10 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-06 15:20 Mikulas Patocka
2019-04-06 17:26 ` Mikulas Patocka
2019-04-08 9:52 ` Mel Gorman
2019-04-08 11:10 ` Mikulas Patocka [this message]
2019-04-08 12:54 ` Mel Gorman
2019-04-08 14:29 ` James Bottomley
2019-04-08 15:22 ` Helge Deller
2019-04-08 19:44 ` James Bottomley
2019-04-09 20:09 ` Helge Deller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LRH.2.02.1904080639570.4674@file01.intranet.prod.int.rdu2.redhat.com \
--to=mpatocka@redhat.com \
--cc=James.Bottomley@hansenpartnership.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=dave.anglin@bell.net \
--cc=deller@gmx.de \
--cc=linux-mm@kvack.org \
--cc=linux-parisc@vger.kernel.org \
--cc=mgorman@techsingularity.net \
--cc=vbabka@suse.cz \
--cc=zi.yan@cs.rutgers.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox