On Fri, May 15, 2009 at 01:32:38AM -0400, starlight@binnacle.cx wrote: > Whacked at a this, attempting to build a testcase from a > combination of the original daemon strace in the bug report > and knowledge of what the daemon is doing. > > What emerged is something that will destroy RHEL5 > 2.6.18-128.1.6.el5 100% every time. Completely fills the kernel > message log with "bad pmd" errors and wrecks hugepages. > Ok, I can confirm that more or less. I reproduced the problem on 2.6.18-92.el5 on x86-64 running RHEL 5.2. I didn't have access to a machine with enough memory though so I dropped the requirements slightly. It still triggered a failure though. However, when I ran 2.6.18, 2.6.19 and 2.6.29.1 on the same machine, I could not reproduce the problem, nor could I cause hugepages to leak so I'm leaning towards believing this is a distribution bug at the moment. On the plus side, due to your good work, there is enough available for them to bisect this problem hopefully. > Unfortunately it only occasionally breaks 2.6.29.1. Haven't > been able to produce "bad pmd" messages, but did get the > kernel to think it's out of large page memory when in > theory it was not. Saw a lot of really strange accounting > in the hugepage section of /proc/meminfo. > What sort of strange accounting? The accounting has changed since 2.6.18 so I want to be sure you're really seeing something weird. When I was testing, I didn't see anything out of the ordinary but maybe I'm looking in a different place. > For what it's worth, the testcase code is attached. > I cleaned the test up a bit and wrote a wrapper script to run this multiple times while checking for hugepage leaks. I've it running in a loop while the machine runs sysbench as a stress test to see can I cause anything out of the ordinary to happen. Nothing so far though. > Note that hugepages=2048 is assumed--the bug seems to require > use of more than 50% of large page memory. > > Definately will be posted under the RHEL5 bug report, which is > the more pressing issue here than far-future kernel support. > If you've filed a RedHat bug, this modified testcase and wrapper script might help them. The program exists and cleans up after itself and the memory requirements are less. The script sets the machine up in a way that breaks for me where the breakage is bad pmd messages and hugepages leaking. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab