On Wed, May 04, 2011 at 10:32:01AM +0800, Dave Young wrote:
> On Wed, May 4, 2011 at 9:56 AM, Dave Young <hidave.darkstar@gmail.com> wrote:
> > On Thu, Apr 28, 2011 at 9:36 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> >> Concurrent page allocations are suffering from high failure rates.
> >>
> >> On a 8p, 3GB ram test box, when reading 1000 sparse files of size 1GB,
> >> the page allocation failures are
> >>
> >> nr_alloc_fail 733 A  A  A  # interleaved reads by 1 single task
> >> nr_alloc_fail 11799 A  A  # concurrent reads by 1000 tasks
> >>
> >> The concurrent read test script is:
> >>
> >> A  A  A  A for i in `seq 1000`
> >> A  A  A  A do
> >> A  A  A  A  A  A  A  A truncate -s 1G /fs/sparse-$i
> >> A  A  A  A  A  A  A  A dd if=/fs/sparse-$i of=/dev/null &
> >> A  A  A  A done
> >>
> >
> > With Core2 Duo, 3G ram, No swap partition I can not produce the alloc fail
> 
> unset CONFIG_SCHED_AUTOGROUP and CONFIG_CGROUP_SCHED seems affects the
> test results, now I see several nr_alloc_fail (dd is not finished
> yet):
> 
> dave@darkstar-32:$ grep fail /proc/vmstat:
> nr_alloc_fail 4
> compact_pagemigrate_failed 0
> compact_fail 3
> htlb_buddy_alloc_fail 0
> thp_collapse_alloc_fail 4
> 
> So the result is related to cpu scheduler.

Good catch! My kernel also disabled CONFIG_CGROUP_SCHED and
CONFIG_SCHED_AUTOGROUP.

Thanks,
Fengguang