linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -V2 0/2] fix oom happening when changing cpuset'mems(was: [regression] cpuset,mm: update tasks' mems_allowed in time (58568d2))
@ 2010-05-04 10:53 Miao Xie
  0 siblings, 0 replies; only message in thread
From: Miao Xie @ 2010-05-04 10:53 UTC (permalink / raw)
  To: David Rientjes, Nick Piggin, Paul Menage, Lee Schermerhorn
  Cc: Andrew Morton, Linux-Kernel, Linux-MM

[-- Attachment #1: Type: text/plain, Size: 1972 bytes --]

Nick Piggin reported that the allocator may see an empty nodemask when changing
cpuset's mems[1]. It happens only on the kernel that do not do atomic nodemask_t
stores. (MAX_NUMNODES > BITS_PER_LONG)

But I found that there is also a problem on the kernel that can do atomic
nodemask_t stores. The problem is that the allocator can't find a node to
alloc page when changing cpuset's mems though there is a lot of free memory.
The reason is like this:
(mpol: mempolicy)
	task1			task1's mpol	task2
	alloc page		1
	  alloc on node0? NO	1
				1		change mems from 1 to 0
				1		rebind task1's mpol
				0-1		  set new bits
				0	  	  clear disallowed bits
	  alloc on node1? NO	0
	  ...
	can't alloc page
	  goto oom

I can use the attached program reproduce it by the following step:
# mkdir /dev/cpuset
# mount -t cpuset cpuset /dev/cpuset
# mkdir /dev/cpuset/1
# echo `cat /dev/cpuset/cpus` > /dev/cpuset/1/cpus
# echo `cat /dev/cpuset/mems` > /dev/cpuset/1/mems
# echo $$ > /dev/cpuset/1/tasks
# numactl --membind=`cat /dev/cpuset/mems` ./cpuset_mem_hog <nr_tasks> &
   <nr_tasks> = max(nr_cpus - 1, 1)
# killall -s SIGUSR1 cpuset_mem_hog
# ./change_mems.sh

several hours later, oom will happen though there is a lot of free memory.

This patchset fixes this problem by expanding the nodes range first(set newly
allowed bits) and shrink it lazily(clear newly disallowed bits). So we use a
variable to tell the write-side task that read-side task is reading nodemask,
and the write-side task clears newly disallowed nodes after read-side task ends
the current memory allocation.

Changelog since V1:
- restructure the mempolicy's rebind functions, and split the rebind work to
  two steps because the rebind functions may breaks the first step - expanding
  the nodes range.

Thanks
Miao

[1] http://lkml.org/lkml/2010/2/18/111

[PATCH 1/2] mempolicy: restructure rebinding-mempolicy functions
[PATCH 2/2] cpuset,mm: fix no node to alloc memory when changing cpuset's mems

[-- Attachment #2: reproduce_prog.tar.gz --]
[-- Type: application/gzip, Size: 1190 bytes --]

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2010-05-04 10:53 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-04 10:53 [PATCH -V2 0/2] fix oom happening when changing cpuset'mems(was: [regression] cpuset,mm: update tasks' mems_allowed in time (58568d2)) Miao Xie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox