linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* 2.3 Pagedir allocation/free and update races
@ 1999-12-14  9:46 Kanoj Sarcar
  1999-12-14  9:55 ` David S. Miller
  1999-12-14 10:13 ` Jakub Jelinek
  0 siblings, 2 replies; 6+ messages in thread
From: Kanoj Sarcar @ 1999-12-14  9:46 UTC (permalink / raw)
  To: linux-kernel, linux-mm; +Cc: Kanoj Sarcar

This note describes races in the procedures that allocate and free
user level page directories. Platform code maintainers, please
read.

Prior to 2.3, a lot of architectures used to cache unused page
directories to minimize copying the kernel level page tables
from the swapper_pg_dir. 2.2 used to rely on the kernel lock
for manipulating the cache list. In 2.3.31, page directories
are allocated with the kernel lock, but not freed with this lock.
So, all architectures that cache page directories have racy code.
The i386 does not cache page directories in 2.3.31.

In 2.2, set_pgdir() could rely on kernel_lock as it traversed the
cached page directories while updating them. 2.2 set_pgdir() was
racy in the sense that it could miss updating page directories
that belonged to newly created processes that were still not on
the active task list. This race exists in 2.3.31. To compound this
race, there might be page directories that do not have active
processes anymore due to optimized mm context switching.

Partly to solve this problem, in 2.3.32, an active mm list has
been added, that links up all the active mm's on the system.
Page directory allocation and freeing happen at precisely the
same times as elements are added and deleted from the active mm
list. Also, to make sure that no page directory misses seeing
updates by set_pgdir(), the pgd_alloc() call has been replaced
by 3 different calls:

a. get_pgd_fast(), which can not go to sleep since it is called
with the mmlist spin lock held. Ideal for searching cached page
directories for a free one.
b. get_pgd_slow(), that is called without mmlist lock, can go to
sleep, user level pgde's can be 0'ed out.
c. get_pgd_uptopdate(), called on newly allocated pgdirs, with the
mmlist lock held, to update kernel level pgde's. Cache lists may
be updated, if in-use pgdirs are being cached.

(Yes, alternate solutions are possible, this is the one that makes
most sense globally).

Also, pgd_free() is now called with mmlist lock held.

Now, the mmlist lock can be used to implement a global (not percpu)
cached list of page directories.

For example, the mmlist lock has been used to restore pgd caching
for i386. The i386 do_check_pgt_cache() also grabs the mmlist lock
while freeing extra pgdirs.

I also updated the code for these hooks for alpha, mips, ppc, sh
and sparc64. Most of these had page caching, but did not have a
lock to protect the cache. Now, they can use the mmlist lock for
this. I have _NOT_ fixed the do_check_pgt_cache() to use
mmlist_lock, which is needed for complete MP-safety.

The arm, m68k and sparc code is a little more intricate, I would
like the platform maintainers to take a look at these issues.

I would like to hear back from at least one platform maintainer
for each architecture (except i386), confirming that he will look
into and fix this issue. Lets say by end of this week, by private
mail.  I am willing to review any changes. If you would like me to
fix the code for you, let me know too. Else, I will have to grope
through unfamiliar code, and send in changes to Linus without having
any way to test the changes.

Thanks a lot for taking the time to read this!

Kanoj

PS: Look for mmlist_access_lock/mmlist_modify_lock/mmlist_set_pgdir
for examples on how to use the lists and lock.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.3 Pagedir allocation/free and update races
  1999-12-14  9:46 2.3 Pagedir allocation/free and update races Kanoj Sarcar
@ 1999-12-14  9:55 ` David S. Miller
  1999-12-14 10:13 ` Jakub Jelinek
  1 sibling, 0 replies; 6+ messages in thread
From: David S. Miller @ 1999-12-14  9:55 UTC (permalink / raw)
  To: kanoj; +Cc: linux-kernel, linux-mm

   So, all architectures that cache page directories have racy code.

They are cpu local, how can they be racy?

If you are mentioning the kernel pgdir update cases, well even then
many architectures do not even need to update any of the pgtable
caches during such events (sparc64 for example).

Later,
David S. Miller
davem@redhat.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.3 Pagedir allocation/free and update races
  1999-12-14  9:46 2.3 Pagedir allocation/free and update races Kanoj Sarcar
  1999-12-14  9:55 ` David S. Miller
@ 1999-12-14 10:13 ` Jakub Jelinek
  1999-12-14 23:00   ` Kanoj Sarcar
  1 sibling, 1 reply; 6+ messages in thread
From: Jakub Jelinek @ 1999-12-14 10:13 UTC (permalink / raw)
  To: Kanoj Sarcar; +Cc: linux-kernel, linux-mm

On Tue, Dec 14, 1999 at 01:46:00AM -0800, Kanoj Sarcar wrote:
> This note describes races in the procedures that allocate and free
> user level page directories. Platform code maintainers, please
> read.

pgd_alloc/free are never invoked from interrupts and the kernel code is not
preemptive, so per-CPU caches are safe, aren't they?

Cheers,
    Jakub
___________________________________________________________________
Jakub Jelinek | jakub@redhat.com | http://sunsite.mff.cuni.cz/~jj
Linux version 2.3.26 on a sparc64 machine (1343.49 BogoMips)
___________________________________________________________________
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.3 Pagedir allocation/free and update races
  1999-12-14 10:13 ` Jakub Jelinek
@ 1999-12-14 23:00   ` Kanoj Sarcar
  1999-12-14 23:08     ` David S. Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Kanoj Sarcar @ 1999-12-14 23:00 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: linux-kernel, linux-mm, davem

> 
> On Tue, Dec 14, 1999 at 01:46:00AM -0800, Kanoj Sarcar wrote:
> > This note describes races in the procedures that allocate and free
> > user level page directories. Platform code maintainers, please
> > read.
> 
> pgd_alloc/free are never invoked from interrupts and the kernel code is not
> preemptive, so per-CPU caches are safe, aren't they?
> 
> Cheers,
>     Jakub

Yes, I am sorry for the misleading logic in my note. Per-cpu caches are 
safe (I wonder why it was taken out for i386). For architectures that 
have to do set_pgdir() though, the pgdir update code might be racy, 
unless the arch code has locks to protect the page directory scanning.

Btw, Linus indicated to me he ran into problems with the patch, and 
will be pulling it out in the next pre-release. I will take a closer look 
at the code.

Kanoj
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.3 Pagedir allocation/free and update races
  1999-12-14 23:00   ` Kanoj Sarcar
@ 1999-12-14 23:08     ` David S. Miller
  1999-12-14 23:51       ` Kanoj Sarcar
  0 siblings, 1 reply; 6+ messages in thread
From: David S. Miller @ 1999-12-14 23:08 UTC (permalink / raw)
  To: kanoj; +Cc: jakub, linux-kernel, linux-mm

   Yes, I am sorry for the misleading logic in my note. Per-cpu caches are 
   safe (I wonder why it was taken out for i386). For architectures that 
   have to do set_pgdir() though, the pgdir update code might be racy, 
   unless the arch code has locks to protect the page directory scanning.

   Btw, Linus indicated to me he ran into problems with the patch, and 
   will be pulling it out in the next pre-release. I will take a closer look 
   at the code.

Just handle the set_pgdir() stuff like this:

        pgcache_update_flag = 0;
	smp_call_func(ALL_CPUS, update_pgcaches_and_wait_on_flag);
	update_local_pgcache();
	pgcache_update_flag = 1;
	for_each_task(tsk)
		update_pgdir(tsk);

That should give the correct synchronization with zero cost
for the fast normal paths which can rely solely on the cpu
localness of the data structure.

Later,
David S. Miller
davem@redhat.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.3 Pagedir allocation/free and update races
  1999-12-14 23:08     ` David S. Miller
@ 1999-12-14 23:51       ` Kanoj Sarcar
  0 siblings, 0 replies; 6+ messages in thread
From: Kanoj Sarcar @ 1999-12-14 23:51 UTC (permalink / raw)
  To: David S. Miller; +Cc: jakub, linux-kernel, linux-mm

> 
>    From: kanoj@google.engr.sgi.com (Kanoj Sarcar)
>    Date: Tue, 14 Dec 1999 15:00:19 -0800 (PST)
> 
>    Yes, I am sorry for the misleading logic in my note. Per-cpu caches are 
>    safe (I wonder why it was taken out for i386). For architectures that 
>    have to do set_pgdir() though, the pgdir update code might be racy, 
>    unless the arch code has locks to protect the page directory scanning.
> 
>    Btw, Linus indicated to me he ran into problems with the patch, and 
>    will be pulling it out in the next pre-release. I will take a closer look 
>    at the code.
> 
> Just handle the set_pgdir() stuff like this:
> 
>         pgcache_update_flag = 0;
> 	smp_call_func(ALL_CPUS, update_pgcaches_and_wait_on_flag);
> 	update_local_pgcache();
> 	pgcache_update_flag = 1;
> 	for_each_task(tsk)
> 		update_pgdir(tsk);

As I mentioned, there are possibly multiple ways in which this can
be fixed. Note that mmlists are not needed solely for set_pgdir().

David, unless I am mistaken (which is happening fairly frequently),
in your solution, set_pgdir() is going to miss a page directory that
a parent has allocated for a child, but the child is not yet on the 
tasklist. Yes, the arch code can keep a list of all allocated page
directories ... I am just trying to come up with a solution that
will work for most architectures, where the common case is that
the pgdir cache does not have a lock because it is percpu.

Kanoj

> 
> That should give the correct synchronization with zero cost
> for the fast normal paths which can rely solely on the cpu
> localness of the data structure.
> 
> Later,
> David S. Miller
> davem@redhat.com
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~1999-12-14 23:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-12-14  9:46 2.3 Pagedir allocation/free and update races Kanoj Sarcar
1999-12-14  9:55 ` David S. Miller
1999-12-14 10:13 ` Jakub Jelinek
1999-12-14 23:00   ` Kanoj Sarcar
1999-12-14 23:08     ` David S. Miller
1999-12-14 23:51       ` Kanoj Sarcar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox