Lazy page reclamation on SMP machines: memory barriers

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Lazy page reclamation on SMP machines: memory barriers
@ 1998-03-23 22:49 Stephen C. Tweedie
  1998-03-23 23:20 ` Linus Torvalds
  1998-03-23 23:37 ` Linus Torvalds
  0 siblings, 2 replies; 7+ messages in thread
From: Stephen C. Tweedie @ 1998-03-23 22:49 UTC (permalink / raw)
  To: Linus Torvalds, linux-mm, linux-smp

Hi,

I am currently finalising some work on lazy page reclamation for the 2.1
kernels.  The basic mechanism involves spinlocking the page cache
linkages, and allowing pages in the cache to be removed and placed on
the free list even from within interrupt context.  

We keep a queue of pages which have been scavenged by the page stealers
(vmscan and shrink_mmap).  Pages on this lazy reclamation queue have a
PG_lazy bit set in the page flags, so they can be safely avoided by
shrink_mmap().  By making a design decision to clear this lazy bit as
the very last step in freeing a lazy page, and by ensuring that the bit
is otherwise only ever set or tested under the page cache spinlock, we
can safely make a test of the lazy bit without taking both that spinlock
and the global kernel lock (which we hold for most VM operations
anyway).  If the lazy bit is clear, then we know, for sure, that no
other CPU can be in the process of freeing up that cached page.

The problem with this scheme is that although it avoids unnecessary page
cache spinlocking, it does rely on memory ordering.  In particular,
there are problems with interactions between one CPU testing the lazy
bit with the kernel spinlock held, and another CPU in interrupt context
freeing the page and then clearing the lazy bit with the page cache
spinlock held.  If any of the memory operations on the second CPU are
reordered on the first, either because writes have been reordered on the
freeing CPU or reads have been reordered on the scanning CPU, then the
protection has failed.

In other words, safety requires that I can guarantee:

In interrupt context:

	spin_lock(&page_cache_lock);
	free_page_from_page_cache(page);

	write_barrier();

	clear_bit(PG_lazy, &page->flags);
	spin_unlock(&page_cache_lock);

and with the kernel lock held in process context:

	if (!test_bit(PG_lazy, &page_flags)) {

		read_barrier();

		if (test_page()) {
			spin_lock_irqsave(&page_cache_lock, flags);
			do_something();
			spin_unlock_irqrestore(&page_cache_lock, flags);
		}
	}		

The cost of taking the spinlock for every page scanned in the second
section would be prohibitive.  With the barriers in place, the kernel
spinlock protects the second section from other CPUs trying to set the
lazy bit unexpectedly, but only the ordering guarantee on the lazy bit
protects it from another CPU freeing the page.  If the clearing of the
lazy bit is visible early on the testing CPU, then we the test_page() or
do_something() calls may not be safe.

Are there barrier constructs available to do this?  I believe the answer
to be no, based on the recent thread concerning the use of inline asm
cpuid instructions as a barrier on Intel machines.  Alternatively, does
Intel provide any ordering guarantees which may help?

Finally, I looked quickly at the kernel's spinlock primitives, and they
also seem unprotected by memory barriers on Intel.  Is this really safe?

--Stephen

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Lazy page reclamation on SMP machines: memory barriers
  1998-03-23 22:49 Lazy page reclamation on SMP machines: memory barriers Stephen C. Tweedie
@ 1998-03-23 23:20 ` Linus Torvalds
  1998-03-24 22:54   ` Stephen C. Tweedie
  1998-03-23 23:37 ` Linus Torvalds
  1 sibling, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 1998-03-23 23:20 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-mm, linux-smp

On Mon, 23 Mar 1998, Stephen C. Tweedie wrote:
>
> Are there barrier constructs available to do this?  I believe the answer
> to be no, based on the recent thread concerning the use of inline asm
> cpuid instructions as a barrier on Intel machines.  Alternatively, does
> Intel provide any ordering guarantees which may help?

Intel only gives you total ordering across certain instructions (cpuid
being one of them, and the only one that is easily usable under all
circumstances). 

> Finally, I looked quickly at the kernel's spinlock primitives, and they
> also seem unprotected by memory barriers on Intel.  Is this really safe?

Yes. Intel guarantees total ordering around any locked instruction, so the
spinlocks themselves act as the barriers. This is why "unlock" is a slow

	lock ; btrl $0,(mem) 

instead of the much faster

	movl $0,(mem) 

because the latter doesn't imply any ordering, and there are no faster
ways to do it (cpuid is fairly slow, so trying to do a "movl + cpuid" 
doesn't help either). 

The intel ordering is really nasty, because there is no good fast
synchronization. "cpuid" trashes half the register set, and all the other
synchronizing instructions have other even nastier side effects. And there
is nothing like the alpha (and others) "write memory barrier" instruction
that does only a one-way barrier.

(To be fair, the alpha for example has very nice primitives for SMP, but
sometimes the implementation of them is horribly slow. For example, the
"load-and-protect" thing always seems to go to the bus even when the CPU
has exclusive ownership, which makes atomic sequences much more expensive
than they should be. I think DEC fixed this in their later alpha's, but
the point being that even when you have the right concepts you can mess up
with having a bad implementation ;) 

		Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Lazy page reclamation on SMP machines: memory barriers
  1998-03-23 22:49 Lazy page reclamation on SMP machines: memory barriers Stephen C. Tweedie
  1998-03-23 23:20 ` Linus Torvalds
@ 1998-03-23 23:37 ` Linus Torvalds
  1 sibling, 0 replies; 7+ messages in thread
From: Linus Torvalds @ 1998-03-23 23:37 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-mm, linux-smp

On Mon, 23 Mar 1998, Stephen C. Tweedie wrote:
> 
> Are there barrier constructs available to do this?  I believe the answer
> to be no, based on the recent thread concerning the use of inline asm
> cpuid instructions as a barrier on Intel machines.  Alternatively, does
> Intel provide any ordering guarantees which may help?

Just a quick follow-up with more intel-specific information in case people
care. The serializing instructions (intel-speak for "read and write memory
barrier") are:

Privileged (and all of these are too slow to really consider):
 - mov to control register
 - mov to debug register
 - wrmsr, invd, invlpg, winvd, lgdt, lldt, lidt, ltr

Non-privileged:
 - CPUID, IRET, RSM (and only CPUID is really usable for serialization)

In addition, any locked instruction (or xchg, which is implicitly locked) 
will "wait for all previous instructions to complete, and for the store
buffer to drain to memory". That, together with the rule that reads cannot
pass locked instructions, essentially makes all locked instructions
serialized (they _are_ serialized as far as memory ordering goes, but
intel seems to use the term "serialized" for both memory ordering and for
"internal CPU behaviour": in intel-speak a "real" serializing instruction
will apparently also wait for the CPU pipeline to drain). 

The cheapest way (considering register usage etc) to get a serializing
instruction _seems_ to be to use something like

	lock ; add $0,0(%esp)

which will act as a read and write barrier, but won't actually drain the
pipe completely (and won't trash any registers - and the stack is likely
to be dirty and cached, so it won't generate any extra memory traffic
except on a Pentium where the "lock" thing cannot work on the cache).

		Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Lazy page reclamation on SMP machines: memory barriers
  1998-03-23 23:20 ` Linus Torvalds
@ 1998-03-24 22:54   ` Stephen C. Tweedie
  1998-03-24 23:45     ` David S. Miller
                       ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Stephen C. Tweedie @ 1998-03-24 22:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Stephen C. Tweedie, linux-mm, linux-smp

Hi,

On Mon, 23 Mar 1998 15:20:11 -0800 (PST), Linus Torvalds
<torvalds@transmeta.com> said:

> Intel guarantees total ordering around any locked instruction, so the
> spinlocks themselves act as the barriers. 

Fine.  Can we assume that spinlocks and atomic set/clear_bit
instructions have the same semantics on other CPUs?

I'm in London until the weekend, but I hope to have the lazy page
stealing in a fit state to release shortly after getting back thanks to
this.

--Stephen

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Lazy page reclamation on SMP machines: memory barriers
  1998-03-24 22:54   ` Stephen C. Tweedie
@ 1998-03-24 23:45     ` David S. Miller
  1998-03-25  0:11     ` Linus Torvalds
  1998-03-25  9:08     ` Rik van Riel
  2 siblings, 0 replies; 7+ messages in thread
From: David S. Miller @ 1998-03-24 23:45 UTC (permalink / raw)
  To: sct; +Cc: torvalds, linux-mm, linux-smp

   Date: 	Tue, 24 Mar 1998 22:54:18 GMT
   From: "Stephen C. Tweedie" <sct@dcs.ed.ac.uk>

   > Intel guarantees total ordering around any locked instruction, so
   > the spinlocks themselves act as the barriers. 

   Fine.  Can we assume that spinlocks and atomic set/clear_bit
   instructions have the same semantics on other CPUs?

Yes, you certainly can for spinlocks.

Later,
David S. Miller
davem@dm.cobaltmicro.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Lazy page reclamation on SMP machines: memory barriers
  1998-03-24 22:54   ` Stephen C. Tweedie
  1998-03-24 23:45     ` David S. Miller
@ 1998-03-25  0:11     ` Linus Torvalds
  1998-03-25  9:08     ` Rik van Riel
  2 siblings, 0 replies; 7+ messages in thread
From: Linus Torvalds @ 1998-03-25  0:11 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-mm, linux-smp



On Tue, 24 Mar 1998, Stephen C. Tweedie wrote:
> > Intel guarantees total ordering around any locked instruction, so the
> > spinlocks themselves act as the barriers. 
> 
> Fine.  Can we assume that spinlocks and atomic set/clear_bit
> instructions have the same semantics on other CPUs?

We can certainly guarantee that a spinlock has the necessary locking
semantics - anything else would make spinlocks useless. 

The other atomic instructions I'd be inclined to claim to be weakly
ordered.

		Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Lazy page reclamation on SMP machines: memory barriers
  1998-03-24 22:54   ` Stephen C. Tweedie
  1998-03-24 23:45     ` David S. Miller
  1998-03-25  0:11     ` Linus Torvalds
@ 1998-03-25  9:08     ` Rik van Riel
  2 siblings, 0 replies; 7+ messages in thread
From: Rik van Riel @ 1998-03-25  9:08 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Linus Torvalds, linux-mm, linux-smp

On Tue, 24 Mar 1998, Stephen C. Tweedie wrote:

> I'm in London until the weekend, but I hope to have the lazy page
> stealing in a fit state to release shortly after getting back thanks to
> this.

Then that would be the end of memory fragmentation. Since
marking something as stealable has no real performance penalty,
we could just mark so much memory stealable that we've got
3 128k area's stealable...

Rik.
+-------------------------------------------+--------------------------+
| Linux: - LinuxHQ MM-patches page          | Scouting       webmaster |
|        - kswapd ask-him & complain-to guy | Vries    cubscout leader |
|     http://www.fys.ruu.nl/~riel/          | <H.H.vanRiel@fys.ruu.nl> |
+-------------------------------------------+--------------------------+

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~1998-03-25 13:39 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-03-23 22:49 Lazy page reclamation on SMP machines: memory barriers Stephen C. Tweedie
1998-03-23 23:20 ` Linus Torvalds
1998-03-24 22:54   ` Stephen C. Tweedie
1998-03-24 23:45     ` David S. Miller
1998-03-25  0:11     ` Linus Torvalds
1998-03-25  9:08     ` Rik van Riel
1998-03-23 23:37 ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox