x86 ptep_get_and_clear question

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* x86 ptep_get_and_clear question
@ 2001-02-15  1:50 Kanoj Sarcar
  2001-02-15  2:13 ` Ben LaHaise
  0 siblings, 1 reply; 21+ messages in thread
From: Kanoj Sarcar @ 2001-02-15  1:50 UTC (permalink / raw)
  To: linux-mm; +Cc: bcrl, mingo, alan

I would like to understand how ptep_get_and_clear() works for x86 on
2.4.1.

I am assuming on x86, we do not implement software dirty bit, as is
implemented in the mips processors. Rather, the kernel relies on the
x86 hardware to update the dirty bit automatically (from looking at 
the implementation of pte_mkwrite()).

Say I have processors 1 and 2. Say both processors have pulled in the 
mapping into their tlbs.

processor 1 is doing change_pte_range(), as an exmaple. It does the
ptep_get_and_clear(pte), which atomically reads the hardware managed
dirty bit, then clears the pte in memory. Now say processor 2 dirties
the page, and I am not sure what will happen. One possibility is that
processor 2 will see in its tlb that the page hasn't been dirtied on 
that processor yet, so then it will go look into the in-memory copy,
see that the pte is not marked dirty, and hence will mark the pte 
dirty. Thus, this dirty bit update is lost. Hence, ptep_get_and_clear()
isn't doing what I assume it was designed to do (from the comments in
mm/mprotect.c) (There are alternative fixes possible)

The other possibility of course is that somehow processor 2 will interlock
out (via hardware), processor 1 will do the flush_tlb_range() out of 
change_protection(), and then processor 1 will continue. If this is 
the assumption, I would like to know if this is in some Intel x86 specs.

Am I missing something?

I am assuming Ben Lahaise wrote this code. I remember having an earlier 
conversation with Alan about this too (we did not know which scenario 
could happen), who suggested I ask Ingo. I do not remember what happened
after that.

Thanks.

Kanoj
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: x86 ptep_get_and_clear question
  2001-02-15  1:50 x86 ptep_get_and_clear question Kanoj Sarcar
@ 2001-02-15  2:13 ` Ben LaHaise
  2001-02-15  2:37   ` Kanoj Sarcar
  2001-02-15 10:55   ` Jamie Lokier
  0 siblings, 2 replies; 21+ messages in thread
From: Ben LaHaise @ 2001-02-15  2:13 UTC (permalink / raw)
  To: Kanoj Sarcar; +Cc: linux-mm, mingo, alan

On Wed, 14 Feb 2001, Kanoj Sarcar wrote:

> I would like to understand how ptep_get_and_clear() works for x86 on
> 2.4.1.
>
> I am assuming on x86, we do not implement software dirty bit, as is
> implemented in the mips processors. Rather, the kernel relies on the
> x86 hardware to update the dirty bit automatically (from looking at
> the implementation of pte_mkwrite()).

However, we do set the dirty bit early.

> The other possibility of course is that somehow processor 2 will interlock
> out (via hardware), processor 1 will do the flush_tlb_range() out of
> change_protection(), and then processor 1 will continue. If this is
> the assumption, I would like to know if this is in some Intel x86 specs.
>
> Am I missing something?

If processor 2 attempts to access the pte while it is cleared, it will
take a page fault.  This page fault will properly serialize by means of
the page table spinlock.

> I am assuming Ben Lahaise wrote this code. I remember having an earlier
> conversation with Alan about this too (we did not know which scenario
> could happen), who suggested I ask Ingo. I do not remember what happened
> after that.

x86 hardware goes back to the page tables whenever there is an attempt to
change the access it has to the pte.  Ie, if it originally accessed the
page table for reading, it will go back to the page tables on write.  I
believe most hardware that performs accessed/dirty bit updates in hardware
behaves the same way.

		-ben

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: x86 ptep_get_and_clear question
  2001-02-15  2:13 ` Ben LaHaise
@ 2001-02-15  2:37   ` Kanoj Sarcar
  2001-02-15 10:55   ` Jamie Lokier
  1 sibling, 0 replies; 21+ messages in thread
From: Kanoj Sarcar @ 2001-02-15  2:37 UTC (permalink / raw)
  To: Ben LaHaise; +Cc: linux-mm, mingo, alan

> 
> On Wed, 14 Feb 2001, Kanoj Sarcar wrote:
> 
> > I would like to understand how ptep_get_and_clear() works for x86 on
> > 2.4.1.
> >
> > I am assuming on x86, we do not implement software dirty bit, as is
> > implemented in the mips processors. Rather, the kernel relies on the
> > x86 hardware to update the dirty bit automatically (from looking at
> > the implementation of pte_mkwrite()).
> 
> However, we do set the dirty bit early.
> 

In some cases we do. But if the first access to a RW map_shared file 
page is for read, for example, we will not update dirty bit early. No?


> > The other possibility of course is that somehow processor 2 will interlock
> > out (via hardware), processor 1 will do the flush_tlb_range() out of
> > change_protection(), and then processor 1 will continue. If this is
> > the assumption, I would like to know if this is in some Intel x86 specs.
> >
> > Am I missing something?
> 
> If processor 2 attempts to access the pte while it is cleared, it will
> take a page fault.  This page fault will properly serialize by means of
> the page table spinlock.
> 

You edited out parts of my original email. In that, I mentioned the 
scenario that processor 2 already has the old pte contents (which gives
read/write permission, but does not have the pte dirty) in its own tlb.
Why would it take a page fault in this case?

> > I am assuming Ben Lahaise wrote this code. I remember having an earlier
> > conversation with Alan about this too (we did not know which scenario
> > could happen), who suggested I ask Ingo. I do not remember what happened
> > after that.
> 
> x86 hardware goes back to the page tables whenever there is an attempt to
> change the access it has to the pte.  Ie, if it originally accessed the
> page table for reading, it will go back to the page tables on write.  I
> believe most hardware that performs accessed/dirty bit updates in hardware
> behaves the same way.
>

Okay, what do you think x86 will do on processor 2 on a write if it goes
to the incore pte and sees that the dirty bit is cleared? Do you have any
specs to support your statement "x86 hardware goes back to the page tables
whenever there is an attempt to change the access it has to the pte"?

Kanoj
 
> 		-ben
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux.eu.org/Linux-MM/
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: x86 ptep_get_and_clear question
  2001-02-15  2:13 ` Ben LaHaise
  2001-02-15  2:37   ` Kanoj Sarcar
@ 2001-02-15 10:55   ` Jamie Lokier
  2001-02-15 16:06     ` Ben LaHaise
  1 sibling, 1 reply; 21+ messages in thread
From: Jamie Lokier @ 2001-02-15 10:55 UTC (permalink / raw)
  To: Ben LaHaise; +Cc: Kanoj Sarcar, linux-mm, mingo, alan

Ben LaHaise wrote:
> x86 hardware goes back to the page tables whenever there is an attempt to
> change the access it has to the pte.  Ie, if it originally accessed the
> page table for reading, it will go back to the page tables on write.  I
> believe most hardware that performs accessed/dirty bit updates in hardware
> behaves the same way.

I think the scenario in question is this:

Processor 2 has recently done some writes, so the dirty bit is set in
processor 2's TLB.

Processor 1 clears the dirty bit atomically.

Processor 2 does some more writes, and does not check the page table
because the page is already dirty in its TLB.

Result: The later writes on processor 2 do not mark the page dirty.

-- Jamie
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: x86 ptep_get_and_clear question
  2001-02-15 10:55   ` Jamie Lokier
@ 2001-02-15 16:06     ` Ben LaHaise
  2001-02-15 16:35       ` Jamie Lokier
  0 siblings, 1 reply; 21+ messages in thread
From: Ben LaHaise @ 2001-02-15 16:06 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Kanoj Sarcar, linux-mm, mingo, alan

On Thu, 15 Feb 2001, Jamie Lokier wrote:

> Ben LaHaise wrote:
> > x86 hardware goes back to the page tables whenever there is an attempt to
> > change the access it has to the pte.  Ie, if it originally accessed the
> > page table for reading, it will go back to the page tables on write.  I
> > believe most hardware that performs accessed/dirty bit updates in hardware
> > behaves the same way.
>
> I think the scenario in question is this:
>
> Processor 2 has recently done some writes, so the dirty bit is set in
> processor 2's TLB.
>
> Processor 1 clears the dirty bit atomically.
>
> Processor 2 does some more writes, and does not check the page table
> because the page is already dirty in its TLB.
>
> Result: The later writes on processor 2 do not mark the page dirty.

Yeah, but the tlb is flushed in those cases (look for flush_tlb_page in
try_to_swap_out).

		-ben

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: x86 ptep_get_and_clear question
  2001-02-15 16:06     ` Ben LaHaise
@ 2001-02-15 16:35       ` Jamie Lokier
  2001-02-15 17:23         ` Kanoj Sarcar
  0 siblings, 1 reply; 21+ messages in thread
From: Jamie Lokier @ 2001-02-15 16:35 UTC (permalink / raw)
  To: Ben LaHaise; +Cc: Kanoj Sarcar, linux-mm, mingo, alan

Ben LaHaise wrote:
> > Processor 2 has recently done some writes, so the dirty bit is set in
> > processor 2's TLB.
> >
> > Processor 1 clears the dirty bit atomically.
> >
> > Processor 2 does some more writes, and does not check the page table
> > because the page is already dirty in its TLB.
> >
> > Result: The later writes on processor 2 do not mark the page dirty.
> 
> Yeah, but the tlb is flushed in those cases (look for flush_tlb_page in
> try_to_swap_out).

As long as processor 1 waits for the flush on processor 2 to complete
before marking the struct page dirty, that looks fine to me.

-- Jamie
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: x86 ptep_get_and_clear question
  2001-02-15 16:35       ` Jamie Lokier
@ 2001-02-15 17:23         ` Kanoj Sarcar
  2001-02-15 17:27           ` Ben LaHaise
  2001-02-15 17:47           ` Jamie Lokier
  0 siblings, 2 replies; 21+ messages in thread
From: Kanoj Sarcar @ 2001-02-15 17:23 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Ben LaHaise, linux-mm, mingo, alan

> 
> Ben LaHaise wrote:
> > > Processor 2 has recently done some writes, so the dirty bit is set in
> > > processor 2's TLB.
> > >
> > > Processor 1 clears the dirty bit atomically.
> > >
> > > Processor 2 does some more writes, and does not check the page table
> > > because the page is already dirty in its TLB.
> > >
> > > Result: The later writes on processor 2 do not mark the page dirty.
> > 
> > Yeah, but the tlb is flushed in those cases (look for flush_tlb_page in
> > try_to_swap_out).
> 
> As long as processor 1 waits for the flush on processor 2 to complete
> before marking the struct page dirty, that looks fine to me.
> 
> -- Jamie
> 

Since this seems to be so hard to understand, lets keep things simple and
continue with my previous example, instead of pulling new examples.

Look in mm/mprotect.c. Look at the call sequence change_protection() -> ...
change_pte_range(). Specifically at the sequence:

	entry = ptep_get_and_clear(pte);
	set_pte(pte, pte_modify(entry, newprot));

Go ahead and pull your x86 specs, and prove to me that between the 
ptep_get_and_clear(), which zeroes out the pte (specifically, when the 
dirty bit is not set), processor 2 can not come in and set the dirty 
bit on the in-memory pte. Which immediately gets overwritten by the 
set_pte(). For an example of how this can happen, look at my previous 
postings.

Jamie's example misses the point in the sense that at the very beginning,
when he says "Processor 2 has recently done some writes", processor 2 has
made sure that the dirty bit is set in the in-memory pte. So, although 
processor 1 clears the entire pte, the set_pte() will set the dirty bit,
and no information is lost. Even if processor 2 tries writing between
the ptep_get_and_clear() and set_pte(). Whether Jamie was trying to 
illustrate a different problem, I am not sure. All I am trying to say
is that the "dirty bit lost on smp x86" still exists, ptep_get_and_clear
does not do anything to fix it.

Kanoj
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: x86 ptep_get_and_clear question
  2001-02-15 17:23         ` Kanoj Sarcar
@ 2001-02-15 17:27           ` Ben LaHaise
  2001-02-15 17:38             ` Kanoj Sarcar
  2001-02-15 17:47           ` Jamie Lokier
  1 sibling, 1 reply; 21+ messages in thread
From: Ben LaHaise @ 2001-02-15 17:27 UTC (permalink / raw)
  To: Kanoj Sarcar; +Cc: Jamie Lokier, linux-mm, mingo, alan

On Thu, 15 Feb 2001, Kanoj Sarcar wrote:

> continue with my previous example, instead of pulling new examples.
>
> Look in mm/mprotect.c. Look at the call sequence change_protection() -> ...
> change_pte_range(). Specifically at the sequence:
>
> 	entry = ptep_get_and_clear(pte);
> 	set_pte(pte, pte_modify(entry, newprot));
>
> Go ahead and pull your x86 specs, and prove to me that between the
> ptep_get_and_clear(), which zeroes out the pte (specifically, when the
> dirty bit is not set), processor 2 can not come in and set the dirty
> bit on the in-memory pte. Which immediately gets overwritten by the
> set_pte(). For an example of how this can happen, look at my previous
> postings.

Look at the specs.  The processor uses read-modify-write cycles to update
the accessed and dirty bits.  If the in memory pte is either not present
or writable, the processor will take a page fault.

> Jamie's example misses the point in the sense that at the very beginning,
> when he says "Processor 2 has recently done some writes", processor 2 has
> made sure that the dirty bit is set in the in-memory pte. So, although
> processor 1 clears the entire pte, the set_pte() will set the dirty bit,
> and no information is lost. Even if processor 2 tries writing between
> the ptep_get_and_clear() and set_pte(). Whether Jamie was trying to
> illustrate a different problem, I am not sure. All I am trying to say
> is that the "dirty bit lost on smp x86" still exists, ptep_get_and_clear
> does not do anything to fix it.

Yes it does.  Write a test program like I did.  The processor does take a
page fault.

		-ben

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: x86 ptep_get_and_clear question
  2001-02-15 17:27           ` Ben LaHaise
@ 2001-02-15 17:38             ` Kanoj Sarcar
  2001-02-15 17:46               ` Ben LaHaise
  0 siblings, 1 reply; 21+ messages in thread
From: Kanoj Sarcar @ 2001-02-15 17:38 UTC (permalink / raw)
  To: Ben LaHaise; +Cc: Jamie Lokier, linux-mm, mingo, alan

> 
> On Thu, 15 Feb 2001, Kanoj Sarcar wrote:
> 
> > continue with my previous example, instead of pulling new examples.
> >
> > Look in mm/mprotect.c. Look at the call sequence change_protection() -> ...
> > change_pte_range(). Specifically at the sequence:
> >
> > 	entry = ptep_get_and_clear(pte);
> > 	set_pte(pte, pte_modify(entry, newprot));
> >
> > Go ahead and pull your x86 specs, and prove to me that between the
> > ptep_get_and_clear(), which zeroes out the pte (specifically, when the
> > dirty bit is not set), processor 2 can not come in and set the dirty
> > bit on the in-memory pte. Which immediately gets overwritten by the
> > set_pte(). For an example of how this can happen, look at my previous
> > postings.
> 
> Look at the specs.  The processor uses read-modify-write cycles to update
> the accessed and dirty bits.  If the in memory pte is either not present
> or writable, the processor will take a page fault.

What specs are you looking at? Please be specific with revision/volume/
section/page number if you are quoting from hardcopy. If you are looking 
at online manuals, please provide a pointer. I am specifically interested
in your claim "If the in memory pte is either not present or writable,
the processor will take a page fault".

This was what I asked for in the first place. We could have saved so much
email exchange if you would just have posted this information.

> 
> > Jamie's example misses the point in the sense that at the very beginning,
> > when he says "Processor 2 has recently done some writes", processor 2 has
> > made sure that the dirty bit is set in the in-memory pte. So, although
> > processor 1 clears the entire pte, the set_pte() will set the dirty bit,
> > and no information is lost. Even if processor 2 tries writing between
> > the ptep_get_and_clear() and set_pte(). Whether Jamie was trying to
> > illustrate a different problem, I am not sure. All I am trying to say
> > is that the "dirty bit lost on smp x86" still exists, ptep_get_and_clear
> > does not do anything to fix it.
> 
> Yes it does.  Write a test program like I did.  The processor does take a
> page fault.

Do you have the program saved (or can explain how it worked)? I would very 
much like to understand exactly how you were tickling the race condition 
by a user program (without hacking the kernel) deterministically.

Another thing: "The processor does take a page fault" might mean that 
current Intel processors do it, but future ones might not. Unless it is
part of the x86 specs. Thats why I am so interested in seeing it.

Kanoj

> 
> 		-ben
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: x86 ptep_get_and_clear question
  2001-02-15 17:38             ` Kanoj Sarcar
@ 2001-02-15 17:46               ` Ben LaHaise
  0 siblings, 0 replies; 21+ messages in thread
From: Ben LaHaise @ 2001-02-15 17:46 UTC (permalink / raw)
  To: Kanoj Sarcar; +Cc: Jamie Lokier, linux-mm, mingo, alan

On Thu, 15 Feb 2001, Kanoj Sarcar wrote:

> >
> > On Thu, 15 Feb 2001, Kanoj Sarcar wrote:
> >
> > > continue with my previous example, instead of pulling new examples.
> > >
> > > Look in mm/mprotect.c. Look at the call sequence change_protection() -> ...
> > > change_pte_range(). Specifically at the sequence:
> > >
> > > 	entry = ptep_get_and_clear(pte);
> > > 	set_pte(pte, pte_modify(entry, newprot));
> > >
> > > Go ahead and pull your x86 specs, and prove to me that between the
> > > ptep_get_and_clear(), which zeroes out the pte (specifically, when the
> > > dirty bit is not set), processor 2 can not come in and set the dirty
> > > bit on the in-memory pte. Which immediately gets overwritten by the
> > > set_pte(). For an example of how this can happen, look at my previous
> > > postings.
> >
> > Look at the specs.  The processor uses read-modify-write cycles to update
> > the accessed and dirty bits.  If the in memory pte is either not present
> > or writable, the processor will take a page fault.
>
> What specs are you looking at? Please be specific with revision/volume/
> section/page number if you are quoting from hardcopy. If you are looking
> at online manuals, please provide a pointer. I am specifically interested
> in your claim "If the in memory pte is either not present or writable,
> the processor will take a page fault".

> This was what I asked for in the first place. We could have saved so much
> email exchange if you would just have posted this information.

I'm not quoting from any particular specs, but from memory.  Iirc, the
manuals claim that using atomic operations on ptes will produce the
correct results.  This is the only model of operation that can be
consistent with that claim.

> > > Jamie's example misses the point in the sense that at the very beginning,
> > > when he says "Processor 2 has recently done some writes", processor 2 has
> > > made sure that the dirty bit is set in the in-memory pte. So, although
> > > processor 1 clears the entire pte, the set_pte() will set the dirty bit,
> > > and no information is lost. Even if processor 2 tries writing between
> > > the ptep_get_and_clear() and set_pte(). Whether Jamie was trying to
> > > illustrate a different problem, I am not sure. All I am trying to say
> > > is that the "dirty bit lost on smp x86" still exists, ptep_get_and_clear
> > > does not do anything to fix it.
> >
> > Yes it does.  Write a test program like I did.  The processor does take a
> > page fault.
>
> Do you have the program saved (or can explain how it worked)? I would very
> much like to understand exactly how you were tickling the race condition
> by a user program (without hacking the kernel) deterministically.

It was a loadable kernel module that primed the TLB with various ptes and
then monitored the resulting page faults.  I can't find the source right
now, but it's about 20 lines to reproduce.

		-ben

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: x86 ptep_get_and_clear question
  2001-02-15 17:23         ` Kanoj Sarcar
  2001-02-15 17:27           ` Ben LaHaise
@ 2001-02-15 17:47           ` Jamie Lokier
  2001-02-15 18:05             ` Kanoj Sarcar
  2001-02-15 18:23             ` Kanoj Sarcar
  1 sibling, 2 replies; 21+ messages in thread
From: Jamie Lokier @ 2001-02-15 17:47 UTC (permalink / raw)
  To: Kanoj Sarcar; +Cc: Ben LaHaise, linux-mm, mingo, alan, linux-kernel

[Added Linus and linux-kernel as I think it's of general interest]

Kanoj Sarcar wrote:
> Whether Jamie was trying to illustrate a different problem, I am not
> sure.

Yes, I was talking about pte_test_and_clear_dirty in the earlier post.

> Look in mm/mprotect.c. Look at the call sequence change_protection() -> ...
> change_pte_range(). Specifically at the sequence:
> 
> 	entry = ptep_get_and_clear(pte);
> 	set_pte(pte, pte_modify(entry, newprot));
>
> Go ahead and pull your x86 specs, and prove to me that between the 
> ptep_get_and_clear(), which zeroes out the pte (specifically, when the 
> dirty bit is not set), processor 2 can not come in and set the dirty 
> bit on the in-memory pte. Which immediately gets overwritten by the 
> set_pte(). For an example of how this can happen, look at my previous 
> postings.

Let's see.  We'll assume processor 2 does a write between the
ptep_get_and_clear and the set_pte, which are done on processor 1.

Now, ptep_get_and_clear is atomic, so we can talk about "before" and
"after".  Before it, either processor 2 has a TLB entry with the dirty
bit set, or it does not (it has either a clean TLB entry or no TLB entry
at all).

After ptep_get_and_clear, processor 2 does a write.  If it already has a
dirty TLB entry, then `entry' will also be dirty so the dirty bit is
preserved.  If processor 2 does not have a dirty TLB entry, then it will
look up the pte.  Processor 2 finds the pte is clear, so raises a page fault.
Spinlocks etc. sort everything out in the page fault.

Here's the important part: when processor 2 wants to set the pte's dirty
bit, it *rereads* the pte and *rechecks* the permission bits again.
Even though it has a non-dirty TLB entry for that pte.

That is how I read Ben LaHaise's description, and his test program tests
exactly this.

If the processor worked by atomically setting the dirty bit in the pte
without rechecking the permissions when it reads that pte bit, then this
scheme would fail and you'd be right about the lost dirty bits.  I would
have thought it would be simpler to implement a CPU this way, but
clearly it is not as efficient for SMP OS design so perhaps CPU
designers thought about this.

The only remaining question is: is the observed behaviour defined for
x86 CPUs in general, or are we depending on the results of testing a few
particular CPUs?

-- Jamie
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: x86 ptep_get_and_clear question
  2001-02-15 17:47           ` Jamie Lokier
@ 2001-02-15 18:05             ` Kanoj Sarcar
  2001-02-15 18:23             ` Kanoj Sarcar
  1 sibling, 0 replies; 21+ messages in thread
From: Kanoj Sarcar @ 2001-02-15 18:05 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Ben LaHaise, linux-mm, mingo, alan, linux-kernel

> 
> [Added Linus and linux-kernel as I think it's of general interest]
> 
> Kanoj Sarcar wrote:
> > Whether Jamie was trying to illustrate a different problem, I am not
> > sure.
> 
> Yes, I was talking about pte_test_and_clear_dirty in the earlier post.
> 
> > Look in mm/mprotect.c. Look at the call sequence change_protection() -> ...
> > change_pte_range(). Specifically at the sequence:
> > 
> > 	entry = ptep_get_and_clear(pte);
> > 	set_pte(pte, pte_modify(entry, newprot));
> >
> > Go ahead and pull your x86 specs, and prove to me that between the 
> > ptep_get_and_clear(), which zeroes out the pte (specifically, when the 
> > dirty bit is not set), processor 2 can not come in and set the dirty 
> > bit on the in-memory pte. Which immediately gets overwritten by the 
> > set_pte(). For an example of how this can happen, look at my previous 
> > postings.
> 

Now you are talking my language!

> Let's see.  We'll assume processor 2 does a write between the
> ptep_get_and_clear and the set_pte, which are done on processor 1.
> 
> Now, ptep_get_and_clear is atomic, so we can talk about "before" and
> "after".  Before it, either processor 2 has a TLB entry with the dirty
> bit set, or it does not (it has either a clean TLB entry or no TLB entry
> at all).
> 
> After ptep_get_and_clear, processor 2 does a write.  If it already has a
> dirty TLB entry, then `entry' will also be dirty so the dirty bit is
> preserved.  If processor 2 does not have a dirty TLB entry, then it will
> look up the pte.  Processor 2 finds the pte is clear, so raises a page fault.
> Spinlocks etc. sort everything out in the page fault.
> 
> Here's the important part: when processor 2 wants to set the pte's dirty
> bit, it *rereads* the pte and *rechecks* the permission bits again.
> Even though it has a non-dirty TLB entry for that pte.
> 
> That is how I read Ben LaHaise's description, and his test program tests
> exactly this.
> 

Okay, I asked Ben, he couldn't point me at specs and shut me up.

> If the processor worked by atomically setting the dirty bit in the pte
> without rechecking the permissions when it reads that pte bit, then this
> scheme would fail and you'd be right about the lost dirty bits.  I would

Exactly. This is why I did not implement this scheme earlier when Alan
and I talked about this scenario, almost a couple of years back.

> have thought it would be simpler to implement a CPU this way, but
> clearly it is not as efficient for SMP OS design so perhaps CPU
> designers thought about this.
> 
> The only remaining question is: is the observed behaviour defined for
> x86 CPUs in general, or are we depending on the results of testing a few
> particular CPUs?

Exactly!

So my claim still stands: ptep_get_and_clear() doesn't do what it claims
to do. I would be more than happy if someone can give me logic to break
this claim ... which would mean one longstanding data integrity problem
on Linux has been fixed satisfactorily.

Kanoj

> 
> -- Jamie
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: x86 ptep_get_and_clear question
  2001-02-15 17:47           ` Jamie Lokier
  2001-02-15 18:05             ` Kanoj Sarcar
@ 2001-02-15 18:23             ` Kanoj Sarcar
  2001-02-15 18:42               ` Jamie Lokier
  2001-02-15 18:51               ` Manfred Spraul
  1 sibling, 2 replies; 21+ messages in thread
From: Kanoj Sarcar @ 2001-02-15 18:23 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Ben LaHaise, linux-mm, mingo, alan, linux-kernel

> 
> [Added Linus and linux-kernel as I think it's of general interest]
> 
> Kanoj Sarcar wrote:
> > Whether Jamie was trying to illustrate a different problem, I am not
> > sure.
> 
> Yes, I was talking about pte_test_and_clear_dirty in the earlier post.
> 
> > Look in mm/mprotect.c. Look at the call sequence change_protection() -> ...
> > change_pte_range(). Specifically at the sequence:
> > 
> > 	entry = ptep_get_and_clear(pte);
> > 	set_pte(pte, pte_modify(entry, newprot));
> >
> > Go ahead and pull your x86 specs, and prove to me that between the 
> > ptep_get_and_clear(), which zeroes out the pte (specifically, when the 
> > dirty bit is not set), processor 2 can not come in and set the dirty 
> > bit on the in-memory pte. Which immediately gets overwritten by the 
> > set_pte(). For an example of how this can happen, look at my previous 
> > postings.
> 
> Let's see.  We'll assume processor 2 does a write between the
> ptep_get_and_clear and the set_pte, which are done on processor 1.
> 
> Now, ptep_get_and_clear is atomic, so we can talk about "before" and
> "after".  Before it, either processor 2 has a TLB entry with the dirty
> bit set, or it does not (it has either a clean TLB entry or no TLB entry
> at all).
> 
> After ptep_get_and_clear, processor 2 does a write.  If it already has a
> dirty TLB entry, then `entry' will also be dirty so the dirty bit is
> preserved.  If processor 2 does not have a dirty TLB entry, then it will
> look up the pte.  Processor 2 finds the pte is clear, so raises a page fault.
> Spinlocks etc. sort everything out in the page fault.
> 
> Here's the important part: when processor 2 wants to set the pte's dirty
> bit, it *rereads* the pte and *rechecks* the permission bits again.
> Even though it has a non-dirty TLB entry for that pte.
> 
> That is how I read Ben LaHaise's description, and his test program tests
> exactly this.

Okay, I will quote from Intel Architecture Software Developer's Manual
Volume 3: System Programming Guide (1997 print), section 3.7, page 3-27:

"Bus cycles to the page directory and page tables in memory are performed
only when the TLBs do not contain the translation information for a 
requested page."

And on the same page:

"Whenever a page directory or page table entry is changed (including when 
the present flag is set to zero), the operating system must immediately
invalidate the corresponding entry in the TLB so that it can be updated
the next time the entry is referenced."

So, it looks highly unlikely to me that the basic assumption about how
x86 works wrt tlb/ptes in the ptep_get_and_clear() solution is correct.

Kanoj

> 
> If the processor worked by atomically setting the dirty bit in the pte
> without rechecking the permissions when it reads that pte bit, then this
> scheme would fail and you'd be right about the lost dirty bits.  I would
> have thought it would be simpler to implement a CPU this way, but
> clearly it is not as efficient for SMP OS design so perhaps CPU
> designers thought about this.
> 
> The only remaining question is: is the observed behaviour defined for
> x86 CPUs in general, or are we depending on the results of testing a few
> particular CPUs?
> 
> -- Jamie
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: x86 ptep_get_and_clear question
  2001-02-15 18:23             ` Kanoj Sarcar
@ 2001-02-15 18:42               ` Jamie Lokier
  2001-02-15 18:57                 ` Kanoj Sarcar
  2001-02-15 18:51               ` Manfred Spraul
  1 sibling, 1 reply; 21+ messages in thread
From: Jamie Lokier @ 2001-02-15 18:42 UTC (permalink / raw)
  To: Kanoj Sarcar; +Cc: Ben LaHaise, linux-mm, mingo, alan, linux-kernel

Kanoj Sarcar wrote:
> > Here's the important part: when processor 2 wants to set the pte's dirty
> > bit, it *rereads* the pte and *rechecks* the permission bits again.
> > Even though it has a non-dirty TLB entry for that pte.
> > 
> > That is how I read Ben LaHaise's description, and his test program tests
> > exactly this.
> 
> Okay, I will quote from Intel Architecture Software Developer's Manual
> Volume 3: System Programming Guide (1997 print), section 3.7, page 3-27:
> 
> "Bus cycles to the page directory and page tables in memory are performed
> only when the TLBs do not contain the translation information for a 
> requested page."
> 
> And on the same page:
> 
> "Whenever a page directory or page table entry is changed (including when 
> the present flag is set to zero), the operating system must immediately
> invalidate the corresponding entry in the TLB so that it can be updated
> the next time the entry is referenced."
> 
> So, it looks highly unlikely to me that the basic assumption about how
> x86 works wrt tlb/ptes in the ptep_get_and_clear() solution is correct.

To me those quotes don't address the question we're asking.  We know
that bus cycles _do_ occur when a TLB entry is switched from clean to
dirty, and furthermore they are locked cycles.  (Don't ask me how I know
this though).

Does that mean, in jargon, the TLB does not "contain
the translation information" for a write?

The second quote: sure, if we want the TLB updated we have to flush it.
And eventually in mm/mprotect.c we do.  But what before, it keeps on
using the old TLB entry?  That's ok.  If the entry was already dirty
then we don't mind if processor 2 continues with the old TLB entry for a
while, until we do the big TLB range flush.

In other words I don't think those two quotes address our question at
all.

What worries more is that this is quite a subtle requirement, and the
code in mm/mprotect.c is not specific to one architecture.  Do all SMP
CPUs support by Linux do the same thing on converting TLB entries from
clean to dirty, or do they have a subtle, easily missed data integrity
problem?

-- Jamie
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: x86 ptep_get_and_clear question
  2001-02-15 18:23             ` Kanoj Sarcar
  2001-02-15 18:42               ` Jamie Lokier
@ 2001-02-15 18:51               ` Manfred Spraul
  2001-02-15 19:05                 ` Kanoj Sarcar
  2001-02-15 19:07                 ` Jamie Lokier
  1 sibling, 2 replies; 21+ messages in thread
From: Manfred Spraul @ 2001-02-15 18:51 UTC (permalink / raw)
  To: Kanoj Sarcar
  Cc: Jamie Lokier, Ben LaHaise, linux-mm, mingo, alan, linux-kernel

Kanoj Sarcar wrote:
> 
> Okay, I will quote from Intel Architecture Software Developer's Manual
> Volume 3: System Programming Guide (1997 print), section 3.7, page 3-27:
> 
> "Bus cycles to the page directory and page tables in memory are performed
> only when the TLBs do not contain the translation information for a
> requested page."
> 
> And on the same page:
> 
> "Whenever a page directory or page table entry is changed (including when
> the present flag is set to zero), the operating system must immediately
> invalidate the corresponding entry in the TLB so that it can be updated
> the next time the entry is referenced."
>

But there is another paragraph that mentions that an OS may use lazy tlb
shootdowns.
[search for shootdown]

You check the far too obvious chapters, remember that Intel wrote the
documentation ;-)
I searched for 'dirty' though Vol 3 and found

Chapter 7.1.2.1 Automatic locking.

.. the processor uses locked cycles to set the accessed and dirty flag
in the page-directory and page-table entries.

But that obviously doesn't answer your question.

Is the sequence
<< lock;
read pte
pte |= dirty
write pte
>> end lock;
or
<< lock;
read pte
if (!present(pte))
	do_page_fault();
pte |= dirty
write pte.
>> end lock;

--
	Manfred
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: x86 ptep_get_and_clear question
  2001-02-15 18:42               ` Jamie Lokier
@ 2001-02-15 18:57                 ` Kanoj Sarcar
  2001-02-15 19:06                   ` Ben LaHaise
  0 siblings, 1 reply; 21+ messages in thread
From: Kanoj Sarcar @ 2001-02-15 18:57 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Ben LaHaise, linux-mm, mingo, alan, linux-kernel

> 
> Kanoj Sarcar wrote:
> > > Here's the important part: when processor 2 wants to set the pte's dirty
> > > bit, it *rereads* the pte and *rechecks* the permission bits again.
> > > Even though it has a non-dirty TLB entry for that pte.
> > > 
> > > That is how I read Ben LaHaise's description, and his test program tests
> > > exactly this.
> > 
> > Okay, I will quote from Intel Architecture Software Developer's Manual
> > Volume 3: System Programming Guide (1997 print), section 3.7, page 3-27:
> > 
> > "Bus cycles to the page directory and page tables in memory are performed
> > only when the TLBs do not contain the translation information for a 
> > requested page."
> > 
> > And on the same page:
> > 
> > "Whenever a page directory or page table entry is changed (including when 
> > the present flag is set to zero), the operating system must immediately
> > invalidate the corresponding entry in the TLB so that it can be updated
> > the next time the entry is referenced."
> > 
> > So, it looks highly unlikely to me that the basic assumption about how
> > x86 works wrt tlb/ptes in the ptep_get_and_clear() solution is correct.
> 
> To me those quotes don't address the question we're asking.  We know
> that bus cycles _do_ occur when a TLB entry is switched from clean to
> dirty, and furthermore they are locked cycles.  (Don't ask me how I know
> this though).
> 
> Does that mean, in jargon, the TLB does not "contain
> the translation information" for a write?
> 
> The second quote: sure, if we want the TLB updated we have to flush it.
> And eventually in mm/mprotect.c we do.  But what before, it keeps on
> using the old TLB entry?  That's ok.  If the entry was already dirty
> then we don't mind if processor 2 continues with the old TLB entry for a
> while, until we do the big TLB range flush.
> 
> In other words I don't think those two quotes address our question at
> all.

Agreed. But these are the only relevant quotes I could come up with. And
to me, these quotes make the ptep_get_and_clear() assumption look risky
at best ... even though they do not give clear answers either way.

> 
> What worries more is that this is quite a subtle requirement, and the
> code in mm/mprotect.c is not specific to one architecture.  Do all SMP
> CPUs support by Linux do the same thing on converting TLB entries from
> clean to dirty, or do they have a subtle, easily missed data integrity
> problem?

No. All architectures do not have this problem. For example, if the
Linux "dirty" (not the pte dirty) bit is managed by software, a fault
will actually be taken when processor 2 tries to do the write. The fault
is solely to make sure that the Linux "dirty" bit can be tracked. As long
as the fault handler grabs the right locks before updating the Linux "dirty"
bit, things should be okay. This is the case with mips, for example.

The problem with x86 is that we depend on automatic x86 dirty bit
update to manage the Linux "dirty" bit (they are the same!). So appropriate
locks are not grabbed.

Kanoj


> 
> -- Jamie
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: x86 ptep_get_and_clear question
  2001-02-15 18:51               ` Manfred Spraul
@ 2001-02-15 19:05                 ` Kanoj Sarcar
  2001-02-15 19:19                   ` Jamie Lokier
  2001-02-15 19:07                 ` Jamie Lokier
  1 sibling, 1 reply; 21+ messages in thread
From: Kanoj Sarcar @ 2001-02-15 19:05 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: Jamie Lokier, Ben LaHaise, linux-mm, mingo, alan, linux-kernel

> 
> Kanoj Sarcar wrote:
> > 
> > Okay, I will quote from Intel Architecture Software Developer's Manual
> > Volume 3: System Programming Guide (1997 print), section 3.7, page 3-27:
> > 
> > "Bus cycles to the page directory and page tables in memory are performed
> > only when the TLBs do not contain the translation information for a
> > requested page."
> > 
> > And on the same page:
> > 
> > "Whenever a page directory or page table entry is changed (including when
> > the present flag is set to zero), the operating system must immediately
> > invalidate the corresponding entry in the TLB so that it can be updated
> > the next time the entry is referenced."
> >
> 
> But there is another paragraph that mentions that an OS may use lazy tlb
> shootdowns.
> [search for shootdown]
> 
> You check the far too obvious chapters, remember that Intel wrote the
> documentation ;-)

:-) :-)

The good part is, there are a lot of Intel folks now active on Linux,
I can go off and ask one of them, if we are sufficiently confused. I
am trying to see whether we are.

> I searched for 'dirty' though Vol 3 and found
> 
> Chapter 7.1.2.1 Automatic locking.
> 
> .. the processor uses locked cycles to set the accessed and dirty flag
> in the page-directory and page-table entries.
> 
> But that obviously doesn't answer your question.
> 
> Is the sequence
> << lock;
> read pte
> pte |= dirty
> write pte
> >> end lock;
> or
> << lock;
> read pte
> if (!present(pte))
> 	do_page_fault();
> pte |= dirty
> write pte.
> >> end lock;

No, it is a little more complicated. You also have to include in the
tlb state into this algorithm. Since that is what we are talking about.
Specifically, what does the processor do when it has a tlb entry allowing
RW, the processor has only done reads using the translation, and the 
in-memory pte is clear?

Kanoj

> 
> --
> 	Manfred
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: x86 ptep_get_and_clear question
  2001-02-15 18:57                 ` Kanoj Sarcar
@ 2001-02-15 19:06                   ` Ben LaHaise
  2001-02-15 19:19                     ` Kanoj Sarcar
  0 siblings, 1 reply; 21+ messages in thread
From: Ben LaHaise @ 2001-02-15 19:06 UTC (permalink / raw)
  To: Kanoj Sarcar; +Cc: Jamie Lokier, linux-mm, mingo, alan, linux-kernel

On Thu, 15 Feb 2001, Kanoj Sarcar wrote:

> No. All architectures do not have this problem. For example, if the
> Linux "dirty" (not the pte dirty) bit is managed by software, a fault
> will actually be taken when processor 2 tries to do the write. The fault
> is solely to make sure that the Linux "dirty" bit can be tracked. As long
> as the fault handler grabs the right locks before updating the Linux "dirty"
> bit, things should be okay. This is the case with mips, for example.
>
> The problem with x86 is that we depend on automatic x86 dirty bit
> update to manage the Linux "dirty" bit (they are the same!). So appropriate
> locks are not grabbed.

Will you please go off and prove that this "problem" exists on some x86
processor before continuing this rant?  None of the PII, PIII, Athlon,
K6-2 or 486s I checked exhibited the worrisome behaviour you're
speculating about, plus it is logically consistent with the statements the
manual does make about updating ptes; otherwise how could an smp os
perform a reliable shootdown by doing an atomic bit clear on the present
bit of a pte?

		-ben

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: x86 ptep_get_and_clear question
  2001-02-15 18:51               ` Manfred Spraul
  2001-02-15 19:05                 ` Kanoj Sarcar
@ 2001-02-15 19:07                 ` Jamie Lokier
  1 sibling, 0 replies; 21+ messages in thread
From: Jamie Lokier @ 2001-02-15 19:07 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: Kanoj Sarcar, Ben LaHaise, linux-mm, mingo, alan, linux-kernel

Manfred Spraul wrote:
> Is the sequence
> << lock;
> read pte
> pte |= dirty
> write pte
> >> end lock;
> or
> << lock;
> read pte
> if (!present(pte))
> 	do_page_fault();
> pte |= dirty
> write pte.
> >> end lock;

or more generally

<< lock;
read pte
if (!present(pte) || !writable(pte))
	do_page_fault();
pte |= dirty
write pte.
>> end lock;

Not to mention, does it guarantee to use the newly read physical
address, does it check the superviser permission again, does it use the
new PAT/CD/WT attributes?

I can vaguely imagine some COW optimisation where the pte is updated to
be writable with the new page's address, and there is no need to flush
other processor TLBs because they will do so when they first write to
the page.  (But of course you have to be careful synchronising with
other uses of the shared page prior to the eventual TLB flush).

-- Jamie
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: x86 ptep_get_and_clear question
  2001-02-15 19:05                 ` Kanoj Sarcar
@ 2001-02-15 19:19                   ` Jamie Lokier
  0 siblings, 0 replies; 21+ messages in thread
From: Jamie Lokier @ 2001-02-15 19:19 UTC (permalink / raw)
  To: Kanoj Sarcar
  Cc: Manfred Spraul, Ben LaHaise, linux-mm, mingo, alan, linux-kernel

Kanoj Sarcar wrote:
> > Is the sequence
> > << lock;
> > read pte
> > pte |= dirty
> > write pte
> > >> end lock;
> > or
> > << lock;
> > read pte
> > if (!present(pte))
> > 	do_page_fault();
> > pte |= dirty
> > write pte.
> > >> end lock;
> 
> No, it is a little more complicated. You also have to include in the
> tlb state into this algorithm. Since that is what we are talking about.
> Specifically, what does the processor do when it has a tlb entry allowing
> RW, the processor has only done reads using the translation, and the 
> in-memory pte is clear?

Yes (no to the no): Manfred's pseudo-code is exactly the question you're
asking.  Because when the TLB entry is non-dirty and you do a write, we
_know_ the processor will do a locked memory cycle to update the dirty
bit.  A locked memory cycle implies read-modify-write, not "write TLB
entry + dirty" (which would be a plain write) or anything like that.

Given you know it's a locked cycle, the only sensible design from Intel
is going to be one of Manfred's scenarios.

An interesting thought experiment though is this:

<< lock;
read pte
pte |= dirty
write pte
>> end lock;
if (!present(pte))
	do_page_fault();

It would have a mighty odd effect wouldn't it?

-- Jamie
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: x86 ptep_get_and_clear question
  2001-02-15 19:06                   ` Ben LaHaise
@ 2001-02-15 19:19                     ` Kanoj Sarcar
  0 siblings, 0 replies; 21+ messages in thread
From: Kanoj Sarcar @ 2001-02-15 19:19 UTC (permalink / raw)
  To: Ben LaHaise; +Cc: Jamie Lokier, linux-mm, mingo, alan, linux-kernel

> 
> On Thu, 15 Feb 2001, Kanoj Sarcar wrote:
> 
> > No. All architectures do not have this problem. For example, if the
> > Linux "dirty" (not the pte dirty) bit is managed by software, a fault
> > will actually be taken when processor 2 tries to do the write. The fault
> > is solely to make sure that the Linux "dirty" bit can be tracked. As long
> > as the fault handler grabs the right locks before updating the Linux "dirty"
> > bit, things should be okay. This is the case with mips, for example.
> >
> > The problem with x86 is that we depend on automatic x86 dirty bit
> > update to manage the Linux "dirty" bit (they are the same!). So appropriate
> > locks are not grabbed.
> 
> Will you please go off and prove that this "problem" exists on some x86
> processor before continuing this rant?  None of the PII, PIII, Athlon,

And will you please stop behaving like this is not an issue? 

> K6-2 or 486s I checked exhibited the worrisome behaviour you're

And I maintain that this kind of race condition can not be tickled
deterministically. There might be some piece of logic (or absence of it),
that can show that your finding of a thousand runs is not relevant.

> speculating about, plus it is logically consistent with the statements the
> manual does make about updating ptes; otherwise how could an smp os

Don't say this anymore, specially if you can not point me to the specs.

> perform a reliable shootdown by doing an atomic bit clear on the present
> bit of a pte?

OS clears present bit, processors can keep using their TLBs and access 
the page, no problems at all. That is why after clearing the present bit, 
the processor must flush all tlbs before it can assume no one is using
the page. Hardware updated access bit could also be a problem, but an
error there does not destroy data, it just leads the os to choosing the
wrong page to evict during memory pressure.

Kanoj

> 
> 		-ben
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux.eu.org/Linux-MM/
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2001-02-15 19:19 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-02-15  1:50 x86 ptep_get_and_clear question Kanoj Sarcar
2001-02-15  2:13 ` Ben LaHaise
2001-02-15  2:37   ` Kanoj Sarcar
2001-02-15 10:55   ` Jamie Lokier
2001-02-15 16:06     ` Ben LaHaise
2001-02-15 16:35       ` Jamie Lokier
2001-02-15 17:23         ` Kanoj Sarcar
2001-02-15 17:27           ` Ben LaHaise
2001-02-15 17:38             ` Kanoj Sarcar
2001-02-15 17:46               ` Ben LaHaise
2001-02-15 17:47           ` Jamie Lokier
2001-02-15 18:05             ` Kanoj Sarcar
2001-02-15 18:23             ` Kanoj Sarcar
2001-02-15 18:42               ` Jamie Lokier
2001-02-15 18:57                 ` Kanoj Sarcar
2001-02-15 19:06                   ` Ben LaHaise
2001-02-15 19:19                     ` Kanoj Sarcar
2001-02-15 18:51               ` Manfred Spraul
2001-02-15 19:05                 ` Kanoj Sarcar
2001-02-15 19:19                   ` Jamie Lokier
2001-02-15 19:07                 ` Jamie Lokier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox