linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Meaning of the dirty bit
@ 2002-10-10  7:46 Martin Maletinsky
  2002-10-10  8:49 ` Dharmender Rai
  2002-10-10 11:40 ` Hugh Dickins
  0 siblings, 2 replies; 9+ messages in thread
From: Martin Maletinsky @ 2002-10-10  7:46 UTC (permalink / raw)
  To: kernelnewbies, linux-mm

Hi,

While studying the follow_page() function (the version of the function that is in place since 2.4.4, i.e. with the write argument), I noticed, that for an address that
should be written to (i.e. write != 0), the function checks not only the writeable flag (with pte_write()), but also the dirty flag (with pte_dirty()) of the page
containing this address.
>From what I thought to understand from general paging theory, the dirty flag of a page is set, when its content in physical memory differs from its backing on the permanent
storage system (file or swap space). Based on this understanding I do not understand why it is necessary to check the dirty flag, in order to ensure that a page is writable
- what am I missing here?

Thanks in advance for any answers
with best regards
Martin Maletinsky

P.S. Pls. put me on cc: in your reply, since I am not on the mailing list.

--
Supercomputing System AG          email: maletinsky@scs.ch
Martin Maletinsky                 phone: +41 (0)1 445 16 05
Technoparkstrasse 1               fax:   +41 (0)1 445 16 10
CH-8005 Zurich


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Meaning of the dirty bit
  2002-10-10  7:46 Meaning of the dirty bit Martin Maletinsky
@ 2002-10-10  8:49 ` Dharmender Rai
  2002-10-10  8:57   ` Martin Maletinsky
  2002-10-10 11:40 ` Hugh Dickins
  1 sibling, 1 reply; 9+ messages in thread
From: Dharmender Rai @ 2002-10-10  8:49 UTC (permalink / raw)
  To: Martin Maletinsky, kernelnewbies, linux-mm

Hi,
The purpose is to achieve need-based disk I/O.
Dirty-flag-set means you have to write the contents of
that page to the disk before paging out or
invalidating that page. If the dirty flag is not set
then there is no need for the I/O part.

Regards
Dharmender Rai

 --- Martin Maletinsky <maletinsky@scs.ch> wrote: >
Hi,
> 
> While studying the follow_page() function (the
> version of the function that is in place since
> 2.4.4, i.e. with the write argument), I noticed,
> that for an address that
> should be written to (i.e. write != 0), the function
> checks not only the writeable flag (with
> pte_write()), but also the dirty flag (with
> pte_dirty()) of the page
> containing this address.
> From what I thought to understand from general
> paging theory, the dirty flag of a page is set, when
> its content in physical memory differs from its
> backing on the permanent
> storage system (file or swap space). Based on this
> understanding I do not understand why it is
> necessary to check the dirty flag, in order to
> ensure that a page is writable
> - what am I missing here?
> 
> Thanks in advance for any answers
> with best regards
> Martin Maletinsky
> 
> P.S. Pls. put me on cc: in your reply, since I am
> not on the mailing list.
> 
> --
> Supercomputing System AG          email:
> maletinsky@scs.ch
> Martin Maletinsky                 phone: +41 (0)1
> 445 16 05
> Technoparkstrasse 1               fax:   +41 (0)1
> 445 16 10
> CH-8005 Zurich
> 
> 
> --
> Kernelnewbies: Help each other learn about the Linux
> kernel.
> Archive:      
> http://mail.nl.linux.org/kernelnewbies/
> FAQ:           http://kernelnewbies.org/faq/
>  

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Meaning of the dirty bit
  2002-10-10  8:49 ` Dharmender Rai
@ 2002-10-10  8:57   ` Martin Maletinsky
  2002-10-10  9:46     ` Dharmender Rai
  0 siblings, 1 reply; 9+ messages in thread
From: Martin Maletinsky @ 2002-10-10  8:57 UTC (permalink / raw)
  To: dharmenderr; +Cc: kernelnewbies, linux-mm

Hello,

Thanks for your reply. What is the reason to check the dirty bit in follow_page(), which (presumably) should just parse the page tables, verify write access (if the write
argument is set) and return the page descriptor describing the page the address is in (from what I understood, there is no I/O involved).
Is there any reason to deny write access when the dirty flag is not set?

Thanks again,
regards
Martin

Dharmender Rai wrote:

> Hi,
> The purpose is to achieve need-based disk I/O.
> Dirty-flag-set means you have to write the contents of
> that page to the disk before paging out or
> invalidating that page. If the dirty flag is not set
> then there is no need for the I/O part.
>
> Regards
> Dharmender Rai
>
>  --- Martin Maletinsky <maletinsky@scs.ch> wrote: >
> Hi,
> >
> > While studying the follow_page() function (the
> > version of the function that is in place since
> > 2.4.4, i.e. with the write argument), I noticed,
> > that for an address that
> > should be written to (i.e. write != 0), the function
> > checks not only the writeable flag (with
> > pte_write()), but also the dirty flag (with
> > pte_dirty()) of the page
> > containing this address.
> > From what I thought to understand from general
> > paging theory, the dirty flag of a page is set, when
> > its content in physical memory differs from its
> > backing on the permanent
> > storage system (file or swap space). Based on this
> > understanding I do not understand why it is
> > necessary to check the dirty flag, in order to
> > ensure that a page is writable
> > - what am I missing here?
> >
> > Thanks in advance for any answers
> > with best regards
> > Martin Maletinsky
> >
> > P.S. Pls. put me on cc: in your reply, since I am
> > not on the mailing list.
> >
> > --
> > Supercomputing System AG          email:
> > maletinsky@scs.ch
> > Martin Maletinsky                 phone: +41 (0)1
> > 445 16 05
> > Technoparkstrasse 1               fax:   +41 (0)1
> > 445 16 10
> > CH-8005 Zurich
> >
> >
> > --
> > Kernelnewbies: Help each other learn about the Linux
> > kernel.
> > Archive:
> > http://mail.nl.linux.org/kernelnewbies/
> > FAQ:           http://kernelnewbies.org/faq/
> >
>
> __________________________________________________
> Do You Yahoo!?
> Everything you'll ever need on one web page
> from News and Sport to Email and Music Charts
> http://uk.my.yahoo.com

--
Supercomputing System AG          email: maletinsky@scs.ch
Martin Maletinsky                 phone: +41 (0)1 445 16 05
Technoparkstrasse 1               fax:   +41 (0)1 445 16 10
CH-8005 Zurich


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Meaning of the dirty bit
  2002-10-10  8:57   ` Martin Maletinsky
@ 2002-10-10  9:46     ` Dharmender Rai
  0 siblings, 0 replies; 9+ messages in thread
From: Dharmender Rai @ 2002-10-10  9:46 UTC (permalink / raw)
  To: Martin Maletinsky; +Cc: kernelnewbies, linux-mm

Hi ,

Read the //// commented part in the following code mentioned by you:

* Do a quick page-table lookup for a single page.
  */
 static struct page * follow_page(unsigned long address, int write)
{
 	pgd_t *pgd;
 	pmd_t *pmd;
                     pte_t *ptep, pte;

 pgd = pgd_offset(current->mm, address);
/// initialized page directory entry or the page directory entry is invalid
 if (pgd_none(*pgd) || pgd_bad(*pgd))
 	goto out;
 pmd = pmd_offset(pgd, address);
/// initialized page middle directory entry or the page middle directory
entry is invalid
 if (pmd_none(*pmd) || pmd_bad(*pmd))
 	goto out;

 ptep = pte_offset(pmd, address);
 if (!ptep)
 	goto out;

 pte = *ptep;
//// if the page table entry is valid
 if (pte_present(pte)) {
 	if (!write ||
//// page is write-able and dirty
      (pte_write(pte) && pte_dirty(pte)))
  	return pte_page(pte);
 }

out:
 return 0;
}

The logic here is very simple. This function is used to detect  one page.
Now a writeable and dirty page is the most suitable one as this page's
content has to be written out on the disk. Suppose you go for the read only
page then you will be interrupting the processes that might be reading from
that page.


Regards,
Dharmender Rai
================================
Dharmender Rai,
Cybage Software Pvt. Ltd,
Kalyani Nagar,
Pune -411006

Phone : 6686359
Extn    :    261
----- Original Message -----
From: "Martin Maletinsky" <maletinsky@scs.ch>
To: <dharmenderr@cybage.com>
Cc: <kernelnewbies@nl.linux.org>; <linux-mm@kvack.org>
Sent: Thursday, October 10, 2002 2:27 PM
Subject: Re: Meaning of the dirty bit


> Hello,
>
> Thanks for your reply. What is the reason to check the dirty bit in
follow_page(), which (presumably) should just parse the page tables, verify
write access (if the write
> argument is set) and return the page descriptor describing the page the
address is in (from what I understood, there is no I/O involved).
> Is there any reason to deny write access when the dirty flag is not set?
>
> Thanks again,
> regards
> Martin
>
> Dharmender Rai wrote:
>
> > Hi,
> > The purpose is to achieve need-based disk I/O.
> > Dirty-flag-set means you have to write the contents of
> > that page to the disk before paging out or
> > invalidating that page. If the dirty flag is not set
> > then there is no need for the I/O part.
> >
> > Regards
> > Dharmender Rai
> >
> >  --- Martin Maletinsky <maletinsky@scs.ch> wrote: >
> > Hi,
> > >
> > > While studying the follow_page() function (the
> > > version of the function that is in place since
> > > 2.4.4, i.e. with the write argument), I noticed,
> > > that for an address that
> > > should be written to (i.e. write != 0), the function
> > > checks not only the writeable flag (with
> > > pte_write()), but also the dirty flag (with
> > > pte_dirty()) of the page
> > > containing this address.
> > > From what I thought to understand from general
> > > paging theory, the dirty flag of a page is set, when
> > > its content in physical memory differs from its
> > > backing on the permanent
> > > storage system (file or swap space). Based on this
> > > understanding I do not understand why it is
> > > necessary to check the dirty flag, in order to
> > > ensure that a page is writable
> > > - what am I missing here?
> > >
> > > Thanks in advance for any answers
> > > with best regards
> > > Martin Maletinsky
> > >
> > > P.S. Pls. put me on cc: in your reply, since I am
> > > not on the mailing list.
> > >
> > > --
> > > Supercomputing System AG          email:
> > > maletinsky@scs.ch
> > > Martin Maletinsky                 phone: +41 (0)1
> > > 445 16 05
> > > Technoparkstrasse 1               fax:   +41 (0)1
> > > 445 16 10
> > > CH-8005 Zurich
> > >
> > >
> > > --
> > > Kernelnewbies: Help each other learn about the Linux
> > > kernel.
> > > Archive:
> > > http://mail.nl.linux.org/kernelnewbies/
> > > FAQ:           http://kernelnewbies.org/faq/
> > >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Everything you'll ever need on one web page
> > from News and Sport to Email and Music Charts
> > http://uk.my.yahoo.com
>
> --
> Supercomputing System AG          email: maletinsky@scs.ch
> Martin Maletinsky                 phone: +41 (0)1 445 16 05
> Technoparkstrasse 1               fax:   +41 (0)1 445 16 10
> CH-8005 Zurich
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Meaning of the dirty bit
  2002-10-10  7:46 Meaning of the dirty bit Martin Maletinsky
  2002-10-10  8:49 ` Dharmender Rai
@ 2002-10-10 11:40 ` Hugh Dickins
  2002-10-10 11:55   ` William Lee Irwin III
                     ` (2 more replies)
  1 sibling, 3 replies; 9+ messages in thread
From: Hugh Dickins @ 2002-10-10 11:40 UTC (permalink / raw)
  To: Martin Maletinsky; +Cc: Stephen Tweedie, kernelnewbies, linux-mm

On Thu, 10 Oct 2002, Martin Maletinsky wrote:
> 
> While studying the follow_page() function (the version of the function
> that is in place since 2.4.4, i.e. with the write argument), I noticed,
> that for an address that > should be written to (i.e. write != 0), the
> function checks not only the writeable flag (with pte_write()), but also
> the dirty flag (with pte_dirty()) of the page > containing this address.
> From what I thought to understand from general paging theory, the dirty
> flag of a page is set, when its content in physical memory differs from
> its backing on the permanent storage system (file or swap space). Based
> on this understanding I do not understand why it is necessary to check
> the dirty flag, in order to ensure that a page is writable
> - what am I missing here?

Good question (and I don't see the answer in Dharmender's replies).
I expect Stephen can give the definitive answer, but here's my guess.

follow_page() was introduced for kiobufs, so despite its general name,
it's doing what map_user_kiobuf() needed (or thought it needed).

Originally (pre-2.4.4), as you've noticed, there was no write argument
to follow_page, and map_user_kiobuf made one call to handle_mm_fault
per page.  Experience with races under memory pressure will have shown
that to be inadequate, it needed to loop until it could hold down the
page, with the writable bit in the pte guaranteeing it good to write to.

But why dirty too, you ask?  I think, because writing to page via kiobuf
happens directly, not via pte, so the pte dirty bit would not be set
that way; but if it's not set, then the modification to the page may
be lost later.  Hence map_user_kiobuf used handle_mm_fault to set
that dirty bit too, and used follow_page to check that it is set.

Except that's racy too, and so mark_dirty_kiobuf() was added to
SetPageDirty on the pages after kio done, before unmapping the kiobuf.
mark_dirty_kiobuf appeared in the main kernel tree at the same time
as the pte_dirty test in follow_page, but I'm guessing the pte_dirty
test was an earlier failed attempt to solve the problems fixed by
mark_dirty_kiobuf, which got left in place (and also helped a bit
if kiobuf users weren't updated to call mark_dirty_kiobuf).

Apologies in advance if my guesses are wild.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Meaning of the dirty bit
  2002-10-10 11:40 ` Hugh Dickins
@ 2002-10-10 11:55   ` William Lee Irwin III
  2002-10-10 13:40     ` Hugh Dickins
  2002-10-10 12:11   ` Martin Maletinsky
  2002-10-10 13:11   ` Dharmender Rai
  2 siblings, 1 reply; 9+ messages in thread
From: William Lee Irwin III @ 2002-10-10 11:55 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Martin Maletinsky, Stephen Tweedie, kernelnewbies, linux-mm

On Thu, Oct 10, 2002 at 12:40:08PM +0100, Hugh Dickins wrote:
> Originally (pre-2.4.4), as you've noticed, there was no write argument
> to follow_page, and map_user_kiobuf made one call to handle_mm_fault
> per page.  Experience with races under memory pressure will have shown
> that to be inadequate, it needed to loop until it could hold down the
> page, with the writable bit in the pte guaranteeing it good to write to.

Could you explain what race occurred?


On Thu, Oct 10, 2002 at 12:40:08PM +0100, Hugh Dickins wrote:
> But why dirty too, you ask?  I think, because writing to page via kiobuf
> happens directly, not via pte, so the pte dirty bit would not be set
> that way; but if it's not set, then the modification to the page may
> be lost later.  Hence map_user_kiobuf used handle_mm_fault to set
> that dirty bit too, and used follow_page to check that it is set.

Some of the mechanics of how the PTE dirty bit relate to the software
notion of a page being dirty are escaping me here. How does follow_page()
enter the equation? The PTE's of other processes cannot be resolved this
way so it does not seem clear to me at all that follow_page() taking an
extra argument can actually get something useful done here.


On Thu, Oct 10, 2002 at 12:40:08PM +0100, Hugh Dickins wrote:
> Except that's racy too, and so mark_dirty_kiobuf() was added to
> SetPageDirty on the pages after kio done, before unmapping the kiobuf.
> mark_dirty_kiobuf appeared in the main kernel tree at the same time
> as the pte_dirty test in follow_page, but I'm guessing the pte_dirty
> test was an earlier failed attempt to solve the problems fixed by
> mark_dirty_kiobuf, which got left in place (and also helped a bit
> if kiobuf users weren't updated to call mark_dirty_kiobuf).
> Apologies in advance if my guesses are wild.

Hrm, I'm going to have to dig up a tree with kiobuf stuff in it, I've
largely ignored that path for various reasons.


Bill
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Meaning of the dirty bit
  2002-10-10 11:40 ` Hugh Dickins
  2002-10-10 11:55   ` William Lee Irwin III
@ 2002-10-10 12:11   ` Martin Maletinsky
  2002-10-10 13:11   ` Dharmender Rai
  2 siblings, 0 replies; 9+ messages in thread
From: Martin Maletinsky @ 2002-10-10 12:11 UTC (permalink / raw)
  Cc: Stephen Tweedie, kernelnewbies, linux-mm

Hi Hugh,

Thanks a lot for your answer.

Hugh Dickins wrote:

> On Thu, 10 Oct 2002, Martin Maletinsky wrote:
> >
> > While studying the follow_page() function (the version of the function
> > that is in place since 2.4.4, i.e. with the write argument), I noticed,
> > that for an address that > should be written to (i.e. write != 0), the
> > function checks not only the writeable flag (with pte_write()), but also
> > the dirty flag (with pte_dirty()) of the page > containing this address.
> > From what I thought to understand from general paging theory, the dirty
> > flag of a page is set, when its content in physical memory differs from
> > its backing on the permanent storage system (file or swap space). Based
> > on this understanding I do not understand why it is necessary to check
> > the dirty flag, in order to ensure that a page is writable
> > - what am I missing here?
>
> Good question (and I don't see the answer in Dharmender's replies).
> I expect Stephen can give the definitive answer, but here's my guess.
>
> follow_page() was introduced for kiobufs, so despite its general name,
> it's doing what map_user_kiobuf() needed (or thought it needed).
>
> Originally (pre-2.4.4), as you've noticed, there was no write argument
> to follow_page, and map_user_kiobuf made one call to handle_mm_fault
> per page.  Experience with races under memory pressure will have shown
> that to be inadequate, it needed to loop until it could hold down the
> page, with the writable bit in the pte guaranteeing it good to write to.
>
> But why dirty too, you ask?  I think, because writing to page via kiobuf
> happens directly, not via pte, so the pte dirty bit would not be set
> that way; but if it's not set, then the modification to the page may
> be lost later.  Hence map_user_kiobuf used handle_mm_fault to set
> that dirty bit too, and used follow_page to check that it is set.
>
> Except that's racy too, and so mark_dirty_kiobuf() was added to
> SetPageDirty on the pages after kio done, before unmapping the kiobuf.
> mark_dirty_kiobuf appeared in the main kernel tree at the same time
> as the pte_dirty test in follow_page, but I'm guessing the pte_dirty
> test was an earlier failed attempt to solve the problems fixed by
> mark_dirty_kiobuf, which got left in place (and also helped a bit
> if kiobuf users weren't updated to call mark_dirty_kiobuf).
>
> Apologies in advance if my guesses are wild.

Although you call it a 'a wild guess', it sounds quite plausible to me. However, if the check of the dirty flag is basically there to ensure that handle_mm_fault() did its
job (to mark the pte dirty), wouldn't it make (more?) sense, to have a pte_mkdirty() call in follow_page() setting the dirty bit (possibly/probably once again)?

thanks again
best regards
Martin Maletinsky

--
Supercomputing System AG          email: maletinsky@scs.ch
Martin Maletinsky                 phone: +41 (0)1 445 16 05
Technoparkstrasse 1               fax:   +41 (0)1 445 16 10
CH-8005 Zurich
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Meaning of the dirty bit
  2002-10-10 11:40 ` Hugh Dickins
  2002-10-10 11:55   ` William Lee Irwin III
  2002-10-10 12:11   ` Martin Maletinsky
@ 2002-10-10 13:11   ` Dharmender Rai
  2 siblings, 0 replies; 9+ messages in thread
From: Dharmender Rai @ 2002-10-10 13:11 UTC (permalink / raw)
  To: Hugh Dickins, Martin Maletinsky; +Cc: Stephen Tweedie, kernelnewbies, linux-mm



 --- Hugh Dickins <hugh@veritas.com> wrote: > On Thu,
10 Oct 2002, Martin Maletinsky wrote:
> > 
> > While studying the follow_page() function (the
> version of the function
Hugh,
   Here is the link to know more about follow_page().
I had replied after reading it.

http://lwn.net/Articles/11483/

Regards
Dharmender Rai

> > that is in place since 2.4.4, i.e. with the write
> argument), I noticed,
> > that for an address that > should be written to
> (i.e. write != 0), the
> > function checks not only the writeable flag (with
> pte_write()), but also
> > the dirty flag (with pte_dirty()) of the page >
> containing this address.
> > From what I thought to understand from general
> paging theory, the dirty
> > flag of a page is set, when its content in
> physical memory differs from
> > its backing on the permanent storage system (file
> or swap space). Based
> > on this understanding I do not understand why it
> is necessary to check
> > the dirty flag, in order to ensure that a page is
> writable
> > - what am I missing here?
> 
> Good question (and I don't see the answer in
> Dharmender's replies).
> I expect Stephen can give the definitive answer, but
> here's my guess.
> 
> follow_page() was introduced for kiobufs, so despite
> its general name,
> it's doing what map_user_kiobuf() needed (or thought
> it needed).
> 
> Originally (pre-2.4.4), as you've noticed, there was
> no write argument
> to follow_page, and map_user_kiobuf made one call to
> handle_mm_fault
> per page.  Experience with races under memory
> pressure will have shown
> that to be inadequate, it needed to loop until it
> could hold down the
> page, with the writable bit in the pte guaranteeing
> it good to write to.
> 
> But why dirty too, you ask?  I think, because
> writing to page via kiobuf
> happens directly, not via pte, so the pte dirty bit
> would not be set
> that way; but if it's not set, then the modification
> to the page may
> be lost later.  Hence map_user_kiobuf used
> handle_mm_fault to set
> that dirty bit too, and used follow_page to check
> that it is set.
> 
> Except that's racy too, and so mark_dirty_kiobuf()
> was added to
> SetPageDirty on the pages after kio done, before
> unmapping the kiobuf.
> mark_dirty_kiobuf appeared in the main kernel tree
> at the same time
> as the pte_dirty test in follow_page, but I'm
> guessing the pte_dirty
> test was an earlier failed attempt to solve the
> problems fixed by
> mark_dirty_kiobuf, which got left in place (and also
> helped a bit
> if kiobuf users weren't updated to call
> mark_dirty_kiobuf).
> 
> Apologies in advance if my guesses are wild.
> 
> Hugh
> 
> --
> Kernelnewbies: Help each other learn about the Linux
> kernel.
> Archive:      
> http://mail.nl.linux.org/kernelnewbies/
> FAQ:           http://kernelnewbies.org/faq/
>  

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Meaning of the dirty bit
  2002-10-10 11:55   ` William Lee Irwin III
@ 2002-10-10 13:40     ` Hugh Dickins
  0 siblings, 0 replies; 9+ messages in thread
From: Hugh Dickins @ 2002-10-10 13:40 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Martin Maletinsky, Stephen Tweedie, kernelnewbies, linux-mm

On Thu, 10 Oct 2002, William Lee Irwin III wrote:
> On Thu, Oct 10, 2002 at 12:40:08PM +0100, Hugh Dickins wrote:
> > Originally (pre-2.4.4), as you've noticed, there was no write argument
> > to follow_page, and map_user_kiobuf made one call to handle_mm_fault
> > per page.  Experience with races under memory pressure will have shown
> > that to be inadequate, it needed to loop until it could hold down the
> > page, with the writable bit in the pte guaranteeing it good to write to.
> 
> Could you explain what race occurred?

In the 2.4.3 version, handle_mm_fault would fault the page in, writable
and dirty, if not already; but try_to_swap_out might intervene, just
before map_user_kiobuf immediately after takes the page_table_lock
and does follow_page, clearing the page table entry just verified.

And there might even be a read fault coming in too (from another thread),
bringing back the page table entry but without its dirty bit.  Er, no,
scrub that: we have down_write on mmap_sem, keeping out such a fault.

(But I wasn't involved, just noticed when the looping was added and
was unsurprised since it had looked unsafe to me before.  Perhaps the
race which actually occurred was something else I've not thought of.)

> On Thu, Oct 10, 2002 at 12:40:08PM +0100, Hugh Dickins wrote:
> > But why dirty too, you ask?  I think, because writing to page via kiobuf
> > happens directly, not via pte, so the pte dirty bit would not be set
> > that way; but if it's not set, then the modification to the page may
> > be lost later.  Hence map_user_kiobuf used handle_mm_fault to set
> > that dirty bit too, and used follow_page to check that it is set.
> 
> Some of the mechanics of how the PTE dirty bit relate to the software
> notion of a page being dirty are escaping me here. How does follow_page()
> enter the equation? The PTE's of other processes cannot be resolved this
> way so it does not seem clear to me at all that follow_page() taking an
> extra argument can actually get something useful done here.

I don't entirely understand you here.  follow_page verifies the pte,
while holding page_table_lock, prior to bumping page reference count:
page_table_lock necessary to keep try_to_swap_out away, and of course
it cannot be held over call to handle_mm_fault.

The extra arg to follow_page does get something useful done, in the
2.4.4 tree where it's introduced along with the loop, since in that
loop the follow_page is done before the handle_mm_fault - so if the
writable dirty(?) pte already exists, no need to call handle_mm_fault
at all.  get_user_pages still works this way.

> On Thu, Oct 10, 2002 at 12:40:08PM +0100, Hugh Dickins wrote:
> > Except that's racy too, and so mark_dirty_kiobuf() was added to
> > SetPageDirty on the pages after kio done, before unmapping the kiobuf.
> > mark_dirty_kiobuf appeared in the main kernel tree at the same time
> > as the pte_dirty test in follow_page, but I'm guessing the pte_dirty
> > test was an earlier failed attempt to solve the problems fixed by
> > mark_dirty_kiobuf, which got left in place (and also helped a bit
> > if kiobuf users weren't updated to call mark_dirty_kiobuf).
> > Apologies in advance if my guesses are wild.
> 
> Hrm, I'm going to have to dig up a tree with kiobuf stuff in it, I've
> largely ignored that path for various reasons.

I believe akpm hopes to do away with kiobufs shortly; but I assume
the get_user_pages inheritor of this code will remain, and it is a
different kind of path which can easily catch us out.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2002-10-10 13:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-10  7:46 Meaning of the dirty bit Martin Maletinsky
2002-10-10  8:49 ` Dharmender Rai
2002-10-10  8:57   ` Martin Maletinsky
2002-10-10  9:46     ` Dharmender Rai
2002-10-10 11:40 ` Hugh Dickins
2002-10-10 11:55   ` William Lee Irwin III
2002-10-10 13:40     ` Hugh Dickins
2002-10-10 12:11   ` Martin Maletinsky
2002-10-10 13:11   ` Dharmender Rai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox