From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Tue, 4 Sep 2007 19:37:48 +0200 From: Nick Piggin Subject: Re: [MAILER-DAEMON@watson.ibm.com: Returned mail: see transcript for details] Message-ID: <20070904173748.GC1191@wotan.suse.de> References: <20070903201645.GA11502@wotan.suse.de> <46DCFBB2.3060200@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <46DCFBB2.3060200@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org Return-Path: To: Balbir Singh Cc: Balbir Singh , Shailabh Nagar , Linux Memory Management List , Hugh Dickins List-ID: On Tue, Sep 04, 2007 at 07:31:14AM +0100, Balbir Singh wrote: > Nick Piggin wrote: > > > But the most obvious delay, where we actually lock the page waiting for > > the swap IO to finish, does not seem to be accounted at all! > > > > Hmm.. Does lock_page() eventually call io_schedule() or io_schedule_timeout()? > I think it does -- via sync_page(). The way our accounting works is that we > account for block I/O in io_schedule*(). If we see the SWAPIN flag set, we > then account for that I/O as swap I/O. Yes, lock_page will io_schedule() down the path you say. > > My proposed fix is to just move the swaping delay accounting to the > > point where the VM does actually wait, for the swapin. > > > > I have no idea what uses swapin delay accounting, but it would be good to > > see if this makes a positive (or at least not negative) impact on those > > users... > > > > Thanks, > > Nick > > > > -- > > Index: linux-2.6/mm/memory.c > > =================================================================== > > --- linux-2.6.orig/mm/memory.c > > +++ linux-2.6/mm/memory.c > > @@ -2158,7 +2158,6 @@ static int do_swap_page(struct mm_struct > > migration_entry_wait(mm, pmd, address); > > goto out; > > } > > - delayacct_set_flag(DELAYACCT_PF_SWAPIN); > > Let's start the accounting here. If you start accounting here, then you can potentially also count other stuff as swapin delay which I don't think you want to. I'm not sure if there is any part of reclaim where we directly wait on page IO, but we do wait on blkdev congestion which calls io_schedule. It seems funny to account this as swapin delay. > > page = lookup_swap_cache(entry); > > if (!page) { > > grab_swap_token(); /* Contend for token _before_ read-in */ > > @@ -2172,7 +2171,6 @@ static int do_swap_page(struct mm_struct > > page_table = pte_offset_map_lock(mm, pmd, address, &ptl); > > if (likely(pte_same(*page_table, orig_pte))) > > ret = VM_FAULT_OOM; > > - delayacct_clear_flag(DELAYACCT_PF_SWAPIN); > > goto unlock; > > } > > > > @@ -2181,9 +2179,10 @@ static int do_swap_page(struct mm_struct > > count_vm_event(PGMAJFAULT); > > } > > > > - delayacct_clear_flag(DELAYACCT_PF_SWAPIN); > > mark_page_accessed(page); > > + delayacct_set_flag(DELAYACCT_PF_SWAPIN); > > lock_page(page); > > + delayacct_clear_flag(DELAYACCT_PF_SWAPIN); > > > > I agree that we should end it after lock_page(). OK. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org