From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51]) by kanga.kvack.org (Postfix) with ESMTP id 0280A6B00A5 for ; Wed, 27 Oct 2010 16:35:08 -0400 (EDT) Received: from wpaz13.hot.corp.google.com (wpaz13.hot.corp.google.com [172.24.198.77]) by smtp-out.google.com with ESMTP id o9RKZ4Zo008068 for ; Wed, 27 Oct 2010 13:35:05 -0700 Received: from qwc9 (qwc9.prod.google.com [10.241.193.137]) by wpaz13.hot.corp.google.com with ESMTP id o9RKYfxg006786 for ; Wed, 27 Oct 2010 13:35:03 -0700 Received: by qwc9 with SMTP id 9so255013qwc.39 for ; Wed, 27 Oct 2010 13:35:03 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <1288200090-23554-1-git-send-email-yinghan@google.com> <4CC869F5.2070405@redhat.com> Date: Wed, 27 Oct 2010 13:35:02 -0700 Message-ID: Subject: Re: [PATCH] mm: don't flush TLB when propagate PTE access bit to struct page. From: Ying Han Content-Type: multipart/alternative; boundary=0016363b7e3ef35e1504939f28c4 Sender: owner-linux-mm@kvack.org To: Hugh Dickins Cc: Nick Piggin , Rik van Riel , Ken Chen , linux-mm@kvack.org, Minchan Kim , KAMEZAWA Hiroyuki , Andrew Morton List-ID: --0016363b7e3ef35e1504939f28c4 Content-Type: text/plain; charset=ISO-8859-1 On Wed, Oct 27, 2010 at 12:13 PM, Hugh Dickins wrote: > On Wed, 27 Oct 2010, Nick Piggin wrote: > > On Wed, Oct 27, 2010 at 12:22 PM, Nick Piggin wrote: > > > On Wed, Oct 27, 2010 at 12:05 PM, Rik van Riel > wrote: > > >> On 10/27/2010 01:21 PM, Ying Han wrote: > > >>> > > >>> kswapd's use case of hardware PTE accessed bit is to approximate page > LRU. > > >>> The > > >>> ActiveLRU demotion to InactiveLRU are not base on accessed bit, while > it > > >>> is only > > >>> used to promote when a page is on inactive LRU list. All of the > state > > >>> transitions > > >>> are triggered by memory pressure and thus has weak relationship with > > >>> respect to > > >>> time. In addition, hardware already transparently flush tlb whenever > CPU > > >>> context > > >>> switch processes and given limited hardware TLB resource, the time > period > > >>> in > > >>> which a page is accessed but not yet propagated to struct page is > very > > >>> small > > >>> in practice. With the nature of approximation, kernel really don't > need to > > >>> flush TLB > > >>> for changing PTE's access bit. This commit removes the flush > operation > > >>> from it. > > It should at least add a comment there in page_referenced_one(), that > a TLB flush ought to be done, but is now judged not worth the effort. > I will make the change here. > > (I'd expect architectures to differ on whether it's worth the effort.) > Right :) I would like hear from upstream if the problem is general enough to solve, and thus we can plan put further effort into it. > >>> > > >>> Signed-off-by: Ying Han > > >>> Singed-off-by: Ken Chen > > Hey, Ken, switch off those curling tongs :) > > > However, it's a scary change -- higher chance of reclaiming a TLB covered > page. > > Yes, I was often tempted to make such a change in the past; > but ran away when it appeared to be in danger of losing the pte > referenced bit of precisely the most intensively referenced pages. > > Ying's point (about what the pte referenced bit is being used for in our > current implementation) is interesting, and might have tipped the balance; > but that's not clear to me - and the flush is only done when mm is on CPU. > The initial patch is from Ken, and I am helping out here to get feedback from upstream and further improvement. :) > > > I had a vague memory of this problem biting someone when this flush > wasn't > > actually done properly... maybe powerpc. > > > > But anyway, same solution could be possible, by flushing every N pages > scanned. > > Yes, batching seems safer. > I might be able to take a look at it. --Ying > > Hugh --0016363b7e3ef35e1504939f28c4 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

On Wed, Oct 27, 2010 at 12:13 PM, Hugh D= ickins <hughd@goog= le.com> wrote:
On Wed, 27 Oct 2010, Nick Piggin wrote:
> On Wed, Oct 27, 2010 at 12:22 PM, Nick Piggin <npiggin@gmail.com> wrote:
> > On Wed, Oct 27, 2010 at 12:05 PM, Rik van Riel <riel@redhat.com> wrote:
> >> On 10/27/2010 01:21 PM, Ying Han wrote:
> >>>
> >>> kswapd's use case of hardware PTE accessed bit is to = approximate page LRU.
> >>> =A0The
> >>> ActiveLRU demotion to InactiveLRU are not base on accesse= d bit, while it
> >>> is only
> >>> used to promote when a page is on inactive LRU list. =A0A= ll of the state
> >>> transitions
> >>> are triggered by memory pressure and thus has weak relati= onship with
> >>> respect to
> >>> time. =A0In addition, hardware already transparently flus= h tlb whenever CPU
> >>> context
> >>> switch processes and given limited hardware TLB resource,= the time period
> >>> in
> >>> which a page is accessed but not yet propagated to struct= page is very
> >>> small
> >>> in practice. With the nature of approximation, kernel rea= lly don't need to
> >>> flush TLB
> >>> for changing PTE's access bit. =A0This commit removes= the flush operation
> >>> from it.

It should at least add a comment there in page_referenced_one(), that=
a TLB flush ought to be done, but is now judged not worth the effort.

I will make the change here. =A0

(I'd expect architectures to differ on whether it's worth the effor= t.)

Right :) =A0I would like hear from = upstream if the problem is general=A0enough to solve, and thus=A0
we can plan put further effort into it.

> >>>
> >>> Signed-off-by: Ying Han<yinghan@google.com>
> >>> Singed-off-by: Ken Chen<kenchen@google.com>

Hey, Ken, switch off those curling tongs :)

> However, it's a scary change -- higher chance of reclaiming a TLB = covered page.

Yes, I was often tempted to make such a change in the past;
but ran away when it appeared to be in danger of losing the pte
referenced bit of precisely the most intensively referenced pages.

Ying's point (about what the pte referenced bit is being used for in ou= r
current implementation) is interesting, and might have tipped the balance;<= br> but that's not clear to me - and the flush is only done when mm is on C= PU.

The initial patch is from Ken, and = I am=A0helping=A0out here to get feedback from
upstream and furth= er improvement. :)

>
> I had a vague memory of this problem biting someone when this flush wa= sn't
> actually done properly... maybe powerpc.
>
> But anyway, same solution could be possible, by flushing every N pages= scanned.

Yes, batching seems safer.

I migh= t be able to take a look at it.

--Ying=A0

Hugh

--0016363b7e3ef35e1504939f28c4-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org