On Wed, Oct 27, 2010 at 12:13 PM, Hugh Dickins wrote: > On Wed, 27 Oct 2010, Nick Piggin wrote: > > On Wed, Oct 27, 2010 at 12:22 PM, Nick Piggin wrote: > > > On Wed, Oct 27, 2010 at 12:05 PM, Rik van Riel > wrote: > > >> On 10/27/2010 01:21 PM, Ying Han wrote: > > >>> > > >>> kswapd's use case of hardware PTE accessed bit is to approximate page > LRU. > > >>> The > > >>> ActiveLRU demotion to InactiveLRU are not base on accessed bit, while > it > > >>> is only > > >>> used to promote when a page is on inactive LRU list. All of the > state > > >>> transitions > > >>> are triggered by memory pressure and thus has weak relationship with > > >>> respect to > > >>> time. In addition, hardware already transparently flush tlb whenever > CPU > > >>> context > > >>> switch processes and given limited hardware TLB resource, the time > period > > >>> in > > >>> which a page is accessed but not yet propagated to struct page is > very > > >>> small > > >>> in practice. With the nature of approximation, kernel really don't > need to > > >>> flush TLB > > >>> for changing PTE's access bit. This commit removes the flush > operation > > >>> from it. > > It should at least add a comment there in page_referenced_one(), that > a TLB flush ought to be done, but is now judged not worth the effort. > I will make the change here. > > (I'd expect architectures to differ on whether it's worth the effort.) > Right :) I would like hear from upstream if the problem is general enough to solve, and thus we can plan put further effort into it. > >>> > > >>> Signed-off-by: Ying Han > > >>> Singed-off-by: Ken Chen > > Hey, Ken, switch off those curling tongs :) > > > However, it's a scary change -- higher chance of reclaiming a TLB covered > page. > > Yes, I was often tempted to make such a change in the past; > but ran away when it appeared to be in danger of losing the pte > referenced bit of precisely the most intensively referenced pages. > > Ying's point (about what the pte referenced bit is being used for in our > current implementation) is interesting, and might have tipped the balance; > but that's not clear to me - and the flush is only done when mm is on CPU. > The initial patch is from Ken, and I am helping out here to get feedback from upstream and further improvement. :) > > > I had a vague memory of this problem biting someone when this flush > wasn't > > actually done properly... maybe powerpc. > > > > But anyway, same solution could be possible, by flushing every N pages > scanned. > > Yes, batching seems safer. > I might be able to take a look at it. --Ying > > Hugh