From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Wed, 1 Aug 2007 01:23:06 +0200 From: Andrea Arcangeli Subject: Re: make swappiness safer to use Message-ID: <20070731232306.GY6910@v2.random> References: <20070731215228.GU6910@v2.random> <20070731160943.30e9c13a.akpm@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070731160943.30e9c13a.akpm@linux-foundation.org> Sender: owner-linux-mm@kvack.org Return-Path: To: Andrew Morton Cc: linux-mm@kvack.org, Nick Piggin , Martin Bligh List-ID: On Tue, Jul 31, 2007 at 04:09:43PM -0700, Andrew Morton wrote: > On Tue, 31 Jul 2007 23:52:28 +0200 > Andrea Arcangeli wrote: > > > I think the prev_priority can also be nuked since it wastes 4 bytes > > per zone (that would be an incremental patch but I wait the > > nr_scan_[in]active to be nuked first for similar reasons). Clearly > > somebody at some point noticed how broken that thing was and they had > > to add min(priority, prev_priority) to give it some reliability, but > > they didn't go the last mile to nuke prev_priority too. Calculating > > distress only in function of not-racy priority is correct and sure > > more than enough without having to add randomness into the equation. > > I don't recall seeing any such patch and I suspect it'd cause problems > anyway. > > If we were to base swap_tendency purely on sc->priority then the VM would > incorrectly fail to deactivate mapped pages until the scanning had reached > a sufficiently high (ie: low) scanning priority. > > The net effect would be that each time some process runs > shrink_active_list(), some pages would be incorrectly retained on the > active list and after a while, the code wold start moving mapped pages down > to the inactive list. > > In fact, I think that was (effectively) the behaviour which we had in > there, and it caused problems with some worklaod which Martin was looking > at and things got better when we fixed it. > > > Anyway, we can say more if we see the patch (or, more accurately, the > analysis which comes with that patch). My reasoning for prev_priority not being such a great feature is that between the two, sc->priority is critically more important because its being set for the current run, prev_priority is set later (in origin only prev_priority was used as failsafe for the swappiness logic, these days sc->priority is being mixed too because clearly prev_priority alone was not enough). But my whole dislike for those prev_* thinks is that they're all smp racey. So your beloved prev_priority will go back to 12 if a new try_to_free_pages runs with a different gfpmask and/or different order of allocation, screwing the other task in the other CPU that is having such an hard time to find unmapped pages to free because it has a strictier gfpmask (perhaps not allowed to eat into dcache/icache) or bigger order (perhaps even looping nearly forever thanks to the order <= PAGE_ALLOC_COSTLY_ORDER check). So I've an hard time to appreciate the prev_priority thing, because like the nr_scan_[in]active it's imperfect. Comments like those also shows the whole imperfection: /* Now that we've scanned all the zones at this priority level, note * that level within the zone so that the next thread that's a lie, I mean there's no such thing as next thread, all threads may be running in parallel in multiple cpus, or they may be context switching. The comment would be remotely correct if there was a big global semaphore around the vm, which would never happen. It's really the same category of the nr_scan_[in]active, and my dislike for those things is exactly the same and motivated by mostly the same reasons. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org