Hugh Dickins wrote: > On Sun, 30 Oct 2005, Nick Piggin wrote: >>I wonder if we should go with Robin's fix (+/- my variation) >>as a temporary measure for 2.6.15? > > > You're right, I was too dismissive. I've now spent a day looking into > the larger rework, and it's a bigger job than I'd thought - partly the > architecture variations, partly the fast/slow paths and other "tlb" cruft, > partly the truncation case's i_mmap_lock (and danger of making no progress > whenever we drop it). I'll have to set all that aside for now. > Yep, I took a quick look before adding my bugs to Robin's patch, and basically came to the same conclusion. > I've taken another look at the two patches. The main reason I preferred > yours was that I misread Robin's! But yes, yours takes it a bit further, > and I think that is worthwhile. > Sure - the nice method to solve it reasonably neatly is Robin's of course. I just preferred to change accounting slightly so we wouldn't, for example, retry the call stack after each empty pgd. > But a built and tested version would be better. Aren't you trying to > return addr from each level (that's what I liked, and what I'll want to > do in the end)? But some levels are returning nothing, Hmph, seem to have got half a patch there :P > and unmap_vmas > does start +=, and the huge case leaves start unchanged, and Dang, thanks. > zap_work > should be a long so it doesn't need casting almost everywhere, and... > given all that, I bet there's more! > Yep, good idea. New tested patch attached. Robin, if you agree with the changes I made, would you sign this one and send it on to Linus and/or Andrew, please? > As to whether p??_none should count for 1 where !pte_none counts for > PAGE_SIZE, well, they say a picture is worth a thousand words, and I'm > sure that's entered your calculation ;-) I'd probably make the p??_none That didn't, but it confirms my theory. I was working on the basis that a *page* is worth a thousand words - but I think we can assume any picture worth looking at will be given its own page. I don't know if we should worry about increasing p??_none case - they are going to be largely operating on contiguous memory, free of atomic ops. And the cost for pages must also include the TLB and possible freeing work too. Trivial to change once the patch is in though. > count for a little more. Perhaps we should get everyone involved in a > great profiling effort across the architectures to determine it. > Config option. Sys tunable. I'll shut up. > I was going to make it a tunable, but then I realised someone already has - /dev/kmem ;) -- SUSE Labs, Novell Inc.