On Sun, Oct 30, 2022 at 3:47 PM Linus Torvalds wrote: > > Anyway, this simplification patch basically means that the *next* step > could be to just move that ipage_zap_pte_rmap()' after the TLB flush, > and now it's trivial and no longer scary. So here's that next patch: it's patch 4/4 in this series. Patches 1-3 are the patches that I already sent out as one (smaller) combined patch. I'm including them here as the whole series in case somebody else wants to follow along with how I did the simplified version of page_remove_rmap(). So if you already looked at the previous patch and don't have any questions about that one, you could just apply PATCH 4/4 on top of that one. Or you can do the whole series with commit messages and (hopefully) clearer individual steps to the simplified version of page_remove_rmap(). I haven't actually tested 4/4 yet, so this is yet another of my "I think this should work" patches. The reason I haven't actually tested it is partly because I never recreated the original problem Navav reported, and partly because the meat of patch 4/4 is just the same "encode an extra flag bit in the low bit of the page pointer" that I _did_ test, just doing the "remove rmap" instead of "set dirty". In other words, I *think* this should make Nadav's test-case happy, and avoid the warning he saw. If somebody wants a git branch, I guess I can push that too, but it would be a non-stable branch only for testing. Also, it's worth noting that zap_pte_range() does this sanity test: if (unlikely(page_mapcount(page) < 0)) print_bad_pte(vma, addr, ptent, page); and that is likely worthless now (because it hasn't actually decremented the mapcount yet). I didn't remove it, because I wasn't sure which option was best: (a) just remove it entirely (b) change the "< 0" to "<= 0" (c) move it to clean_and_free_pages_and_swap_cache() that actually does the page_zap_pte_rmap() now. so I'm in no way claiming this series is any kind of final word, but I do mostly like how the code ended up looking. Now, this whole "remove rmap after flush" would probably make sense for some of the largepage cases too, and there's room for another bit in there (or more, if you align 'struct page') and that whole code could use page size hints etc too. But I suspect that the main case we really care is for individual small pages, just because that's the case where I think it would be much worse to do any fancy TLB tracking. The largepage cases presumably aren't as critical, since there by definition is fewer of those. Comments? Linus