* [Question] ksm: rmap_item pointing to some stale vmas @ 2015-04-09 14:05 Susheel Khiani 2015-04-10 17:56 ` Hugh Dickins 0 siblings, 1 reply; 7+ messages in thread From: Susheel Khiani @ 2015-04-09 14:05 UTC (permalink / raw) To: akpm, peterz, neilb, dhowells, hughd, paulmcquad, linux-mm, linux-kernel Hi, We are seeing an issue during try_to_unmap_ksm where in call to try_to_unmap_one is failing. try_to_unmap_ksm in this particular case is trying to go through vmas associated with each rmap_item->anon_vma. What we see is this that the corresponding page is not mapped to any of the vmas associated with 2 rmap_item. The associated rmap_item in this case looks like pointing to some valid vma but the said page is not found to be mapped under it. try_to_unmap_one thus fails to find valid ptes for these vmas. At the same time we can see that the page actually is mapped in 2 separate and different vmas which are not part of rmap_item associated with page. So whether rmap_item is pointing to some stale vmas and now the mapping has changed? Or there is something else going on here. p Any pointer would be appreciated. -- Susheel Khiani QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Question] ksm: rmap_item pointing to some stale vmas 2015-04-09 14:05 [Question] ksm: rmap_item pointing to some stale vmas Susheel Khiani @ 2015-04-10 17:56 ` Hugh Dickins 2015-04-14 7:01 ` Susheel Khiani 0 siblings, 1 reply; 7+ messages in thread From: Hugh Dickins @ 2015-04-10 17:56 UTC (permalink / raw) To: Susheel Khiani Cc: akpm, peterz, neilb, dhowells, hughd, paulmcquad, linux-mm, linux-kernel On Thu, 9 Apr 2015, Susheel Khiani wrote: > Hi, > > We are seeing an issue during try_to_unmap_ksm where in call to > try_to_unmap_one is failing. > > try_to_unmap_ksm in this particular case is trying to go through vmas > associated with each rmap_item->anon_vma. What we see is this that the > corresponding page is not mapped to any of the vmas associated with 2 > rmap_item. > > The associated rmap_item in this case looks like pointing to some valid vma > but the said page is not found to be mapped under it. try_to_unmap_one thus > fails to find valid ptes for these vmas. > > At the same time we can see that the page actually is mapped in 2 separate > and different vmas which are not part of rmap_item associated with page. > > So whether rmap_item is pointing to some stale vmas and now the mapping has > changed? Or there is something else going on here. > p > Any pointer would be appreciated. I expected to be able to argue this away, but no: I think you've found a bug, and I think I get it too. I have no idea what's wrong at this point, will set aside some time to investigate, and report back. Which kernel are you using? try_to_unmap_ksm says v3.13 or earlier. Probably doesn't affect the bug, but may affect the patch you'll need. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Question] ksm: rmap_item pointing to some stale vmas 2015-04-10 17:56 ` Hugh Dickins @ 2015-04-14 7:01 ` Susheel Khiani 2015-04-15 6:22 ` Hugh Dickins 0 siblings, 1 reply; 7+ messages in thread From: Susheel Khiani @ 2015-04-14 7:01 UTC (permalink / raw) To: Hugh Dickins Cc: akpm, peterz, neilb, dhowells, paulmcquad, linux-mm, linux-kernel On 04/10/15 23:26, Hugh Dickins wrote: > On Thu, 9 Apr 2015, Susheel Khiani wrote: > >> Hi, >> >> We are seeing an issue during try_to_unmap_ksm where in call to >> try_to_unmap_one is failing. >> >> try_to_unmap_ksm in this particular case is trying to go through vmas >> associated with each rmap_item->anon_vma. What we see is this that the >> corresponding page is not mapped to any of the vmas associated with 2 >> rmap_item. >> >> The associated rmap_item in this case looks like pointing to some valid vma >> but the said page is not found to be mapped under it. try_to_unmap_one thus >> fails to find valid ptes for these vmas. >> >> At the same time we can see that the page actually is mapped in 2 separate >> and different vmas which are not part of rmap_item associated with page. >> >> So whether rmap_item is pointing to some stale vmas and now the mapping has >> changed? Or there is something else going on here. >> p >> Any pointer would be appreciated. > > I expected to be able to argue this away, but no: I think you've found > a bug, and I think I get it too. I have no idea what's wrong at this > point, will set aside some time to investigate, and report back. > > Which kernel are you using? try_to_unmap_ksm says v3.13 or earlier. > Probably doesn't affect the bug, but may affect the patch you'll need. > > Hugh > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > We are using kernel-3.10.49 and I have gone through patches of ksm above this kernel version but didn't find anything relevant w.r.t issue. The latest patch which we have for KSM on our tree is 668f9abb: mm: close PageTail race The issue otherwise is difficult to reproduce and is appearing after days of testing on 512MB Android platform. What I am not able to figure out is which code path in ksm could actually land us in situation where in stable_node we still have stale rmap_items with old vmas which are now unmapped. In the dumps we can see the new vmas mapping to the page but the new rmap_items with these new vmas which maps the page are still not updated in stable_node. -- Susheel Khiani QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Question] ksm: rmap_item pointing to some stale vmas 2015-04-14 7:01 ` Susheel Khiani @ 2015-04-15 6:22 ` Hugh Dickins 2015-04-30 6:07 ` Susheel Khiani 0 siblings, 1 reply; 7+ messages in thread From: Hugh Dickins @ 2015-04-15 6:22 UTC (permalink / raw) To: Susheel Khiani Cc: Hugh Dickins, akpm, peterz, neilb, dhowells, paulmcquad, linux-mm, linux-kernel On Tue, 14 Apr 2015, Susheel Khiani wrote: > On 04/10/15 23:26, Hugh Dickins wrote: > > On Thu, 9 Apr 2015, Susheel Khiani wrote: > > > > > > We are seeing an issue during try_to_unmap_ksm where in call to > > > try_to_unmap_one is failing. > > > > > > try_to_unmap_ksm in this particular case is trying to go through vmas > > > associated with each rmap_item->anon_vma. What we see is this that the > > > corresponding page is not mapped to any of the vmas associated with 2 > > > rmap_item. > > > > > > The associated rmap_item in this case looks like pointing to some valid > > > vma > > > but the said page is not found to be mapped under it. try_to_unmap_one > > > thus > > > fails to find valid ptes for these vmas. > > > > > > At the same time we can see that the page actually is mapped in 2 > > > separate > > > and different vmas which are not part of rmap_item associated with page. > > > > > > So whether rmap_item is pointing to some stale vmas and now the mapping > > > has > > > changed? Or there is something else going on here. > > > p > > > Any pointer would be appreciated. > > > > I expected to be able to argue this away, but no: I think you've found > > a bug, and I think I get it too. I have no idea what's wrong at this > > point, will set aside some time to investigate, and report back. > > > > Which kernel are you using? try_to_unmap_ksm says v3.13 or earlier. > > Probably doesn't affect the bug, but may affect the patch you'll need. > > > > We are using kernel-3.10.49 and I have gone through patches of ksm above this > kernel version but didn't find anything relevant w.r.t issue. The latest > patch which we have for KSM on our tree is > > 668f9abb: mm: close PageTail race I agree, I don't think 3.10.49 would be missing any relevant fix - unless there's a later fix to some "random" corruption which happens to hit you here in KSM. I wonder how you identified that this issue of un-unmappable pages is peculiar to KSM. Have you established that ordinary anon pages (we need not worry about file pages here) are always successfully unmappable? KSM is reliant upon anon_vmas working as intended (but then makes use of them in its own peculiar way). > > The issue otherwise is difficult to reproduce and is appearing after days of > testing on 512MB Android platform. What I am not able to figure out is which > code path in ksm could actually land us in situation where in stable_node we > still have stale rmap_items with old vmas which are now unmapped. Whether that's something to worry about depends on what you mean. It's normal for a stable_node to have some stale rmap_items attached, now pointing to pages different from the stable page, or pointing to none. That's in the nature of KSM, the way ksmd builds up its structures by peeking at what's in each mm, moving on, and coming back a cycle later to discover what's changed. But the anon_vma which such a stale rmap_item points to should remain valid (KSM holds an additional reference to it), even if its interval tree is now empty, or none of the vmas that it holds now cover this mm,address (but any vmas held should still be valid vmas). I was concerned, not that the stable_node has stale rmap_items attached, but that you know the page to be mapped, yet try_to_unmap_ksm is unable to locate its mappings. > > In the dumps we can see the new vmas mapping to the page but the new > rmap_items with these new vmas which maps the page are still not updated in > stable_node. "still not updated" after how long? I assume you to mean that, how ever long you wait (but at least one full scan), the stable_node is not updated with an rmap_item pointing to an anon_vma whose interval tree contains one of these new vmas which maps the page? (When setting up a new stable node, it will take several scans to establish, and can be delayed by various races, such as shifts in the unstable tree, and the trylock_page in try_to_merge_one_page. But I think that once you can see a stable ksm page mapped somewhere, all pointers to it should be captured within a single scan.) That's bad, but I have no idea of the cause. I mention corruption above, because that would be one possibility; though unlikely if it always hits you here in KSM only. Whereas if you mean that a new mapping of the stable page may not be unmapped until ksmd has completed a full scan, that is also wrong, but not so serious. Or would even that be a serious issue for you? Please describe how this comes to be a problem for you. I believe I have found two bugs that would explain the latter case; but both of them require fork, and legend has it that Android avoids fork (correct me if wrong); so I doubt they're responsible for your case, and expect both to be corrected within one full scan. The lesser of the bugs is this: KSM reclaim (dependent on anon_vmas) was introduced in 2.6.33, but then anon_vma_chains were introduced in 2.6.34, and I suspect that the conversion ought to have updated try_to_merge_with_ksm_page, to take rmap_item->anon_vma from page instead of from vma. I believe that some fork-connected mappings may be missed for a scan because of that. But fixing it doesn't help much: because the greater bug (mine) is that the search_new_forks code is not working as well as intended. It relies on using one rmap_item's anon_vma to locate the page in newer mappings forked from it, before ksmd reaches them to create their own rmap_items; but we're doing nothing to prevent that earlier rmap_item from being removed too soon. I would much rather be sending a patch, than trying to describe this so obscurely; but I have not succeeded and time has run out. I got far enough, I think, to confirm that this happens for me, and can be fixed by delaying the removal of such rmap_items. But I did not get far enough to stop them from leaking wildly; and although I've searched for quick and easy ways to do it, have come to the conclusion that fixing it safely without leaks will require more time and care than I can afford at present. (And even with those fixed, there would still be rare cases when a new mapping could not immediately be unmapped: for example, replace_page increments kpage's mapcount, but a racing try_to_unmap_ksm may hold kpage's page lock, preventing the relevant rmap_item from being appended to the stable tree.) I do hate to put down half-finished work, and would have liked to send you a patch, even if only to confirm that my problem is actually not your problem. But I now see no alternative to merely informing you of this, and wishing you luck in your own investigation: I'm sorry, I just don't know. But if I've misunderstood, and you think that what you're seeing fits with the transient forking bugs I've (not quite) described, and you can explain why even the transient case is important for you to have fixed, then I really ought to redouble my efforts. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Question] ksm: rmap_item pointing to some stale vmas 2015-04-15 6:22 ` Hugh Dickins @ 2015-04-30 6:07 ` Susheel Khiani 2015-06-09 18:26 ` Susheel Khiani 0 siblings, 1 reply; 7+ messages in thread From: Susheel Khiani @ 2015-04-30 6:07 UTC (permalink / raw) To: Hugh Dickins Cc: akpm, peterz, neilb, dhowells, paulmcquad, linux-mm, linux-kernel On 04/15/15 11:52, Hugh Dickins wrote: >> We are using kernel-3.10.49 and I have gone through patches of ksm above this >> >kernel version but didn't find anything relevant w.r.t issue. The latest >> >patch which we have for KSM on our tree is >> > >> >668f9abb: mm: close PageTail race > I agree, I don't think 3.10.49 would be missing any relevant fix - > unless there's a later fix to some "random" corruption which happens > to hit you here in KSM. > > I wonder how you identified that this issue of un-unmappable pages > is peculiar to KSM. Have you established that ordinary anon pages > (we need not worry about file pages here) are always successfully > unmappable? KSM is reliant upon anon_vmas working as intended > (but then makes use of them in its own peculiar way). > We identified issue in try_to_unmap_ksm as part of debugging CMA allocation failures. During alloc_contig_range we do migrate_pages, where we were failing to migrate a specific page even after all the retries which we make in migrate_pages function. Digging deeper we were able to conclude that we were failing in try_to_unmap_ksm where we failed to find valid ptes. >> > >> >The issue otherwise is difficult to reproduce and is appearing after days of >> >testing on 512MB Android platform. What I am not able to figure out is which >> >code path in ksm could actually land us in situation where in stable_node we >> >still have stale rmap_items with old vmas which are now unmapped. > Whether that's something to worry about depends on what you mean. > > It's normal for a stable_node to have some stale rmap_items attached, > now pointing to pages different from the stable page, or pointing to none. > That's in the nature of KSM, the way ksmd builds up its structures by > peeking at what's in each mm, moving on, and coming back a cycle later > to discover what's changed. > > But the anon_vma which such a stale rmap_item points to should remain > valid (KSM holds an additional reference to it), even if its interval > tree is now empty, or none of the vmas that it holds now cover this > mm,address (but any vmas held should still be valid vmas). > > I was concerned, not that the stable_node has stale rmap_items attached, > but that you know the page to be mapped, yet try_to_unmap_ksm is unable > to locate its mappings. > >> > >> >In the dumps we can see the new vmas mapping to the page but the new >> >rmap_items with these new vmas which maps the page are still not updated in >> >stable_node. > "still not updated" after how long? > I assume you to mean that, how ever long you wait (but at least > one full scan), the stable_node is not updated with an rmap_item > pointing to an anon_vma whose interval tree contains one of these > new vmas which maps the page? I have not yet concluded if we are waiting for one full scan or not. Since I was debugging this w.r.t CMA allocation failure by saying "still not updated" , I meant that even after all the number of retries which we make in CMA allocation path to migrate pages, the stable_node was not updated with rmap_item. But now I understand that we need to wait for at least one full ksm scan to see the update. > > (When setting up a new stable node, it will take several scans to > establish, and can be delayed by various races, such as shifts in > the unstable tree, and the trylock_page in try_to_merge_one_page. > But I think that once you can see a stable ksm page mapped somewhere, > all pointers to it should be captured within a single scan.) I am actually thinking the reason for my issue could be that we might have not waited sufficient time to ensure that ksm scan ran once. The reason for this is I was able to track down mm_slot structure which we create in __ksm_enter and it contained mm_struct which had vma where our page is mapped. But rmap_list of this mm_slot was still NULL which I guess would get populate once ksm_do_scan runs. > > That's bad, but I have no idea of the cause. I mention corruption > above, because that would be one possibility; though unlikely if > it always hits you here in KSM only. Yes, even we have ruled out corruption since now we have seen multiple instances with similar symptoms. > > Whereas if you mean that a new mapping of the stable page may not > be unmapped until ksmd has completed a full scan, that is also > wrong, but not so serious. Or would even that be a serious issue > for you? Please describe how this comes to be a problem for you. Right now I don't have enough data points to claim that new mapping of the stable page may not be unmapped until ksmd has completed a full scan. But I am debugging in this direction and would get back once I have sufficient data. > > I believe I have found two bugs that would explain the latter case; > but both of them require fork, and legend has it that Android avoids > fork (correct me if wrong); so I doubt they're responsible for your > case, and expect both to be corrected within one full scan. > > The lesser of the bugs is this: KSM reclaim (dependent on anon_vmas) > was introduced in 2.6.33, but then anon_vma_chains were introduced > in 2.6.34, and I suspect that the conversion ought to have updated > try_to_merge_with_ksm_page, to take rmap_item->anon_vma from page > instead of from vma. I believe that some fork-connected mappings > may be missed for a scan because of that. > > But fixing it doesn't help much: because the greater bug (mine) is > that the search_new_forks code is not working as well as intended. > It relies on using one rmap_item's anon_vma to locate the page in > newer mappings forked from it, before ksmd reaches them to create > their own rmap_items; but we're doing nothing to prevent that > earlier rmap_item from being removed too soon. > > I would much rather be sending a patch, than trying to describe > this so obscurely; but I have not succeeded and time has run out. > > I got far enough, I think, to confirm that this happens for me, > and can be fixed by delaying the removal of such rmap_items. > But I did not get far enough to stop them from leaking wildly; > and although I've searched for quick and easy ways to do it, > have come to the conclusion that fixing it safely without leaks > will require more time and care than I can afford at present. > > (And even with those fixed, there would still be rare cases when > a new mapping could not immediately be unmapped: for example, > replace_page increments kpage's mapcount, but a racing > try_to_unmap_ksm may hold kpage's page lock, preventing the > relevant rmap_item from being appended to the stable tree.) > > I do hate to put down half-finished work, and would have liked > to send you a patch, even if only to confirm that my problem > is actually not your problem. But I now see no alternative to > merely informing you of this, and wishing you luck in your own > investigation: I'm sorry, I just don't know. > > But if I've misunderstood, and you think that what you're seeing > fits with the transient forking bugs I've (not quite) described, > and you can explain why even the transient case is important for > you to have fixed, then I really ought to redouble my efforts. > > Hugh -- Susheel Khiani QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Question] ksm: rmap_item pointing to some stale vmas 2015-04-30 6:07 ` Susheel Khiani @ 2015-06-09 18:26 ` Susheel Khiani 2015-06-22 5:19 ` Susheel Khiani 0 siblings, 1 reply; 7+ messages in thread From: Susheel Khiani @ 2015-06-09 18:26 UTC (permalink / raw) To: Hugh Dickins Cc: akpm, peterz, neilb, dhowells, paulmcquad, linux-mm, linux-kernel On 4/30/2015 11:37 AM, Susheel Khiani wrote: >> But if I've misunderstood, and you think that what you're seeing >> fits with the transient forking bugs I've (not quite) described, >> and you can explain why even the transient case is important for >> you to have fixed, then I really ought to redouble my efforts. >> >> Hugh I was able to root cause the issue as we got few instances of same and was frequently getting reproducible on stress tests. The reason why it was important was because failure to unmap ksm page was resulting into CMA allocation failure for us. For cases like fork, what we observed is for private mapped file pages, stable_node pointed by KSM page won't cover all the mappings until ksmd completes one full scan. Only after ksmd scan, new rmap_items pointing to mappings in child process would come into existence. So in cases like CMA allocations where we can't wait for ksmd to complete one full cycle, we can traverse anon_vma tree from parent's anon_vma to find out all the pages wheres CMA is mapped. I have tested the following patch on 3.10 kernel and with this change I am able to avoid CMA allocation failure which we were otherwise frequently seeing because of not able to unmap KSM page. Please review and let me know the feedback. [PATCH] ksm: Traverse through parent's anon_vma while unmapping While doing try_to_unmap_ksm, we traverse through rmap_item list to find out all the anon_vmas from which page needs to be unmapped. Now as per the design of KSM, it builds up its data structures by looking into each mm, and comes back a cycle later to find out which data structures are now outdated and needs to be updated. So, for cases like fork, what we observe is for private mapped file pages stable_node pointed by KSM page won't cover all the mappings until ksmd completes one full scan. Only after ksmd scan, new rmap_items pointing to mappings in child process would come into existence. As a result unmapping of a stable page can't be done until ksmd has completed one full scan. This becomes an issue in case of CMA where we need to unmap and move a CMA page and can't wait for ksmd to complete one cycle. Because of new rmap_items for new mapping still not created we won't be able to unmap CMA page from all the vmas where it is mapped. This would result in frequent CMA allocation failures. So instead of just relying on rmap_items list which we know can contain incomplete list, we also scan anon_vma tree from parent's anon_vma to find out all the vmas where CMA page is mapped and thereby successfully unmap the page and move it to new page. Change-Id: I97cacf6a73734b10c7098362c20fb3f2d4040c76 Signed-off-by: Susheel Khiani <skhiani@codeaurora.org> --- mm/ksm.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 55 insertions(+), 3 deletions(-) diff --git a/mm/ksm.c b/mm/ksm.c index 11f6293..10d5266 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -1956,6 +1956,7 @@ int page_referenced_ksm(struct page *page, struct mem_cgroup *memcg, unsigned int mapcount = page_mapcount(page); int referenced = 0; int search_new_forks = 0; + int search_from_root = 0; VM_BUG_ON(!PageKsm(page)); VM_BUG_ON(!PageLocked(page)); @@ -1968,9 +1969,20 @@ again: struct anon_vma *anon_vma = rmap_item->anon_vma; struct anon_vma_chain *vmac; struct vm_area_struct *vma; + struct rb_root rb_root; + + if (!search_from_root) { + if (anon_vma) + rb_root = anon_vma->rb_root; + } + else { + if (anon_vma && anon_vma->root) { + rb_root = anon_vma->root->rb_root; + } + } anon_vma_lock_read(anon_vma); - anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root, + anon_vma_interval_tree_foreach(vmac, &rb_root, 0, ULONG_MAX) { vma = vmac->vma; if (rmap_item->address < vma->vm_start || @@ -1999,6 +2011,11 @@ again: } if (!search_new_forks++) goto again; + + if (!search_from_root++) { + search_new_forks = 0; + goto again; + } out: return referenced; } @@ -2010,6 +2027,7 @@ int try_to_unmap_ksm(struct page *page, enum ttu_flags flags, struct rmap_item *rmap_item; int ret = SWAP_AGAIN; int search_new_forks = 0; + int search_from_root = 0; VM_BUG_ON(!PageKsm(page)); VM_BUG_ON(!PageLocked(page)); @@ -2028,9 +2046,20 @@ again: struct anon_vma *anon_vma = rmap_item->anon_vma; struct anon_vma_chain *vmac; struct vm_area_struct *vma; + struct rb_root rb_root; + + if (!search_from_root) { + if (anon_vma) + rb_root = anon_vma->rb_root; + } + else { + if (anon_vma && anon_vma->root) { + rb_root = anon_vma->root->rb_root; + } + } anon_vma_lock_read(anon_vma); - anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root, + anon_vma_interval_tree_foreach(vmac, &rb_root, 0, ULONG_MAX) { vma = vmac->vma; if (rmap_item->address < vma->vm_start || @@ -2056,6 +2085,11 @@ again: } if (!search_new_forks++) goto again; + + if(!search_from_root++) { + search_new_forks = 0; + goto again; + } out: return ret; } @@ -2068,6 +2102,7 @@ int rmap_walk_ksm(struct page *page, int (*rmap_one)(struct page *, struct rmap_item *rmap_item; int ret = SWAP_AGAIN; int search_new_forks = 0; + int search_from_root = 0; VM_BUG_ON(!PageKsm(page)); VM_BUG_ON(!PageLocked(page)); @@ -2080,9 +2115,21 @@ again: struct anon_vma *anon_vma = rmap_item->anon_vma; struct anon_vma_chain *vmac; struct vm_area_struct *vma; + struct rb_root rb_root; + + if (!search_from_root) { + if (anon_vma) + rb_root = anon_vma->rb_root; + } + else { + if (anon_vma && anon_vma->root) { + rb_root = anon_vma->root->rb_root; + } + } + anon_vma_lock_read(anon_vma); - anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root, + anon_vma_interval_tree_foreach(vmac, &rb_root, 0, ULONG_MAX) { vma = vmac->vma; if (rmap_item->address < vma->vm_start || @@ -2107,6 +2154,11 @@ again: } if (!search_new_forks++) goto again; + + if (!search_from_root++) { + search_new_forks = 0; + goto again; + } out: return ret; } -- 1.8.2.1 -- Susheel Khiani QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation -- Susheel Khiani QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Question] ksm: rmap_item pointing to some stale vmas 2015-06-09 18:26 ` Susheel Khiani @ 2015-06-22 5:19 ` Susheel Khiani 0 siblings, 0 replies; 7+ messages in thread From: Susheel Khiani @ 2015-06-22 5:19 UTC (permalink / raw) To: Hugh Dickins Cc: akpm, peterz, neilb, dhowells, paulmcquad, linux-mm, linux-kernel On 6/9/2015 11:56 PM, Susheel Khiani wrote: > On 4/30/2015 11:37 AM, Susheel Khiani wrote: >>> But if I've misunderstood, and you think that what you're seeing >>> fits with the transient forking bugs I've (not quite) described, >>> and you can explain why even the transient case is important for >>> you to have fixed, then I really ought to redouble my efforts. >>> >>> Hugh > > I was able to root cause the issue as we got few instances of same and > was frequently getting reproducible on stress tests. The reason why it > was important was because failure to unmap ksm page was resulting into > CMA allocation failure for us. > > For cases like fork, what we observed is for private mapped file pages, > stable_node pointed by KSM page won't cover all the mappings until ksmd > completes one full scan. Only after ksmd scan, new rmap_items pointing > to mappings in child process would come into existence. So in cases like > CMA allocations where we can't wait for ksmd to complete one full cycle, > we can traverse anon_vma tree from parent's anon_vma to find out all the > pages wheres CMA is mapped. > > I have tested the following patch on 3.10 kernel and with this change I > am able to avoid CMA allocation failure which we were otherwise > frequently seeing because of not able to unmap KSM page. > > Please review and let me know the feedback. > > > > [PATCH] ksm: Traverse through parent's anon_vma while unmapping > > While doing try_to_unmap_ksm, we traverse through > rmap_item list to find out all the anon_vmas from which > page needs to be unmapped. > > Now as per the design of KSM, it builds up its data > structures by looking into each mm, and comes back a cycle > later to find out which data structures are now outdated and > needs to be updated. So, for cases like fork, what we > observe is for private mapped file pages stable_node > pointed by KSM page won't cover all the mappings until > ksmd completes one full scan. Only after ksmd scan, new > rmap_items pointing to mappings in child process would come > into existence. > > As a result unmapping of a stable page can't be done until > ksmd has completed one full scan. This becomes an issue in > case of CMA where we need to unmap and move a CMA page and > can't wait for ksmd to complete one cycle. Because of > new rmap_items for new mapping still not created we won't be > able to unmap CMA page from all the vmas where it is mapped. > This would result in frequent CMA allocation failures. > > So instead of just relying on rmap_items list which we know > can contain incomplete list, we also scan anon_vma tree from > parent's anon_vma to find out all the vmas where CMA page is > mapped and thereby successfully unmap the page and move it > to new page. > > Change-Id: I97cacf6a73734b10c7098362c20fb3f2d4040c76 > Signed-off-by: Susheel Khiani <skhiani@codeaurora.org> > --- > mm/ksm.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 55 insertions(+), 3 deletions(-) > > diff --git a/mm/ksm.c b/mm/ksm.c > index 11f6293..10d5266 100644 > --- a/mm/ksm.c > +++ b/mm/ksm.c > @@ -1956,6 +1956,7 @@ int page_referenced_ksm(struct page *page, struct > mem_cgroup *memcg, > unsigned int mapcount = page_mapcount(page); > int referenced = 0; > int search_new_forks = 0; > + int search_from_root = 0; > > VM_BUG_ON(!PageKsm(page)); > VM_BUG_ON(!PageLocked(page)); > @@ -1968,9 +1969,20 @@ again: > struct anon_vma *anon_vma = rmap_item->anon_vma; > struct anon_vma_chain *vmac; > struct vm_area_struct *vma; > + struct rb_root rb_root; > + > + if (!search_from_root) { > + if (anon_vma) > + rb_root = anon_vma->rb_root; > + } > + else { > + if (anon_vma && anon_vma->root) { > + rb_root = anon_vma->root->rb_root; > + } > + } > > anon_vma_lock_read(anon_vma); > - anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root, > + anon_vma_interval_tree_foreach(vmac, &rb_root, > 0, ULONG_MAX) { > vma = vmac->vma; > if (rmap_item->address < vma->vm_start || > @@ -1999,6 +2011,11 @@ again: > } > if (!search_new_forks++) > goto again; > + > + if (!search_from_root++) { > + search_new_forks = 0; > + goto again; > + } > out: > return referenced; > } > @@ -2010,6 +2027,7 @@ int try_to_unmap_ksm(struct page *page, enum > ttu_flags flags, > struct rmap_item *rmap_item; > int ret = SWAP_AGAIN; > int search_new_forks = 0; > + int search_from_root = 0; > > VM_BUG_ON(!PageKsm(page)); > VM_BUG_ON(!PageLocked(page)); > @@ -2028,9 +2046,20 @@ again: > struct anon_vma *anon_vma = rmap_item->anon_vma; > struct anon_vma_chain *vmac; > struct vm_area_struct *vma; > + struct rb_root rb_root; > + > + if (!search_from_root) { > + if (anon_vma) > + rb_root = anon_vma->rb_root; > + } > + else { > + if (anon_vma && anon_vma->root) { > + rb_root = anon_vma->root->rb_root; > + } > + } > > anon_vma_lock_read(anon_vma); > - anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root, > + anon_vma_interval_tree_foreach(vmac, &rb_root, > 0, ULONG_MAX) { > vma = vmac->vma; > if (rmap_item->address < vma->vm_start || > @@ -2056,6 +2085,11 @@ again: > } > if (!search_new_forks++) > goto again; > + > + if(!search_from_root++) { > + search_new_forks = 0; > + goto again; > + } > out: > return ret; > } > @@ -2068,6 +2102,7 @@ int rmap_walk_ksm(struct page *page, int > (*rmap_one)(struct page *, > struct rmap_item *rmap_item; > int ret = SWAP_AGAIN; > int search_new_forks = 0; > + int search_from_root = 0; > > VM_BUG_ON(!PageKsm(page)); > VM_BUG_ON(!PageLocked(page)); > @@ -2080,9 +2115,21 @@ again: > struct anon_vma *anon_vma = rmap_item->anon_vma; > struct anon_vma_chain *vmac; > struct vm_area_struct *vma; > + struct rb_root rb_root; > + > + if (!search_from_root) { > + if (anon_vma) > + rb_root = anon_vma->rb_root; > + } > + else { > + if (anon_vma && anon_vma->root) { > + rb_root = anon_vma->root->rb_root; > + } > + } > + > > anon_vma_lock_read(anon_vma); > - anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root, > + anon_vma_interval_tree_foreach(vmac, &rb_root, > 0, ULONG_MAX) { > vma = vmac->vma; > if (rmap_item->address < vma->vm_start || > @@ -2107,6 +2154,11 @@ again: > } > if (!search_new_forks++) > goto again; > + > + if (!search_from_root++) { > + search_new_forks = 0; > + goto again; > + } > out: > return ret; > } Reminder Ping, did you get a chance to look into the previous mail -- Susheel Khiani QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-06-22 5:19 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-04-09 14:05 [Question] ksm: rmap_item pointing to some stale vmas Susheel Khiani 2015-04-10 17:56 ` Hugh Dickins 2015-04-14 7:01 ` Susheel Khiani 2015-04-15 6:22 ` Hugh Dickins 2015-04-30 6:07 ` Susheel Khiani 2015-06-09 18:26 ` Susheel Khiani 2015-06-22 5:19 ` Susheel Khiani
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox