Hi Matthew, I don't believe execution of unmerge_and_remove_all_rmap_items() after an mm is misplaced is guaranteed. Consider the following interleaving: Thread A executes *__ksm_enter* with KSM_RUN_MERGE set through the check on https://elixir.bootlin.com/linux/v5.18-rc5/source/mm/ksm.c#L2501 Thread B executes *run_store* and sets KSM_RUN_UNMERGE and then also executes unmerge_and_remove_all_rmap_items() on https://elixir.bootlin.com/linux/v5.18-rc5/source/mm/ksm.c#L2900 Thread A completes *__ksm_enter *and misplaces the mm behind the scanning cursor since it is still on the KSM_RUN_MERGE path on https://elixir.bootlin.com/linux/v5.18-rc5/source/mm/ksm.c#L2504 I also noticed through manual inspection another check that appears racy of the KSM_RUN_UNMERGE flag on https://elixir.bootlin.com/linux/v5.18-rc5/source/mm/ksm.c#L2563 Best, Gabe On Tue, Aug 2, 2022 at 11:45 AM Matthew Wilcox wrote: > On Tue, Aug 02, 2022 at 11:15:50PM +0800, Kefeng Wang wrote: > > The ksm_run is alread protected by ksm_thread_mutex in run_store, we > > could add this lock in __ksm_enter() to avoid the above issue. > > I don't think this is a great fix. Why not protect the store with > ksm_mmlist_lock? ie: > > mutex_lock(&ksm_thread_mutex); > wait_while_offlining(); > if (ksm_run != flags) { > + spin_lock(&ksm_mmlist_lock); > ksm_run = flags; > + spin_unlock(&ksm_mmlist_lock); > if (flags & KSM_RUN_UNMERGE) { > set_current_oom_origin(); > err = unmerge_and_remove_all_rmap_items(); > clear_current_oom_origin(); > if (err) { > + spin_lock(&ksm_mmlist_lock); > ksm_run = KSM_RUN_STOP; > + spin_unlock(&ksm_mmlist_lock); > ... > > (I also don't think this is a real bug, because the call to > unmerge_and_remove_all_rmap_items() will "cure" the misplacement of > items in the list, but there's value in shutting up the tools, I suppose) >