From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl0-f71.google.com (mail-pl0-f71.google.com [209.85.160.71]) by kanga.kvack.org (Postfix) with ESMTP id 48BCE6B0253 for ; Thu, 21 Dec 2017 02:49:19 -0500 (EST) Received: by mail-pl0-f71.google.com with SMTP id i12so11093454plk.5 for ; Wed, 20 Dec 2017 23:49:19 -0800 (PST) Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTPS id bf4si14565235plb.142.2017.12.20.23.49.17 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 20 Dec 2017 23:49:18 -0800 (PST) From: "Huang\, Ying" Subject: Re: [PATCH -V4 -mm] mm, swap: Fix race between swapoff and some swap operations References: <20171220012632.26840-1-ying.huang@intel.com> <20171221021619.GA27475@bbox> Date: Thu, 21 Dec 2017 15:48:56 +0800 In-Reply-To: <20171221021619.GA27475@bbox> (Minchan Kim's message of "Thu, 21 Dec 2017 11:16:19 +0900") Message-ID: <871sjopllj.fsf@yhuang-dev.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ascii Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Hugh Dickins , "Paul E . McKenney" , Johannes Weiner , Tim Chen , Shaohua Li , Mel Gorman , Jerome Glisse , Michal Hocko , Andrea Arcangeli , David Rientjes , Rik van Riel , Jan Kara , Dave Jiang , Aaron Lu Minchan Kim writes: > On Wed, Dec 20, 2017 at 09:26:32AM +0800, Huang, Ying wrote: >> From: Huang Ying >> >> When the swapin is performed, after getting the swap entry information >> from the page table, system will swap in the swap entry, without any >> lock held to prevent the swap device from being swapoff. This may >> cause the race like below, >> >> CPU 1 CPU 2 >> ----- ----- >> do_swap_page >> swapin_readahead >> __read_swap_cache_async >> swapoff swapcache_prepare >> p->swap_map = NULL __swap_duplicate >> p->swap_map[?] /* !!! NULL pointer access */ >> >> Because swapoff is usually done when system shutdown only, the race >> may not hit many people in practice. But it is still a race need to >> be fixed. >> >> To fix the race, get_swap_device() is added to check whether the >> specified swap entry is valid in its swap device. If so, it will keep >> the swap entry valid via preventing the swap device from being >> swapoff, until put_swap_device() is called. >> >> Because swapoff() is very race code path, to make the normal path runs >> as fast as possible, RCU instead of reference count is used to >> implement get/put_swap_device(). From get_swap_device() to >> put_swap_device(), the RCU read lock is held, so synchronize_rcu() in >> swapoff() will wait until put_swap_device() is called. >> >> In addition to swap_map, cluster_info, etc. data structure in the >> struct swap_info_struct, the swap cache radix tree will be freed after >> swapoff, so this patch fixes the race between swap cache looking up >> and swapoff too. >> >> Cc: Hugh Dickins >> Cc: Paul E. McKenney >> Cc: Minchan Kim >> Cc: Johannes Weiner >> Cc: Tim Chen >> Cc: Shaohua Li >> Cc: Mel Gorman >> Cc: "Jrme Glisse" >> Cc: Michal Hocko >> Cc: Andrea Arcangeli >> Cc: David Rientjes >> Cc: Rik van Riel >> Cc: Jan Kara >> Cc: Dave Jiang >> Cc: Aaron Lu >> Signed-off-by: "Huang, Ying" >> >> Changelog: >> >> v4: >> >> - Use synchronize_rcu() in enable_swap_info() to reduce overhead of >> normal paths further. > > Hi Huang, Hi, Minchan, > This version is much better than old. To me, it's due to not rcu, > srcu, refcount thing but it adds swap device dependency(i.e., get/put) > into every swap related functions so users who don't interested on swap > don't need to care of it. Good. > > The problem is caused by freeing by swap related-data structure > *dynamically* while old swap logic was based on static data > structure(i.e., never freed and the verify it's stale). > So, I reviewed some places where use PageSwapCache and swp_entry_t > which could make access of swap related data structures. > > A example is __isolate_lru_page > > It calls page_mapping to get a address_space. > What happens if the page is on SwapCache and raced with swapoff? > The mapping got could be disappeared by the race. Right? Yes. We should think about that. Considering the file cache pages, the address_space backing the file cache pages may be freed dynamically too. So to use page_mapping() return value for the file cache pages, some kind of locking is needed to guarantee the address_space isn't freed under us. Page may be locked, or under writeback, or some other locks need to be held, for example, page table lock, or lru_lock, etc. For __isolate_lru_page(), lru_lock will be held when it is called. And we will call synchronize_rcu() between clear PageSwapCache and free swap cache, so the usage of swap cache in __isolate_lru_page() should be safe. Do you think my analysis makes sense? Best Regards, Huang, Ying -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org