From: Vinayak Menon <vinmenon@codeaurora.org>
To: Michal Hocko <mhocko@kernel.org>
Cc: minchan@kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] mm: fix the race between swapin_readahead and SWP_SYNCHRONOUS_IO path
Date: Tue, 3 Sep 2019 11:43:16 +0530 [thread overview]
Message-ID: <79303914-d6a6-011a-150f-74488c8e12f2@codeaurora.org> (raw)
In-Reply-To: <20190902132104.GJ14028@dhcp22.suse.cz>
Hi Michal,
Thanks for reviewing this.
On 9/2/2019 6:51 PM, Michal Hocko wrote:
> On Fri 30-08-19 18:13:31, Vinayak Menon wrote:
>> The following race is observed due to which a processes faulting
>> on a swap entry, finds the page neither in swapcache nor swap. This
>> causes zram to give a zero filled page that gets mapped to the
>> process, resulting in a user space crash later.
>>
>> Consider parent and child processes Pa and Pb sharing the same swap
>> slot with swap_count 2. Swap is on zram with SWP_SYNCHRONOUS_IO set.
>> Virtual address 'VA' of Pa and Pb points to the shared swap entry.
>>
>> Pa Pb
>>
>> fault on VA fault on VA
>> do_swap_page do_swap_page
>> lookup_swap_cache fails lookup_swap_cache fails
>> Pb scheduled out
>> swapin_readahead (deletes zram entry)
>> swap_free (makes swap_count 1)
>> Pb scheduled in
>> swap_readpage (swap_count == 1)
>> Takes SWP_SYNCHRONOUS_IO path
>> zram enrty absent
>> zram gives a zero filled page
> This sounds like a zram issue, right? Why is a generic swap path changed
> then?
I think zram entry being deleted by Pa and zram giving out a zeroed page to Pb is normal.
This is because zram avoids lazy swap slot freeing by implementing gendisk->fops->swap_slot_free_notify
and swap_slot_free_notify deletes the zram entry because the page is in swapcache.
The issue is that Pb attempted a swapcache lookup before Pa brought the page to swapcache, and failed. If
Pb had taken the swapin_readahead path, __read_swap_cache_async would have performed a second lookup
and found the page in swapcache. The issue here is that due to the lookup failure and swap_count being 1,
it takes the SWP_SYNCHRONOUS_IO path and does a direct read which is bound to fail. So it seems to me as
a problem in the way SWP_SYNCHRONOUS_IO is handled in do_swap_page, and not a problem with zram.
Any swap device that sets SWP_SYNCHRONOUS_IO and implements swap_slot_free_notify can hit this bug.
do_swap_page first brings in the page to swapcache and then decrements the swap_count, and SWP_SYNCHRONOUS_IO
code in do_swap_page performs the swapcache and swap_count checks in the same order. Due to thread preemption
described in the sequence above, it can happen that the SWP_SYNCHRONOUS_IO path fails in swapcache check, but sees
the swap_count decremented later, thus missing a valid swapcache entry.
I have not tested, but the following patch may also fix the issue.
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 063c0c1..a5ca05f 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -463,6 +463,7 @@ extern sector_t map_swap_page(struct page *, struct block_device **);
extern sector_t swapdev_block(int, pgoff_t);
extern int page_swapcount(struct page *);
extern int __swap_count(swp_entry_t entry);
+extern bool __swap_has_cache(swp_entry_t entry);
extern int __swp_swapcount(swp_entry_t entry);
extern int swp_swapcount(swp_entry_t entry);
extern struct swap_info_struct *page_swap_info(struct page *);
@@ -589,6 +590,11 @@ static inline int __swap_count(swp_entry_t entry)
return 0;
}
+static bool __swap_has_cache(swp_entry_t entry)
+{
+ return 0;
+}
+
static inline int __swp_swapcount(swp_entry_t entry)
{
return 0;
diff --git a/mm/memory.c b/mm/memory.c
index e0c232f..a13511f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2778,7 +2778,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
struct swap_info_struct *si = swp_swap_info(entry);
if (si->flags & SWP_SYNCHRONOUS_IO &&
- __swap_count(entry) == 1) {
+ __swap_count(entry) == 1 &&
+ !__swap_has_cache(entry)) {
/* skip swapcache */
page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma,
vmf->address);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 80445f4..2a1554a8 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1459,6 +1459,20 @@ int __swap_count(swp_entry_t entry)
return count;
}
+bool __swap_has_cache(swp_entry_t entry)
+{
+ struct swap_info_struct *si;
+ pgoff_t offset = swp_offset(entry);
+ bool has_cache = false;
+
+ si = get_swap_device(entry);
+ if (si) {
+ has_cache = !!(si->swap_map[offset] & SWAP_HAS_CACHE);
+ put_swap_device(si);
+ }
+ return has_cache;
+}
+
static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry)
{
int count = 0;
>
>> Fix this by reading the swap_count before lookup_swap_cache, which conforms
>> with the order in which page is added to swap cache and swap count is
>> decremented in do_swap_page. In the race case above, this will let Pb take
>> the readahead path and thus pick the proper page from swapcache.
>>
>> Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
next prev parent reply other threads:[~2019-09-03 6:13 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-30 12:43 Vinayak Menon
2019-09-02 13:21 ` Michal Hocko
2019-09-03 6:13 ` Vinayak Menon [this message]
2019-09-03 11:41 ` Michal Hocko
2019-09-03 12:17 ` Vinayak Menon
2019-09-09 4:05 ` Vinayak Menon
2019-09-09 11:23 ` Michal Hocko
2019-09-09 23:26 ` Minchan Kim
2019-09-10 8:22 ` Vinayak Menon
2019-09-10 17:51 ` Minchan Kim
2019-09-11 10:07 ` Vinayak Menon
2019-09-12 17:14 ` Minchan Kim
2019-09-13 9:05 ` Vinayak Menon
2019-09-16 20:05 ` Minchan Kim
2019-09-17 5:38 ` Vinayak Menon
2019-09-18 1:12 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=79303914-d6a6-011a-150f-74488c8e12f2@codeaurora.org \
--to=vinmenon@codeaurora.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=minchan@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox