From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7095DC433F5 for ; Tue, 11 Jan 2022 18:54:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D92B16B00AC; Tue, 11 Jan 2022 13:54:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D41C16B00AD; Tue, 11 Jan 2022 13:54:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C08B36B00AE; Tue, 11 Jan 2022 13:54:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B5DB56B00AC for ; Tue, 11 Jan 2022 13:54:13 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 625EA181A346A for ; Tue, 11 Jan 2022 18:54:13 +0000 (UTC) X-FDA: 79018906386.13.B960B95 Received: from mail-pj1-f51.google.com (mail-pj1-f51.google.com [209.85.216.51]) by imf29.hostedemail.com (Postfix) with ESMTP id 15F06120014 for ; Tue, 11 Jan 2022 18:54:12 +0000 (UTC) Received: by mail-pj1-f51.google.com with SMTP id c14-20020a17090a674e00b001b31e16749cso6973082pjm.4 for ; Tue, 11 Jan 2022 10:54:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=bwERjAnR0E8SZooIPfKQaM9IMhOZ/X6J9PXJ2PYB2W4=; b=X+HSsF3lEwmP2DUMlMwfhkC568VHYqa49c6EpGGf0TPOOP2Fh0B8GeJiC/oGwRf+EY T4YPAI5cgP7J7/RbbPaFT9wr1SL3l+SBOl5Ni3TipuT9+KCeJMDBQREemlM2x3rUu7gE XAnKop5vKlzUXc3r/53bUCZvxhDGsBp7FMHSwp2fFJTgwIYl8kYyElEKzHd/5ahJu+xv R38qjcBCj53z06E7r1Kgpwlnsestr1B6Wi1gBeyIjHUz6+CWychPUgfr+QJVugDL4M9b +wt2f6kf4a1eKHK10RGr+ul5Y3S+cQ7/rWxvxPzWoBkBTbxvYr6mG1y+cuw3SCi0xCVd Q6dA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=bwERjAnR0E8SZooIPfKQaM9IMhOZ/X6J9PXJ2PYB2W4=; b=DrcRQQyh88OojuQ1D0JBWMF9FLqI3Mx/lvtZsL9hYyc/8hn+3ock2d4izpn5D2+Pyg FYl8birj0gyt9fajf5OCm86j1dSULHosNPZY4fPyISICH7yHbV+KpjySgU6GWVThhpjQ RvOID/89dJpCv1F6xQC8OkdURL6/OBFLd25d05PRqFscMW5ZVuEHfmTGGDw7GPSUpbDq H+/kXVbzVXUTHInMJch86qUkth7UCcz8onNdoxbW6xWB4bfYc1ygbfM9psFhHTB6S/KU yxqqNA0J/F0+XlTIoiZCUGUgw9UCOE6O0dzQgDHzvObxWZT9vWDvuUD1uuJaouY9bTxz XXaQ== X-Gm-Message-State: AOAM530kqqbctD6vWlq1vV0fveDbKK6nbxnYjUd1+pwkyJ5mk3YtnwYF lHAMmXRDd7a9tQWfAAq2Fqw= X-Google-Smtp-Source: ABdhPJzJsABTOD6BKQzdvZCLEq0qLqut3UGKMBl3QzPZX6WpUBi2lpe5kSt5q/I9beJKFV59g+5+Ew== X-Received: by 2002:a17:902:ea07:b0:14a:45c0:78a7 with SMTP id s7-20020a170902ea0700b0014a45c078a7mr5725617plg.92.1641927251994; Tue, 11 Jan 2022 10:54:11 -0800 (PST) Received: from google.com ([2620:15c:211:201:4f0e:ffc8:3f7b:ac89]) by smtp.gmail.com with ESMTPSA id g11sm1124958pfv.136.2022.01.11.10.54.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jan 2022 10:54:11 -0800 (PST) Date: Tue, 11 Jan 2022 10:54:09 -0800 From: Minchan Kim To: Yu Zhao Cc: Mauricio Faria de Oliveira , Andrew Morton , linux-mm@kvack.org, linux-block@vger.kernel.org, Huang Ying , Miaohe Lin , Yang Shi , John Hubbard Subject: Re: [PATCH v2] mm: fix race between MADV_FREE reclaim and blkdev direct IO read Message-ID: References: <20220105233440.63361-1-mfo@canonical.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 15F06120014 X-Stat-Signature: 3obp3eynhhna6497cms57zcgmapgmphf Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=X+HSsF3l; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none); spf=pass (imf29.hostedemail.com: domain of minchan.kim@gmail.com designates 209.85.216.51 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com X-Rspamd-Server: rspam03 X-HE-Tag: 1641927252-758269 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jan 10, 2022 at 11:48:13PM -0700, Yu Zhao wrote: > On Wed, Jan 05, 2022 at 08:34:40PM -0300, Mauricio Faria de Oliveira wrote: > > diff --git a/mm/rmap.c b/mm/rmap.c > > index 163ac4e6bcee..8671de473c25 100644 > > --- a/mm/rmap.c > > +++ b/mm/rmap.c > > @@ -1570,7 +1570,20 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > > > > /* MADV_FREE page check */ > > if (!PageSwapBacked(page)) { > > - if (!PageDirty(page)) { > > + int ref_count = page_ref_count(page); > > + int map_count = page_mapcount(page); > > + > > + /* > > + * The only page refs must be from the isolation > > + * (checked by the caller shrink_page_list() too) > > + * and one or more rmap's (dropped by discard:). > > + * > > + * Check the reference count before dirty flag > > + * with memory barrier; see __remove_mapping(). > > + */ > > + smp_rmb(); > > + if ((ref_count - 1 == map_count) && > > + !PageDirty(page)) { > > /* Invalidate as we cleared the pte */ > > mmu_notifier_invalidate_range(mm, > > address, address + PAGE_SIZE); > > Out of curiosity, how does it work with COW in terms of reordering? > Specifically, it seems to me get_page() and page_dup_rmap() in > copy_present_pte() can happen in any order, and if page_dup_rmap() > is seen first, and direct io is holding a refcnt, this check can still > pass? > Hi Yu, I think you're correct. I think we don't like memory barrier there in page_dup_rmap. Then, how about make gup_fast is aware of FOLL_TOUCH? FOLL_TOUCH means it's going to write something so the page should be dirty. Currently, get_user_pages works like that. Howver, problem is get_user_pages_fast since it looks like that lockless_pages_from_mm doesn't support FOLL_TOUCH. So the idea is if param in internal_get_user_pages_fast includes FOLL_TOUCH, gup_{pmd,pte}_range try to make the page dirty under trylock_page(If the lock fails, it goes slow path with __gup_longterm_unlocked and set_dirty_pages for them). This approach would solve other cases where map userspace pages into kernel space and then write. Since the write didn't go through with the process's page table, we will lose the dirty bit in the page table of the process and it turns out same problem. That's why I'd like to approach this. If it doesn't work, the other option to fix this specific case is can't we make pages dirty in advance in DIO read-case? When I look at DIO code, it's already doing in async case. Could't we do the same thing for the other cases? I guess the worst case we will see would be more page writeback since the page becomes dirty unnecessary.