From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71E5AC433F5 for ; Wed, 12 Jan 2022 17:33:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EA32C6B01AE; Wed, 12 Jan 2022 12:33:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E52F26B01B1; Wed, 12 Jan 2022 12:33:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CCBDF6B01B3; Wed, 12 Jan 2022 12:33:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0008.hostedemail.com [216.40.44.8]) by kanga.kvack.org (Postfix) with ESMTP id A5EFE6B01AE for ; Wed, 12 Jan 2022 12:33:37 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 1DDE818086CA1 for ; Wed, 12 Jan 2022 17:33:37 +0000 (UTC) X-FDA: 79022332074.19.5496D52 Received: from mail-pj1-f48.google.com (mail-pj1-f48.google.com [209.85.216.48]) by imf18.hostedemail.com (Postfix) with ESMTP id E5C3C1C001A for ; Wed, 12 Jan 2022 17:33:35 +0000 (UTC) Received: by mail-pj1-f48.google.com with SMTP id b1-20020a17090a990100b001b14bd47532so6266754pjp.0 for ; Wed, 12 Jan 2022 09:33:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=QTT6XNlZaHloaWKh9RlDBcbi7e/Mx29t7wQ9qiCr9Wk=; b=McJ9rx6ho4q7bQA5UEc5Hl+SPzzXrSF17SbhpQ9arHbQBNEEZqy5jlQaRdedoCU+fb VU0nKy9XA71b2jesQEk9fGOzp7AddLJC4uxRqwulc2+EHQbNX8r3QdRtMnI4HuAk30HK NNsPKgd0VY9SEUP3yZnIy1nyo6rH9Vz0cfGe/nxlPBqipEB2RkFd5twZ5+QDnS6VTGO0 p6wIEer4taQvMZJEbEsB26Z0OZiztq97Isph2FaOsF5lptQ3GnDlkv1dsDpCRYB3S98J kentlDg19ZSanmjpfQQsOIUnAINkHq4ylwi0oygiwnouXEzZJSo0jOHS88LVV5iBjTs8 NUfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=QTT6XNlZaHloaWKh9RlDBcbi7e/Mx29t7wQ9qiCr9Wk=; b=UF6jXoy7iQYdZSfXoG7BvI0bM0hd4BRQU/TS6gcAjtQFB82cQXtiJZcy3kqgAvGZUq eE5hdwaAj+nWiPxxz1JJ6Yt3oPFw+En5i28nVlXmEHZKc1Si0gtlya5+NVbjsvUZCCle DpTOSRQhH5mrZkDbyVAQXX1XV9XZ5bFDKzHWwcLrcZY+GGOvimPL/UlHBtorEBtixDti IARKncZK6f2Nq+9ASa8/PMe9qS+YakO137BKwnFIROBKdJeQy1WR9KK+roU4FWoJTNXY +Czu4eF9fLIucsTViHh6r/L/rO2++WIQEbKmUCJKGy/gltq4epNPOUjVDFb5qOy8bU5V JMVw== X-Gm-Message-State: AOAM532BY98QDzVmds3KejS7tzzyotEjD9uSjKaVMGTyZnjbmF2lWHGg oo0aTURTtIl5eaMxthucweU= X-Google-Smtp-Source: ABdhPJwKijslZUpfhxrNVx/g1iCkKFSqKCsY14+CZzSIGCrvJV9N8a5p2muRe42uf+WJol2H6N8G3g== X-Received: by 2002:a17:90b:4a47:: with SMTP id lb7mr662190pjb.126.1642008814586; Wed, 12 Jan 2022 09:33:34 -0800 (PST) Received: from google.com ([2620:15c:211:201:b6c7:c163:623d:56bc]) by smtp.gmail.com with ESMTPSA id y31sm243277pfa.67.2022.01.12.09.33.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Jan 2022 09:33:34 -0800 (PST) Date: Wed, 12 Jan 2022 09:33:32 -0800 From: Minchan Kim To: "Huang, Ying" Cc: Yu Zhao , Mauricio Faria de Oliveira , Andrew Morton , linux-mm@kvack.org, linux-block@vger.kernel.org, Miaohe Lin , Yang Shi Subject: Re: [PATCH v2] mm: fix race between MADV_FREE reclaim and blkdev direct IO read Message-ID: References: <20220105233440.63361-1-mfo@canonical.com> <87v8ypybdc.fsf@yhuang6-desk2.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87v8ypybdc.fsf@yhuang6-desk2.ccr.corp.intel.com> X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: E5C3C1C001A X-Stat-Signature: yto98qdtj8n347w8nojpsedqg585skad Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=McJ9rx6h; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none); spf=pass (imf18.hostedemail.com: domain of minchan.kim@gmail.com designates 209.85.216.48 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com X-HE-Tag: 1642008815-391566 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jan 12, 2022 at 09:46:23AM +0800, Huang, Ying wrote: > Yu Zhao writes: > > > On Wed, Jan 05, 2022 at 08:34:40PM -0300, Mauricio Faria de Oliveira wrote: > >> diff --git a/mm/rmap.c b/mm/rmap.c > >> index 163ac4e6bcee..8671de473c25 100644 > >> --- a/mm/rmap.c > >> +++ b/mm/rmap.c > >> @@ -1570,7 +1570,20 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > >> > >> /* MADV_FREE page check */ > >> if (!PageSwapBacked(page)) { > >> - if (!PageDirty(page)) { > >> + int ref_count = page_ref_count(page); > >> + int map_count = page_mapcount(page); > >> + > >> + /* > >> + * The only page refs must be from the isolation > >> + * (checked by the caller shrink_page_list() too) > >> + * and one or more rmap's (dropped by discard:). > >> + * > >> + * Check the reference count before dirty flag > >> + * with memory barrier; see __remove_mapping(). > >> + */ > >> + smp_rmb(); > >> + if ((ref_count - 1 == map_count) && > >> + !PageDirty(page)) { > >> /* Invalidate as we cleared the pte */ > >> mmu_notifier_invalidate_range(mm, > >> address, address + PAGE_SIZE); > > > > Out of curiosity, how does it work with COW in terms of reordering? > > Specifically, it seems to me get_page() and page_dup_rmap() in > > copy_present_pte() can happen in any order, and if page_dup_rmap() > > is seen first, and direct io is holding a refcnt, this check can still > > pass? > > I think that you are correct. > > After more thoughts, it appears very tricky to compare page count and > map count. Even if we have added smp_rmb() between page_ref_count() and > page_mapcount(), an interrupt may happen between them. During the > interrupt, the page count and map count may be changed, for example, > unmapped, or do_swap_page(). Yeah, it happens but what specific problem are you concerning from the count change under race? The fork case Yu pointed out was already known for breaking DIO so user should take care not to fork under DIO(Please look at O_DIRECT section in man 2 open). If you could give a specific example, it would be great to think over the issue. I agree it's little tricky but it seems to be way other place has used for a long time(Please look at write_protect_page in ksm.c). So, here what we missing is tlb flush before the checking. Something like this. diff --git a/mm/rmap.c b/mm/rmap.c index b0fd9dc19eba..b4ad9faa17b2 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1599,18 +1599,8 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, /* MADV_FREE page check */ if (!PageSwapBacked(page)) { - int refcount = page_ref_count(page); - - /* - * The only page refs must be from the isolation - * (checked by the caller shrink_page_list() too) - * and the (single) rmap (dropped by discard:). - * - * Check the reference count before dirty flag - * with memory barrier; see __remove_mapping(). - */ - smp_rmb(); - if (refcount == 2 && !PageDirty(page)) { + if (!PageDirty(page) && + page_mapcount(page) + 1 == page_count(page)) { /* Invalidate as we cleared the pte */ mmu_notifier_invalidate_range(mm, address, address + PAGE_SIZE); diff --git a/mm/vmscan.c b/mm/vmscan.c index f3162a5724de..6454ff5c576f 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1754,6 +1754,9 @@ static unsigned int shrink_page_list(struct list_head *page_list, enum ttu_flags flags = TTU_BATCH_FLUSH; bool was_swapbacked = PageSwapBacked(page); + if (!was_swapbacked && PageAnon(page)) + flags &= ~TTU_BATCH_FLUSH; + if (unlikely(PageTransHuge(page))) flags |= TTU_SPLIT_HUGE_PMD;