From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EBDB9C433F5 for ; Tue, 4 Jan 2022 23:32:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3DD996B0072; Tue, 4 Jan 2022 18:32:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 38E2F6B0073; Tue, 4 Jan 2022 18:32:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 22EFE6B0074; Tue, 4 Jan 2022 18:32:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0184.hostedemail.com [216.40.44.184]) by kanga.kvack.org (Postfix) with ESMTP id 12E866B0072 for ; Tue, 4 Jan 2022 18:32:53 -0500 (EST) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id CACED8249980 for ; Tue, 4 Jan 2022 23:32:52 +0000 (UTC) X-FDA: 78994206984.10.7F117E7 Received: from smtp-relay-internal-0.canonical.com (smtp-relay-internal-0.canonical.com [185.125.188.122]) by imf20.hostedemail.com (Postfix) with ESMTP id CA0E21C0006 for ; Tue, 4 Jan 2022 23:32:42 +0000 (UTC) Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-0.canonical.com (Postfix) with ESMTPS id 50CF1405F6 for ; Tue, 4 Jan 2022 23:32:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1641339170; bh=2Jq5YA9QElG6IgLbn0P5uha5jwOm4n0q6iN1EfoB0uU=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=rtwO/iqnPILQjw2ToUPr/VMtHTiFxRDTjyMlWfZ4gU8376ieLQ7khGaPbqwIEutdE hY8SMJifHhjRk2N2OKJDcRXamssrAnI4LDyWuPBKMK6AWhPI52ipQ3geELX48T7lmV U79Rcntb8sJaPyg/pN3I1fOm2ZrzjF/w1IlCXMZ9hgyS9mja31EMEnBHaIu3jF8qDc SoYwAbFeOVwlwIgOdlfzkDkbRB5qSvfcJakWnNFicvt5nAkIOn7w2nNyjylODAigQw 3rr31CaaAOhLAbvigKQ++8rc88cY0v+nkV11EW8/ootZK7M/wDVNcHipXkkqaDin5U GhMCkgG4Ebnag== Received: by mail-pf1-f198.google.com with SMTP id e10-20020a62ee0a000000b004bbfb89d2ffso12127989pfi.7 for ; Tue, 04 Jan 2022 15:32:50 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=2Jq5YA9QElG6IgLbn0P5uha5jwOm4n0q6iN1EfoB0uU=; b=Mta6X8CbOcv+lgpnVchs3la05F5ISbLnFjsGXoJ3Tv2HdwLk1OVzKJy/07x1RzrQh0 9tvlRA8X7BKPxpYW9Fr4WNysCqf7wHCTiLNIjfa0T8ls8isu3z09M6b5Q0GHIkhPENBP hDTBg2RhO1kXFKDNiWQlMxU9vYUqH6FbE5TDuQfd96Ye01TLF2jEPQTwaTa5VTVl+QT5 RWWM326f2LK6kRuDtcg0C+opsZYPzQu+h2CiVDqlVXz8Hxv4HAABAvNvwTaBavOhGnwF Lm5kS3ihtt7brmp62VJHh07lHSPlwvC/ckce9Tjp56NlxDbHRKUGa3ZWBMuXJScyFF3r IatA== X-Gm-Message-State: AOAM532DjGLf9hxTqGxmwiphRoYWxynJ2PaJTdvOTjdPEaI1X/qAgq+D rSnTyTjalnPjCXpZQAJooTYN3uJyH2ZdDz5u59U1K0dt4gawss5BNHBV3r7VEjEPZEJFfRUrQek AeEb1BDR+BQWG8AFFuXop2o9W6hFRA4OKp5D8AFl5x6GD X-Received: by 2002:a17:90b:4a86:: with SMTP id lp6mr854998pjb.57.1641339168766; Tue, 04 Jan 2022 15:32:48 -0800 (PST) X-Google-Smtp-Source: ABdhPJzg0a00iFZhBsqRdaTn5SYJknz35E9l87qVAr0K6JBncmlx7dgKFy2VV/Hs6sKTQUegi36wH8UHBQGtisvDlOE= X-Received: by 2002:a17:90b:4a86:: with SMTP id lp6mr854970pjb.57.1641339168489; Tue, 04 Jan 2022 15:32:48 -0800 (PST) MIME-Version: 1.0 References: <20211211022115.1547617-1-mfo@canonical.com> In-Reply-To: From: Mauricio Faria de Oliveira Date: Tue, 4 Jan 2022 20:32:36 -0300 Message-ID: Subject: Re: [PATCH] mm: fix race between MADV_FREE reclaim and blkdev direct IO read To: Minchan Kim Cc: Andrew Morton , linux-mm@kvack.org, linux-block@vger.kernel.org, Huang Ying , Miaohe Lin Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=canonical.com header.s=20210705 header.b="rtwO/iqn"; spf=pass (imf20.hostedemail.com: domain of mauricio.oliveira@canonical.com designates 185.125.188.122 as permitted sender) smtp.mailfrom=mauricio.oliveira@canonical.com; dmarc=pass (policy=none) header.from=canonical.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: CA0E21C0006 X-Stat-Signature: kqpiid16883k3fsazrj81n937fg3rjoy X-HE-Tag: 1641339162-348375 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 4, 2022 at 8:06 PM Minchan Kim wrote: > > On Tue, Jan 04, 2022 at 08:46:17AM -0300, Mauricio Faria de Oliveira wrote: > > On Thu, Dec 16, 2021 at 3:17 PM Minchan Kim wrote: > > ... > > > Hi Mauricio, > > > > > > Thanks for catching the bug. There is some comment before I would > > > look the problem in more detail. Please see below. > > > > > > > Hey! Thanks for looking into this. Sorry for the delay; I've been out > > a few weeks. > > > > > > diff --git a/mm/rmap.c b/mm/rmap.c > > > > index 163ac4e6bcee..f04151aae03b 100644 > > > > --- a/mm/rmap.c > > > > +++ b/mm/rmap.c > > > > @@ -1570,7 +1570,18 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > > > > > > > > /* MADV_FREE page check */ > > > > if (!PageSwapBacked(page)) { > > > > - if (!PageDirty(page)) { > > > > + int refcount = page_ref_count(page); > > > > + > > > > + /* > > > > + * The only page refs must be from the isolation > > > > + * (checked by the caller shrink_page_list() too) > > > > + * and the (single) rmap (dropped by discard:). > > > > + * > > > > + * Check the reference count before dirty flag > > > > + * with memory barrier; see __remove_mapping(). > > > > + */ > > > > + smp_rmb(); > > > > + if (refcount == 2 && !PageDirty(page)) { > > > > > > A madv_free marked page could be mapped at several processes so > > > it wouldn't be refcount two all the time, I think. > > > Shouldn't we check it with page_mapcount with page_refcount? > > > > > > page_ref_count(page) - 1 > page_mapcount(page) > > > > > > > It's the other way around, isn't it? The madvise(MADV_FREE) call only clears > > the page dirty flag if page_mapcount() == 1 (ie not mapped by more processes). > > > > @ madvise_free_pte_range() > > > > /* > > * If page is shared with others, we couldn't clear > > * PG_dirty of the page. > > */ > > if (page_mapcount(page) != 1) { > > unlock_page(page); > > continue; > > } > > ... > > ClearPageDirty(page); > > unlock_page(page); > > > > > > If that's right, the refcount of 2 should be OK (one from the isolation, > > another one from the single map/one process.) > > > > Does that make sense? I might be missing something. > > A child process could be forked from parent after madvise so the page > mapcount of the hinted page could have elevated refcount. Thanks for clarifying. I was indeed missing something. :) I revisited your commit 854e9ed09ded ("mm: support madvise(MADV_FREE)"), and now realized that the restriction for map count == 1 is only for _cleaning_ the dirty flag, not for discarding the (clean) page mapping later on. """ To solve the problem, this patch clears PG_dirty if only the page is owned exclusively by current process when madvise is called because PG_dirty represents ptes's dirtiness in several processes so we could clear it only if we own it exclusively. """ I'll look at this tomorrow and submit a v2. Thanks! -- Mauricio Faria de Oliveira