From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE8F8C433F5 for ; Tue, 4 Jan 2022 11:57:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 739EF6B0072; Tue, 4 Jan 2022 06:57:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6E7726B0073; Tue, 4 Jan 2022 06:57:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5FD8F6B0074; Tue, 4 Jan 2022 06:57:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0005.hostedemail.com [216.40.44.5]) by kanga.kvack.org (Postfix) with ESMTP id 4C8116B0072 for ; Tue, 4 Jan 2022 06:57:49 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 11FE78D8F6 for ; Tue, 4 Jan 2022 11:57:49 +0000 (UTC) X-FDA: 78992455458.23.B44C1F4 Received: from smtp-relay-internal-0.canonical.com (smtp-relay-internal-0.canonical.com [185.125.188.122]) by imf20.hostedemail.com (Postfix) with ESMTP id 35DA71C0005 for ; Tue, 4 Jan 2022 11:57:39 +0000 (UTC) Received: from mail-pj1-f70.google.com (mail-pj1-f70.google.com [209.85.216.70]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-0.canonical.com (Postfix) with ESMTPS id 6A7BF3F1EE for ; Tue, 4 Jan 2022 11:57:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1641297466; bh=TTSxXc2RKt32xiQ39M9Zcwu4JZ3xGkPo+vBRnWQ78JM=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=AQDYHu841+h8ErzW1tGyE+/CDByorSFDYML/YTlKCGiA6E7S9/1oy7XJS6km4WGB1 iiETyaWqbPWV0D5Q3Pcj9bzWJYoAZ712SxBvh0tI4TjZGL1W13E6jjzBCpLvKT2LDe qRGQd7nh3WtQ6J/1GaMN4BQILOru2kYz70sFHyC34Ee8ky11ftoj0d0ciurvp6KWVS FcqFaRwgXVQmp0rDCPMmqYlWHL7gq3jl0y7ixIr8ydXF9+jcGQp506KfmcZRdxdu5U PWZGSmMv2sKTkty/o5ghzE2HhNl/g6Xug5NQF86EuC9Vt+FBLIg3OMhcyCxQ9b/uRG ETP+BOX670YHQ== Received: by mail-pj1-f70.google.com with SMTP id f11-20020a17090a664b00b001b0fbffc9d6so24018987pjm.1 for ; Tue, 04 Jan 2022 03:57:45 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=TTSxXc2RKt32xiQ39M9Zcwu4JZ3xGkPo+vBRnWQ78JM=; b=k+Rxk1VsIAGQT7yvefqzhtmStwphDXxmEFXwrYIpHm+QWC4QLWxEoAnu1eZiVErJq8 tOBDWFngYKmnve39vkMTJniHvkIewoH4CGgg3lbTzqJ4la3SsAPR1k5BXdd1EzHtyJI7 OYgTiqNY5kWzyWfqPmt/z+H6iP1Hq39E6s6SXlUWu68hKPSbr1yIa4d67bErXBPLPju6 0hV0nyF/dEunEEQSKATMb7e/9Io3/vcmuK055Lg2fnlOn0T/neRNiZECDDmusnCVWrfP TPvNOQK9ins/TrNKhx3AR4BQSN0TrTCifKywE+k/zx6OvVe3EQDCCsMKaFOHF7I7aO0q Dwag== X-Gm-Message-State: AOAM5324BKFo2uCIqX46j50Np+bJgCdYygmHx3X8BoM48F3QdtvcP6T2 XCr1wnBkdU/KIQYPIccjBjK8XCJ7ZE9VKHtzNZNy7pWdOA5cmMlHZaWZATJ0caZKEdrmip++hEw YfwQ6AfTqOJBF4fZ+bQZ4gcBrYt8aysFwy1tkKKyj1mnh X-Received: by 2002:a17:902:dac7:b0:148:ea85:af4d with SMTP id q7-20020a170902dac700b00148ea85af4dmr48765132plx.131.1641297464070; Tue, 04 Jan 2022 03:57:44 -0800 (PST) X-Google-Smtp-Source: ABdhPJxU6C6dvrScVurb08yFmZeWpWrS8mICcJPRA1UJmcOFi10MVXzk4xNR4n9ktbQbCqN0fdbA9dFX1k1SJQXPjB0= X-Received: by 2002:a17:902:dac7:b0:148:ea85:af4d with SMTP id q7-20020a170902dac700b00148ea85af4dmr48765113plx.131.1641297463823; Tue, 04 Jan 2022 03:57:43 -0800 (PST) MIME-Version: 1.0 References: <20211211022115.1547617-1-mfo@canonical.com> In-Reply-To: From: Mauricio Faria de Oliveira Date: Tue, 4 Jan 2022 08:57:32 -0300 Message-ID: Subject: Re: [PATCH] mm: fix race between MADV_FREE reclaim and blkdev direct IO read To: Yang Shi Cc: Andrew Morton , Minchan Kim , Linux MM , linux-block@vger.kernel.org, Huang Ying , Miaohe Lin Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 35DA71C0005 X-Stat-Signature: 9dog3qbpb973fyforo1mayjoa76b8339 Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=canonical.com header.s=20210705 header.b=AQDYHu84; dmarc=pass (policy=none) header.from=canonical.com; spf=pass (imf20.hostedemail.com: domain of mauricio.oliveira@canonical.com designates 185.125.188.122 as permitted sender) smtp.mailfrom=mauricio.oliveira@canonical.com X-Rspamd-Server: rspam11 X-HE-Tag: 1641297459-881202 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Dec 17, 2021 at 3:51 PM Yang Shi wrote: > > On Fri, Dec 10, 2021 at 6:22 PM Mauricio Faria de Oliveira > wrote: ... > > MADV_FREE'd buffers: > > =================== > > > > So, back to the "if MADV_FREE pages are used as buffers" note. > > The case is arguable, and subject to multiple interpretations. > > > > The madvise(2) manual page on the MADV_FREE advice value says: > > - 'After a successful MADV_FREE ... data will be lost when > > the kernel frees the pages.' > > - 'the free operation will be canceled if the caller writes > > into the page' / 'subsequent writes ... will succeed and > > then [the] kernel cannot free those dirtied pages' > > - 'If there is no subsequent write, the kernel can free the > > pages at any time.' > > > > Thoughts, questions, considerations... > > - Since the kernel didn't actually free the page (page_ref_freeze() > > failed), should the data not have been lost? (on userspace read.) > > - Should writes performed by the direct IO read be able to cancel > > the free operation? > > - Should the direct IO read be considered as 'the caller' too, > > as it's been requested by 'the caller'? > > - Should the bio technique to dirty pages on return to userspace > > (bio_check_pages_dirty() is called/used by __blkdev_direct_IO()) > > be considered in another/special way here? > > - Should an upcoming write from a previously requested direct IO > > read be considered as a subsequent write, so the kernel should > > not free the pages? (as it's known at the time of page reclaim.) > > > > Technically, the last point would seem a reasonable consideration > > and balance, as the madvise(2) manual page apparently (and fairly) > > seem to assume that 'writes' are memory access from the userspace > > process (not explicitly considering writes from the kernel or its > > corner cases; again, fairly).. plus the kernel fix implementation > > for the corner case of the largely 'non-atomic write' encompassed > > by a direct IO read operation, is relatively simple; and it helps. ... > IIUC, you are expecting to get the old data after MADV_FREE? TBH, you > should not expect so at all after MADV_FREE since those pages may get > freed at any time. Hey, thanks for checking this. Correct; the discussion behind this is covered in the text above. It's indeed arguable, but the fix makes the behavior more consistent for the case of a direct IO read (rather than potentially returning zero-pages a bit randomly.) cheers, -- Mauricio Faria de Oliveira