From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0EF5DC433EF for ; Tue, 15 Mar 2022 07:53:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 36AA28D0002; Tue, 15 Mar 2022 03:53:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F3578D0001; Tue, 15 Mar 2022 03:53:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 193D58D0002; Tue, 15 Mar 2022 03:53:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 063E38D0001 for ; Tue, 15 Mar 2022 03:53:23 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id AE52722EDC for ; Tue, 15 Mar 2022 07:53:22 +0000 (UTC) X-FDA: 79245855444.09.5AAD4B8 Received: from mail-yb1-f171.google.com (mail-yb1-f171.google.com [209.85.219.171]) by imf14.hostedemail.com (Postfix) with ESMTP id 83880100004 for ; Tue, 15 Mar 2022 07:53:21 +0000 (UTC) Received: by mail-yb1-f171.google.com with SMTP id y142so1396160ybe.11 for ; Tue, 15 Mar 2022 00:53:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=9fx9f/dGI0jG5qMZR80BsY4AEIW3adegKwssDhY/VxU=; b=X3kXd8IgVsb6phSuLDtCPMN6ZSj5LyxjrsIZNHgQ1y74uWx+8ke/pSvk5/wrAfN+AV lwpxSb95g8tsy1IKUKROLcWljPhTXP8G8CdfQvYCYBCD4WKDaNM1W7gd26JURACTcKTK U3+872+E2nf2ZnX6YFLLKzHo1LqglmwIRHhnzsh/9xpK3F76XwIov/1a/2/lc3AjvK3y WJ3CsWyLfDSnxyYDTEkXFcjTEtAy0l249WxSCqLrNslHL2R5ohpXwQGVELyaSdGfbjGW r8S2HA/zPdzeDW0xO6HOu5NqHdJUdjPlrWCCjkMuxlduawfSy2iACKrnAHZi4WI2RqB9 NLBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=9fx9f/dGI0jG5qMZR80BsY4AEIW3adegKwssDhY/VxU=; b=Pl+KrVCsRTuNqTC75JVXhVX+fThBuxzX7odjv9GcVBdXD6KR/Ero8/wNTLzkFEs5pM smA5TH0sDay7cao2p7Ink2z2MtLPR996uUIwQs/zzMJyMH8cc9VFG4eBgiuI/mlZxoL1 ia6koMiq6F1BEdg1ISDPU0hf0jIStjm6VV+3JUS02Vlj752T6UznLpDaFscON8awolKl 1DQNeu4QcGxm0vpTBWoVUhiKtUM4Txcs248E3ttPrLPx2WYiJNxhR2h6iZjFCTwwAL/v V4tBnkCTmheEHspVLJF8RsQ1BWM+v1e5qrZeFcBXGpdEp9fppFTMJaWMf0LfSPeVBd1g gStg== X-Gm-Message-State: AOAM531Rj8GqL2scY5sU4emOGSmpVG4eRj2YsgB4tnzIYHqJjnBT+INq NzmfLwfhLNkWesd6hi/+OgFv8h7mr2nA61VW6NeOmg== X-Google-Smtp-Source: ABdhPJxOY+Gd7fbyVQ1ZM8TF+hRcSWHjS8y0g3Rs/rRlJp/i6LSmbPejo0CMOLJoPxKwDrAWMN6zwhZl7EOhjTmKxxw= X-Received: by 2002:a25:dc4:0:b0:629:2337:f9ea with SMTP id 187-20020a250dc4000000b006292337f9eamr21531864ybn.6.1647330800437; Tue, 15 Mar 2022 00:53:20 -0700 (PDT) MIME-Version: 1.0 References: <20220302082718.32268-1-songmuchun@bytedance.com> <20220302082718.32268-6-songmuchun@bytedance.com> In-Reply-To: From: Muchun Song Date: Tue, 15 Mar 2022 15:51:31 +0800 Message-ID: Subject: Re: [PATCH v4 5/6] dax: fix missing writeprotect the pte entry To: Dan Williams Cc: Matthew Wilcox , Jan Kara , Al Viro , Andrew Morton , Alistair Popple , Yang Shi , Ralph Campbell , Hugh Dickins , Xiyu Yang , "Kirill A. Shutemov" , Ross Zwisler , Christoph Hellwig , linux-fsdevel , Linux NVDIMM , Linux Kernel Mailing List , Linux MM , Xiongchun duan , Muchun Song Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 83880100004 X-Rspam-User: Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=X3kXd8Ig; spf=pass (imf14.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.219.171 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-Stat-Signature: 4njrit8td1s67mu6fsc514gmn8duh91k X-Rspamd-Server: rspam07 X-HE-Tag: 1647330801-883691 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Mar 15, 2022 at 4:50 AM Dan Williams wrote: > > On Fri, Mar 11, 2022 at 1:06 AM Muchun Song wrote: > > > > On Thu, Mar 10, 2022 at 8:59 AM Dan Williams wrote: > > > > > > On Wed, Mar 2, 2022 at 12:30 AM Muchun Song wrote: > > > > > > > > Currently dax_mapping_entry_mkclean() fails to clean and write protect > > > > the pte entry within a DAX PMD entry during an *sync operation. This > > > > can result in data loss in the following sequence: > > > > > > > > 1) process A mmap write to DAX PMD, dirtying PMD radix tree entry and > > > > making the pmd entry dirty and writeable. > > > > 2) process B mmap with the @offset (e.g. 4K) and @length (e.g. 4K) > > > > write to the same file, dirtying PMD radix tree entry (already > > > > done in 1)) and making the pte entry dirty and writeable. > > > > 3) fsync, flushing out PMD data and cleaning the radix tree entry. We > > > > currently fail to mark the pte entry as clean and write protected > > > > since the vma of process B is not covered in dax_entry_mkclean(). > > > > 4) process B writes to the pte. These don't cause any page faults since > > > > the pte entry is dirty and writeable. The radix tree entry remains > > > > clean. > > > > 5) fsync, which fails to flush the dirty PMD data because the radix tree > > > > entry was clean. > > > > 6) crash - dirty data that should have been fsync'd as part of 5) could > > > > still have been in the processor cache, and is lost. > > > > > > Excellent description. > > > > > > > > > > > Just to use pfn_mkclean_range() to clean the pfns to fix this issue. > > > > > > So the original motivation for CONFIG_FS_DAX_LIMITED was for archs > > > that do not have spare PTE bits to indicate pmd_devmap(). So this fix > > > can only work in the CONFIG_FS_DAX_LIMITED=n case and in that case it > > > seems you can use the current page_mkclean_one(), right? > > > > I don't know the history of CONFIG_FS_DAX_LIMITED. > > page_mkclean_one() need a struct page associated with > > the pfn, do the struct pages exist when CONFIG_FS_DAX_LIMITED > > and ! FS_DAX_PMD? > > CONFIG_FS_DAX_LIMITED was created to preserve some DAX use for S390 > which does not have CONFIG_ARCH_HAS_PTE_DEVMAP. Without PTE_DEVMAP > then get_user_pages() for DAX mappings fails. > > To your question, no, there are no pages at all in the > CONFIG_FS_DAX_LIMITED=y case. So page_mkclean_one() could only be > deployed for PMD mappings, but I think it is reasonable to just > disable PMD mappings for the CONFIG_FS_DAX_LIMITED=y case. > > Going forward the hope is to remove the ARCH_HAS_PTE_DEVMAP > requirement for DAX, and use PTE_SPECIAL for the S390 case. However, > that still wants to have 'struct page' availability as an across the > board requirement. Got it. Thanks for your patient explanation. > > > If yes, I think you are right. But I don't > > see this guarantee. I am not familiar with DAX code, so what am > > I missing here? > > Perhaps I missed a 'struct page' dependency? I thought the bug you are > fixing only triggers in the presence of PMDs. The Right. > CONFIG_FS_DAX_LIMITED=y case can still use the current "page-less" > mkclean path for PTEs. But I think introducing pfn_mkclean_range() could make the code simple and easy to maintain here since it could handle both PTE and PMD mappings. And page_vma_mapped_walk() could work on PFNs since commit [1], which is the case here, we do not need extra code to handle the page-less case here. What do you think? [1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=b786e44a4dbfe64476e7120ec7990b89a37be37d