From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6D4DC433F5 for ; Mon, 14 Mar 2022 20:50:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E94466B0071; Mon, 14 Mar 2022 16:50:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E44036B0072; Mon, 14 Mar 2022 16:50:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CE4048D0001; Mon, 14 Mar 2022 16:50:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0227.hostedemail.com [216.40.44.227]) by kanga.kvack.org (Postfix) with ESMTP id BF3AF6B0071 for ; Mon, 14 Mar 2022 16:50:29 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 795ABA2B05 for ; Mon, 14 Mar 2022 20:50:29 +0000 (UTC) X-FDA: 79244184978.31.CE93A4A Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) by imf15.hostedemail.com (Postfix) with ESMTP id 09494A0003 for ; Mon, 14 Mar 2022 20:50:27 +0000 (UTC) Received: by mail-pj1-f42.google.com with SMTP id lj8-20020a17090b344800b001bfaa46bca3so426955pjb.2 for ; Mon, 14 Mar 2022 13:50:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=O2sXHURm0NoY8IkGl4zp8hYr0GiRSSumlHJx+8DBDR8=; b=VhtWPU5hfae5y3UUUMJ53v9z+E9cDVJIzdnbdhqMLcZjn/y1tL1g4Ip4yO+C4Fok2K bkA2PkP51EJHCCddXQKObdt27Fru7/62k24TcydvfkiIVLHPqli3aLjUtpSqhf6fdvm8 1tdvjT2lcpLsz9HkJYzRiUDg0oCZ5KAIkOCZllpiAF6fb+esenRjlDpqT7+FW9d7AchY mGVomHGFT+saFC/IwCLVOpGVGM13KdQCU94o195Z/wWPW30GzVIhaRLhaD0BjuDsiejE 24QuSkzXgDPKVuG/LFy/ynbPOQDZfiRQ0Ri1H2+DhkMdX+72GpsZkfJDWYWMHLKiSbnT /mxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=O2sXHURm0NoY8IkGl4zp8hYr0GiRSSumlHJx+8DBDR8=; b=eZVipCQ3yB6XGA2ljL73uwJsBg1dEP5zzzjvA2eGQv44FSQ39oiHoaaQtObZNLFC38 2AKyu2HQjbAZiWKOxwR6xr/Za5K/6IYjtsYE1Gpwl1/p7E1aJ/zUbZ8UJYAStHvMkeKk /9kJ3UNKncmyJ9e6Q/RSf3mYVBnRJ+ZtM0CpPLvZqo7kMNeDH0cF9WIwBtrq+/WX/M4C DijE7t/Nxj8pH4bcJy1P2TEaj6zTqnYNwlpHVOL8kUcDZTsybfI3KKe6C9YXX00i+yJ0 ynFPmitNiCu/E79PrlCfDKIFQpa6Qz9lYgT61bQmDbD6o5hbFFP6eWkc6yBZOuhXj49a 2yAQ== X-Gm-Message-State: AOAM532RvMLYFim8DXnVHFZxOA4YrmrNglfHjP9U+6ne4hImm+rTXXcU mgwR6kPGPQCoLt8staPABNw0reosa8EYb06DiQMyzA== X-Google-Smtp-Source: ABdhPJyzK/9bpWo87K7yVfuQDVxjAiaa0iHolmeQz5FZjuIEWLouKzJgJxd/xUTCLfio9rCn6T0zucx/Bsw9kuMKEmc= X-Received: by 2002:a17:902:7296:b0:14b:4bc6:e81 with SMTP id d22-20020a170902729600b0014b4bc60e81mr25125235pll.132.1647291026682; Mon, 14 Mar 2022 13:50:26 -0700 (PDT) MIME-Version: 1.0 References: <20220302082718.32268-1-songmuchun@bytedance.com> <20220302082718.32268-6-songmuchun@bytedance.com> In-Reply-To: From: Dan Williams Date: Mon, 14 Mar 2022 13:50:16 -0700 Message-ID: Subject: Re: [PATCH v4 5/6] dax: fix missing writeprotect the pte entry To: Muchun Song Cc: Matthew Wilcox , Jan Kara , Al Viro , Andrew Morton , Alistair Popple , Yang Shi , Ralph Campbell , Hugh Dickins , Xiyu Yang , "Kirill A. Shutemov" , Ross Zwisler , Christoph Hellwig , linux-fsdevel , Linux NVDIMM , Linux Kernel Mailing List , Linux MM , Xiongchun duan , Muchun Song Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Queue-Id: 09494A0003 X-Stat-Signature: um6knun3pqra4yxuimygux9qokruci36 Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=intel-com.20210112.gappssmtp.com header.s=20210112 header.b=VhtWPU5h; dmarc=fail reason="No valid SPF, DKIM not aligned (relaxed)" header.from=intel.com (policy=none); spf=none (imf15.hostedemail.com: domain of dan.j.williams@intel.com has no SPF policy when checking 209.85.216.42) smtp.mailfrom=dan.j.williams@intel.com X-Rspamd-Server: rspam03 X-HE-Tag: 1647291027-673084 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Mar 11, 2022 at 1:06 AM Muchun Song wrote: > > On Thu, Mar 10, 2022 at 8:59 AM Dan Williams wrote: > > > > On Wed, Mar 2, 2022 at 12:30 AM Muchun Song wrote: > > > > > > Currently dax_mapping_entry_mkclean() fails to clean and write protect > > > the pte entry within a DAX PMD entry during an *sync operation. This > > > can result in data loss in the following sequence: > > > > > > 1) process A mmap write to DAX PMD, dirtying PMD radix tree entry and > > > making the pmd entry dirty and writeable. > > > 2) process B mmap with the @offset (e.g. 4K) and @length (e.g. 4K) > > > write to the same file, dirtying PMD radix tree entry (already > > > done in 1)) and making the pte entry dirty and writeable. > > > 3) fsync, flushing out PMD data and cleaning the radix tree entry. We > > > currently fail to mark the pte entry as clean and write protected > > > since the vma of process B is not covered in dax_entry_mkclean(). > > > 4) process B writes to the pte. These don't cause any page faults since > > > the pte entry is dirty and writeable. The radix tree entry remains > > > clean. > > > 5) fsync, which fails to flush the dirty PMD data because the radix tree > > > entry was clean. > > > 6) crash - dirty data that should have been fsync'd as part of 5) could > > > still have been in the processor cache, and is lost. > > > > Excellent description. > > > > > > > > Just to use pfn_mkclean_range() to clean the pfns to fix this issue. > > > > So the original motivation for CONFIG_FS_DAX_LIMITED was for archs > > that do not have spare PTE bits to indicate pmd_devmap(). So this fix > > can only work in the CONFIG_FS_DAX_LIMITED=n case and in that case it > > seems you can use the current page_mkclean_one(), right? > > I don't know the history of CONFIG_FS_DAX_LIMITED. > page_mkclean_one() need a struct page associated with > the pfn, do the struct pages exist when CONFIG_FS_DAX_LIMITED > and ! FS_DAX_PMD? CONFIG_FS_DAX_LIMITED was created to preserve some DAX use for S390 which does not have CONFIG_ARCH_HAS_PTE_DEVMAP. Without PTE_DEVMAP then get_user_pages() for DAX mappings fails. To your question, no, there are no pages at all in the CONFIG_FS_DAX_LIMITED=y case. So page_mkclean_one() could only be deployed for PMD mappings, but I think it is reasonable to just disable PMD mappings for the CONFIG_FS_DAX_LIMITED=y case. Going forward the hope is to remove the ARCH_HAS_PTE_DEVMAP requirement for DAX, and use PTE_SPECIAL for the S390 case. However, that still wants to have 'struct page' availability as an across the board requirement. > If yes, I think you are right. But I don't > see this guarantee. I am not familiar with DAX code, so what am > I missing here? Perhaps I missed a 'struct page' dependency? I thought the bug you are fixing only triggers in the presence of PMDs. The CONFIG_FS_DAX_LIMITED=y case can still use the current "page-less" mkclean path for PTEs.