From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2364C76196 for ; Fri, 24 Mar 2023 09:49:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 43BD36B0075; Fri, 24 Mar 2023 05:49:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3EBBF6B0078; Fri, 24 Mar 2023 05:49:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2B3396B007B; Fri, 24 Mar 2023 05:49:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 1D3D16B0075 for ; Fri, 24 Mar 2023 05:49:21 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id C58801A0935 for ; Fri, 24 Mar 2023 09:49:20 +0000 (UTC) X-FDA: 80603318880.06.80A4D18 Received: from mail1.bemta37.messagelabs.com (mail1.bemta37.messagelabs.com [85.158.142.112]) by imf11.hostedemail.com (Postfix) with ESMTP id 514884000C for ; Fri, 24 Mar 2023 09:49:18 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=fujitsu.com header.s=170520fj header.b=wRrqAhZq; dmarc=pass (policy=none) header.from=fujitsu.com; spf=pass (imf11.hostedemail.com: domain of ruansy.fnst@fujitsu.com designates 85.158.142.112 as permitted sender) smtp.mailfrom=ruansy.fnst@fujitsu.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679651358; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Pxpk4CSw0qrpM2DP+X1AdjLT3KYQnQt8U9Qz9IB7N+U=; b=Jyx943pl2lbuXx8oRBoolPZJwwPpGhRhTgb5jICgbrbg25owhekdfQsjhUs9EyWXg+05ZU TiBtV2haRMZ2kVtOi5E8dXzJ7MRvD75y7vwKbI/nrJcMA35LYMZppp+HPJ3T0JlL1tmEWg lP5NUkBb92Tx3CxERlgZr2AhYbz4Cl4= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=fujitsu.com header.s=170520fj header.b=wRrqAhZq; dmarc=pass (policy=none) header.from=fujitsu.com; spf=pass (imf11.hostedemail.com: domain of ruansy.fnst@fujitsu.com designates 85.158.142.112 as permitted sender) smtp.mailfrom=ruansy.fnst@fujitsu.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679651358; a=rsa-sha256; cv=none; b=MmFGvVKHHlN0mMrbapewAl3rN96xE9zXo79j98l+VKiTmXxrBzNw/AcDQO5OGi/00fW44O pFnWmKNAFu4A89O2pcyPDoQq7FIECoInrviT9JO2wbbqDHTCfN4tU/xq8z0Ng7dTUMFP+T c01y9cVP+SPpbByeLNqdR1LMbMx67W0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fujitsu.com; s=170520fj; t=1679651356; i=@fujitsu.com; bh=Pxpk4CSw0qrpM2DP+X1AdjLT3KYQnQt8U9Qz9IB7N+U=; h=Message-ID:Date:MIME-Version:Subject:From:To:CC:References: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=wRrqAhZq/4Yj0ectfoivQwTsfOHqmkGelD0V0/AEQsPj2mQG3F9NQU0bywwt0o9ke 7VnnPZidpLCt7SD1nH1igc6gQOgUTXMJJ6lJAjkLESlGNaO3mt+OrWHl9hltO4Sdar obtO56zs2Mh1nRCPhCdwmNeWnUD7Ig/0zbIJQ4NUHp+xKkQWh6rxazFHNmKXQ8HKEm e+hH2CL35WMVPDHKvJIw7X2wmmqqu3nSizqv8B+dridieTEAQJWhPR3rxWfQoIwSg8 D2S90Jw+cjl2qCzW1u3Vr2ShZTdcZriLymdYCqUt0rmOv3upXF5Kmkrl52d+PcjppL SkPKIlIYW84qA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrEKsWRWlGSWpSXmKPExsViZ8ORpCtdJJt isGOhnMWc9WvYLKZPvcBoseXYPUaLy0/4LE5PWMRksfv1TTaLPXtPsljcW/Of1WLXnx3sFit/ /GG1+P1jDpsDt8epRRIem1doeSze85LJY9OqTjaPTZ8msXucmPGbxePF5pmMHh+f3mLx+LxJL oAzijUzLym/IoE1Y9akBvaCHo2Kps8XGBsYdyh2MXJxCAlsZJTY//wBG4SzhEliSUMDE4SzjV Fiw+Ib7F2MnBy8AnYSWzbNZAaxWQRUJV4tmMwGEReUODnzCQuILSqQLHHsfCtQnINDWMBP4sI ac5Awm4COxIUFf1lBbBEBNYlJk3aAjWEWaGGS2NJdAbFrKpPEuWVNTCAJTgF7iUXd96CKLCQW vznIDmHLSzRvnQ0WlxBQkrj49Q4rhF0h0Tj9EBOErSZx9dwm5gmMQrOQnDcLyahZSEYtYGRex WhenFpUllqka2iil1SUmZ5RkpuYmaOXWKWbqJdaqpuXX1SSoWuol1herJdaXKxXXJmbnJOil5 dasokRGJ0pxYnHdzC+6vurd4hRkoNJSZRXIlQ6RYgvKT+lMiOxOCO+qDQntfgQowwHh5IEr1a BbIqQYFFqempFWmYOMFHApCU4eJREeNdnAaV5iwsSc4sz0yFSpxh1OdY2HNjLLMSSl5+XKiXO G1EIVCQAUpRRmgc3Apa0LjHKSgnzMjIwMAjxFKQW5WaWoMq/YhTnYFQS5tXMBZrCk5lXArfpF dARTEBHONfIgBxRkoiQkmpgKuc4EBDT9bzHvyPzY5QHy6nk5zEJD1zUBU+mXzjQsENW90dkT2 5wyewnbBvW/GoMFxU5q5wf5R91Os7k9OOHH953X79v/9h85VSL6TOv3Pli4cz0sXrSpVNOzZo /Ej7GTdfvXch1+WHWO3XDIC/T7/f37dlbrzM7fNLKKz/609OkmxatlfJ1KAj0PLtu78eLj64a H7fePqutn3/bNimried1PI5U7rFOKIuQOCX18YJlekPk9eLGuYynX4Zl/zf/H/jze+ZlZi19v 14bZbVTz87O9+rPzPzBLF6dUN9tfLSwyO/SXv7klaprDY6+ETHs0qzNNJp9hdHSxYljS92eN1 O9ZlgfF45vO/lgY0WCEktxRqKhFnNRcSIA4qC5ctUDAAA= X-Env-Sender: ruansy.fnst@fujitsu.com X-Msg-Ref: server-2.tower-732.messagelabs.com!1679651354!283082!1 X-Originating-IP: [62.60.8.98] X-SYMC-ESS-Client-Auth: outbound-route-from=pass X-StarScan-Received: X-StarScan-Version: 9.104.1; banners=-,-,- X-VirusChecked: Checked Received: (qmail 27032 invoked from network); 24 Mar 2023 09:49:15 -0000 Received: from unknown (HELO n03ukasimr03.n03.fujitsu.local) (62.60.8.98) by server-2.tower-732.messagelabs.com with ECDHE-RSA-AES256-GCM-SHA384 encrypted SMTP; 24 Mar 2023 09:49:15 -0000 Received: from n03ukasimr03.n03.fujitsu.local (localhost [127.0.0.1]) by n03ukasimr03.n03.fujitsu.local (Postfix) with ESMTP id C28BC1AF; Fri, 24 Mar 2023 09:49:14 +0000 (GMT) Received: from R01UKEXCASM121.r01.fujitsu.local (R01UKEXCASM121 [10.183.43.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by n03ukasimr03.n03.fujitsu.local (Postfix) with ESMTPS id B61461AE; Fri, 24 Mar 2023 09:49:14 +0000 (GMT) Received: from [192.168.50.5] (10.167.234.230) by R01UKEXCASM121.r01.fujitsu.local (10.183.43.173) with Microsoft SMTP Server (TLS) id 15.0.1497.42; Fri, 24 Mar 2023 09:49:10 +0000 Message-ID: <7ba8c1f6-b9fe-714a-cd40-2b9e17ea61e7@fujitsu.com> Date: Fri, 24 Mar 2023 17:49:04 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.7.1 Subject: Re: [PATCH v10 3/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind From: Shiyang Ruan To: Dave Chinner CC: , , , , , , , , , References: <1676645312-13-1-git-send-email-ruansy.fnst@fujitsu.com> <1676645312-13-4-git-send-email-ruansy.fnst@fujitsu.com> <20230227000759.GZ360264@dread.disaster.area> <56e0a5e8-74db-95eb-d6fb-5d4a3b5cb156@fujitsu.com> In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.167.234.230] X-ClientProxiedBy: G08CNEXCHPEKD07.g08.fujitsu.local (10.167.33.80) To R01UKEXCASM121.r01.fujitsu.local (10.183.43.173) X-Virus-Scanned: ClamAV using ClamSMTP X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 514884000C X-Stat-Signature: jp6ytz139gt4s5hhmm6jth8o4nwu33mn X-HE-Tag: 1679651358-654776 X-HE-Meta: U2FsdGVkX1+V4SAGKtn9bGOzV6DVsRUyuBo6dLJcNtVugsa/nv7g3FG1yfeY+DAv4hPoEHhe7HPTQ/klAQi6rXLxmLM/UxPYWz+BjbJNHONhWrN3v9y2akBxeSDTJHwTraVoQKXohDriUdWyeimgCxP51doBVOiGGiC9MQPm5tOMB58mKNoWhZUNihxZg9SSXjJpGVjBvBPQlt7DDIMU+B39SaD3/h9jwhTwZTRpUsf6LRvVw+O+wLPYehv6e3gpeaLrvdDLoGQsRJI6h1UQ019tyamm676LPksoH4MwWFl+UkxJ3qv6uzy7iNZ85DWSLzem4sD6uS3A137ku749sWinG7dLHcgMDMOQg6T0ymWfiyRDSxA5zrL63+cK8Zqg+LT8OBnUaYg+Y0gcEmNSShsLn3QQNTWchY8r3iqD7j1dEzf9SPUjHjepBnwLimjT8EoIiXmeLzgrKxXBuW5xMWpVXNnOnX/BFNX2SX9Fpz+N35XU1v/Pgnw9xb4IFyEb0K+vnRLrUqKkImNW5kpbrsX/SjY+sQcc423SmotgNjyQVFIhfvnyuo02f4QQzM7IHkqzkJpQUwXI/l9PQ1+P99OlA4K3kQjJzu3LWhgS2xEAoq9QKAU4WJ4jt7DSiWKBSdImxtmosejkFbvqxOTJ3ECjyiZ+D/qIPHapeZ4hGxOLJ8bl4YG9XM3dM40e8gQVXcUSP1CAw3A3Lpv0fnymS5uqqnAv5itY4JCuHKPhrLJxNANnKZ7i/WrDJbS4tDelFxqHlM4aX+OUW4PmBhKCVgas4a8j613xfObHfM894UWYtURI3Zrs8KxzyHX0iQ1n0RnDalymG2MLk5icmu3dVLmReYbCoPYdaCnDpH8GSusr2eRfIwUeT3b+13exDZlTYOops3WVHXK17OYJNfnDbavboNyQmZuI0jsH8i0SUK630PaBE5XjfkTdkPB9IfzWcULYKvaZQX6FkKITx2D wkM8J6A7 wXYNuVVx62J2QALfNxYBKZunu3L0DayrNyg2/TrEAt8or97x9vK0tkaF4Vay36zKzvPfMiyDA5k7ud7oRXcUyT19SeM71U/RLj6BKR56gIFIbCCiaZC+5oBrmcUR4PBNq/Og5gZbh4MbAd+hu8ulSkEwS/o4QZSTjX9NMi6u6CuOtWgJ06F5nLHeMDz/bPULDjkZFVoav5w5Gd7zwbPCsBqA4fKzGJxLYi7H8Y5VTYCdldLBsn0XOpxwxC8S5ifIBYx9Gt1TRO+GwCZPOFcMZ9qIYnbCRbgNa/GZf1bmWTSH867igvdxNTS7pwGC81Pi36iNzbV80QtIU4vIwwNPaKLdtRSI1K3ldCDoBRJt+DGmR4PP8zT2iFdW///4DTDVe5df9O26YMeniZtC4x3Cv0KT8Z1sFUFsHiQBP37Kq/6T7OQnsk1rhsPlpNIXnbH9kREprHkEldoLcRyZ267XOBgQhqg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: 在 2023/3/21 18:59, Shiyang Ruan 写道: > > > 在 2023/2/27 18:06, Shiyang Ruan 写道: >> >> >> 在 2023/2/27 8:07, Dave Chinner 写道: >>> On Fri, Feb 17, 2023 at 02:48:32PM +0000, Shiyang Ruan wrote: >>>> This patch is inspired by Dan's "mm, dax, pmem: Introduce >>>> dev_pagemap_failure()"[1].  With the help of dax_holder and >>>> ->notify_failure() mechanism, the pmem driver is able to ask filesystem >>>> (or mapped device) on it to unmap all files in use and notify processes >>>> who are using those files. >>>> >>>> Call trace: >>>> trigger unbind >>>>   -> unbind_store() >>>>    -> ... (skip) >>>>     -> devres_release_all()   # was pmem driver ->remove() in v1 >>>>      -> kill_dax() >>>>       -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, >>>> MF_MEM_PRE_REMOVE) >>>>        -> xfs_dax_notify_failure() >>>> >>>> Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove >>>> event.  So do not shutdown filesystem directly if something not >>>> supported, or if failure range includes metadata area.  Make sure all >>>> files and processes are handled correctly. >>>> >>>> [1]: >>>> https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@dwillia2-desk3.amr.corp.intel.com/ >>>> >>>> Signed-off-by: Shiyang Ruan >>> >>> ..... >>> >>>> --- >>>> @@ -225,6 +242,15 @@ xfs_dax_notify_failure( >>>>       if (offset + len - 1 > ddev_end) >>>>           len = ddev_end - offset + 1; >>>> +    if (mf_flags & MF_MEM_PRE_REMOVE) { >>>> +        xfs_info(mp, "device is about to be removed!"); >>>> +        error = freeze_super(mp->m_super); >>>> +        if (error) >>>> +            return error; >>>> +        /* invalidate_inode_pages2() invalidates dax mapping */ >>>> +        super_drop_pagecache(mp->m_super, invalidate_inode_pages2); >>>> +    } >>> >>> Why do you still need to drop the pagecache here? My suggestion was >>> to replace it with freezing the filesystem at this point is to stop >>> it being dirtied further before the device remove actually occurs. >>> The userspace processes will be killed, their DAX mappings reclaimed >>> and the filesystem shut down before device removal occurs, so >>> super_drop_pagecache() is largely superfluous as it doesn't actually >>> provide any protection against racing with new mappings or dirtying >>> of existing/newly created mappings. >>> >>> Freezing doesn't stop the creation of new mappings, either, it just >>> cleans all the dirty mappings and halts anything that is trying to >> >> This is the point I wasn't aware of. >> >>> dirty existing clean mappings. It's not until we kill the userspace >>> processes that new mappings will be stopped, and it's not until we >>> shut the filesystem down that the filesystem itself will stop >>> accessing the storage. >>> >>> Hence I don't see why you retained super_drop_pagecache() here at >>> all. Can you explain why it is still needed? >> >> >> So I was just afraid that it's not enough for rmap & processes killer >> to invalidate the dax mappings.  If something error happened during >> the rmap walker, the fs will shutdown and there is no chance to >> invalidate the rest mappings whose user didn't be killed yet. >> >> Now that freezing the fs is enough, I will remove the drop cache code. > > I removed the drop cache code, then kernel always went into crash when > running the test[1].  After the investigation, I found that the crash is > cause by accessing (invalidate dax pages when umounting fs) the page of > a pmem while the pmem has been removed. > > According to the design, the dax page should have been invalidated by > mf_dax_kill_procs() but it didn't.  I found two reasons: >  1. collect_procs_fsdax() only kills the current process >  2. unmap_mapping_range() doesn't invalidate the dax pages > (disassociate dax entry in fs/dax.c), which causes the crash in my test > > So, I think we should: >  1. pass the mf_flag to collect_procs_fsdax() to let it collect all > processes associated with the file on the XFS. >  2. drop cache is still needed, but just drop the associated files' > cache after mf_dax_kill_procs(), instead of dropping cache of the whole > filesystem. > > Then the logic shuld be looked like this: > unbind >  `-> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE) >    `-> xfs_dax_notify_failure() >      `-> freeze_super() >      `-> do xfs rmap >        `-> mf_dax_kill_procs() >          `-> collect_procs_fsdax()   // all associated >          `-> unmap_and_kill() >        `-> invalidate_inode_pages2() // drop file's cache >      `-> thaw_super() > > > [1] The step of unbind test: >  1. create fsdax namespace on a pmem >  2. mkfs.xfs on it >  3. run fsx test in background >  4. wait 1s >  5. echo "pfn0.1" > unbind >  6. wait 1s >  7. umount xfs       --> crash happened > Hi, Any comments? > > -- > Thanks, > Ruan. > >> >> >> -- >> Thanks, >> Ruan. >> >>> >>> -Dave.