From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70CA1C02198 for ; Mon, 10 Feb 2025 09:00:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0C9A26B007B; Mon, 10 Feb 2025 04:00:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 079996B0083; Mon, 10 Feb 2025 04:00:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EACF46B0089; Mon, 10 Feb 2025 04:00:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id CED1B6B007B for ; Mon, 10 Feb 2025 04:00:20 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 835411C7CB0 for ; Mon, 10 Feb 2025 09:00:20 +0000 (UTC) X-FDA: 83103438600.19.15D499F Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by imf13.hostedemail.com (Postfix) with ESMTP id 0B97220011 for ; Mon, 10 Feb 2025 09:00:16 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=Jsh4Op3O; spf=pass (imf13.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739178018; a=rsa-sha256; cv=none; b=sV5tpb+aHXlUkGfT9HdhF5fxCb8AkMYehgsJK2X73tRc/5cHfRk7Hd1hXvM8O5rRBZMyDq RFDeC7ko1CVi7mZIV1WQLKslTrX3/U/v/rVDDPser2YtxKK3u9QEXSerXKE7t2c0ByNnR+ 4SbeUMhtHvBKIxSH9/GjjymSuNGD+aA= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=Jsh4Op3O; spf=pass (imf13.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739178018; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=X9SV/KBFhD6Fda0VnlvUomEfZ86TEAFjs4kVe2jPYeg=; b=tUM3ZYENXdvtt15gJj43rNOqxLIVtnXvt6PThIgL6hnL70RXj427J65laWxQmQmIUMYDCd k0G3l3QIdaWc9Gp4J8be+h5TyBhXn50wLF9nafrx88PvpCI15Oyk09/Vtz9nf/gyuPNG/1 7LGsysIeG51lSozjUh92j6f6uym0w5w= Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-21f818a980cso15109115ad.3 for ; Mon, 10 Feb 2025 01:00:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1739178016; x=1739782816; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=X9SV/KBFhD6Fda0VnlvUomEfZ86TEAFjs4kVe2jPYeg=; b=Jsh4Op3O0fZZ6U3R/JlFZfMXxRxIrIwRL1VgqEcZOBZcFq0tZQ1wJBTRJkztwfLVJj UYP29XnUVnaSfivC3nrQrK3GHz1vtfM3VTIEXBzNQNTySI0N02jKBFUQFC07+l8Df+UN wu2dTHJxviTv5T7NMeqOQNp4VepsBSvGKbvWG/UwYfw7ec6sEArb9KJdWv8OjitIsAUh UR5jtsDDJWg9owDZPNs0u8B+SlTwEsytDllnFxs4w5qq069+xOuFbewULEOT//rulhkj rgX50yyE9ZeaXqPbf+UsFU5Ik9UlwZcp3aUqxt5Q5PUqku04OhLG6l44dprEXE5aQiFS FDdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739178016; x=1739782816; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=X9SV/KBFhD6Fda0VnlvUomEfZ86TEAFjs4kVe2jPYeg=; b=K29dqwfRlG+mNqPb50wC1dAtluMF40WKKQj+NEmW/km/JBsnVlX8bCLmNoqlNKx9qy Ff36XJ8dvSJCCy6Bhx7wCrJnYxlRza9qudiQL0dqvvbb/A2/41x8xdnZUhIApQNIe6HD jyJJTRpX3xPTp4hHfyCO03J+BmaVwNAOcsEovPpLr3AiZRw8+FR9nx6qYpWayS5EQV0O VJ2rz2SoOC4hm5CJxuv9O1NqfGwrRLumkqRqrIc0TuieAWZhYpIyNETyxf2Ld5+yIMnm BqUnAb8Bouw/vZqvfjuSA6Pn/iCrGBtZY/gddTzhFPu7036YrD1h8hZ/ByB/GRwbRBhF 5TuQ== X-Forwarded-Encrypted: i=1; AJvYcCVWbWr21/JyY2EiM2qtBikMQFyUkKLpE1eE0NIs9oHVn/V1tAlrVJSrFVEx0P2e/G3DipWPQqpLew==@kvack.org X-Gm-Message-State: AOJu0YwVRELWflrNT1ymUXvB24tqHG7YPwl2P60pjr0tPTmLpnX5Y5bd 8xiwZPwoyQmNnSHEYG/54PiY3SqWXgv8nKranR5O2e7nQ7EEa1OShwudxGH0UlQ= X-Gm-Gg: ASbGncuui9NtXCyit2zrv3X8DI/4p6ytD6GRifbd5uJSEAsWnBh9pwBLUcF1iKovkl7 EOoQde7iHgzJoU4j2s1vFxN8m4fGZEMGvsrtSqhXAGAFpM/WxxTFHwyhkDOCxlmHyk389DcglW+ wqBZteVb5gRVs3rINRxwe/zhhvCX6kTH6+W6D3VZkS2jVzfmD6lY90usk55j3fuw8A9jhyRT8dZ nUBw595OvOBuj/981tzi5VKt993DtmmiUog+pFxtzCZ70e54l7/o3seJNWGRu1UAJrQFGTZCcah HqoX5ryfQW2RsEbStgEAnBv0iMdz2VnXoTonCv2x4w== X-Google-Smtp-Source: AGHT+IFEOkC+0/zuM1UD849y9ikHusLLu7U7zyLrYxwfstCH1IFRakpOJpQGWn8u85whPC18PpHlew== X-Received: by 2002:a17:902:e547:b0:215:9f5a:a236 with SMTP id d9443c01a7336-21f4e6a5751mr173530685ad.6.1739178015698; Mon, 10 Feb 2025 01:00:15 -0800 (PST) Received: from [10.84.150.121] ([203.208.167.149]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2fa5312f4c5sm4281921a91.8.2025.02.10.01.00.10 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 01:00:15 -0800 (PST) Message-ID: <81834761-5537-41dc-813d-63c947bba5a5@bytedance.com> Date: Mon, 10 Feb 2025 17:00:08 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: xfs/folio splat with v6.14-rc1 Content-Language: en-US To: Qu Wenruo Cc: Zi Yan , Matthew Wilcox , Christian Brauner , David Hildenbrand , Jann Horn , "Darrick J . Wong" , Dave Chinner , linux-mm@kvack.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org References: <20250207-anbot-bankfilialen-acce9d79a2c7@brauner> <20250207-handel-unbehagen-fce1c4c0dd2a@brauner> <2766D04E-5A04-4BF6-A2A3-5683A3054973@nvidia.com> <8c71f41e-3733-4100-ab55-1176998ced29@bytedance.com> <669898c1-e998-461d-9381-9143a3cb39c2@gmx.com> From: Qi Zheng In-Reply-To: <669898c1-e998-461d-9381-9143a3cb39c2@gmx.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 0B97220011 X-Stat-Signature: pdk8rhwtwchy56hg9uou9ipdfejkcgh5 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1739178016-823993 X-HE-Meta: U2FsdGVkX1/s3s3Cfy/mMvtMZ8mipyDbPALjs9n0LquLuCZBbuwzBO/NniPQz9I1tNL9R/wxMCLdAjrD4SUgSJZixTpH+KJTJjbR2036icIMSGS0EqN5rfM58PTvUw8vmnDHZ0/6aL1gBkZB9/Dt3PT7oCvSyvzN9enuBickeRj3NePBPCHUXBqR+PmFoHygcyRVobXts3Y1zBmcPDFIUxo5gwSYFfGR8XMxy7EdqTjGyhFpyaLUMHDadRv/BsJdNnC92ZCkU2f/wlA+jecGSHa+MSzM/M4e5I56hLfr8YVxwwkmxkwBXLJD+CKMXOySIarcv9fRTc9i/c75UeNUEDqHphphTVVYcljBM8pJDZarUQiPcxQmY1bk2QdHYgxaS5x4csVt7KzuSwmGdwgh3QId+dhh+elnhF4rZVhx9Gr21AAR2JDomiV1qxDbEDu9GFGrnW+6mzSZeD52Jrk1MAqXg/HIP3zHF/hlDyxFndsR7f3Y29wTeFIttT85wa6DKV/Ne9AZrXmLca+OdmIoyxRoHTnvNQIsIVRUQ0+vKA9ONCBViEW2TgloHXMLmq3eGA3iWxl6vBVkpFPUQjXkgEdYQpBNrOOhBwMLre3U39/GesjXoSgnCYLx9H2APC7ltGs8fdSMX8TXlQrqgg7QyZsrPGS/B/IfLmczCcaZ+N+X8FzgU9jAtkW25/hIKu0VM5KXEm8a7gZ/MEKTuYCqhA6gVIc9tJpvzNw7v2k3RK+v51g4QHzG6S9RhcmuLeeUyZetVVKHfKo3dFIo5WvCjbMIA2n6/4jn2HN0JhMzw2YpNTl+dcVRBY7Fj9fs0E3vvmj62uNfWhf1DorOm4QZbf/jVxqdz2creX2EVNNoGF666NyfMriwhZhTOY7Dnwa1tm9vFehsgZ4OZQzWx9YQDoccEMTkRF5LC1MQ/fFFmuiZHKsuh45ZQN4v8sC8weqy+O0rgGVml3KUq6lFEJF ejJcHxio VeSn49lmP9wCVBUg6tE24k5D9nAVy2PUb4nLb5sBubZHw2oMjD0E9RojoEEm/UY3zW1vWmYL65vEuGWROuxGLNPHv5HAnbdn9DJsViGc0l9NPeTZEhvDmmmhE9Fuj76sx/6jmBqPyKWn3e8N0S9MFMRuplSr9rkGZLblT2f9GB6kSILmxQpLQMxaCPIhDe3mNGXg7GVvnY3tPzYX4fSprdNct/Gylo9fJ5hSBgUKzMGizqhLfSaMCG/YtWFXK8PEmeINm0VcVwuFJdgIz6fHmG6s3EwifbvUhjEQ//Gn1KCl50Qan1tcZngWw32cb2Th8vf2ytIEiTXjYLDJgHGEa8a8jIQmyvVunjIEAOAJA8HRvVgs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/2/10 16:50, Qu Wenruo wrote: > > > 在 2025/2/10 18:48, Qi Zheng 写道: >> Hi all, >> >> On 2025/2/10 12:02, Qi Zheng wrote: >>> Hi Zi, >>> >>> On 2025/2/10 11:35, Zi Yan wrote: >>>> On 7 Feb 2025, at 17:17, Matthew Wilcox wrote: >>>> >>>>> On Fri, Feb 07, 2025 at 04:29:36PM +0100, Christian Brauner wrote: >>>>>> while true; do ./xfs.run.sh "generic/437"; done >>>>>> >>>>>> allows me to reproduce this fairly quickly. >>>>> >>>>> on holiday, back monday >>>> >>>> git bisect points to commit >>>> 4817f70c25b6 ("x86: select ARCH_SUPPORTS_PT_RECLAIM if X86_64"). >>>> Qi is cc'd. >>>> >>>> After deselect PT_RECLAIM on v6.14-rc1, the issue is gone. >>>> At least, no splat after running for more than 300s, >>>> whereas the splat is usually triggered after ~20s with >>>> PT_RECLAIM set. >>> >>> The PT_RECLAIM mainly made the following two changes: >>> >>> 1) try to reclaim page table pages during madvise(MADV_DONTNEED) >>> 2) Unconditionally select MMU_GATHER_RCU_TABLE_FREE >>> >>> Will ./xfs.run.sh "generic/437" perform the madvise(MADV_DONTNEED)? >>> >>> Anyway, I will try to reproduce it locally and troubleshoot it. >> >> I reproduced it locally and it was indeed caused by PT_RECLAIM. >> >> The root cause is that the pte lock may be released midway in >> zap_pte_range() and then retried. In this case, the originally none pte >> entry may be refilled with physical pages. >> >> So we should recheck all pte entries in this case: >> >> diff --git a/mm/memory.c b/mm/memory.c >> index a8196ae72e9ae..ca1b133a288b5 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -1721,7 +1721,7 @@ static unsigned long zap_pte_range(struct >> mmu_gather *tlb, >>          pmd_t pmdval; >>          unsigned long start = addr; >>          bool can_reclaim_pt = reclaim_pt_is_enabled(start, end, >> details); >> -       bool direct_reclaim = false; >> +       bool direct_reclaim = true; >>          int nr; >> >>   retry: >> @@ -1736,8 +1736,10 @@ static unsigned long zap_pte_range(struct >> mmu_gather *tlb, >>          do { >>                  bool any_skipped = false; >> >> -               if (need_resched()) >> +               if (need_resched()) { >> +                       direct_reclaim = false; >>                          break; >> +               } >> >>                  nr = do_zap_pte_range(tlb, vma, pte, addr, end, >> details, rss, >>                                        &force_flush, &force_break, >> &any_skipped); >> @@ -1745,11 +1747,12 @@ static unsigned long zap_pte_range(struct >> mmu_gather *tlb, >>                          can_reclaim_pt = false; >>                  if (unlikely(force_break)) { >>                          addr += nr * PAGE_SIZE; >> +                       direct_reclaim = false; >>                          break; >>                  } >>          } while (pte += nr, addr += PAGE_SIZE * nr, addr != end); >> >> -       if (can_reclaim_pt && addr == end) >> +       if (can_reclaim_pt && direct_reclaim && addr == end) >>                  direct_reclaim = try_get_and_clear_pmd(mm, pmd, >> &pmdval); >> >>          add_mm_rss_vec(mm, rss); >> >> I tested the above code and no bugs were reported for a while. Does it >> work for you? > > Tested 128 generic/437 runs with CONFIG_PT_RECLAIM on btrfs. > No more crash, will do a longer run, but it looks like to get the bug > fixed. Thank you for testing so quickly! > > Before the fix merged, I'll deselect PT_RECLAIM as a workaround for my > runs on btrfs/for-next branch. Sorry for the inconvenience. I will submit a formal fix patch after testing it for a longer period of time. Thanks, Qi > > Thanks, > Qu > >> >> Thanks, >> Qi >> >>> >>> Thanks! >>> >>>> >>>> -- >>>> Best Regards, >>>> Yan, Zi >