From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53D21C05027 for ; Fri, 3 Feb 2023 15:04:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9D6C36B0071; Fri, 3 Feb 2023 10:04:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 988DE6B0072; Fri, 3 Feb 2023 10:04:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 89D0A6B0074; Fri, 3 Feb 2023 10:04:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 7BC526B0071 for ; Fri, 3 Feb 2023 10:04:24 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 4EBE81C6AA0 for ; Fri, 3 Feb 2023 15:04:24 +0000 (UTC) X-FDA: 80426301648.28.393406D Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by imf06.hostedemail.com (Postfix) with ESMTP id 99B4F180019 for ; Fri, 3 Feb 2023 15:04:19 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=jEptUMmc; spf=pass (imf06.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675436660; a=rsa-sha256; cv=none; b=AAWxFqj10rgI3IRhcVl5L7YrnDi5itK3uPk3i6d4ebBXfhrDVCHu4c59gh695V3aDKIhZ6 xVyqGkaBkuFdjbrCVtXmLCtn4PmdSYQyKC63zeKKgCq7huAVnTByJhchbpF4GDAX2lfPMp PnVS7j7oJfuyQi1GVFKLlnI8e3oIzkQ= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=jEptUMmc; spf=pass (imf06.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675436660; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VLJ0e3KjuXiNkp+5k6vXMX7KjEBGmc5ZKsD/Q2qUxvc=; b=xSZHNmmGWA2ILHK5u+r9+u6ZLQPMMQmKx4Q8LO01rQu914r1ZfhKqBUJCmT3xfha+BRmCQ 3mO5sCTV8tOpld7XZxYPMS5QfMJq523YQpOQ0nBietqyC+ouWv0ZQkSfAML8KS0/bB+Pgo Kc3TzOeO6wmJBRRy1H0NAweSNEASS5c= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1675436659; x=1706972659; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=XKPK2caAUhuleHpEf4KvbSd+7t4HCRuLmbbgIYhCNfA=; b=jEptUMmcpXad99sk22gkqF9xU5kRmvPNfO3/77RJ/6ThBisgJeKT7w6e v/XjMPouQsEFB0e3RhWRRSpKG5OQQlRlZkEN/A9CvkNC6SuvQ3Q5hSTbn fieU4h6gJBgbykwc780BqJxMkeYP+sZTn+tOdfB74zEyVbG+8a50gsq7k pC6qLrh/fvVI2vaRoBrikmYJ44UhcHnmJm8RPbhXLwpMgTmwzbvR6f2sz tQItLPpmTwpR9Tjti6pAC6bhBgSbB/G9tomJ1Lj0BGnGrn1VSjbC5unuF YpFO7y3ylruOF2cXJ/6vfbHglaqoq7o2qgeenDyu7usy1KPI4xqFxW7KP g==; X-IronPort-AV: E=McAfee;i="6500,9779,10610"; a="414976425" X-IronPort-AV: E=Sophos;i="5.97,270,1669104000"; d="scan'208";a="414976425" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Feb 2023 07:03:52 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10610"; a="789719304" X-IronPort-AV: E=Sophos;i="5.97,270,1669104000"; d="scan'208";a="789719304" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Feb 2023 07:03:49 -0800 From: "Huang, Ying" To: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: linux-mm@kvack.org, akpm@linux-foundation.org, Zi Yan , Yang Shi , Baolin Wang , Oscar Salvador , Matthew Wilcox , Bharata B Rao , Alistair Popple , haoxin , Minchan Kim Subject: Re: [BISECTED] first bad commit is c203c6d5b3f0597 ("migrate_pages: batch _unmap and _move") References: <87h6w5othj.fsf@yhuang6-desk2.ccr.corp.intel.com> <87wn4zmzd1.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Fri, 03 Feb 2023 23:02:46 +0800 In-Reply-To: (Hyeonggon Yoo's message of "Fri, 3 Feb 2023 14:17:56 +0000") Message-ID: <874js2n65l.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspam-User: X-Rspamd-Queue-Id: 99B4F180019 X-Rspamd-Server: rspam01 X-Stat-Signature: ypr5nah6ky4zj81ysapa6nag4msrco3y X-HE-Tag: 1675436659-349908 X-HE-Meta: U2FsdGVkX1+UUX0WAqGFTdaemWjFdSVEbfBrjHGZ7UWsNkPqD1KawlxoD++IHcBMJMQvzLQa92JkYqq+jGcI0MdUeGDPjv8Z9VF0OHCNRaTvvnakHAwisj85NWRqe2mRtbCd44sOB96OqwEp+rWk1OmsF8HSPPKiOYG6XdUalmx6VXCwTPn9LZoYFY1afnHvQIB1clw2HylJSEIy2KaJ9jAS+riqtsE8G3PDtmXBBTaxzS7E5LjN+j1XY8sYnhlluDepG1MYiaR7g9f9aZtcAUWaH5UjUuRYvvf8AJBRFaO0duWp8eMrkRbBYVzCgOGAH1uEJefdRueujWGj63moobm6tRjIZQskz0b9+f4PK4rWDGnj/wVrJ/XX7cRSxP+rfies+xYEM+GwB13JHE7uvSO+tLtq4tSqLoPqOw/PoTbGpkcGkUUWuh8sNmFwH88SoeSYCvCyOpOt4IQEz6qiA0dyWSHdjTZ8zLBTOyuQtVFN2CVLxfndJ7uFxdHhwvsIxFCgx5/4tLlky8gbL38vNKnPearpoiPA44uNY17c5tj1omMdgBri57GL7cdU8sU6mlTGmLaAQEJL/TILjBFC+ZDtyzY9KseW1DDaQh1kYTdUWihYavKT0DFrZ1W0trI3UO5ubg4curwT1ySe3TvOZithNlN43mye1EkRCYtEhRMYGbH6V782RV/7aLNHlQRnuwMNNdN7MvyE8XifjJkx6l+S/v1EHeF3Bxmn+FwWu9n+s0u8W27qaF3PhrrajUNiJnccTxJVI1UvtgXhXSI6q8RPiBL97eQLhOBeHo/h2USinhuxNKL7+8Zc4FAV/DOon+ygweupetBEiSo/WYEv47eOvWMQIS2zs81eKjJ4FOOvdwj9mnc1C0NmrjWTQCIpWTTONPdf54+KkTJWtY8keWNM+h7RfzZCd5+jE0v3Vh9yhgCjJrgNfqSUNeV6QIDCliLeUPBopB22BZnwv3e Hngtgqnc aRID+8PUeYkehYdyGYFFYLzKid2W2Jwr3VG3RyZDE+wGEI05cI/cZdnW6yxs+K+MP2/4gwHyxscOT4rMXVCJqLqZmIAnHYZD8+gCOJ36QuwhWPNPDj1FqEf8BFvP/ircrJfrHWWQRaRARuyseKEMn77dyZBduRT69bf4mGOjPDnFXiwlztAI9gFB7xkdvxI72i9ToOVPktsDSFCziS+aUL2wSBEo4HW4Q09DmzKApVRR1sETnJi89B8I1Ar2ZTUwNmnNDu2DRaXd08XyXy+RaEcu5vzgQ83nDZFCoY1lFv9M4bYDfIyzvTJpXuWnImOx5eBFOW9/QOtUpgGDF4Y/8fwMFG0AvewpKs483M/M6UForxgQ1FIPyiCWYsRuEOK8JJOKNXrtY+BVZ9bm+luN8AvbaCLxhvwXt7CoP X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hyeonggon Yoo <42.hyeyoo@gmail.com> writes: > On Fri, Feb 03, 2023 at 07:17:14AM +0800, Huang, Ying wrote: >> "Huang, Ying" writes: >> >> > Hi, Hyeonggon, >> > >> > Hyeonggon Yoo <42.hyeyoo@gmail.com> writes: >> > >> >> On Wed, Feb 01, 2023 at 01:09:10AM +0900, Hyeonggon Yoo wrote: >> >>> I've observed random list_del corruption on mm-unstable, >> >>> where HEAD is commit d69862e693c069f4 >> >>> ("mm/migrate: convert putback_movable_pages() to use folios"). >> >>> >> >>> The issue can be easily reproduced by stressing MM multiple times: >> >>> stress-ng --bigheap 0 --timeout 300 >> >>> >> >>> The compiler is gcc 12.2.1 and config, dmesg are included as attachment. >> >>> I will try to bisect but can't promise quick resolution :) >> >> >> >> >> >> The first bad commits appears to be: >> >> c203c6d5b3f0597 ("migrate_pages: batch _unmap and _move") >> >> >> >> the first bad commit _probably_ be earlier, but this is quite >> >> easy to reproduce so at this point I think above is the real bad commit. >> > >> > Thank you very much for reporting the bug. I'm in travel now but I will >> > try to find some time to reproduce and debug it. >> >> Still haven't reproduced the issue. But after reviewing the code, I >> found a bug in the code, which may cause list corruption. Can you try >> the debug patch below? > > Unfortunately my home server seems to be broken again :( > That means I only have access to VMs and not a real machine now. > > FYI it was not reproduced on KVM but reproduced on real machine. > Could you try checking on your machine with the config I attached? [1] Thank you very much for testing! > Sorry to bother your travel! Never mind. Your report helps me very much! > [1] https://marc.info/?l=linux-mm&m=167518135116956 I have reproduced the bug successfully! And I can reproduce the bug with the previous debug patch too, although the reproduction rate isn't high. And in my test, the following patch can fix the bug. It appears that zswap code will touch dst->lru during moving page. Best Regards, Huang, Ying -------------------------8<---------------------------------- >From b2e3f4aab16d8af0033286fde669b46e7467c7ec Mon Sep 17 00:00:00 2001 From: Huang Ying Date: Fri, 3 Feb 2023 22:03:24 +0800 Subject: [PATCH] dbg,migrate_pages: restore destination folio state before move --- mm/migrate.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index 143d96775b4d..fa7212330cb6 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1225,13 +1225,19 @@ static int __migrate_folio_move(struct folio *src, struct folio *dst, int page_was_mapped = 0; struct anon_vma *anon_vma = NULL; bool is_lru = !__PageMovable(&src->page); + struct list_head *prev; __migrate_folio_extract(dst, &page_was_mapped, &anon_vma); + prev = dst->lru.prev; + list_del(&dst->lru); rc = move_to_new_folio(dst, src, mode); - if (rc != -EAGAIN) - list_del(&dst->lru); + if (rc == -EAGAIN) { + list_add(&dst->lru, prev); + __migrate_folio_record(dst, page_was_mapped, anon_vma); + return rc; + } if (unlikely(!is_lru)) goto out_unlock_both; @@ -1251,11 +1257,6 @@ static int __migrate_folio_move(struct folio *src, struct folio *dst, lru_add_drain(); } - if (rc == -EAGAIN) { - __migrate_folio_record(dst, page_was_mapped, anon_vma); - return rc; - } - if (page_was_mapped) remove_migration_ptes(src, rc == MIGRATEPAGE_SUCCESS ? dst : src, false); -- 2.35.1