From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46798C3DA6E for ; Wed, 20 Dec 2023 18:01:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B64F66B0071; Wed, 20 Dec 2023 13:01:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B15356B007B; Wed, 20 Dec 2023 13:01:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9DCAC6B0080; Wed, 20 Dec 2023 13:01:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 8EB816B0071 for ; Wed, 20 Dec 2023 13:01:15 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 5619D80C27 for ; Wed, 20 Dec 2023 18:01:15 +0000 (UTC) X-FDA: 81587963310.16.A0E45A2 Received: from mail-io1-f51.google.com (mail-io1-f51.google.com [209.85.166.51]) by imf14.hostedemail.com (Postfix) with ESMTP id 415CB100017 for ; Wed, 20 Dec 2023 18:01:09 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=RNUS5Prw; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf14.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.51 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703095270; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZJzSEoEvuif1BDrUFvDqH2tIiQ4fF92h4jSTRbuCOLY=; b=bty8gbJZZIOSGS3ng8zTg6mKRrQTcx8A189kkZI2gq5DnsBRv4vcv4EGA2pkNe1L7eJE+p wp9cEcM0ydq5nkCaWYEX6QsoWYefSRtHMbT2trn8t9otfkOHfwvhsw9wga97ydnT4JKSVi z1LOphjjH8VTtiOh+cHPCmfumQTb3HA= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=RNUS5Prw; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf14.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.51 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703095270; a=rsa-sha256; cv=none; b=m1mY3gKVnTKsg8uZ4/4gyUUvdZtO7BcrulY+Nlvl1O5/wV7xfmh/peyn3ExWN71oPs4T5e p0hhRvI+ZsIjOvAVD25sD5RbxwG/9vDb3NU2ALfiHI+3W/3f2n8wvbs7oemm5EQrOn1R8T oeBJ5vleW1kV7UG7nOA0kYmKpM8coRQ= Received: by mail-io1-f51.google.com with SMTP id ca18e2360f4ac-7b7fdde8b56so68352439f.1 for ; Wed, 20 Dec 2023 10:01:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1703095269; x=1703700069; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ZJzSEoEvuif1BDrUFvDqH2tIiQ4fF92h4jSTRbuCOLY=; b=RNUS5PrwjV0pP3mDir81cubMHt2IiY6/rYlMRQH1GzBkM67PcGCIu5fOz6611taeft 8fDQLOWnf13/FNqScfNVgp9t/CIcX8Rtm/Q8rqMK8mK39AHF3QwD4gLKUssJ/YhizIsJ sQn4yZSc0qU7rQItDrMb9LMepWT0GpnL29kB3gek8bs41on4dpBpeLxeQgjveODJ2o4x 1L11tPpOP2u1x5zQGSyxQKi9EfZziaB2rymNe5USg5lpQDInOiT1aTBagq5+THhEpi+X SgZ0EsCrx3UGck0oFgKwfGzm+/+pDRPIyvkwIlzn/EXaP8V0oX7b/Wevd22Y57hcQ4x1 y8kA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703095269; x=1703700069; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZJzSEoEvuif1BDrUFvDqH2tIiQ4fF92h4jSTRbuCOLY=; b=iy+XMvXtDX1b2zqBMsJ4mkT8tKodp5BCgPozzXBQX2aWjRhsEIEK+tmQqV0hQxz+uv ahLUxQsPZWWh5AyWZQpX/Mu0tsJKr29aQWY8vPNg8fzYIjpD6IGrVCGlaSA/WyTqnDMv fIp+rgFS0MaZotvSPEwhy1sQ3BW2hrMSQF9reK8yFRD1n5DvnvSb/i/Pkzr8C/vF36zC y8fUGC04GJ/KszMqa5PhP4W+4whgkcZhsXF8gtp1XwqNU7jJ31uCGisG4pPOP/DAfNfb x1Jpz8Ps44AOdajdSFOk/vUWfzMepvjYWtCkLHpRTvrHMeAGDsQ8bKuN5KwUucTBsZZF qIbw== X-Gm-Message-State: AOJu0Yy2vCqNl5vx5dXnxfqguNoxIENXNjRCjo9lHRl32pG81V0i0+cQ 95pB0D/PTgUvXEh+3rrMRZfcsePrRaWyWVc+RYs= X-Google-Smtp-Source: AGHT+IFyV8d2RNp2PDX4fUdZmr5runKhc8tqWt5xHA/RVVcp4rAfS4is9TW8I+KHHVVp9KONswzhGSHykb5YQalhmzs= X-Received: by 2002:a05:6602:2c53:b0:7b3:9356:665 with SMTP id x19-20020a0566022c5300b007b393560665mr27215712iov.4.1703095268836; Wed, 20 Dec 2023 10:01:08 -0800 (PST) MIME-Version: 1.0 References: <61273e5e9b490682388377c20f52d19de4a80460.1703054559.git.baolin.wang@linux.alibaba.com> In-Reply-To: <61273e5e9b490682388377c20f52d19de4a80460.1703054559.git.baolin.wang@linux.alibaba.com> From: Nhat Pham Date: Wed, 20 Dec 2023 10:00:57 -0800 Message-ID: Subject: Re: [PATCH] mm: memcg: fix split queue list crash when large folio migration To: Baolin Wang Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, david@redhat.com, ying.huang@intel.com, shy828301@gmail.com, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 415CB100017 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: t5xbjsq35rruhnxm4ndkyca1dbqd9jzu X-HE-Tag: 1703095269-107748 X-HE-Meta: U2FsdGVkX19R7oyDLkU/bOmQ0PntV3371gXTM4uT9lEt3QxPRhrl8ZY2yxvxAxvRWOHOOICu8wj7TMnDuuPSl0A8zq2crI0x/gqyDFfo356LNzrsjwn/tYU3N1LvNkzL+Q+ZcNFeqbe55IF1SQgYMPQQfdxrvr448EcBqNGf7Yzabpssud1uyHpyazf9lKrmsFz9UY/uRgjmHDmL0m/Ji+KAREzHB+zSRsfUZwsmyYJx8YfiSmqOG8RF8egF0S0fxax6s6ottnR7ToX09yGIum24p9ARQkdxUdCTB2sgdEvt/A87QCsqPGOn1Uuus0WORW2xFrMY6cAQ2YfA/jXpXK44gxiKjZRt3K0u/VGwT6ZKlcK2sGY2254ypUkoEH6o1a5EnR9PaiQNgUaicFTf/8gxyRRe8tIJs1bxP+zjXjIiWCyTolZv+7ovgPdiPSYpYKBT5QzH/4hRpWuH04PmXP5j6duK34iSZA8dVCQmtA8U6e1MYZUDnzQTKosVx95vUAnOWB9DVyeJsUPmPV+648RGzxS3WB0a78ZE7ghij2RcAwOpz/oXECBe4kg6WYG0CqfPdac/FEbgntzlWIxuV5EvdsRBYGRVOKdiix+37M8SLOdrQ6IdYLiwdph3gQ4O4rdgfsIemwVdGld8UY1I4cT9/pWGQ+QDiKSHDtxLuJLEjOfFLcxYAKdwKfDb7FLafDahmON7As6DKtBALKDGqzDrpFqo6l7rCRqv1D4cNxmAO1HKcnb3r8t7fr4w2Fesax2icG/JRxc3DK/j1emRBdW8TGPHVvsSdubBpcYGlQBTd+VdsS/Q1ge94PFNvkz0JjW85zapB6Xu7PKd+MWIY/0cVh9/n+SKU9BS0hoYc1hpXkvzUGCdp6J3e94yOYvWFgNlmz6dsMmp30ENPj0J6key2wc/MroZbz5kSVrwnPWbFF3o/qL6uFKhMl0X6xqFNWoNif0thi0uzyHLpcz 4JW8lvYM mynykNXSGKhxDlC18G9J2inOFVbnLvqxXkNSr X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Dec 19, 2023 at 10:52=E2=80=AFPM Baolin Wang wrote: > > When running autonuma with enabling multi-size THP, I encountered the fol= lowing > kernel crash issue: > > [ 134.290216] list_del corruption. prev->next should be fffff9ad42e1c490= , > but was dead000000000100. (prev=3Dfffff9ad42399890) > [ 134.290877] kernel BUG at lib/list_debug.c:62! > [ 134.291052] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > [ 134.291210] CPU: 56 PID: 8037 Comm: numa01 Kdump: loaded Tainted: > G E 6.7.0-rc4+ #20 > [ 134.291649] RIP: 0010:__list_del_entry_valid_or_report+0x97/0xb0 > ...... > [ 134.294252] Call Trace: > [ 134.294362] > [ 134.294440] ? die+0x33/0x90 > [ 134.294561] ? do_trap+0xe0/0x110 > ...... > [ 134.295681] ? __list_del_entry_valid_or_report+0x97/0xb0 > [ 134.295842] folio_undo_large_rmappable+0x99/0x100 > [ 134.296003] destroy_large_folio+0x68/0x70 > [ 134.296172] migrate_folio_move+0x12e/0x260 > [ 134.296264] ? __pfx_remove_migration_pte+0x10/0x10 > [ 134.296389] migrate_pages_batch+0x495/0x6b0 > [ 134.296523] migrate_pages+0x1d0/0x500 > [ 134.296646] ? __pfx_alloc_misplaced_dst_folio+0x10/0x10 > [ 134.296799] migrate_misplaced_folio+0x12d/0x2b0 > [ 134.296953] do_numa_page+0x1f4/0x570 > [ 134.297121] __handle_mm_fault+0x2b0/0x6c0 > [ 134.297254] handle_mm_fault+0x107/0x270 > [ 134.300897] do_user_addr_fault+0x167/0x680 > [ 134.304561] exc_page_fault+0x65/0x140 > [ 134.307919] asm_exc_page_fault+0x22/0x30 > > The reason for the crash is that, the commit 85ce2c517ade ("memcontrol: o= nly > transfer the memcg data for migration") removed the charging and unchargi= ng > operations of the migration folios and cleared the memcg data of the old = folio. > > During the subsequent release process of the old large folio in destroy_l= arge_folio(), > if the large folio needs to be removed from the split queue, an incorrect= split > queue can be obtained (which is pgdat->deferred_split_queue) because the = old > folio's memcg is NULL now. This can lead to list operations being perform= ed > under the wrong split queue lock protection, resulting in a list crash as= above. Ah this is tricky. I think you're right - the old folio's memcg is used to get the deferred split queue, and we cleared it here :) > > After the migration, the old folio is going to be freed, so we can remove= it > from the split queue in mem_cgroup_migrate() a bit earlier before clearin= g the > memcg data to avoid getting incorrect split queue. > > Fixes: 85ce2c517ade ("memcontrol: only transfer the memcg data for migrat= ion") > Signed-off-by: Baolin Wang > --- > mm/huge_memory.c | 2 +- > mm/memcontrol.c | 11 +++++++++++ > 2 files changed, 12 insertions(+), 1 deletion(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 6be1a380a298..c50dc2e1483f 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -3124,7 +3124,7 @@ void folio_undo_large_rmappable(struct folio *folio= ) > spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > if (!list_empty(&folio->_deferred_list)) { > ds_queue->split_queue_len--; > - list_del(&folio->_deferred_list); > + list_del_init(&folio->_deferred_list); > } > spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); > } > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index ae8c62c7aa53..e66e0811cccc 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -7575,6 +7575,17 @@ void mem_cgroup_migrate(struct folio *old, struct = folio *new) > > /* Transfer the charge and the css ref */ > commit_charge(new, memcg); > + /* > + * If the old folio a large folio and is in the split queue, it n= eeds > + * to be removed from the split queue now, in case getting an inc= orrect > + * split queue in destroy_large_folio() after the memcg of the ol= d folio > + * is cleared. > + * > + * In addition, the old folio is about to be freed after migratio= n, so > + * removing from the split queue a bit earlier seems reasonable. > + */ > + if (folio_test_large(old) && folio_test_large_rmappable(old)) > + folio_undo_large_rmappable(old); This looks reasonable to me :) Reviewed-by: Nhat Pham > old->memcg_data =3D 0; > } > > -- > 2.39.3 >