From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8BDC2C433ED for ; Sat, 24 Apr 2021 05:28:30 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 88A50613DB for ; Sat, 24 Apr 2021 05:28:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 88A50613DB Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=e16-tech.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 886126B0036; Sat, 24 Apr 2021 01:28:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8368C6B006C; Sat, 24 Apr 2021 01:28:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 724DF6B006E; Sat, 24 Apr 2021 01:28:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0138.hostedemail.com [216.40.44.138]) by kanga.kvack.org (Postfix) with ESMTP id 589276B0036 for ; Sat, 24 Apr 2021 01:28:28 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 00C638248047 for ; Sat, 24 Apr 2021 05:28:27 +0000 (UTC) X-FDA: 78066130296.03.7EAF89C Received: from out20-110.mail.aliyun.com (out20-110.mail.aliyun.com [115.124.20.110]) by imf08.hostedemail.com (Postfix) with ESMTP id 2939180192D4 for ; Sat, 24 Apr 2021 05:28:04 +0000 (UTC) X-Alimail-AntiSpam:AC=CONTINUE;BC=0.05311311|-1;CH=green;DM=|CONTINUE|false|;DS=CONTINUE|ham_regular_dialog|0.0223009-0.00100575-0.976693;FP=0|0|0|0|0|-1|-1|-1;HT=ay29a033018047205;MF=wangyugui@e16-tech.com;NM=1;PH=DS;RN=3;RT=3;SR=0;TI=SMTPD_---.K3D-eY._1619242101; Received: from 192.168.2.112(mailfrom:wangyugui@e16-tech.com fp:SMTPD_---.K3D-eY._1619242101) by smtp.aliyun-inc.com(10.147.44.129); Sat, 24 Apr 2021 13:28:21 +0800 Date: Sat, 24 Apr 2021 13:28:27 +0800 From: Wang Yugui To: Yang Shi Subject: Re: kernel BUG at mm/huge_memory.c:2736(linux 5.10.29) Cc: "Kirill A. Shutemov" , Linux MM In-Reply-To: References: <20210423160753.6A51.409509F4@e16-tech.com> Message-Id: <20210424132826.89B1.409509F4@e16-tech.com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.75.03 [en] X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 2939180192D4 X-Stat-Signature: 14o7wxb51em8z5nxt8y13ktoyktehjue Received-SPF: none (e16-tech.com>: No applicable sender policy available) receiver=imf08; identity=mailfrom; envelope-from=""; helo=out20-110.mail.aliyun.com; client-ip=115.124.20.110 X-HE-DKIM-Result: none/none X-HE-Tag: 1619242084-235763 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, > On Fri, Apr 23, 2021 at 1:07 AM Wang Yugui wrote: > > > > Hi, > > > > > With this patch, the problem yet not happen after 4 tests(5.10.x). > > > > With this patch , another problem happened at 6th test. > > > > kernel BUG at mm/huge_memory.c:2343! > > static void unmap_page(struct page *page) > > { > > enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | > > TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD; > > bool unmap_success; > > > > VM_BUG_ON_PAGE(!PageHead(page), page); > > > > if (PageAnon(page)) > > ttu_flags |= TTU_SPLIT_FREEZE; > > > > unmap_success = try_to_unmap(page, ttu_flags); > > L2343:VM_BUG_ON_PAGE(!unmap_success,page); > > Thanks for running the test. This is what I expected from the debug > patch. It means try_to_unmap() didn't unmap the huge page > successfully. The huge page is PTE-mapped, try_to_unmap() is supposed > to unmap every mapped subpage. But it seems it didn't unmap any > subpage at all (the refcount of the huge page is 512 per the log from > earlier email). > > By reading the code, I didn't figure out what went wrong yet. You > mentioned that the 5.4.x kernel is fine, so may you try to do some > bisect? This maybe happen on some memory reclaim path. Our application need to process the file about 300G-400G. We have 4 servers, two servers have 192G memory, 1 server has 512G memory, 1 server has 768G memory. If the memory(total memory * 10 / 12 - 120G) is enough to process the files, no temp file is needed. else, we will write the buffer to temp file, and continue to process another part. this problem happened on the server with 192G memory && kernel 5.10.x, but yet not happen on the server with kernel 5.4.x || total memory>=512G. so this maybe a timing problem too. debug code maybe userful than code bisect? fedora with new linux kernel configured with CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y, so new linux kernel with CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y maybe not well tested? Best Regards Wang Yugui (wangyugui@e16-tech.com) 2021/04/24