From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.1 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1550C433E0 for ; Wed, 6 Jan 2021 03:10:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8F02F22D6E for ; Wed, 6 Jan 2021 03:10:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8F02F22D6E Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DE0668D00D9; Tue, 5 Jan 2021 22:10:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D90198D00D1; Tue, 5 Jan 2021 22:10:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C57E28D00D9; Tue, 5 Jan 2021 22:10:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0059.hostedemail.com [216.40.44.59]) by kanga.kvack.org (Postfix) with ESMTP id B00938D00D1 for ; Tue, 5 Jan 2021 22:10:16 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 82957362D for ; Wed, 6 Jan 2021 03:10:16 +0000 (UTC) X-FDA: 77673871632.27.rest97_540a732274de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin27.hostedemail.com (Postfix) with ESMTP id 69BB93D663 for ; Wed, 6 Jan 2021 03:10:16 +0000 (UTC) X-HE-Tag: rest97_540a732274de X-Filterd-Recvd-Size: 5689 Received: from mail-oi1-f170.google.com (mail-oi1-f170.google.com [209.85.167.170]) by imf23.hostedemail.com (Postfix) with ESMTP for ; Wed, 6 Jan 2021 03:10:15 +0000 (UTC) Received: by mail-oi1-f170.google.com with SMTP id q25so1988497oij.10 for ; Tue, 05 Jan 2021 19:10:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=W/PpEKlUbN28jcfj5YrFvkIXjBCbIDL7mYlBH/tOtfs=; b=bmoh1VQZHSZjSrOM6u1ba6MIirJY9CfW1kYiAp9WYOf/riWyAnLCEGxk/5FH3HjjpE pYap8ubcX3zHFbkOERTsYBbzxYK7rH+ZsL88OuIAwJl2BWR2FyddP6Eb6+k8KfDnrLGd 2fjyOIK+OXv5qRpdFTIJOC+SSSk6qdmSazk/6zHuESaSbyhyLWM8Nmn2MX5G/LkSdZu0 i8d8eXZ+UG+wCxiHc4J8coLbNuOHX6k7dBvxkeTf2WdxrmLjb41ACPKpYp6SFcie82FK gLUHYI2dBDx0VRGINwGa1rxqAreOsOVQKJZmp2Eri5C1uU0UzFHGtSLXJwDBUD30L+lV nTfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=W/PpEKlUbN28jcfj5YrFvkIXjBCbIDL7mYlBH/tOtfs=; b=memITcsvCyDkzN4SInbgfOzW+BpN6A0TrGk3Hw5r7wdWbGQfpWWEXyXe6fnzV21qF0 RFlu+9b/+WBPJGcQWMs0ycjGpFakoqS1ur/BMSY9hGdwIjf1xBZ7SeXqRBFEYdBKbc1B W9oHhWolvBsXeTsMqyV6yx84PcW2F7Q0YWcrLYDQ2IWBX3A6KFOBZI0WBM/WxkI1KXnC yJ+PMxbBiycrZm2/gssvNQWgvE0guCd1ni89L/liSlwg50X85idEU/aj+Iv6jM+yBlOJ cFY/Ms4Jv11lYlgv9NoSLoalfwUrUl1dCPSPhcGS92OlXo9BAie/StzUWRSbfEfaIwJA GnXQ== X-Gm-Message-State: AOAM532zjiZtwLdRccDuI4QfJEdcs0ESWLV6MOM7JWXcYWOXZHE4ErIX +A1qHJy9AZm49JaEE6iQLS85nA== X-Google-Smtp-Source: ABdhPJwfDgp3jhoh27r8VrRwITx2/zk4fqf+o+Wra6UnFfGPgOP3Lw3+1mTRais5hrS2Y3X1Q4S+mA== X-Received: by 2002:aca:c697:: with SMTP id w145mr1828526oif.117.1609902614989; Tue, 05 Jan 2021 19:10:14 -0800 (PST) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id i8sm341755oih.2.2021.01.05.19.10.12 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Tue, 05 Jan 2021 19:10:14 -0800 (PST) Date: Tue, 5 Jan 2021 19:10:01 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Qian Cai cc: Hugh Dickins , Shakeel Butt , Alex Shi , Andrew Morton , Mel Gorman , Tejun Heo , Konstantin Khlebnikov , Daniel Jordan , Matthew Wilcox , Johannes Weiner , kernel test robot , Linux MM , LKML , Cgroups , Joonsoo Kim , Wei Yang , "Kirill A. Shutemov" , alexander.duyck@gmail.com, kernel test robot , Michal Hocko , Vladimir Davydov , Yang Shi Subject: Re: [PATCH v21 00/19] per memcg lru lock In-Reply-To: Message-ID: References: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com> <49be27f2652d4658f80c95bea171142c35513761.camel@redhat.com> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, 5 Jan 2021, Qian Cai wrote: > On Tue, 2021-01-05 at 13:35 -0800, Hugh Dickins wrote: > > This patchset went into mmotm 2020-11-16-16-23, so probably linux-next > > on 2020-11-17: you'll have had three trouble-free weeks testing with it > > in, so it's not a likely suspect. I haven't looked yet at your report, > > to think of a more likely suspect: will do. > > Probably my memory was bad then. Unfortunately, I had 2 weeks holidays before > the Thanksgiving as well. I have tried a few times so far and only been able to > reproduce once. Looks nasty... I have not found a likely suspect. What it smells like is a defect in cloning anon_vma during fork, such that mappings of the THP can get added even after all that could be found were unmapped (tree lookup ordering should prevent that). But I've not seen any recent change there. It would be very easily fixed by deleting the whole BUG() block, which is only there as a sanity check for developers: but we would not want to delete it without understanding why it has gone wrong (and would also have to reconsider two related VM_BUG_ON_PAGEs). It is possible that b6769834aac1 ("mm/thp: narrow lru locking") of this patchset has changed the timing and made a pre-existing bug more likely in some situations: it used to hold an lru_lock before that BUG() on total_mapcount(), and now does not; but that's not a lock which should be relevant to the check. When you get more info (or not), please repost the bugstack in a new email thread: this thread is not really useful for pursuing it. Hugh