From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 106A3C10F00 for ; Fri, 6 Mar 2020 13:30:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C5B352073B for ; Fri, 6 Mar 2020 13:30:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C5B352073B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 719066B0005; Fri, 6 Mar 2020 08:30:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6C9996B0006; Fri, 6 Mar 2020 08:30:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5B7CA6B0007; Fri, 6 Mar 2020 08:30:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0054.hostedemail.com [216.40.44.54]) by kanga.kvack.org (Postfix) with ESMTP id 4527D6B0005 for ; Fri, 6 Mar 2020 08:30:47 -0500 (EST) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id E78B44DDB for ; Fri, 6 Mar 2020 13:30:46 +0000 (UTC) X-FDA: 76565022492.05.join88_40a8f31557540 X-HE-Tag: join88_40a8f31557540 X-Filterd-Recvd-Size: 6595 Received: from out30-54.freemail.mail.aliyun.com (out30-54.freemail.mail.aliyun.com [115.124.30.54]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Fri, 6 Mar 2020 13:30:42 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R161e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04452;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=17;SR=0;TI=SMTPD_---0TrpcGV2_1583501424; Received: from IT-FVFX43SYHV2H.local(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0TrpcGV2_1583501424) by smtp.aliyun-inc.com(127.0.0.1); Fri, 06 Mar 2020 21:30:25 +0800 Subject: Re: [failures] mm-vmscan-remove-unnecessary-lruvec-adding.patch removed from -mm tree To: Hugh Dickins , Qian Cai Cc: Matthew Wilcox , LKML , Andrew Morton , aarcange@redhat.com, daniel.m.jordan@oracle.com, hannes@cmpxchg.org, khlebnikov@yandex-team.ru, kirill@shutemov.name, kravetz@us.ibm.com, mhocko@kernel.org, mm-commits@vger.kernel.org, tj@kernel.org, vdavydov.dev@gmail.com, yang.shi@linux.alibaba.com, linux-mm@kvack.org References: <20200306025041.rERhvnYmB%akpm@linux-foundation.org> <211632B1-2D6F-4BFA-A5A0-3030339D3D2A@lca.pw> <20200306033850.GO29971@bombadil.infradead.org> <97EE83E1-FEC9-48B6-98E8-07FB3FECB961@lca.pw> From: Alex Shi Message-ID: Date: Fri, 6 Mar 2020 21:30:24 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: =E5=9C=A8 2020/3/6 =E4=B8=8B=E5=8D=8812:17, Hugh Dickins =E5=86=99=E9=81=93= : >>> >>> Subject: Re: [PATCH v9 00/21] per lruvec lru_lock for memcg >> >> I don=E2=80=99t see it on lore.kernel or anywhere. Private email? >=20 > You're right, sorry I didn't notice, lots of ccs but > neither lkml nor linux-mm were on that thread from the start: My fault, I thought people would often give comments on each patch, will = care this from now on. >=20 > And now the bad news. >=20 > Andrew, please revert those six (or seven as they ended up in mmotm). > 5.6-rc4-mm1 without them runs my tmpfs+loop+swapping+memcg+ksm kernel > build loads fine (did four hours just now), but 5.6-rc4-mm1 itself > crashed just after starting - seconds or minutes I didn't see, > but it did not complete an iteration. >=20 > I thought maybe those six would be harmless (though I've not looked > at them at all); but knew already that the full series is not good yet: > I gave it a try over 5.6-rc4 on Monday, and crashed very soon on simple= r > testing, in different ways from what hits mmotm. >=20 > The first thing wrong with the full set was when I tried tmpfs+loop+ > swapping kernel builds in "mem=3D700M cgroup_disabled=3Dmemory", of cou= rse > with CONFIG_DEBUG_LIST=3Dy. That soon collapsed in a splurge of OOM kil= ls > and list_del corruption messages: __list_del_entry_valid < list_del < > __page_cache_release < __put_page < put_page < __try_to_reclaim_swap < > free_swap_and_cache < shmem_free_swap < shmem_undo_range. I have been run kernel build with a "mem=3D700M cgroup_disabled=3Dmemory"= qemu-kvm with a swapfile for 3 hours, Hope I could catch sth while waiting for you= r=20 kindly reproduce scripts. Thanks Hugh! >=20 > When I next tried with "mem=3D1G" and memcg enabled (but not being used= ), > that managed some iterations, no OOM kills, no list_del warnings (was > it swapping? perhaps, perhaps not, I was trying to go easy on it just > to see if "cgroup_disabled=3Dmemory" had been the problem); but when > rebooting after that, again list_del corruption messages and crash > (I didn't note them down). >=20 > So I didn't take much notice of what the mmotm crash backtrace showed > (but IIRC shmem and swap were in it). Is there some place to get mmotm's crash backtrace? >=20 > Alex, I'm afraid you're focusing too much on performance results, > without doing the basic testing needed - I thought we had given you > some hints on the challenging areas (swapping, move_charge_at_immigrate= , > page migration) when we attached a *correctly working* 5.3 version back > on 23rd August: >=20 > https://lore.kernel.org/linux-mm/alpine.LSU.2.11.1908231736001.16920@eg= gly.anvils/ >=20 > (Correctly working, except missing two patches I'd mistakenly dropped > as unnecessary in earlier rebases: but our discussions with Johannes > later showed to be very necessary, though their races rarely seen.) >=20 Did you mean the Johannes's question of race on page->memcg in previous e= mail? "> I don't see what prevents the lruvec from changing under compaction, > neither in your patches nor in Hugh's. Maybe I'm missing something?" https://lkml.org/lkml/2019/11/22/2153 >From then on, I have tired 2 solutions to protect page->memcg,=20 first use lock_page_memcg(wrong) and 2nd new solution, taking PageLRU bit= as page=20 isoltion precondition which may work for memcg migration, and page=20 migration in compaction etc. Could you like to give some comments on this= ? > I have not had the time (and do not expect to have the time) to review > your series: maybe it's one or two small fixes away from being complete= , > or maybe it's still fundamentally flawed, I do not know. I had naively > hoped that you would help with a patchset that worked, rather than > cutting it down into something which does not.>=20 Sorry, Hugh, I didn't know you have per memcg lru_lock patchset before I = sent=20 out my first verion. > Submitting your series to routine testing is much easier for me than > reviewing it: but then, yes, it's a pity that I don't find the time > to report the results on intervening versions, which also crashed. >=20 > What I have to do now, is set aside time today and tomorrow, to package > up the old scripts I use, describe them and their environment, and send > them to you (cc akpm in case I fall under a bus): so that you can > reproduce the crashes for yourself, and get to work on them. >=20 Thanks advance for your coming testing scripts, I believe it will help a = lot. BTW, I try my best to orgnize this patches to make it stright, a senior e= xperts like you, won't cost much time to go through whole patches. and give some= precious comment!=20 I am looking forward to hear comments from you. :) Thanks Alex