From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5F03C433F5 for ; Wed, 11 May 2022 20:36:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D7CE66B0074; Wed, 11 May 2022 16:36:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D2B266B0075; Wed, 11 May 2022 16:36:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BF2556B0078; Wed, 11 May 2022 16:36:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id AC2AB6B0074 for ; Wed, 11 May 2022 16:36:58 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id 8A26282187 for ; Wed, 11 May 2022 20:36:58 +0000 (UTC) X-FDA: 79454621316.05.0D0A01F Received: from smtp-2.orcon.net.nz (smtp-2.orcon.net.nz [60.234.4.43]) by imf12.hostedemail.com (Postfix) with ESMTP id 6FD3E4009A for ; Wed, 11 May 2022 20:36:35 +0000 (UTC) Received: from [121.99.247.178] (port=10273 helo=creeky) by smtp-2.orcon.net.nz with esmtpa (Exim 4.90_1) (envelope-from ) id 1not4n-00080s-4K; Thu, 12 May 2022 08:36:53 +1200 Date: Thu, 12 May 2022 08:36:48 +1200 From: Michael Cree To: Yu Zhao Cc: Linux-MM , linux-kernel , Hillf Danton , Joonsoo Kim Subject: Re: Alpha: rare random memory corruption/segfault in user space bisected Message-ID: References: <20220507015646.5377-1-hdanton@sina.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-GeoIP: NZ X-Spam_score: -2.9 X-Spam_score_int: -28 X-Spam_bar: -- X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 6FD3E4009A X-Stat-Signature: 9sfhg6n9e6idnaxdi5md6wqwbmz8hcqy X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=orcon.net.nz; spf=pass (imf12.hostedemail.com: domain of mcree@orcon.net.nz designates 60.234.4.43 as permitted sender) smtp.mailfrom=mcree@orcon.net.nz X-HE-Tag: 1652301395-335360 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, May 07, 2022 at 11:27:15AM -0700, Yu Zhao wrote: > On Fri, May 6, 2022 at 6:57 PM Hillf Danton wrote: > > > > On Sat, 7 May 2022 09:21:25 +1200 Michael Cree wrote: > > > Alpha kernel has been exhibiting rare and random memory > > > corruptions/segaults in user space since the 5.9.y kernel. First seen > > > on the Debian Ports build daemon when running 5.10.y kernel resulting > > > in the occasional (one or two a day) build failures with gcc ICEs either > > > due to self detected corrupt memory structures or segfaults. Have been > > > running 5.8.y kernel without such problems for over six months. > > > > > > Tried bisecting last year but went off track with incorrect good/bad > > > determinations due to rare nature of bug. After trying a 5.16.y kernel > > > early this year and seen the bug is still present retried the bisection > > > and have got to: > > > > > > aae466b0052e1888edd1d7f473d4310d64936196 is the first bad commit > > > commit aae466b0052e1888edd1d7f473d4310d64936196 > > > Author: Joonsoo Kim > > > Date: Tue Aug 11 18:30:50 2020 -0700 > > > > > > mm/swap: implement workingset detection for anonymous LRU > > This commit seems innocent to me. While not ruling out anything, i.e., > this commit, compiler, qemu, userspace itself, etc., my wild guess is > the problem is memory barrier related. Two lock/unlock pairs, which > imply two full barriers, were removed. This is not a small deal on > Alpha, since it imposes no constraints on cache coherency, AFAIK. > > Can you please try the attached patch on top of this commit? Thanks! Thanks, I have that running now for a day without any problem showing up, but that's not long enough to be sure it has fixed the problem. Will get back to you after another day or two of testing. Cheers, Michael.