From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3C28C43334 for ; Wed, 29 Jun 2022 12:18:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 36B8B8E0003; Wed, 29 Jun 2022 08:18:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 31CBF8E0001; Wed, 29 Jun 2022 08:18:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 194758E0003; Wed, 29 Jun 2022 08:18:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0ABF08E0001 for ; Wed, 29 Jun 2022 08:18:19 -0400 (EDT) Received: from smtpin31.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id CC9A420378 for ; Wed, 29 Jun 2022 12:18:18 +0000 (UTC) X-FDA: 79631175876.31.878B0B0 Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf18.hostedemail.com (Postfix) with ESMTP id 537181C003A for ; Wed, 29 Jun 2022 12:18:17 +0000 (UTC) Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 25TC3XJj032380; Wed, 29 Jun 2022 12:18:16 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : references : date : in-reply-to : message-id : mime-version : content-type : content-transfer-encoding; s=pp1; bh=lwqjL+rTVMTV/YnIkpDWycZp39fG1HJxqZxK4n7g3uQ=; b=bFomUyV4QLfopuHJVW5HEv97d/sby4G1R+HBOplFF/OX5F8f5Lfi3qd9n7dnebpKaJzL dIndpQwfqI3VBOG+s/rgYbcJ+DL3sXnJ/AxnnX/S62E8q3LCz2DrlRCpPow6zhrG91pT 3b733DYBBdl6CbzjZIazYTsKyX/9dFqbhq7xdLqysC6YKp57AlBJOWUEgq0e39Ajk9b8 mDngcpMu/zA2btw+G9uGbQD5UAYz/HEHK0BSUf8/0wDu6gWsF7rseO0j/2sxcgxE+TxB efPMi4NfUSktnXtZlG5P1FkLT28WG8tm4AJa7/rKkhMTk0FlB7gSg5MDB+wk7YE6a93K Pg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3h0pdkgegv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 29 Jun 2022 12:18:16 +0000 Received: from m0098416.ppops.net (m0098416.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 25TC51VG009406; Wed, 29 Jun 2022 12:18:15 GMT Received: from ppma04fra.de.ibm.com (6a.4a.5195.ip4.static.sl-reverse.com [149.81.74.106]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3h0pdkgefy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 29 Jun 2022 12:18:15 +0000 Received: from pps.filterd (ppma04fra.de.ibm.com [127.0.0.1]) by ppma04fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 25TC8FGJ003555; Wed, 29 Jun 2022 12:18:13 GMT Received: from b06cxnps4074.portsmouth.uk.ibm.com (d06relay11.portsmouth.uk.ibm.com [9.149.109.196]) by ppma04fra.de.ibm.com with ESMTP id 3gwt094u37-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 29 Jun 2022 12:18:13 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 25TCIAKp21299524 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Jun 2022 12:18:10 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1AFC811C04C; Wed, 29 Jun 2022 12:18:10 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7E1AF11C04A; Wed, 29 Jun 2022 12:18:09 +0000 (GMT) Received: from tuxmaker.linux.ibm.com (unknown [9.152.85.9]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Wed, 29 Jun 2022 12:18:09 +0000 (GMT) From: Sven Schnelle To: Alex =?utf-8?Q?Benn=C3=A9e?= Cc: David Hildenbrand , Janosch Frank , Liam Howlett , Heiko Carstens , Claudio Imbrenda , Andrew Morton , Guenter Roeck , "maple-tree@lists.infradead.org" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , Yu Zhao , Juergen Gross , Vasily Gorbik , Alexander Gordeev , Christian Borntraeger , Andreas Krebbel , Ilya Leoshkevich , Thomas Huth , richard.henderson@linaro.org, qemu-devel@nongnu.org, qemu-s390x@nongnu.org Subject: Re: qemu-system-s390x hang in tcg References: <20220426150616.3937571-24-Liam.Howlett@oracle.com> <20220428201947.GA1912192@roeck-us.net> <20220429003841.cx7uenepca22qbdl@revolver> <20220428181621.636487e753422ad0faf09bd6@linux-foundation.org> <20220502001358.s2azy37zcc27vgdb@revolver> <20220501172412.50268e7b217d0963293e7314@linux-foundation.org> <20220502133050.kuy2kjkzv6msokeb@revolver> <20220503215520.qpaukvjq55o7qwu3@revolver> <60a3bc3f-5cd6-79ac-a7a8-4ecc3d7fd3db@linux.ibm.com> <15f5f8d6-dc92-d491-d455-dd6b22b34bc3@redhat.com> <87pmirj3aq.fsf@linaro.org> Date: Wed, 29 Jun 2022 14:18:09 +0200 In-Reply-To: (Sven Schnelle's message of "Wed, 29 Jun 2022 12:46:01 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 8AhSG-fhtXsT03k4RQQHlt278RxqkdfQ X-Proofpoint-GUID: cz-P2Qr80a7G7CgtNCMwmrsq8Y0a_bjz X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-06-29_16,2022-06-28_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 impostorscore=0 lowpriorityscore=0 bulkscore=0 mlxscore=0 malwarescore=0 mlxlogscore=999 spamscore=0 priorityscore=1501 phishscore=0 clxscore=1015 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2204290000 definitions=main-2206290043 ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=bFomUyV4; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf18.hostedemail.com: domain of svens@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=svens@linux.ibm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1656505097; a=rsa-sha256; cv=none; b=vCmzJjrmNcRFsAAPL1THSMEXR7LqcIswhe7Z4KHnTXITcj4Qk0Gh0KP1vU6VIp3UnluOa8 rg5x52KQmU3v9pV7b3PQLFJgXKXmMl3EaJABXKyxmnZDyv+hITU6/uUxuR3KfWawM+jFvP EtU+zsFBKDHj1VkqnGbe9DWtMoJxRfY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1656505097; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lwqjL+rTVMTV/YnIkpDWycZp39fG1HJxqZxK4n7g3uQ=; b=kFKyKzo+l+cJpek43koBUZHNMGbhUFtQHj/sCNJC1LeW4UeupRhH/uy7g/VSEgEUPqNZCQ tuEjD+nqyOCvAmDwMqjC78SB4BNpzgfYg1jmiQr/3RvvcOmUQFBpz5YvGJrOwSGqn51Wr2 ++dzHJlOyxTmFYm4+VRZxm6vtQG7oEY= X-Stat-Signature: aro9jmi6aohjqb1f9u7y9gx5fbch3ijn X-Rspamd-Server: rspam08 X-Rspam-User: X-Rspamd-Queue-Id: 537181C003A Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=bFomUyV4; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf18.hostedemail.com: domain of svens@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=svens@linux.ibm.com X-HE-Tag: 1656505097-737890 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Sven Schnelle writes: > Alex Benn=C3=A9e writes: > >> Sven Schnelle writes: >> >>> Hi, >>> >>> David Hildenbrand writes: >>> >>>> On 04.05.22 09:37, Janosch Frank wrote: >>>>> I had a short look yesterday and the boot usually hangs in the raid6= =20 >>>>> code. Disabling vector instructions didn't make a difference but a fe= w=20 >>>>> interruptions via GDB solve the problem for some reason. >>>>>=20 >>>>> CCing David and Thomas for TCG >>>>>=20 >>>> >>>> I somehow recall that KASAN was always disabled under TCG, I might be >>>> wrong (I thought we'd get a message early during boot that the HW >>>> doesn't support KASAN). >>>> >>>> I recall that raid code is a heavy user of vector instructions. >>>> >>>> How can I reproduce? Compile upstream (or -next?) with kasan support a= nd >>>> run it under TCG? >>> >>> I spent some time looking into this. It's usually hanging in >>> s390vx8_gen_syndrome(). My first thought was that it is a problem with >>> the VX instructions, but turned out that it hangs even if i remove all >>> the code from s390vx8_gen_syndrome(). >>> >>> Tracing the execution of TB's, i see that the generated code is always >>> jumping between a few TB's, but never exiting the TB's to check for >>> interrupts (i.e. return to cpu_tb_exec(). I only see calls to >>> helper_lookup_tb_ptr to lookup the tb pointer for the next TB. >>> >>> The raid6 code is waiting for some time to expire by reading jiffies, >>> but interrupts are never processed and therefore jiffies doesn't change. >>> So the raid6 code hangs forever. >>> >>> As a test, i made a quick change to test: >>> >>> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c >>> index c997c2e8e0..35819fd5a7 100644 >>> --- a/accel/tcg/cpu-exec.c >>> +++ b/accel/tcg/cpu-exec.c >>> @@ -319,7 +319,8 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env) >>> cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags); >>> >>> cflags =3D curr_cflags(cpu); >>> - if (check_for_breakpoints(cpu, pc, &cflags)) { >>> + if (check_for_breakpoints(cpu, pc, &cflags) || >>> + unlikely(qatomic_read(&cpu->interrupt_request))) { >>> cpu_loop_exit(cpu); >>> } >>> >>> And that makes the problem go away. But i'm not familiar with the TCG >>> internals, so i can't say whether the generated code is incorrect or >>> something else is wrong. I have tcg log files of a failing + working run >>> if someone wants to take a look. They are rather large so i would have = to >>> upload them somewhere. >> >> Whatever is setting cpu->interrupt_request should be calling >> cpu_exit(cpu) which sets the exit flag which is checked at the start of >> every TB execution (see gen_tb_start). > > Thanks, that was very helpful. I added debugging and it turned out > that the TB is left because of a pending irq. The code then calls > s390_cpu_exec_interrupt: > > bool s390_cpu_exec_interrupt(CPUState *cs, int interrupt_request) > { > if (interrupt_request & CPU_INTERRUPT_HARD) { > S390CPU *cpu =3D S390_CPU(cs); > CPUS390XState *env =3D &cpu->env; > > if (env->ex_value) { > /* Execution of the target insn is indivisible from > the parent EXECUTE insn. */ > return false; > } > if (s390_cpu_has_int(cpu)) { > s390_cpu_do_interrupt(cs); > return true; > } > if (env->psw.mask & PSW_MASK_WAIT) { > /* Woken up because of a floating interrupt but it has already > * been delivered. Go back to sleep. */ > cpu_interrupt(CPU(cpu), CPU_INTERRUPT_HALT); > } > } > return false; > } > > Note the 'if (env->ex_value) { }' check. It looks like this function > just returns false in case tcg is executing an EX instruction. After > that the information that the TB should be exited because of an > interrupt is gone. So the TB's are never exited again, although the > interrupt wasn't handled. At least that's my assumption now, if i'm > wrong please tell me. Looking at the code i see CF_NOIRQ to prevent TB's from getting interrupted. But i only see that used in the core tcg code. Would that be a possibility, or is there something else/better? Sorry for the dumb questions, i'm not often working on qemu ;-)