From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24CB5C3DA42 for ; Wed, 17 Jul 2024 10:31:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7F1196B009A; Wed, 17 Jul 2024 06:31:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7A11F6B009C; Wed, 17 Jul 2024 06:31:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 619D06B009D; Wed, 17 Jul 2024 06:31:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 440EF6B009A for ; Wed, 17 Jul 2024 06:31:22 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A13FF80948 for ; Wed, 17 Jul 2024 10:31:21 +0000 (UTC) X-FDA: 82348877562.01.9668053 Received: from NAM04-DM6-obe.outbound.protection.outlook.com (mail-dm6nam04on2070.outbound.protection.outlook.com [40.107.102.70]) by imf23.hostedemail.com (Postfix) with ESMTP id AFF73140020 for ; Wed, 17 Jul 2024 10:31:18 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=fSE7a85q; spf=pass (imf23.hostedemail.com: domain of bharata@amd.com designates 40.107.102.70 as permitted sender) smtp.mailfrom=bharata@amd.com; arc=pass ("microsoft.com:s=arcselector10001:i=1"); dmarc=pass (policy=quarantine) header.from=amd.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721212258; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6hehbd0jXiTF29YA/Couh5GxYYXmEOHOMsFTUqQ7vW0=; b=nknXdSG6UXmTzSOJi987+jQm3AtatUIfTzpVCsKf9WGDf+iF7ygreZDTJnEK60IbczAOYR YjLr+UJ9qoB7G01t0yK068Mmlijy8ucknv90N95tV9Dj2pb6MV2bstTlHjOxbDTDWXFpdq bcL6v8gjANExyZeR0j8lvXayA6De45k= ARC-Authentication-Results: i=2; imf23.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=fSE7a85q; spf=pass (imf23.hostedemail.com: domain of bharata@amd.com designates 40.107.102.70 as permitted sender) smtp.mailfrom=bharata@amd.com; arc=pass ("microsoft.com:s=arcselector10001:i=1"); dmarc=pass (policy=quarantine) header.from=amd.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1721212258; a=rsa-sha256; cv=pass; b=GLbybCHaIoiqWdJmroTIU3v4fUd8Mzb4Xxpbv6o1MhNXz/F3W0jZ8oUEEX8BF5tzPgAtSJ WSDo836twdM+kAJzQjwvkolf4cGQ3Qm8d6U7+905xe/kqXeul3mkGv56L92Ck+tkC2HkbV oZB9JbJK0Raa8JzhK1btqfNLWLrnr3g= ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=fd8f8fkavQPQem+WyWsTi4gb0zpv1fQVTpyliJC9x8HWXg0nmk9d0GaNF+r7WftRB9oSQ/mnNNOy36y7dcnU349ySqk3pwZ9X+vC0fAP6Erl3t5AO7d2c7d0r5916V5FS1yYJHaNd1hz7YaaNesYQfVmp1trREQQyWO+CGtq8SsOINg6Fq6LxqQ3uyR/pe+DLXWRRj0ArT7RtZ57qeU0PFak3XIDojAWrZXgZ8IZ51uauGCNTCwp2o3GDn0NoPpqEPrJLm/VLPX+GOdy8AEWmLgWJiEVpLr55EphuRi2pjLqB2TnKsvOFCGxrQaSUVSh/1FbIoPaD78SXDD5ZWPGtw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=6hehbd0jXiTF29YA/Couh5GxYYXmEOHOMsFTUqQ7vW0=; b=apyf+5Nh0z6szb2z9YGE9mAW0zr0VLlFe0qBoVe7q3TQmeP3U6vXKroZyYdQ5Xtn+8qJ7K47uq/aFbDo7jFMzFNNaeuffrpo9i16A/CqyA+uyPJxiI3hu+TTruEkDY8KdBETA3P0JQfgrCvChTJFxBmCbfR+tTjWWCMzOGSLF6+x4ocsSOo0irG2jp/dgW0ylQ+ww30muBiomHZwCdMHc1dUfBzgJ6PEIJWEtMyZ7E6yPGTWz75usG1CHTB4svqtQ7dE0vcbh7vI7l0KA+bxe7RekouXZoCVGz5wlJab4+Vob0is4SyYRFiHHNoIjLM89ETnmZzAKQorHwN2MQLtxQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=6hehbd0jXiTF29YA/Couh5GxYYXmEOHOMsFTUqQ7vW0=; b=fSE7a85q9S3xyOAUE6knsWTn5QqpaFMTfuc6Cy+lOdr2XN6+FJJn2Ibuk/zuH8DZ4kwybfx0Fyb7Er8MiA2pGqLOSIFkcPqdHs172SRREqLL0WK6X/pZFLUxgrHe3rpvkcF8PkEWgO5GufhNavXYhXphxhuHRYJsRx0co1zX9eo= Received: from IA1PR12MB6434.namprd12.prod.outlook.com (2603:10b6:208:3ae::10) by CY8PR12MB8193.namprd12.prod.outlook.com (2603:10b6:930:71::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7784.17; Wed, 17 Jul 2024 10:31:14 +0000 Received: from IA1PR12MB6434.namprd12.prod.outlook.com ([fe80::dbf7:e40c:4ae9:8134]) by IA1PR12MB6434.namprd12.prod.outlook.com ([fe80::dbf7:e40c:4ae9:8134%3]) with mapi id 15.20.7762.027; Wed, 17 Jul 2024 10:31:14 +0000 Message-ID: Date: Wed, 17 Jul 2024 16:01:05 +0530 User-Agent: Mozilla Thunderbird Subject: Re: Hard and soft lockups with FIO and LTP runs on a large system To: Vlastimil Babka , linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, nikunj@amd.com, "Upadhyay, Neeraj" , Andrew Morton , David Hildenbrand , willy@infradead.org, yuzhao@google.com, kinseyho@google.com, Mel Gorman , Mateusz Guzik References: <3128c3c0-ede2-4930-a841-a1da56e797d7@suse.cz> Content-Language: en-US From: Bharata B Rao In-Reply-To: <3128c3c0-ede2-4930-a841-a1da56e797d7@suse.cz> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-ClientProxiedBy: PN3PR01CA0135.INDPRD01.PROD.OUTLOOK.COM (2603:1096:c01:bf::6) To IA1PR12MB6434.namprd12.prod.outlook.com (2603:10b6:208:3ae::10) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: IA1PR12MB6434:EE_|CY8PR12MB8193:EE_ X-MS-Office365-Filtering-Correlation-Id: 24cbf99e-b0c1-460f-6781-08dca64b95bd X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|7416014|376014|1800799024; X-Microsoft-Antispam-Message-Info: =?utf-8?B?WkgwVUhUL1BQcUlQS2pwY1JkVkFVYWUxZ2U3b05oUEl6NGRhNERoam1xOHho?= =?utf-8?B?ckZNbjRqSkg0UFY1NHZXM0JpSm9GdjNqN1RjbkkwdFNBMVhwcHVXQmFWWmRq?= =?utf-8?B?RmtWQmlZWk0rZ2VQMkhTWFJybER4alYyOXowbUU4ZUhRU2hFQVRHUzVLcW9X?= =?utf-8?B?Szk5MlQzRXJ3RkVzQzB5SERibEdkeHhhTjg3MHRJZzQ0NWZ5WFl6L3pTSDBH?= =?utf-8?B?SDdOUVZrMTV4WllFaGJyUEhzOXZXMHpDOUVMa1pQK2lOdHRqQmszR1BYK0pL?= =?utf-8?B?RHJpUTYrTEtBbjh4b0RuR1lyRU9PT2xObkFLVERicE9HV2xRakUweFh3RTNa?= =?utf-8?B?RWQ5N2lBTzNzWDE3VU93YW4zOFdYM1RFM20vc3FHVU8zZ1QzeVB1d3AxU0Nu?= =?utf-8?B?WVg5MFdGTXhyVVVFWXltLzRZVVJ1ZHZDWXVUUW9zTFkwTVBhcy8xUXdoVFlN?= =?utf-8?B?WDlLNFJrTElCSWgrUWhTZmIvV2w4VTFzT0tHOXNma2VmWGNvZS9BSHpBRDlR?= =?utf-8?B?dU81aklXWUxCcVFDM1dWN21TcnFsRWMyYTI5SFdyT2ZBNEVOMjlzV2EwV3RP?= =?utf-8?B?R00zVGIvQUZmTEk5NXlFQVlVemlnNjZsNm41d3dWOHlIell2aGJzVGVaTHBV?= =?utf-8?B?SEttdklRVFhwUlJsSHpSQkpUVENac1BpYWN2aGtYb1ViUlJCUXFUak1PRGg2?= =?utf-8?B?anc5TUVFbGttZHBuN0k5a3dpR2hrOUtnMFVHTzQ0Z1pzVG1oNU95NU5WM3pr?= =?utf-8?B?MHoxSE81c2VQeitZUlJpMWxNK1AyV3M0elZneDN4ODRaZDRSd1ZlRU1vMGdE?= =?utf-8?B?ZVYzVGxKbkYvbWwrZ3BuMW9ScUhQc0hGTDE4UU05b09sMzZkUWJSdlZRczdu?= =?utf-8?B?ZGUyZUxlQ1Y3bnNhNmo1eUdFcUNBS0k3VEg2NjU4TThZTE5IZWRldGJRd1Qw?= =?utf-8?B?a1dvZ0N6UWZJWERQaGx2aWtYS1lrckYrUjNveWY2dUxFUW5VblkzdTdTNElm?= =?utf-8?B?b3BReDFPVGZ2Mmo2ODF2ZU9sT1JGVXlxMGl5TFVIS1pYRmpnNHgxTDRpSTdj?= =?utf-8?B?YkN6WE5HVHBvaUl3SmRIWFJ6aUREeURsR3dacG0yRVczcE1oOEF1aHdGWGh5?= =?utf-8?B?ZHJhTWNXdlMwVy8yUDZrdFJvVEZQaWExMlZGTDBsWHZpY2tRZ0ZBaUwwODM4?= =?utf-8?B?YmwvMWttYXQ5emsvVEZmZjZaeDBCQzlZcVRjSjlRdkE5NmZyakFaRUFGR3dw?= =?utf-8?B?VGJZbEVDY3FXQ0hkbExQZ2pCWVNJb1RCTWNOd0RzN3RNSU5rWTJzZC9mREpr?= =?utf-8?B?WkpVbVpUaDZ1enJldEhselJLUVRFSGprVTI3TjZIeXEwSUxFaStZWi8vM2Nm?= =?utf-8?B?VVZiNCtOc3ZLZjQxMmhNRVNHUEljVTVkblBFZVF1ZGgzTVhBaFFpMlZIZmpC?= =?utf-8?B?SGZqVDJSU1I0enZlYklYaFhKS09mR3I0b2lSayt5TjhQZkhDS3V5UHhrVWpL?= =?utf-8?B?ajI1THZicHlkbXhxYVBhZVNETFFLcE1WU0ZNVjJMWGFFdEQxWW9ndWQ2VkFB?= =?utf-8?B?bEFENVNNMkkrSmhQODdGTkx6MmF1ZzdscEQ4QkVuZ1RXZXRSQ2ZMc2R6WU9P?= =?utf-8?B?Tjhodm5FV3RCMThqRUpKSWQxdXVOcTFDU0xDWlNreDU4ZExZRkoyNVVieVI2?= =?utf-8?B?a1lzaGRCYmZrWG1SVGhrUkt6MC9SZlN0RC85c01sSEpVcXprdlp4WnRHWjVl?= =?utf-8?Q?qR3Qh2392YxaqS4s48=3D?= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:IA1PR12MB6434.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(366016)(7416014)(376014)(1800799024);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?SE9ZSXl1SHpjN2FCVDB0enRHdS9NUHVzK2ZhNnNVQml3Q3JudWxEVENaZXRm?= =?utf-8?B?YVFlREpCVWpnOFFDQ2Rxa2ZWU3A3YmQ2N3F3L05MM0E0MXFLTG1wZFhRQ2x6?= =?utf-8?B?WGwxOHp4SEZJYkxMWEQvUjVTV0Z1VDB4ZGNuNGtES2U3cDVQNlJYVXovVUVZ?= =?utf-8?B?M1FrNEMrTC81ZXVGdi84T1pTQ2hITFN1ZnNTUTFRbUhJdWIxb25VSTJNZTdx?= =?utf-8?B?Y1BvemdWcEdPbDllQldPR21Pc096N0w3T3hlWnVoejVwZER2cmc1MWlUbnc0?= =?utf-8?B?M0ZJTEJFQ0xYWXpTeHNNVU1uQ25PYTVsZnBVTmhjNlNKZmhJb3FscnpKRFp2?= =?utf-8?B?U2FGWldpVFJPenRLeU02Z2d3QVlDek9tSEI0aGtLOXhqRFBNL0VYbVZNZjJL?= =?utf-8?B?R0ZIVlo2ZVpxelpxd3lJSUp0L09HUkxvN1hpN0x2d2FsR0FsY2lzLzdkeU14?= =?utf-8?B?NW9HQ3FFQ3pYcUh5SDdMYzZhL05NbDBjdzNpZXRDcThremNReW4xK0Y3ZUJw?= =?utf-8?B?b2ROOGxmc21ZNm52ZEV0blVVWUtuelpvOWNDeWhUNHBUMXBqWFBlVE9BOFU5?= =?utf-8?B?NkwrTXF5V1FQZ0NLNUJKdXhoY2k3TG9MUS9DU1RNQnZjSzVGb2wyQmd2eGtB?= =?utf-8?B?UXVxN2VoQXBXK3BjbFpoaEtzMCtXeDlndUh1VjZxSnFsMmdvc1RBNEJSU2xU?= =?utf-8?B?TlFZRmFzWjA3d09sbjZhQlorakJ6TmpUQzhEc0p0MW1WMzR4ZFVQbXNWenNY?= =?utf-8?B?RUdON2JMRkZBelpwVUZCQ2N2ZUR1UUE2dWhaazZRSEowVStSL0VZY1F6c2xo?= =?utf-8?B?bmszbU9ac3dtRWRWSm43VU5xS2hjUlJIam9vTFEwblN2dm41bXRkV3hIZW5t?= =?utf-8?B?TzVMUER5ZE9CSXU4NkhSN3MrRElUVlZJNk9LRWZDT1lpOVRHVmZ5bm5Eb2pS?= =?utf-8?B?MUZsT29iSG9sdW9SNktnVEN1UlZTay9LNE02QVEyZG9uODNlWVZiL0Fua05I?= =?utf-8?B?dnRaSkYrMldRZnBYc2lQd2ZFWW9McUQrZDFzcnF0bnJMOWpweFhWcVkwS1Fp?= =?utf-8?B?M0VXdTZjS2ZKRi9oVkttQ3FtZzJPTEZuSDVUeEFwRm9qc1pnVnpGYWE5WG5n?= =?utf-8?B?VUZydVE3RytSU1JPWFMxOU1uM0Z0TU9NMndLWFVKVC90ano0Ni8reHlMVGpS?= =?utf-8?B?SXo5c1I3emdobE96cTNvQjhJUGV1ak9YZnh0R1ZHejR3NE5tYTN3MjBQd0ZY?= =?utf-8?B?dGk2dDhoRS9QM05kdmFOV2VKSmVYb2J3NWE1eTg0NDRtQzBPcmovd0FibjAy?= =?utf-8?B?TnE1dXNpWkdpNXRwQ0x4UHYvRDlDRXFwZlZrMWRYSmgrRWFMTWdlM3d2ZSs2?= =?utf-8?B?a0JVNWxvaFozUC9JdFlkd051dk1zWEh0bmZPZU1qeC8vTGZ5UVJuRTRaNXR5?= =?utf-8?B?NUFTWjNpY3BjYWQzS3l6M2NTa044WWpLeXhKZEsvdGpZVXdjZS9pN2ZvYnlz?= =?utf-8?B?bFd4Ny8yQTRkTC8yNFkwdFBudjlYQzRzYkc2Q0M0UUJSanVPcVhXdFRKbk5F?= =?utf-8?B?OGpQamU5TFQwSGwzQmhRUExDUXFUSEtvSEFueUx3Y3YwZTZaUE5kVTZEWmYr?= =?utf-8?B?T2o2d1B0TXYzRXhpNnR3akZYK2FtRE81V0JkblJ0d1dQVkNMMHhpQk01ZDVS?= =?utf-8?B?TU5qNE9UZzhBMlhhVnpmc01ZZmVaQzVGK05oZ2Q0R3BPUVpMekdsU0VzNmVa?= =?utf-8?B?alBQcVpWYTd4QStKc0YwQStHSEZ2VW5RVk52Q0Z2VkF3VVNvSnAxRjVJUXNi?= =?utf-8?B?VnpCb3l6NTZlQkFjNzFKSFowUlZMQXd4WTNiL3o3VzRJTFQrOEZTT0FUeTdR?= =?utf-8?B?OWgrREZ6REJ3UDZoTUZpMkR1bW45UkNYV000NGptVGY5dDZDK1NESXlCRWxK?= =?utf-8?B?dnM3alZJdkloTlF4TVh5Uy94bGdyRlI5OTUzVldkMG1Mc1N2KzhBa3FXL0tN?= =?utf-8?B?ak1hZ3RjYWowUjh2ZzBSYml1UmlUUThTQUFyUDJuSXkzNjViV1lPZVJXdE5X?= =?utf-8?B?ekxyVWVtaWxaM0Q0RXJkRTJ3MHRTT1RQbWJ1V0tZb2F1ZVJuWjQ2YUNNNWo3?= =?utf-8?Q?bCHY+gOoHTlE4qF0CS0UWvepv?= X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-Network-Message-Id: 24cbf99e-b0c1-460f-6781-08dca64b95bd X-MS-Exchange-CrossTenant-AuthSource: IA1PR12MB6434.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 Jul 2024 10:31:14.8127 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 07X93XIWtUUOGCi8i0sGpxb3lGONVc9sRAF2DMx4lZhXbSQhYdUuTbxk0u5wlzfPvc75cfWMyY1EfW3S5yraVA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY8PR12MB8193 X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: AFF73140020 X-Stat-Signature: 4fr4j7merjeujfrhiynogsjruosa1fdm X-Rspam-User: X-HE-Tag: 1721212278-688639 X-HE-Meta: U2FsdGVkX1+XIT6+gVwHhTVbIQmK2n+ugomzfLH1th/YcRZwS6uYqYq3paIu8eeMycDdplWB3MyQrQ9S7mqNBsQW1Oa5Tk8D08KUzCICenKKtRE2quIAQ4YMcejqa54RfsXXJusurchl149J5zUeua/nrNsu79Qj1S94nlL7FD7Oo5aqZJMijWoXr9idfzvY1RE7oaLeOOraGoalnPuPLLrYkQY5exQukIGz6rMT+JBAu7fVsiO59wf4iXXSEih6fiZ7Ns0mZpI2Yn0Mc4HJryy5BU4azlywn1cqatPvXZl4ZmtnwGMM5Nbp6VSXB+GJQ1z/0pIyxu+uUQ/HBJJAe+fp4XswlBS3kJeFU7hU7uWIOEEm1CqOkRxAML0w/o9fq7FiTxY5uXJE6SHBeeKEvZ93SDvtgf+gKjdqg2roz75SSxnzTAPcB2WJ6DctO0l30BOeZlh8hPEd9R3h8CsEcoHtGD6G1K6eKPnzm0htuh6hoiMjKPTKH2WHXiXFheSTX5lOJt7I8jXOenfsdsJ/pU28x3VLYGqSeZfES0jAi1jnuK+qoYAokIpbvtdytKDNuqUF2rqnHnKySRZ69mk0kaM3S29EJf0TQvS2PwZFg//i/RiFVCePGo0IYlYVsTmpM/6/byiCY2brVy+1zYgAAGz8cXBFY6sOXTFi4k+WbkJTHai9JMi+cfKkTRxUzb6ITwjbAWiZuYu8mitYjK8BbFjZ7Qou7uybzr1YL66PHTkAUEWiYw013guCH0Wj803wvw/pVvno2bLpsfKjvIVGoZnNNH7fGjfuh0mfr/b4Y+I4moJ7ocL5ka5CmD2iBYGuZbCrC2nya4cJ0WCXcni7yq0wn9SSKmOwm3y9+ExoL7UO1NeWPGldDYEnB+iIwCf1F0DBMi6jxEqmABbEw6rctQhIeBCF8e/g2xAslk43oH+j45AqiWEuwtkHMiTKYZOAKxGuhRmunncK9hCAkmW fP4AP7BH +n8s0gcmcPxXz7sfF1xYSK/+czhmk4haUuA2oVced+yagkslJOnlXaIknbm+kbDloUw04GkupL9sWc8yIcmEYvsqT3eSfeQbXKoC/Mj0C5oXMTYMatB4iBwI5eRhqlFCg1usaFPE49u1NmAYwdo+vK8NkyEL/7MAoBEGePlMDGSM5JdAFU7YPuU45tEnpaT0bCf+YLjzAVKnQFC4XTmUimKQHgSiU3Md08DdinGbG3quPa75Z0sja3g1TKri1fco3XiV1JPHkpdXOH6r6hhdwzyZaLE8jl5lZjian9aTyXjNHIacUAGbkFukEWUp4Y66cgtm+ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 17-Jul-24 3:12 PM, Vlastimil Babka wrote: > On 7/3/24 5:11 PM, Bharata B Rao wrote: >> Many soft and hard lockups are seen with upstream kernel when running a >> bunch of tests that include FIO and LTP filesystem test on 10 NVME >> disks. The lockups can appear anywhere between 2 to 48 hours. Originally >> this was reported on a large customer VM instance with passthrough NVME >> disks on older kernels(v5.4 based). However, similar problems were >> reproduced when running the tests on bare metal with latest upstream >> kernel (v6.10-rc3). Other lockups with different signatures are seen but >> in this report, only those related to MM area are being discussed. >> Also note that the subsequent description is related to the lockups in >> bare metal upstream (and not VM). >> >> The general observation is that the problem usually surfaces when the >> system free memory goes very low and page cache/buffer consumption hits >> the ceiling. Most of the times the two contended locks are lruvec and >> inode->i_lock spinlocks. >> >> - Could this be a scalability issue in LRU list handling and/or page >> cache invalidation typical to a large system configuration? > > Seems to me it could be (except that ZONE_DMA corner case) a general > scalability issue in that you tweak some part of the kernel and the > contention moves elsewhere. At least in MM we have per-node locks so this > means 256 CPUs per lock? It used to be that there were not that many > (cores/threads) per a physical CPU and its NUMA node, so many cpus would > mean also more NUMA nodes where the locks contention would distribute among > them. I think you could try fakenuma to create these nodes artificially and > see if it helps for the MM part. But if the contention moves to e.g. an > inode lock, I'm not sure what to do about that then. See below... > >> >> 3) AMD has a BIOS setting called NPS (Nodes per socket), using which a >> socket can be further partitioned into smaller NUMA nodes. With NPS=4, >> there will be four NUMA nodes in one socket, and hence 8 NUMA nodes in >> the system. This was done to check if having more number of kswapd >> threads working on lesser number of folios per node would make a >> difference. However here too, multiple soft lockups were seen (in >> clear_shadow_entry() as seen in MGLRU case). No hard lockups were observed. These are some softlockups seen with NPS4 mode. watchdog: BUG: soft lockup - CPU#315 stuck for 11s! [kworker/315:1H:5153] CPU: 315 PID: 5153 Comm: kworker/315:1H Kdump: loaded Not tainted 6.10.0-rc3-enbprftw #12 Workqueue: kblockd blk_mq_run_work_fn RIP: 0010:handle_softirqs+0x70/0x2f0 Call Trace: __irq_exit_rcu+0x68/0x90 irq_exit_rcu+0x12/0x20 sysvec_apic_timer_interrupt+0x85/0xb0 asm_sysvec_apic_timer_interrupt+0x1f/0x30 RIP: 0010:iommu_dma_map_page+0xca/0x2c0 dma_map_page_attrs+0x20d/0x2a0 nvme_prep_rq.part.0+0x63d/0x940 [nvme] nvme_queue_rq+0x82/0x210 [nvme] blk_mq_dispatch_rq_list+0x289/0x6d0 __blk_mq_sched_dispatch_requests+0x142/0x5f0 blk_mq_sched_dispatch_requests+0x36/0x70 blk_mq_run_work_fn+0x73/0x90 process_one_work+0x185/0x3d0 worker_thread+0x2ce/0x3e0 kthread+0xe5/0x120 ret_from_fork+0x3d/0x60 ret_from_fork_asm+0x1a/0x30 watchdog: BUG: soft lockup - CPU#0 stuck for 11s! [fio:19820] CPU: 0 PID: 19820 Comm: fio Kdump: loaded Tainted: G L 6.10.0-rc3-enbprftw #12 RIP: 0010:native_queued_spin_lock_slowpath+0x2b8/0x300 Call Trace: _raw_spin_lock+0x2d/0x40 clear_shadow_entry+0x3d/0x100 mapping_try_invalidate+0x11b/0x1e0 invalidate_mapping_pages+0x14/0x20 invalidate_bdev+0x40/0x50 blkdev_common_ioctl+0x5f7/0xa90 blkdev_ioctl+0x10d/0x270 __x64_sys_ioctl+0x99/0xd0 x64_sys_call+0x1219/0x20d0 do_syscall_64+0x51/0x120 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7fc92fc3ec6b The above one (clear_shadow_entry) has since been fixed by Yu Zhao and fix is in mm tree. We had seen a couple of scenarios with zone lock contention from page free and slab free code paths, as reported here: https://lore.kernel.org/linux-mm/b68e43d4-91f2-4481-80a9-d166c0a43584@amd.com/ Would you have any insights on these? Regards, Bharata.