From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BFACEC54E76 for ; Tue, 17 Jan 2023 17:45:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 27DAC6B0072; Tue, 17 Jan 2023 12:45:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2074D6B0075; Tue, 17 Jan 2023 12:45:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0316E6B0078; Tue, 17 Jan 2023 12:45:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id DFFCB6B0072 for ; Tue, 17 Jan 2023 12:45:57 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id AFF5280B4E for ; Tue, 17 Jan 2023 17:45:57 +0000 (UTC) X-FDA: 80365019154.24.416EEEA Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2080.outbound.protection.outlook.com [40.107.92.80]) by imf13.hostedemail.com (Postfix) with ESMTP id EFF182002A for ; Tue, 17 Jan 2023 17:45:53 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=N6TlA3JM; arc=pass ("microsoft.com:s=arcselector9901:i=1"); spf=pass (imf13.hostedemail.com: domain of Raghavendra.KodsaraThimmappa@amd.com designates 40.107.92.80 as permitted sender) smtp.mailfrom=Raghavendra.KodsaraThimmappa@amd.com; dmarc=pass (policy=quarantine) header.from=amd.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673977554; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=y6O6s7sOvBvNJ7RhZy/ja4XrsqQepgp+rknJCcBS2WE=; b=b/7DIpOYCVSzBbNDvyHbVoJz3D7987UcQB1s3/k9zXnHRBZj7DDDj3i4ONPie4NSfJR9n/ YouVqsyV8I6M09buNcSZrPIy9VlWdVmQUhVyEIIVtbU4G32L9//xV09QoLpfnf/nQ+0PDv /6P9b6SA/M2emHIK/rrNkLkbD3jYkbU= ARC-Authentication-Results: i=2; imf13.hostedemail.com; dkim=pass header.d=amd.com header.s=selector1 header.b=N6TlA3JM; arc=pass ("microsoft.com:s=arcselector9901:i=1"); spf=pass (imf13.hostedemail.com: domain of Raghavendra.KodsaraThimmappa@amd.com designates 40.107.92.80 as permitted sender) smtp.mailfrom=Raghavendra.KodsaraThimmappa@amd.com; dmarc=pass (policy=quarantine) header.from=amd.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1673977554; a=rsa-sha256; cv=pass; b=66HwHZX6nX9IkeB33kO+5+8R+CjxwRTcQAC7QZyIjn2qViBjj3Lw6U6uGJ6592OOGEa592 WX3R9OZz0HD6EhjKO60c+YYFiPLySSmV38YZ5/tXfGoJHl0B6iEdvxzOSZCvq+8lZbbJKs US7hGnhR8UvNITK2QvxBgcM2lxdL+SM= ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VqmuC/U1k3BOHh2Mel/m5LXcGxuuSokZvcfNF2DU980Xtv0Y2ru7vNtjp5sWpzMbNSghcCO0WurFBR5fvFzxX9rbY6OvQqH3f7tE89eUY3fQC0x6VR+J/ay5HQDOrmjpCVRRxtAryHub4fOVJ0BR0JDEAQk50tKZyFooGjPss5BLQvEUKyF0AD3Eu4T46JZt02XYYKdiZeJwPnbcG8GVKcQ73futZWOHze51Pdm4CxA8b8S9NhgrNG/t9mnPmLYk2LEakcO9jmI5IOFYIfMtubVqRKlEF7ZoFoGDI6qIaOkHzybTnEgpXR+qecrqLRP/eRnRWJKecNigaQmNzC2vWg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=y6O6s7sOvBvNJ7RhZy/ja4XrsqQepgp+rknJCcBS2WE=; b=Nwi747tEcH6EgJmaMfGmvNKTDYx2rnzaORzm9oQdMkHjav5AtO3YyTYvnc33sNWLn7SK5TNr/344Lf48to+E9nB8TkIOu934j7wXhfYK8Hr+q8F7bWZMLj1cWE6eEHcEJy6+qyM4wef/BVAcXd2uB67Fgta/+YlFcCjF4wfIreIdWCDj7kiCNEUk42Fqux3YhZPqnTTu8Tk/LT3Cc0hsrtjL0nVXrPmplWqiWXDrzKtHvxrIKi18ZexikWxEtjsZ7Zqi5sdSn60omy1zGB18Wawit+kMRxhxp+oxXg5JmAActgcPGXiTGXYp2ziTsuyXX0a8Soo2fDfuKqDaG++evA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=y6O6s7sOvBvNJ7RhZy/ja4XrsqQepgp+rknJCcBS2WE=; b=N6TlA3JMKIbDDpVHxi1ybZumvD6q68fQLECUYfZS91BXwatuMZURsVeccbTQ2B1bYjUVWu41swosRQ0/inBgokaqDkZ+8YOuIL+2ZQHQNApfAnfVqADr92sodOsotQtHpRRviDriskL/gRkjPxd9UhB6HhMgjy28l11TBhmcQ30= Received: from MN2PR12MB3008.namprd12.prod.outlook.com (2603:10b6:208:c8::17) by CY5PR12MB6429.namprd12.prod.outlook.com (2603:10b6:930:3b::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.19; Tue, 17 Jan 2023 17:45:51 +0000 Received: from MN2PR12MB3008.namprd12.prod.outlook.com ([fe80::f9e8:ee48:6cf9:afdc]) by MN2PR12MB3008.namprd12.prod.outlook.com ([fe80::f9e8:ee48:6cf9:afdc%7]) with mapi id 15.20.5986.023; Tue, 17 Jan 2023 17:45:50 +0000 Message-ID: <10a06a2f-0dfc-6f36-3b7b-f4fd03153f66@amd.com> Date: Tue, 17 Jan 2023 23:15:37 +0530 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.13.1 Subject: Re: [RFC PATCH V1 1/1] sched/numa: Enhance vma scanning logic Content-Language: en-US To: Mel Gorman Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Daniel Bristot de Oliveira , Valentin Schneider , Andrew Morton , Matthew Wilcox , Vlastimil Babka , "Liam R . Howlett" , Peter Xu , David Hildenbrand , xu xin , Yu Zhao , Colin Cross , Arnd Bergmann , Hugh Dickins , Bharata B Rao , Disha Talreja References: <67bf778d592c39d02444825c416c2ed11d2ef4b2.1673610485.git.raghavendra.kt@amd.com> <20230117145951.s2jmva4v54lfrhds@suse.de> From: Raghavendra K T In-Reply-To: <20230117145951.s2jmva4v54lfrhds@suse.de> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-ClientProxiedBy: PN2PR01CA0164.INDPRD01.PROD.OUTLOOK.COM (2603:1096:c01:26::19) To MN2PR12MB3008.namprd12.prod.outlook.com (2603:10b6:208:c8::17) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: MN2PR12MB3008:EE_|CY5PR12MB6429:EE_ X-MS-Office365-Filtering-Correlation-Id: 95da4097-6c8d-40d8-09ef-08daf8b2ac3d X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: NVtiUqj3GVVe6WD5++ZSb3nmfwpcZDnW/rOGUSaxMF1LAmGF8mPlDsJIpJdAKJR82D4Acds6Rbl1OfKmdGfdcxvUgcY+wCa8/QndssVbd5JQZR7gvSg5CgoiY/YSrp9q2xGJxfYoOpjLo9O05C9OW/2wxdUCLnsI9dlhmhhVvHFcQsDYu7xqavlDPgudgkFzMstnpa0vWQk4vIAwBNkOqSa7idDMJSABIT8UDoCy0bRWbPpmpBwdj9/xIh09maF2pNkQ6hzw/CrQSYhPWuUk4xirOa8SLcvXHvfzZVemwPwlKFe3lI8okqZFtIE+JFDE24zvnC0yBqv4P1TatuymUhDo1bAaeNDCWdZpnskj5ql1jv4lImUWC+kUuD4RxMTazAQ3+u/fASprSPnF4bMfAar/+gCPylvW8ccaEcxNu6caln7XlRsQq6eSBzHw2Xfd1SKKcw/fWJXJ/ocywRwlFnsTS7H6MSMteNfCmMdPp/JF4CnJp4xVktS185a44EBQUaHIC2GbLSv/TmzNa5hA564V69Nk7awsD7+D9CbqJGC8DnccwGk4JHlYW516k8yDYCgTm9o3Qz4s7ANViL/SNTeRLfMEAmQSQEKJSIlo6sbVkWVNzhhSz+T/aYIDDdpMslUL3gpGtunTnsbc0+B9qNthHjmPTQYnMGqr/5C0X14W1+tVkHXDa8ojyQ0tgbZFk8PGeE2RCiFwk7nMdkuX5Cv+ivo79veBFZZbPWEZ34k= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:MN2PR12MB3008.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(4636009)(376002)(396003)(136003)(346002)(366004)(39860400002)(451199015)(6506007)(26005)(6512007)(53546011)(6666004)(36756003)(186003)(6486002)(38100700002)(478600001)(31696002)(83380400001)(2616005)(66899015)(5660300002)(66556008)(7416002)(8936002)(31686004)(4326008)(66946007)(316002)(2906002)(66476007)(8676002)(54906003)(41300700001)(6916009)(45980500001)(43740500002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?SnRRamRKcDhWOURHa012bXk4eXRjdFN0SGVDeWVzU0VYa1NlWUsyeFo3MXNo?= =?utf-8?B?c0ltMDBMbG1jNkgrSEk2aTFlZnU1c01aMW9kSkNzRUZQQ093aHdydW9LOXZY?= =?utf-8?B?dHR3bmxqMjFVSEZGZmdkK3Fya1d2cnVMNXllUlZjQUNBTytDNktxRUFDSzk5?= =?utf-8?B?TCtabWkwazV3NUNPTG56U1N1eVR6NkdsQ295MnVtVkNJc3JBUDlvakdKMDBp?= =?utf-8?B?bW9WWDBaMjFYTEl0dXVXeXBqc2ZkYUVqMW9RQ01HdTltUllLZlFwOS9BcUdM?= =?utf-8?B?MEsvOVJSeGdPS0FNZFlwTE9VNW1IQjQ4UkpFeVkwZno1NG5vNlhZbmxhQUtH?= =?utf-8?B?dGNycWdidUVpdGxJSlhxTlVkUEFsRXgwWUtlVTVWY1AzL3VrWmhPOXAxdDNp?= =?utf-8?B?QjBheWRaWEt5dUl0WmVpaW0wYmd0MzcrNlNxU3YyNzdrWVprZWRKWi8reFlu?= =?utf-8?B?bkQ3endXQVBSb2xRL3MyNWZOaEdpMTdwNjd6bjdjcGFvbC9SS2I4SlhSUzdL?= =?utf-8?B?MVp4OW5qa1BIa2ZZUlczRUNHL1kxZ1FkSERFK0M1b0Mxd1R4ZVp2WmU3a3oy?= =?utf-8?B?VWFIUERwWklEVTdaN0g1V2VlVnNoNW1YZWZBSVZZZ21uazlaQThPNklnRWpo?= =?utf-8?B?VWxFQnROaU1ybTZiSzVKUVpHeVlpSGRjcHBCZDNFV1g2U2tFb0UzTWxBeG5l?= =?utf-8?B?OHlrN3NlcnlKK2xMS3BJZVNwUjVJV21GdW0vWFA1OHlPZ1dCWWIwVVpjTHox?= =?utf-8?B?VFJpTSsvNVg3T1NnR2Znb3B1YjRXTUJJUXJnTW45SUowR21GeG1pcU1DSUp5?= =?utf-8?B?cGR1dDFsN2J0YXloa0krYjZYeFlWRUNoSVROL3RMcVBqTEtjYmJEaTU4OGJJ?= =?utf-8?B?bUJPUHRLQnc3N0JWbFBhRjJQbkJvY1NMQjRVMWExSGd0RWZXVmlHdld4Z2la?= =?utf-8?B?VDkxdmxPYXZmTTFyZWdiYVkrRWV2cXRhdHJWK3ZLTG1aeGpVKy82N1lGeVhw?= =?utf-8?B?Z2k5a0Qyb0ZmZ29BcW1XSUFIY3FnZldiQXRoZkg5cDNRWEdiTHJOeWdESkVU?= =?utf-8?B?c0xtRmVuMGQ1b0tteDd6TGx1eCswNk5VZTlHbXpGbU5PekU5OFFVamJXMVJS?= =?utf-8?B?QlY2YldYdWE4YWgxNVRXTmg4djJJc2NzYkdvb1FaZXpHZnJVTlo2em5LdFlG?= =?utf-8?B?U2ljL1NOSDB4SnMzQ1MzK0w2KzYxQ3U0Tnpma3gvU05hdTVNSHdMUjlrUEd6?= =?utf-8?B?Mmlya0pZaTVnNlV2allNNlo0S3Q0UWdoZWhuVWpUdnVVSTJ5dU9DbjJOUXVy?= =?utf-8?B?UGxYQXZPZHFrKzY5bDg2bWdwSnQrRjhhWlVXQ2hOdTF1eFRHM05YeTd2cHdV?= =?utf-8?B?REZHSXJCSGVUQjVkOWY5QkRzNUxPNThuLy9aaThpV055NWU0eWRvOFdIZ282?= =?utf-8?B?Q0NoeWRreU9jVDJPUkdkbGdtM2NJQnZ3RnZGM0h1OWdUVHV3WCtMME5aV2xT?= =?utf-8?B?aEFXZ2hpT3c1d2VjeEpQODJSenFXT0g3cmM4NVIvdnNQMzZMdUJFRmlnMnlM?= =?utf-8?B?Zk9RMHByOWJXNzZBcjMzclVicDBYUkF4S3ZwbGMvMVNOT3pIWTAvbXcrZzc5?= =?utf-8?B?cEhVT3VVTWt3WEpJN0R1UFpGQnBxQUJLcDFuRjAraUdyOXhpeXlyTmtFV01F?= =?utf-8?B?WFJXWE9JTGo3SThIT2czakxIRzNLZFpmWFBpbzRZUEQwRXVjVDZuZjZyTEJD?= =?utf-8?B?L2w4QXQ5WTFPb01hK1VmL2Fzenk3UU9DdkdaNVhkWXUyK2Z3Q3B6U2VCNmt1?= =?utf-8?B?NHZZempoWmxUZkRUanFYaHN0TytpcERONi93S29tUXJycVNFdWM5NjJJNnVn?= =?utf-8?B?MkpBYUp2OXpvQXczSmt2dEJsYVN1WWI5ZkFsQTI2OXdGUGhxeEs1WHRjdCta?= =?utf-8?B?VU9jd29hZmdYUHp4dWNWdm1hM3JDcG95R1VlSWRqV0RldGFrMGZoLytmKzNY?= =?utf-8?B?ZVVNWWRBdnpKem9FMVFldEpVamhXRGJkaG1pTEhpRzg2eVFic2RZQlEwdUJk?= =?utf-8?B?QnBjUnQ3UlhBVjU4QWI3NEpqd3VVNG1PUDVDTE10b0hyVjBIYm1XMG5oTmUr?= =?utf-8?Q?fJ9ELnSg1qj4Hi7mxkNeL0ZLH?= X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-Network-Message-Id: 95da4097-6c8d-40d8-09ef-08daf8b2ac3d X-MS-Exchange-CrossTenant-AuthSource: MN2PR12MB3008.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 Jan 2023 17:45:50.7644 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: OU4bV4KT0q08SmqZxWdHQaQzBLKiia6ZNMco3gv8xv8fQ/NvUoc8QId70pU6W1wz1nXrhXMm9l+CLRFOefDE5A== X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY5PR12MB6429 X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: q1fg83qmyzf8cqieqdi7iw17oy3gfntx X-Rspamd-Queue-Id: EFF182002A X-HE-Tag: 1673977553-830349 X-HE-Meta: U2FsdGVkX1+wrNuf381ScZWhxGUkUlkgzoPGVXQM7KR8DjwSDzmT0OgJjwAXyF0wwl5FjaJxwstEgx2hfXilJITjqSLa3obBTlb2raJ61D7EZsB/hDtizz2AiQ2hzTCEqqeWijvqS5iyVeARrZYitwU7y6dx5UUojE70DcVBGhTgeWnV7WMgUrPbZVR6Puw5dHJQclTbwvGB7XZvY27TZEIgiLd5cEDWUjWVY0XpptOZk+nzGktjykYuZl6nhZguZW+CjH1PHFsLHOlU54oJNQSQTDRxexdWV2SUarc/7bhNjrhxkT6sK88lUM4LkCmWEv16mOzV8cFXbxIVvSMIU/dkazwO2rrV8vILJyLx2nvhWxqiRCk3YikiR6LbVUoctwtJFNnaLYLYNZX5jh9qVYbGPSQl8YVep/MSDtZTYsBJbkYaDGqsVnYK91qhylZRQvdoeGa+GRpw5LeMTtxFzDbii7g0yPuoYFLmgZsFEBnlMnQ4T7eoV0hIkv8WHe1/ZzRhAgoz5F/5IgqaU1dgbx2WEegp4uYl41EpK0ShpsxnAPuekdRRY2+kqpIBDqTdzga4jO1+rmN/oZ2aPzKLwX0En4q9ZTLervfgiN5+HRcBsvBcOtCcEdWXzucDeXM3VHH6dx5bqhlzZPLYiKCQGSMr9kw46+YLP6BxS7Q12Uq2wTcwBvLnjG4T6nPYwNeoEemc18fbvmdSQpcNC8WQq2oapg2O10vumpeSdx+DV1DunVlaa1KA8gX5Yzoa7Ptj2zzMbo59Wt+DUrEXgTkDZRfiJ7aH75mOb5Gwa+EWO3i3enwotZGYmH7Wy4OYrjVyu8bP/1Hz5FAYMxAXAIdP3Inqn/4ntKrzBYxkKx06nTIWbYSQ4fog9nBRvccBJosYEyRSQNpUF0mmRyVC5E4p5ncdv4Bx5bYu4q+MZD/nWrT2yQa4O1woyEjN2oNQTUD3WVnTH0zaVU3B2EJ5Bgu TUl1vcbt H5GmoU4V3oyB7eLTJenXV1TFoWOTFiWMzLjBNwBid+EWEjNXAfQSkHRGdjDb65L7ywOkiUwKdvYRa3FajOEBuwWjKkOVLrwSC/rS/t7pA0q6YywuUppf06cbHoPUu4G09FzvocyFQG2YIK1CswMM8vfNZkfaKn9TtFR8nI/jQeq/BFv4niPcU/5pWm3wmAD81yMsOhB0tEQtIJU0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 1/17/2023 8:29 PM, Mel Gorman wrote: > Note that the cc list is excessive for the topic. > Thank you Mel for the review. Sorry for the long list. (got by get_maintainer). Will trim the list for V2. > On Mon, Jan 16, 2023 at 07:05:34AM +0530, Raghavendra K T wrote: >> During the Numa scanning make sure only relevant vmas of the >> tasks are scanned. >> >> Logic: >> 1) For the first two time allow unconditional scanning of vmas >> 2) Store recent 4 unique tasks (last 8bits of PIDs) accessed the vma. >> False negetives in case of collison should be fine here. >> 3) If more than 4 pids exist assume task indeed accessed vma to >> to avoid false negetives >> >> Co-developed-by: Bharata B Rao >> (initial patch to store pid information) >> >> Suggested-by: Mel Gorman >> Signed-off-by: Bharata B Rao >> Signed-off-by: Raghavendra K T >> --- >> include/linux/mm_types.h | 2 ++ >> kernel/sched/fair.c | 32 ++++++++++++++++++++++++++++++++ >> mm/memory.c | 21 +++++++++++++++++++++ >> 3 files changed, 55 insertions(+) >> >> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h >> index 500e536796ca..07feae37b8e6 100644 >> --- a/include/linux/mm_types.h >> +++ b/include/linux/mm_types.h >> @@ -506,6 +506,8 @@ struct vm_area_struct { >> struct mempolicy *vm_policy; /* NUMA policy for the VMA */ >> #endif >> struct vm_userfaultfd_ctx vm_userfaultfd_ctx; >> + unsigned int accessing_pids; >> + int next_pid_slot; >> } __randomize_layout; >> > > This should be behind CONFIG_NUMA_BALANCING but per-vma state should also be > tracked in its own struct and allocated on demand iff the state is required. > Agree as David also pointed. I will take your patch below as base to develop per-vma struct on its own. >> struct kioctx_table; >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index e4a0b8bd941c..944d2e3b0b3c 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -2916,6 +2916,35 @@ static void reset_ptenuma_scan(struct task_struct *p) >> p->mm->numa_scan_offset = 0; >> } >> >> +static bool vma_is_accessed(struct vm_area_struct *vma) >> +{ >> + int i; >> + bool more_pids_exist; >> + unsigned long pid, max_pids; >> + unsigned long current_pid = current->pid & LAST__PID_MASK; >> + >> + max_pids = sizeof(unsigned int) * BITS_PER_BYTE / LAST__PID_SHIFT; >> + >> + /* By default we assume >= max_pids exist */ >> + more_pids_exist = true; >> + >> + if (READ_ONCE(current->mm->numa_scan_seq) < 2) >> + return true; >> + >> + for (i = 0; i < max_pids; i++) { >> + pid = (vma->accessing_pids >> i * LAST__PID_SHIFT) & >> + LAST__PID_MASK; >> + if (pid == current_pid) >> + return true; >> + if (pid == 0) { >> + more_pids_exist = false; >> + break; >> + } >> + } >> + >> + return more_pids_exist; >> +} > > I get the intent is to avoid PIDs scanning VMAs that it has never faulted > within but it seems unnecessarily complex to search on every fault to track > just 4 pids with no recent access information. The pid modulo BITS_PER_WORD > couls be used to set a bit on an unsigned long to track approximate recent > acceses and skip VMAs that do not have the bit set. That would allow more > recent PIDs to be tracked although false positives would still exist. It > would be necessary to reset the mask periodically. Got the idea but I lost you on pid modulo BITS_PER_WORD, (is it extracting last 5 or 8 bits of PID?) OR Do you intend to say we can just do vma->accessing_pids | = current_pid.. so that later we can just check if (vma->accessing_pids | current_pid) == vma->accessing_pids then it is a hit.. This becomes simple and we avoid iteration, duplicate tracking etc > > Even tracking 4 pids, a reset is periodically needed. Otherwise it'll > be vulnerable to changes in phase behaviour causing all pids to scan all > VMAs again. > Agree. Yes this will be the key thing to do. On a related note I saw huge increment in numa_scan_seq because we frequently visit scanning after the patch >> @@ -3015,6 +3044,9 @@ static void task_numa_work(struct callback_head *work) >> if (!vma_is_accessible(vma)) >> continue; >> >> + if (!vma_is_accessed(vma)) >> + continue; >> + >> do { >> start = max(start, vma->vm_start); >> end = ALIGN(start + (pages << PAGE_SHIFT), HPAGE_SIZE); >> diff --git a/mm/memory.c b/mm/memory.c >> index 8c8420934d60..fafd78d87a51 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -4717,7 +4717,28 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) >> pte_t pte, old_pte; >> bool was_writable = pte_savedwrite(vmf->orig_pte); >> int flags = 0; >> + int pid_slot = vma->next_pid_slot; >> >> + int i; >> + unsigned long pid, max_pids; >> + unsigned long current_pid = current->pid & LAST__PID_MASK; >> + >> + max_pids = sizeof(unsigned int) * BITS_PER_BYTE / LAST__PID_SHIFT; >> + > > Won't build on defconfig > OOPs! Sorry. This also should have ideally gone behind CONFIG_NUMA_BALANCING.. >> + /* Avoid duplicate PID updation */ >> + for (i = 0; i < max_pids; i++) { >> + pid = (vma->accessing_pids >> i * LAST__PID_SHIFT) & >> + LAST__PID_MASK; >> + if (pid == current_pid) >> + goto skip_update; >> + } >> + >> + vma->next_pid_slot = (++pid_slot) % max_pids; >> + vma->accessing_pids &= ~(LAST__PID_MASK << (pid_slot * LAST__PID_SHIFT)); >> + vma->accessing_pids |= ((current_pid) << >> + (pid_slot * LAST__PID_SHIFT)); >> + > > The PID tracking and clearing should probably be split out but that aside, Sure will do. > what about do_huge_pmd_numa_page? Will target this eventually, (ASAP if it is less complicated) :) > > First off though, expanding VMA size by more than a word for NUMA balancing > is probably a no-go. > Agree > This is a build-tested only prototype to illustrate how VMA could track > NUMA balancing state. It starts with applying the scan delay to every VMA > instead of every task to avoid scanning new or very short-lived VMAs. I > went back to my old notes on how I hoped to reduce excessive scanning in > NUMA balancing and it happened to be second on my list and straight-forward > to prototype in a few minutes. > Nice idea. Thanks again.. I will take this as a base patch for expansion. > diff --git a/include/linux/mm.h b/include/linux/mm.h > index f3f196e4d66d..3cebda5cc8a7 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -620,6 +620,9 @@ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm) > vma->vm_mm = mm; > vma->vm_ops = &dummy_vm_ops; > INIT_LIST_HEAD(&vma->anon_vma_chain); > +#ifdef CONFIG_NUMA_BALANCING > + vma->numab = NULL; > +#endif > } > > static inline void vma_set_anonymous(struct vm_area_struct *vma) > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 3b8475007734..3c0cfdde33e0 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -526,6 +526,10 @@ struct anon_vma_name { > char name[]; > }; > > +struct vma_numab { > + unsigned long next_scan; > +}; > + > /* > * This struct describes a virtual memory area. There is one of these > * per VM-area/task. A VM area is any part of the process virtual memory > @@ -593,6 +597,9 @@ struct vm_area_struct { > #endif > #ifdef CONFIG_NUMA > struct mempolicy *vm_policy; /* NUMA policy for the VMA */ > +#endif > +#ifdef CONFIG_NUMA_BALANCING > + struct vma_numab *numab; /* NUMA Balancing state */ > #endif > struct vm_userfaultfd_ctx vm_userfaultfd_ctx; > } __randomize_layout; > diff --git a/kernel/fork.c b/kernel/fork.c > index 9f7fe3541897..2d34c484553d 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -481,6 +481,9 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig) > > void vm_area_free(struct vm_area_struct *vma) > { > +#ifdef CONFIG_NUMA_BALANCING > + kfree(vma->numab); > +#endif > free_anon_vma_name(vma); > kmem_cache_free(vm_area_cachep, vma); > } > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index c36aa54ae071..6a1cffdfc76b 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -3027,6 +3027,23 @@ static void task_numa_work(struct callback_head *work) > if (!vma_is_accessible(vma)) > continue; > > + /* Initialise new per-VMA NUMAB state. */ > + if (!vma->numab) { > + vma->numab = kzalloc(sizeof(struct vma_numab), GFP_KERNEL); > + if (!vma->numab) > + continue; > + > + vma->numab->next_scan = now + > + msecs_to_jiffies(sysctl_numa_balancing_scan_delay); > + } > + > + /* > + * After the first scan is complete, delay the balancing scan > + * for new VMAs. > + */ > + if (mm->numa_scan_seq && time_before(jiffies, vma->numab->next_scan)) > + continue; > + > do { > start = max(start, vma->vm_start); > end = ALIGN(start + (pages << PAGE_SHIFT), HPAGE_SIZE); > Thanks - Raghu