From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2BFBAC433EF for ; Fri, 22 Jul 2022 16:47:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 886406B0072; Fri, 22 Jul 2022 12:47:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 835DB6B0073; Fri, 22 Jul 2022 12:47:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6AF318E0001; Fri, 22 Jul 2022 12:47:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 555B96B0072 for ; Fri, 22 Jul 2022 12:47:21 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 2BA641C676B for ; Fri, 22 Jul 2022 16:47:21 +0000 (UTC) X-FDA: 79715316282.10.7C262D8 Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2079.outbound.protection.outlook.com [40.107.244.79]) by imf05.hostedemail.com (Postfix) with ESMTP id 933B5100073 for ; Fri, 22 Jul 2022 16:47:19 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=jMckJ+Usff/V1P+FimdQmjIjdxjV+1nlCgvZS60PP4zT4xtN7fB7f8KyqVtPnbBqOmFilTdPU502yXfxWB0ERSet0ZbAi3T2JsiXtldTUqWaztHQdlb8Uu80jz+NxmDp75Nq61Go9QerKMicBrYJWaK+3qHDW7Cp/RJMRgj/MbPHAeF8fvkjOst1pWYyIjt1vXZdzvcxMdjbQH73Ry5PKyCGJ+ubXX1bAqNzziBlSXh49GmDFSadXQimQ5PKbY7ZdrmOawsvvL+/2RiT3wbyEhRbDWZOxT6QEIfTEz0TASWNusXw+5nHMIzSoEqSKNtmTwdl4BYBjMGSvdbrjJsPlQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Vha+4MLUsPQbSlTJwHYnENOXxmaZn6f7Wh/FKw09PiU=; b=g1HJUcaQJyzmhanEyM6ganEKqXzViMiO9/4S7nJmSh6h2xdmM8vvvcMhr48TG9MP6Zk5bbx2T3MEtOu6B8EeQXcU1GoDE567iTx6SkWBIk7746ZTzmNXa4gX/q7eDU0jS+Us0U6PDpOUZc8Yth+kwVt2thxEvGz8YbnjoMSXTSOYVQcRhtO5A51JYXyri4p2EqQqM46OVr7MaF14dreEjSg2UpZe5MzrRdlWnIQhkfr7/a4PCsWg7lMwJ1TPAGAsB3jmfSu4Qgr88pgMaj3WxjV9Vr8SrnVeNoQ2AmeTNtXAtHQBs/ztK5OMT3c7xZvuTkANMnsEhGAu+vr8fULeEQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Vha+4MLUsPQbSlTJwHYnENOXxmaZn6f7Wh/FKw09PiU=; b=SXsGlw3Yipn+OsRbpy7PDqClQLDBqqmL1OS4kFqMil9wzMXl8py3Xw8h0YrApLpF29a26/Lkaf34Q/bU04GZ39vS4QPPKzlcvzyHEdOYxCkdAvz8CDvo6LLARtndibVYaoUDcbv5QdIvbTfJDFnqzcfQCjg5FBgEdyDld7Yu7XdxdGG0TGQvUglrbga0b4lmrDuGfrHz0zGy3sqywTdCSyFi5ed6nhECU7joPOxsvE+6ZNmOuH9cP8zPKp7Uht2PvuGpSZbrNCi4CzbgUrJtpHoHJM0nY5C7Rh7pR6/L1nKmSaXsMQK1bdJm0aHVSetnC0/rq7qPitCd1rskmjk+0g== Received: from MN0PR12MB6150.namprd12.prod.outlook.com (2603:10b6:208:3c6::11) by BYAPR12MB2824.namprd12.prod.outlook.com (2603:10b6:a03:97::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5458.19; Fri, 22 Jul 2022 16:47:16 +0000 Received: from MN0PR12MB6150.namprd12.prod.outlook.com ([fe80::30b4:54b4:c046:2e14]) by MN0PR12MB6150.namprd12.prod.outlook.com ([fe80::30b4:54b4:c046:2e14%3]) with mapi id 15.20.5458.019; Fri, 22 Jul 2022 16:47:16 +0000 Content-Type: multipart/alternative; boundary="------------J1t9LU4OgWe2JXtb5yLXAnoL" Message-ID: Date: Fri, 22 Jul 2022 09:47:14 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: hmm_range_fault interaction between different drivers Content-Language: en-US To: Jason Gunthorpe , Felix Kuehling Cc: "linux-mm@kvack.org" , Alistair Popple , "Yang, Philip" References: <0d9a7320-e639-ad7b-c45a-644914d73c2e@amd.com> <20220722153409.GB58735@nvidia.com> From: Ralph Campbell In-Reply-To: <20220722153409.GB58735@nvidia.com> X-ClientProxiedBy: SJ0PR05CA0190.namprd05.prod.outlook.com (2603:10b6:a03:330::15) To MN0PR12MB6150.namprd12.prod.outlook.com (2603:10b6:208:3c6::11) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 6eaf762e-5191-4416-a111-08da6c01d5ed X-MS-TrafficTypeDiagnostic: BYAPR12MB2824:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: WbcTev4UjpyGS5g+FVrtyxkMGfB6GrTMN/ed3jPDDrtt+59i1KuwptJ1tM+A06r9tOV84J/D26u7JFQkiHGs4Xxj29IKhZZJ+tIniYMF0hhPPcM89dk6Rgv+gU/36zQczulN8rNY6wPMebs99VeC+u7AixslgV89avACqem7PPFTzLl2EMWe5fA4QI2kjkYHK3XjK0FaUZ3uEUPV5l+/Kpx1zYrri83a9mBhJuNHQqx4VR3o1rCeDLghk9aheh+NT6AEUoYSKWV4aIBYlh/fRWZKll8PMupLabDPoXyVlmRAZtcZ75Ys+pQvdUtdfe7orATF8/RlLcsUJmX69j9gp9O7T0jO4LFtPfx6w+wn4bfCXSE5NwSq0ls2sPBJTb/nDi7DScpLUWl2NxRAWZ25JrILc4cSnQ4+KI9ImzzbG1TTrM/xdmUoD1wYFoCjL1dtX11opGLB7K5/yhRVNzGqdhC1tGt5hwqqcYTsotZUvxB08+EfL9OnplGCbzY8ge4SNOFpQag70+/A14ZgkVU7BfCwlTpWWwE0wXsRtFpuFhn/wfYw8e3nd1ciczA6iijB7qJveTdmqy5Qj41dmon/TRMJpImhub6oXXWhLwvfNmFds/sGhLvjsdQYCV01NMbVk3lA7sfZFaiG7EflrTe6/17V0/kHTWb3UpGEjpqk7D/ifDRkteISkeU/Cd6WG2Ij292f7kNkD1czQXvdnqch4R3QAfTcplOGQROjaY6ESlUnA/DbFVCEvIWyIphyE4Xsbhisr14GFIxrBU76Hs9BZAaJ25d7ngknSHJAesC6xpI2uKWVR+s1+3O7mIiuQt7dtIOsX9iqlfxQ3pXHlSo+nw== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:MN0PR12MB6150.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230016)(4636009)(396003)(366004)(39860400002)(136003)(376002)(346002)(110136005)(2906002)(316002)(54906003)(31696002)(86362001)(8936002)(5660300002)(66556008)(38100700002)(4326008)(66946007)(8676002)(66476007)(186003)(478600001)(31686004)(6486002)(36756003)(41300700001)(6512007)(33964004)(6506007)(53546011)(83380400001)(2616005)(45980500001)(43740500002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?SXBOb1g4L3dDdlU3ZVRROGVjUzNlU3NqZ3pxY25ua2xUcjk4K1BiZ0J6K0FD?= =?utf-8?B?SXJLQ25BTk9KcHk1UjZJUGVEb1ZRaHc3cGRGbi8rYW5lNW5VOS9rekFJeWNq?= =?utf-8?B?ZXJyb0NUQTAvSHZDc1hUV2t5b05sZXZzcEFONWZYWk1sZUNsRUhqZDhhZlkx?= =?utf-8?B?bWZtZWpYamUwbjMyRWhUK0ZGb1hCNEdydE5oMSs5dDBJTTZzZTVqTEQ1OCtr?= =?utf-8?B?VWZIVkpnTmFrWDJjQWxXamEySzdsNGRWRzJlbmtkWjlFTVJRWmJPa2NqT2tL?= =?utf-8?B?cEJVRXV1OGdMUCt6TmNXZ0VWYWxpZVpBM3VhL20zOVMvcXJxNHc0cFhZVjR4?= =?utf-8?B?T2U0UEo0VDFLZG9NOC9Wc29WcHFzL3Q1cytuNE9JSytJMWZ1aTQ5a1NIb3RU?= =?utf-8?B?SlNxQk5FSythV2tZaW1SM3M4Z1hTTEtnWjU1R0E5RXpuZUhGamR6aE5aaThO?= =?utf-8?B?SHBqa295Y1dINW9ITU5kejRFSWNWTkdvVXRkRzcwNjIwWlFLaGkrSU1Va2ly?= =?utf-8?B?WlBCYWl6NDQvRHhtTE1HNFk1aGVkWnhQNVdVdXNHU25Hdm8xMmU4eUF3WTlN?= =?utf-8?B?N3l2ZlIzTmt0TjhzY2hMV01vQlVqWk1IWWtHUGFXbHJBU2I2NUViQkZmS004?= =?utf-8?B?Y21nZFpyUDd6VWp1eTZOSFpOT3pqRy9ubkZTa0wwaU0yb2VsL21lUndvZW5H?= =?utf-8?B?Lzh1ajdsSVpMYUQ0U2sraFlSbUhITnVVa2dnRGEraTZTcTNTd0x6NGxDc0pr?= =?utf-8?B?WW1obUJRZ29Pa0l0WlAwbzFzYkt6MzR3dkZhRGEvWjdLbjBlOVRNRXVmTERD?= =?utf-8?B?UkxQM2Q1Kzc1MG9BbUN3RmszOGQ5eVpNZlR6dzhycldhbDNEZ25UejY0cU9P?= =?utf-8?B?OGtTbGhXNkFVQmhmODduanBheDZhZXZnOGY5a0wyZ2dpeXlMSWF6Tmd0c3pp?= =?utf-8?B?QStvQWFPZTNtVnRncjFRTml5UGQzeGdsd1M1TW8xK0swYnhjVnBZRFdHdVYx?= =?utf-8?B?VmhYL1VFK3FjSm9NWUp5SXpBMk1oYndnV0o4TVRRR3FoQUgwTGNFS25NYkIr?= =?utf-8?B?ZkIwbWR4NnlEak9kNGRDdDZPRHYxRkVoTmtGL3AySHBWdUdDSldnYkxUV1VT?= =?utf-8?B?dUE5amo0YkJPY3lQZTJ3dlQ1TDVsYXRrMlh0N2hRYnRsSVNPZGNHdktUQ2JI?= =?utf-8?B?eUEyRFVISkZkZlVMZ1VTdXJZdHBLdHpVS0NvK215bVkwcFBMQW5FTDFFQ0hV?= =?utf-8?B?a3FSNytNM2czSVZLSHBzcU0xZWVSTERhai9nUjVkUFMyaHJUQmVIcXc2MFpx?= =?utf-8?B?NDhyVUVLQk9uRmd2K2pYMTduQjNSQlJMdzFlZ3VCTFpxV3h5Z2MzeUZ6UFoz?= =?utf-8?B?a28wZS9yMlhLRFVXakt2YkpOa0Z0VkN2M0ZYN29ZTjJHcXA2QitWN0NSaGd6?= =?utf-8?B?OWtzbHZoVStFQktFTlVYayt3bUMrenJYby9YVS9Hd3FmQ1A5MEQ4dWpwSThx?= =?utf-8?B?Zm5aaE8zU3F4Rlc5amRaM2k5V0hrWjdkT1BBYUtQOHcrU2h0QUFzRnVZRjg3?= =?utf-8?B?MnRMTkFva3VBMS9obFVtbjVhVGVFQnVrbExYbHNDWlA3ZUcxTlFQeE9qcC8z?= =?utf-8?B?V0IzRXFEVTZJbDBWWVZHNVVSdlVDZlFKaVVqTjJCWE5GbVFjZHhvQ3JwYm9M?= =?utf-8?B?TlYxZ3JRdlFqRFBndm9EVFR6UkdKRmkrRWtuZDgrTlJDVVZNem8rTEFhangx?= =?utf-8?B?c1I1V1ptN0tWUUxtMExXMHYySVhDYUhoNUkvNlp4K3JXcy9NMUJSa09TZkZm?= =?utf-8?B?OVhmRDJTTHM5aTdSQjRmVDJKV2ZyMXdSblJwTCtOeVNFdHg0MjNkSWtnMmx3?= =?utf-8?B?N01SV0xEREcybWR6aG5FWHVNaDdITUkxSW80cEVsOHRIa2hjVDNkVjlDUzJz?= =?utf-8?B?cVU1czV0aVA2OVFVVDJnT0hjejhJRkdab3QrcTVENzkzTnFBcTlHQTZGS3Fm?= =?utf-8?B?T0VPNkRlVllkcXBRRW0rK3ZHTDgxajRoUkZxNWJLK1FKWExkZmM5NDl5V2Nn?= =?utf-8?B?bkdzTGFmaUVERHRybS9BTnBlaGlDMEtXcFNuNndWWU1uVzhaTmJlbmJpRVNH?= =?utf-8?B?TGJwblBhVlVuZ1FpdWo3dlpVaHZPdXlhUkthZjdaS01zRERxVHRHOXdMYUNS?= =?utf-8?Q?KALYEvYsDsfijIyPv7vzZ0w=3D?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 6eaf762e-5191-4416-a111-08da6c01d5ed X-MS-Exchange-CrossTenant-AuthSource: MN0PR12MB6150.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Jul 2022 16:47:16.7784 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: IBFwaKWafqs7FnBoTquOLeOXfA1J5zL7tgdwqUyZ0UfnTIoqYGESj4xQAmYyAclPRPNDsb1/4ucynFFP71lc9g== X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR12MB2824 ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1658508440; a=rsa-sha256; cv=pass; b=JZ0ViAOt25CVCxxs+LU+YOdslXfpRP2El8nNRs8ICMr2Gg52A4bssDkKHJarnwoNASljlA YGTeXNF4+FoPTwHtgozsj7vPeVBZ6SrE+3h5qIq6/dnWAX2+sLASnH4GOmIpurqTWRi3nY EpuOROzaUFIZIoGuCI0QwU3npPzp+IQ= ARC-Authentication-Results: i=2; imf05.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=SXsGlw3Y; dmarc=pass (policy=reject) header.from=nvidia.com; spf=none (imf05.hostedemail.com: domain of rcampbell@nvidia.com has no SPF policy when checking 40.107.244.79) smtp.mailfrom=rcampbell@nvidia.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1658508440; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Vha+4MLUsPQbSlTJwHYnENOXxmaZn6f7Wh/FKw09PiU=; b=MM7RdkTEsX+MSw/KllvGv8UDUpv2s9pB1dsunRA9hOtGJ4by1/Ow/cV5nLT7qbL7JSRrCg +zpJN1fuCGxmPzgQBVNCVbIKTMFaKkHQgIu/4HwQ5cZ7qZdF+wu+PwAxwYqQoJzRDVOkfo rN3lhiZboOyp+0PtbBZ85yIcdiiNmk4= X-Rspamd-Queue-Id: 933B5100073 Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=SXsGlw3Y; dmarc=pass (policy=reject) header.from=nvidia.com; spf=none (imf05.hostedemail.com: domain of rcampbell@nvidia.com has no SPF policy when checking 40.107.244.79) smtp.mailfrom=rcampbell@nvidia.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: 5nc7zhkez59uu5ghaucrn8jxfhrtu1z1 X-HE-Tag: 1658508439-853460 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --------------J1t9LU4OgWe2JXtb5yLXAnoL Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 7/22/22 08:34, Jason Gunthorpe wrote: > On Thu, Jul 21, 2022 at 07:00:23PM -0400, Felix Kuehling wrote: >> Hi all, >> >> We're noticing some unexpected behaviour when the amdgpu and Mellanox >> drivers are interacting on shared memory with hmm_range_fault. If the amdgpu >> driver migrated pages to DEVICE_PRIVATE memory, we would expect >> hmm_range_fault called by the Mellanox driver to fault them back to system >> memory. But that's not happening. Instead hmm_range_fault fails. >> >> For an experiment, Philip hacked hmm_vma_handle_pte to treat DEVICE_PRIVATE >> pages like device_exclusive pages, which gave us the expected behaviour. It >> would result in a dev_pagemap_ops.migrate_to_ram callback in our driver, and >> hmm_range_fault would return system memory pages to the Mellanox driver. >> >> So something is clearly wrong. It could be: >> >> * our expectations are wrong, >> * the implementation of hmm_range_fault is wrong, or >> * our driver is missing something when migrating to DEVICE_PRIVATE memory. >> >> Do you have any insights? > I think it is a bug > > Jason Yes, looks like a bug to me too. In hmm_vma_handle_pte(), it calls hmm_is_device_private_entry() which correctly handles the case where the device private entry is owned by the driver calling hmm_range_fault() but then does nothing to fault in the page if it is a device private entry not owned by the driver. I'll work with Alistair and one of us will post a fix. Thanks for finding this! --------------J1t9LU4OgWe2JXtb5yLXAnoL Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit

    
On 7/22/22 08:34, Jason Gunthorpe wrote:
On Thu, Jul 21, 2022 at 07:00:23PM -0400, Felix Kuehling wrote:
Hi all,

We're noticing some unexpected behaviour when the amdgpu and Mellanox
drivers are interacting on shared memory with hmm_range_fault. If the amdgpu
driver migrated pages to DEVICE_PRIVATE memory, we would expect
hmm_range_fault called by the Mellanox driver to fault them back to system
memory. But that's not happening. Instead hmm_range_fault fails.

For an experiment, Philip hacked hmm_vma_handle_pte to treat DEVICE_PRIVATE
pages like device_exclusive pages, which gave us the expected behaviour. It
would result in a dev_pagemap_ops.migrate_to_ram callback in our driver, and
hmm_range_fault would return system memory pages to the Mellanox driver.

So something is clearly wrong. It could be:

 * our expectations are wrong,
 * the implementation of hmm_range_fault is wrong, or
 * our driver is missing something when migrating to DEVICE_PRIVATE memory.

Do you have any insights?
I think it is a bug

Jason
Yes, looks like a bug to me too. In hmm_vma_handle_pte(), it calls
hmm_is_device_private_entry() which correctly handles the case where
the device private entry is owned by the driver calling hmm_range_fault()
but then does nothing to fault in the page if it is a device private
entry not owned by the driver.

I'll work with Alistair and one of us will post a fix.
Thanks for finding this!
--------------J1t9LU4OgWe2JXtb5yLXAnoL--