From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 492A7C35274 for ; Thu, 21 Dec 2023 22:08:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DC1E46B0081; Thu, 21 Dec 2023 17:08:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D72056B0082; Thu, 21 Dec 2023 17:08:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C60F56B0083; Thu, 21 Dec 2023 17:08:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id B26CF6B0081 for ; Thu, 21 Dec 2023 17:08:28 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 895EB1C00B0 for ; Thu, 21 Dec 2023 22:08:28 +0000 (UTC) X-FDA: 81592215096.01.99F99FE Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf15.hostedemail.com (Postfix) with ESMTP id BC22FA001F for ; Thu, 21 Dec 2023 22:08:26 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=1GRPxZ7b; spf=pass (imf15.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703196506; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=eyqM/f7+FtNt+KEaXKw1YbLhIXeKkkaARlHdPAqHB0s=; b=edJtJSEKz0iBZO2lwzV8TkWsiBTczpMKWE6mRkFiZOwtklw2cT1SEvOUR/am7nvO80QiQf +Mn5fn24sPQRqEk815byjtZ6mzOG9enUR/L61CcVwMMZ1QTIE1DC9J9N8xt4VW6Xe2Rcdc V5dzTYIF3C24pX6xJfUdPtqAYElxSt4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703196506; a=rsa-sha256; cv=none; b=nvTpoSewyN72udL/phgEkA7bnfSndWu0ASQ8rgW7D7ZSrkAj0pDX8OPQ+zCtvL+6AsfzSV AuKAYQObJFTOaPGCVEmXC9OaRzD+6kDZKSj0M9xWh+3yWCZ6yZETLX9JAebT0iLGF8VVWU 2Xvm0DMtcDKNrrKPt/2AodC2GpTyVvk= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=1GRPxZ7b; spf=pass (imf15.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by ams.source.kernel.org (Postfix) with ESMTP id 30D33B821CB; Thu, 21 Dec 2023 22:08:25 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6FFDAC433C7; Thu, 21 Dec 2023 22:08:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1703196504; bh=GZnak5fMCSFehBPFlJ9iudVAVRz/t843L8EYKs7DzGM=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=1GRPxZ7b311HpzBeZf2P8zlj7pnwlHOqjK2zu2QKRZjC9gpFum5xJ/h47c6/G6vDT l4RviXTke0ZLEiff7cmrFFjI/vggofwRA/5gc8soFea3DHyKRDVyHExh5iT3FQFHAa xeRN/bW7u1lOpatORbe9ptWfl52eRqxDsj+nGJTk= Date: Thu, 21 Dec 2023 14:08:23 -0800 From: Andrew Morton To: Jiajun Xie Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v1] mm: fix unmap_mapping_range high bits shift bug Message-Id: <20231221140823.2908189514c0081ae9efbda8@linux-foundation.org> In-Reply-To: References: <20231220052839.26970-1-jiajun.xie.sh@gmail.com> <20231220095343.326584f605e8ce995ac151d0@linux-foundation.org> X-Mailer: Sylpheed 3.8.0beta1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: BC22FA001F X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 8c3ci5era79146gc7hf4z1ob4ydoib1p X-HE-Tag: 1703196506-257178 X-HE-Meta: U2FsdGVkX19jXu520nYnTt9XYf80DUJi5QQ8okkQx9KKwxZRf36jLHYmz0mxdpnuMhTtj0mx+SG+Fxv9GkMvCCfNBOH6AnQuJVv48WA10emeWIIOIpMIdpdZbepjbNeskMl9IN5ZahIm4KRXPW5ZKp1g6PJbEbMb3mczFuJGrTz9jn3MGdFHAJFKTYiqp/o1GUfMYesDjLGjVfMw4vz3sgBsv+3oeWOEwOwV0w6LyylwTNrSbeYDQgpfIsutEKVuAiozr/UsSJ51b5vKasAzTQ2wsgzOvbkLrqW9fpqR4TwQejM8VFh5prUtAQyjpAeITrqZCtqX6MVbKEKWe+YqJgfr3bo9VxkQZDPuxNlux5Fi+NHeLKp/GjD4NbICI9TvybJBB5Sk70ODagssgwBpsQsJ8olbFqbZK2b1+3pda62otDQZPwumb5MsDBqtKil84h1Keeqa0zbDFd3RzVsdgO+HbPnbGBbGIC0h+Uot5D7bNY2Ix9PoGKiApo8WHDRUTv5rF/C+HnWTNSw7r7eFKRm7HIu+z9eNxkoFL7ORq5HznmZaEu8SH0FGYyqGSrkW4A0JEGxFcJXNOH0YBSrnt+LpYK2HRyIVH+HLYPgxZntgoxVRXCQhPSsnAEmIsniH2xIV+/Zfv+EKyOHnTR2+WXUn2w/N2bxsPhkBeQaC9VFDjYsE4anzW5xU/3P3RM+NkCOvF8BGbaFM0pke2NTFCCeCfwh8qyQBQ2OCRbGdnRPJiYlJCcx9mToa5fKbJQOtVJjwDLB9/azuPf4IlvXl3QRzZPFeZNuGXfZelMXZ0P5ECw1HPWE5HzzFohmtpZvR4H2/os8rOyF2gXoubQMW15ClrMHOw0n4SBWzYLTmFuazppxveF8/bzRTS6D1aAzqJ00PHtKhUB2CDYySXKBIGPV1RgLORXgIlO/DJIlPkSvlsK7eYITj+9qfhhLHwYALZZISFoaJYvKDektY3eY OIgNbt8t QEYe68YdgZZmia/9q6xNClBOSmUfK8ObbXqVZbMTE9ltKyFtAUO8Q7CDWeyFb90uK15YlHA49blYNEhhaehT+/6IJwc+o48Bx2qlUm1JG88rU5XuqTFW0pPJ4187oJicRCONSevfllbBJjEvOrSadpdoCIU7dwd2Fz04jZneOdm7wT9aU8aGvmN4TMaVjLd8fJ7cmwoQj4LVGhCaHMX3h7Y3MrKsskdc5qnV8vEgAI3337oGf8MILPkNzAMmGFgczv/CRLHBR406vY74f5qG21aphy0BWxPpxZJ2beQHiyK56QHL6cQuIQZHQ1VQ1VrI9FcG1YgT3cycy8UI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 21 Dec 2023 13:40:11 +0800 Jiajun Xie wrote: > > (obviously bad, but it's good to spell it out) and under what > > circumstances it occurs? > > Thanks for the quick reply. > > The issue happens in Heterogeneous computing, where the > device(e.g. gpu) and host share the same virtual address space. > > A simple workflow pattern which hit the issue is: > /* host */ > 1. userspace first mmap a file backed VA range with specified offset. > e.g. (offset=0x800..., mmap return: va_a) > 2. write some data to the corresponding sys page > e.g. (va_a = 0xAABB) > /* device */ > 3. gpu workload touches VA, triggers gpu fault and notify the host. > /* host */ > 4. reviced gpu fault notification, then it will: > 4.1 unmap host pages and also takes care of cpu tlb > (use unmap_mapping_range with offset=0x800...) > 4.2 migrate sys page to device > 4.3 setup device page table and resolve device fault. > /* device */ > 5. gpu workload continued, it accessed va_a and got 0xAABB. > 6. gpu workload continued, it wrote 0xBBCC to va_a. > /* host */ > 7. userspace access va_a, as expected, it will: > 7.1 trigger cpu vm fault. > 7.2 driver handling fault to migrate gpu local page to host. > 8. userspace then could correctly get 0xBBCC from va_a > 9. done > > But in step 4.1, if we hitted the bug this patch mentioned, then user space > would never trigger cpu fault, and still get the old value: 0xAABB. Thanks. Based on the above, I added cc:stable to the changelog so the fix will be backported into earlier kernels (it looks like that's 20+ years worth!). And I pasted the above text into that changelog.