From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D9A77C433F5 for ; Wed, 16 Feb 2022 16:59:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4C8AB6B0074; Wed, 16 Feb 2022 11:59:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 451BD6B0075; Wed, 16 Feb 2022 11:59:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2CB816B0078; Wed, 16 Feb 2022 11:59:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0085.hostedemail.com [216.40.44.85]) by kanga.kvack.org (Postfix) with ESMTP id 194DF6B0074 for ; Wed, 16 Feb 2022 11:59:51 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id D31FC8249980 for ; Wed, 16 Feb 2022 16:59:50 +0000 (UTC) X-FDA: 79149254940.06.7729F0B Received: from mail-yb1-f178.google.com (mail-yb1-f178.google.com [209.85.219.178]) by imf15.hostedemail.com (Postfix) with ESMTP id 77759A000B for ; Wed, 16 Feb 2022 16:59:50 +0000 (UTC) Received: by mail-yb1-f178.google.com with SMTP id p5so7308309ybd.13 for ; Wed, 16 Feb 2022 08:59:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Exph8UN21TSJr+eQwovoIrKIT7yhVfaU/3hoH0vSQhc=; b=bXK4Yyg77XZRew+nhTpLkT7bnUmPYIusjiEV7dXi9VxQX6ke68/oYgYHUj3lLJw5Hw RUkXtIH5dxHvoipiwNId+P/Lt/ArX3LnixL9C6IJzOHkHC1hLoqSP2CLa5LEAYnSVVeh f9BrtjnSE7ycofKnifxUKspc0WqcGflZk1y6aSmUzzSv/aEM7KYnWcAZn2OlQ5VwIEOI iuGHlu84BSZ6uUdWMpCIEJE2ocwFqURcNn6+Br7D3iKbDmdiHV0nK5CUWxhoZFZZseol ZpU3hQQE7CWp42taq03nJNxdbQ7ovWN4rNIFGfnnlzTLRgoLB87Qm3fm8hjRrOzVfSJc JNDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Exph8UN21TSJr+eQwovoIrKIT7yhVfaU/3hoH0vSQhc=; b=V1HOQeWgTxbuJtIxTkOd3ug9of7Wx2UsyBZpJVQhGdWoJ/F2gaOo+iibG/C0qkPExk fu671P6QHdIJbHZp70pTycylrIRtsWdurHFCFpcWr6ODZleyUQLDt2HcQdSrAH/bMI3U Nj7xFpTAe94/0p8FLweeoQ9wyP5deFi8wlCiRwbV5O9KKfgIdi6s2TmLshOMoAVtx8HA Wu7lcG6jxZKUGW4rK9otZR0xHgMdeNcCOdwhKPgpK4uneAr7XOgWzOXMED9/Y1SmTMQR szDJXqLpJqxX+44zdXqLaXcHAOhyYWhUOBbaHnbuFSC36M7+W/BaKo9gwD019wVMh5Pa L9hA== X-Gm-Message-State: AOAM531+1EPnCTmO2pqnqYTBcTVkL/aHJLdkhWXidW0GPihl70NBxKca cqCeQ9AyaxjGHIhEn1eTwDPBPeQyYRdeuLhnKsc= X-Google-Smtp-Source: ABdhPJzgYrPUcnOA4CQUL5KailDc41T+IYE4N0QMrFi+eREFsmxkmXELoYto7dcijGC4px2jMvPZyF+OML/3Cvn/nUc= X-Received: by 2002:a25:8289:0:b0:623:dc4d:28e5 with SMTP id r9-20020a258289000000b00623dc4d28e5mr2990803ybk.182.1645030789530; Wed, 16 Feb 2022 08:59:49 -0800 (PST) MIME-Version: 1.0 References: <20200917112538.GD8409@ziepe.ca> <20200917193824.GL8409@ziepe.ca> <20200918164032.GA5962@xz-x1> <20200918173240.GY8409@ziepe.ca> <20200918204048.GC5962@xz-x1> <0af8c77e-ff60-cada-7d22-c7cfcf859b19@nvidia.com> <20200919000153.GZ8409@ziepe.ca> <20200921083505.GA5862@quack2.suse.cz> <20200921120301.GD8409@ziepe.ca> In-Reply-To: <20200921120301.GD8409@ziepe.ca> From: Oded Gabbay Date: Wed, 16 Feb 2022 18:59:22 +0200 Message-ID: Subject: Re: [PATCH 1/4] mm: Trial do_wp_page() simplification To: Jason Gunthorpe , Linus Torvalds Cc: Jan Kara , John Hubbard , Leon Romanovsky , Linux-MM , Linux Kernel Mailing List , "Maya B . Gokhale" , Yang Shi , Marty Mcfadden , Kirill Shutemov , Oleg Nesterov , Jann Horn , Kirill Tkhai , Andrea Arcangeli , Christoph Hellwig , Andrew Morton , Daniel Vetter , Greg Kroah-Hartman , Peter Xu Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=bXK4Yyg7; spf=pass (imf15.hostedemail.com: domain of oded.gabbay@gmail.com designates 209.85.219.178 as permitted sender) smtp.mailfrom=oded.gabbay@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam07 X-Rspam-User: X-Rspamd-Queue-Id: 77759A000B X-Stat-Signature: 3gs8tpzgdkwm4bh5k4af8waz1oy4o6pp X-HE-Tag: 1645030790-208329 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Sep 21, 2020 at 3:03 PM Jason Gunthorpe wrote: > > On Mon, Sep 21, 2020 at 10:35:05AM +0200, Jan Kara wrote: > > > My thinking is to hit this issue you have to already be doing > > > FOLL_LONGTERM, and if some driver hasn't been properly marked and > > > regresses, the fix is to mark it. > > > > > > Remember, this use case requires the pin to extend after a system > > > call, past another fork() system call, and still have data-coherence. > > > > > > IMHO that can only happen in the FOLL_LONGTERM case as it inhernetly > > > means the lifetime of the pin is being controlled by userspace, not by > > > the kernel. Otherwise userspace could not cause new DMA touches after > > > fork. > > > > I agree that the new aggressive COW behavior is probably causing issues > > only for FOLL_LONGTERM users. That being said it would be nice if even > > ordinary threaded FOLL_PIN users would not have to be that careful about > > fork(2) and possible data loss due to COW - we had certainly reports of > > O_DIRECT IO loosing data due to fork(2) and COW exactly because it is very > > subtle how it behaves... But as I wrote above this is not urgent since that > > problematic behavior exists since the beginning of O_DIRECT IO in Linux. > > Yes, I agree - what I was thinking is to do this FOLL_LONGTERM for the > rc and then a small patch to make it wider for the next cycle so it > can test in linux-next for a responsible time period. > > Interesting to hear you confirm block has also seen subtle user > problems with this as well. > > Jason > Hi Jason, Linus, Sorry for waking up this thread, but I've filed a bug against this change: https://bugzilla.kernel.org/show_bug.cgi?id=215616 In the past week, I've bisected a problem we have in one of our new demos running on our Gaudi accelerator, and after a very long bisection, I've come to this commit. All the details are in the bug, but the bottom line is that somehow, this patch causes corruption when the numa balancing feature is enabled AND we don't use process affinity AND we use GUP to pin pages so our accelerator can DMA to/from system memory. Either disabling numa balancing, using process affinity to bind to specific numa-node or reverting this patch causes the bug to disappear. I validated the bug and the revert on kernels 5.9, 5.11 and 5.17-rc4. You can see our GUP code in the driver in get_user_memory() in drivers/misc/habanalabs/common/memory.c. It is fairly standard and I think I got that line from Daniel (cc'ed on this email). I would appreciate help from the mm experts here to understand how to fix this, but it looks as if this simplification caused or exposed some race between numa migration code and GUP. Thanks, Oded