From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f71.google.com (mail-pg0-f71.google.com [74.125.83.71]) by kanga.kvack.org (Postfix) with ESMTP id B02B26B06C3 for ; Fri, 18 May 2018 23:24:04 -0400 (EDT) Received: by mail-pg0-f71.google.com with SMTP id k1-v6so3294651pgq.20 for ; Fri, 18 May 2018 20:24:04 -0700 (PDT) Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id i4-v6sor4220176plt.52.2018.05.18.20.24.02 for (Google Transport Security); Fri, 18 May 2018 20:24:02 -0700 (PDT) Date: Fri, 18 May 2018 21:24:00 -0600 From: Jason Gunthorpe Subject: Re: [LSFMM] RDMA data corruption potential during FS writeback Message-ID: <20180519032400.GA12517@ziepe.ca> References: <0100016373af827b-e6164b8d-f12e-4938-bf1f-2f85ec830bc0-000000@email.amazonses.com> <20180518154945.GC15611@ziepe.ca> <0100016374267882-16b274b1-d6f6-4c13-94bb-8e78a51e9091-000000@email.amazonses.com> <20180518173637.GF15611@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: John Hubbard Cc: Dan Williams , Christopher Lameter , linux-rdma , Linux MM , Michal Hocko On Fri, May 18, 2018 at 07:33:41PM -0700, John Hubbard wrote: > On 05/18/2018 01:23 PM, Dan Williams wrote: > > On Fri, May 18, 2018 at 10:36 AM, Jason Gunthorpe wrote: > >> On Fri, May 18, 2018 at 04:47:48PM +0000, Christopher Lameter wrote: > >>> On Fri, 18 May 2018, Jason Gunthorpe wrote: > >>> > >>> > >>> The newcomer here is RDMA. The FS side is the mainstream use case and has > >>> been there since Unix learned to do paging. > >> > >> Well, it has been this way for 12 years, so it isn't that new. > >> > >> Honestly it sounds like get_user_pages is just a broken Linux > >> API?? > >> > >> Nothing can use it to write to pages because the FS could explode - > >> RDMA makes it particularly easy to trigger this due to the longer time > >> windows, but presumably any get_user_pages could generate a race and > >> hit this? Is that right? > > +1, and I am now super-interested in this conversation, because > after tracking down a kernel BUG to this classic mistaken pattern: > > get_user_pages (on file-backed memory from ext4) > ...do some DMA > set_pages_dirty > put_page(s) Ummm, RDMA has done essentially that since 2005, since when did it become wrong? Do you have some references? Is there some alternative? See __ib_umem_release > ...there is (rarely!) a backtrace from ext4, that disavows ownership of > any such pages. Yes, I've seen that oops with RDMA, apparently isn't actually that rare if you tweak things just right. I thought it was an obscure ext4 bug :( > Because the obvious "fix" in device driver land is to use a dedicated > buffer for DMA, and copy to the filesystem buffer, and of course I will > get *killed* if I propose such a performance-killing approach. But a > core kernel fix really is starting to sound attractive. Yeah, killed is right. That idea totally cripples RDMA. What is the point of get_user_pages FOLL_WRITE if you can't write to and dirty the pages!?! Jason