From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.4 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19B96C2D0D2 for ; Fri, 20 Dec 2019 23:55:04 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D2A262063A for ; Fri, 20 Dec 2019 23:55:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="oRvXZWdo" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D2A262063A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6A05F8E01D3; Fri, 20 Dec 2019 18:55:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6517D8E019D; Fri, 20 Dec 2019 18:55:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5199E8E01D3; Fri, 20 Dec 2019 18:55:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0073.hostedemail.com [216.40.44.73]) by kanga.kvack.org (Postfix) with ESMTP id 3BD4D8E019D for ; Fri, 20 Dec 2019 18:55:03 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 8FEB36126 for ; Fri, 20 Dec 2019 23:55:02 +0000 (UTC) X-FDA: 76287178044.15.heart60_ed96a0cdf83a X-HE-Tag: heart60_ed96a0cdf83a X-Filterd-Recvd-Size: 5713 Received: from hqnvemgate24.nvidia.com (hqnvemgate24.nvidia.com [216.228.121.143]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Fri, 20 Dec 2019 23:55:01 +0000 (UTC) Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate24.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Fri, 20 Dec 2019 15:54:29 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Fri, 20 Dec 2019 15:55:00 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Fri, 20 Dec 2019 15:55:00 -0800 Received: from [10.110.48.28] (10.124.1.5) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Fri, 20 Dec 2019 23:54:55 +0000 Subject: Re: [PATCH v11 00/25] mm/gup: track dma-pinned pages: FOLL_PIN To: Leon Romanovsky , Jason Gunthorpe CC: Andrew Morton , Al Viro , Alex Williamson , Benjamin Herrenschmidt , =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= , Christoph Hellwig , Dan Williams , Daniel Vetter , Dave Chinner , David Airlie , "David S . Miller" , Ira Weiny , Jan Kara , Jens Axboe , Jonathan Corbet , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Magnus Karlsson , "Mauro Carvalho Chehab" , Michael Ellerman , Michal Hocko , Mike Kravetz , "Paul Mackerras" , Shuah Khan , Vlastimil Babka , , , , , , , , , , , , , LKML , Maor Gottlieb References: <20191216222537.491123-1-jhubbard@nvidia.com> <20191219132607.GA410823@unreal> <20191219210743.GN17227@ziepe.ca> <20191220182939.GA10944@unreal> From: John Hubbard X-Nvconfidentiality: public Message-ID: <1001a5fc-a71d-9c0f-1090-546c4913d8a2@nvidia.com> Date: Fri, 20 Dec 2019 15:54:55 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.3.0 MIME-Version: 1.0 In-Reply-To: <20191220182939.GA10944@unreal> X-Originating-IP: [10.124.1.5] X-ClientProxiedBy: HQMAIL111.nvidia.com (172.20.187.18) To HQMAIL107.nvidia.com (172.20.187.13) Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1576886069; bh=/03FljbsYciU5tHAkbKg2NU1vxam4rPGvNuyvBOvTmI=; h=X-PGP-Universal:Subject:To:CC:References:From:X-Nvconfidentiality: Message-ID:Date:User-Agent:MIME-Version:In-Reply-To: X-Originating-IP:X-ClientProxiedBy:Content-Type:Content-Language: Content-Transfer-Encoding; b=oRvXZWdofYZmMnKtHbC887r9WWAfrydUr/UibodWJ1qaNTyMPy3+W8R0NcaZhg7dS 971HGgJD6LwSffJ62pj2tIvxp3eHI8BXVOtdjk7S3sjdZSz9tGHLzm7ujCZVURuC8a irEUmm/OhCMXyR+iqtnm6nCO/xQp89VT1WotayqMgxGp4mpoDDX+NNHJX4q0B3UwU3 +RvXPnLJz1mCrbJfx3TkKb0L+jPAoTDVjeRU4xrKmbuD30tHozBBlUpnyYWoaOHmat YDkXozvEZrxOc0h7qKS+l5AK23MRif8N+YKUGoUxe7vXoCKoEk2wdk8r2SNOo6On86 EItNjAud0AHkw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 12/20/19 10:29 AM, Leon Romanovsky wrote: ... >> $ ./build.sh >> $ build/bin/run_tests.py >> >> If you get things that far I think Leon can get a reproduction for you > > I'm not so optimistic about that. > OK, I'm going to proceed for now on the assumption that I've got an overflow problem that happens when huge pages are pinned. If I can get more information, great, otherwise it's probably enough. One thing: for your repro, if you know the huge page size, and the system page size for that case, that would really help. Also the number of pins per page, more or less, that you'd expect. Because Jason says that only 2M huge pages are used... Because the other possibility is that the refcount really is going negative, likely due to a mismatched pin/unpin somehow. If there's not an obvious repro case available, but you do have one (is it easy to repro, though?), then *if* you have the time, I could point you to a github branch that reduces GUP_PIN_COUNTING_BIAS by, say, 4x, by applying this: diff --git a/include/linux/mm.h b/include/linux/mm.h index bb44c4d2ada7..8526fd03b978 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1077,7 +1077,7 @@ static inline void put_page(struct page *page) * get_user_pages and page_mkclean and other calls that race to set up page * table entries. */ -#define GUP_PIN_COUNTING_BIAS (1U << 10) +#define GUP_PIN_COUNTING_BIAS (1U << 8) void unpin_user_page(struct page *page); void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages, If that fails to repro, then we would be zeroing in on the root cause. The branch is here (I just tested it and it seems healthy): git@github.com:johnhubbard/linux.git pin_user_pages_tracking_v11_with_diags thanks, -- John Hubbard NVIDIA