From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81DD9C43334 for ; Wed, 6 Jul 2022 22:46:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0A1866B0072; Wed, 6 Jul 2022 18:46:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0529A6B0073; Wed, 6 Jul 2022 18:46:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E5BB76B0074; Wed, 6 Jul 2022 18:46:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id D289E6B0072 for ; Wed, 6 Jul 2022 18:46:58 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A25F63602C for ; Wed, 6 Jul 2022 22:46:58 +0000 (UTC) X-FDA: 79658161716.19.B7CA230 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by imf04.hostedemail.com (Postfix) with ESMTP id B624E40013 for ; Wed, 6 Jul 2022 22:46:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657147614; x=1688683614; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=3OQczjwH8QXbCcqouItHyVdLS/bQn/scNTSSIBCfO14=; b=mLJsNUWnhV1yB9uX0ttNZeLrv4x7ulPpxETeAiV0T4caILZROLFm7f1B ts+jt+3j0QvD/jREybf8vEExWHhJF+lMLVl5muU8fmUzqJ9LYTGAPJ/LG a1psVr3t6rVdag+sT+QW/5toJFoqhVbB9qhJQQwnTuPYY6fbiOj/D5yn4 dKLOQ43E22N6vsLGnqciyKYcWRm9R1qO8epbc6ghdi8ct5RKD9M6mKF4a AnEVByx8puhkAobR6WuIU5hiGLQSeNFB3rwN3jIORB9JyNMKk1fDRbRjW tNVSoo8oQBmyDmEgXpL7qpCuys3RsS/GY2/HU+fddwmtf94xq5wLgn1it Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10400"; a="370203194" X-IronPort-AV: E=Sophos;i="5.92,251,1650956400"; d="scan'208";a="370203194" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jul 2022 15:46:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,251,1650956400"; d="scan'208";a="626083547" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga001.jf.intel.com with ESMTP; 06 Jul 2022 15:46:50 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 9BE5311D; Thu, 7 Jul 2022 01:46:57 +0300 (EEST) Date: Thu, 7 Jul 2022 01:46:57 +0300 From: "Kirill A. Shutemov" To: Josef Bacik Cc: linux-mm@kvack.org, akpm@linux-foundation.org, Matthew Wilcox , Rik van Riel , Chris Mason Subject: Re: [PATCH] mm: fix page leak with multiple threads mapping the same page Message-ID: <20220706224657.3xbhbkflernezlxy@black.fi.intel.com> References: <2b798acfd95c9ab9395fe85e8d5a835e2e10a920.1657051137.git.josef@toxicpanda.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2b798acfd95c9ab9395fe85e8d5a835e2e10a920.1657051137.git.josef@toxicpanda.com> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657147618; a=rsa-sha256; cv=none; b=bTaGdvDVtBbfxR30z1GcbBKwnS+j6F75iSNqQvLe67ryWKerwLbtc/A2f/F5DXmebCPQxv Oz54IGyC+AIeltEtuXvbjkLmCM0otLph92Qq9RdcgLT+oE27npRq+tr17E26P8v9NZLtvy 8bxwy6RO1WfbjZe9yfPtDftg1sWbtNA= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=mLJsNUWn; spf=none (imf04.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 192.55.52.43) smtp.mailfrom=kirill.shutemov@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657147618; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=A/GXbmUkjnBePsib4oy8cVMtZmodCrs3UOFwkkOyyQ8=; b=KiSG4JUXy3STfORflIhl14kAzzt5xq9j4ax/yy4Hri7DsK49AMEpxeBU9v1CsuL5L/nrhU bw4ZCU6PRIdch0vX4LYKzhVd2lyhYVBqNDJwHHWVaO45nNarkvNYWRy9Sl8cpa0aeGiwC4 PddyjhsYasIlv8QTxmRyaPZUgIPDbVE= X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: B624E40013 X-Rspam-User: Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=mLJsNUWn; spf=none (imf04.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 192.55.52.43) smtp.mailfrom=kirill.shutemov@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com X-Stat-Signature: it3a5ukeg93hjt8dkb4oismfbu3ra45g X-HE-Tag: 1657147614-332654 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jul 05, 2022 at 04:00:36PM -0400, Josef Bacik wrote: > We have an application with a lot of threads that use a shared mmap > backed by tmpfs mounted with -o huge=within_size. This application > started leaking loads of huge pages when we upgraded to a recent kernel. > > Using the page ref tracepoints and a BPF program written by Tejun Heo we > were able to determine that these pages would have multiple refcounts > from the page fault path, but when it came to unmap time we wouldn't > drop the number of refs we had added from the faults. > > I wrote a reproducer that mmap'ed a file backed by tmpfs with -o > huge=always, and then spawned 20 threads all looping faulting random > offsets in this map, while using madvise(MADV_DONTNEED) randomly for > huge page aligned ranges. This very quickly reproduced the problem. > > The problem here is that we check for the case that we have multiple > threads faulting in a range that was previously unmapped. One thread > maps the PMD, the other thread loses the race and then returns 0. > However at this point we already have the page, and we are no longer > putting this page into the processes address space, and so we leak the > page. We actually did the correct thing prior to f9ce0be71d1f, however > it looks like Kirill copied what we do in the anonymous page case. In > the anonymous page case we don't yet have a page, so we don't have to > drop a reference on anything. Previously we did the correct thing for > file based faults by returning VM_FAULT_NOPAGE so we correctly drop the > reference on the page we faulted in. > > Fix this by returning VM_FAULT_NOPAGE in the pmd_devmap_trans_unstable() > case, this makes us drop the ref on the page properly, and now my > reproducer no longer leaks the huge pages. > > Fixes: f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths") > Cc: Kirill A. Shutemov > Cc: Matthew Wilcox (Oracle) > Signed-off-by: Josef Bacik > Signed-off-by: Rik van Riel > Signed-off-by: Chris Mason Cc: stable@ ? > --- > mm/memory.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 7a089145cad4..f10724d7dca3 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4371,7 +4371,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf) > > /* See comment in handle_pte_fault() */ > if (pmd_devmap_trans_unstable(vmf->pmd)) > - return 0; > + return VM_FAULT_NOPAGE; Comment update would be nice. Other instances of pmd_devmap_trans_unstable() return 0 in the fault path. Explanation would be helpful. Otherwise, Acked-by: Kirill A. Shutemov -- Kirill A. Shutemov