From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 379DBE748F1 for ; Sun, 1 Oct 2023 02:54:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4D1416B020D; Sat, 30 Sep 2023 22:54:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4812D6B020E; Sat, 30 Sep 2023 22:54:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 371976B020F; Sat, 30 Sep 2023 22:54:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 287226B020D for ; Sat, 30 Sep 2023 22:54:39 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id D8527401F1 for ; Sun, 1 Oct 2023 02:54:38 +0000 (UTC) X-FDA: 81295374636.20.523B330 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf11.hostedemail.com (Postfix) with ESMTP id 0DF5840004 for ; Sun, 1 Oct 2023 02:54:36 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=aAyrQlCz; dmarc=none; spf=pass (imf11.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696128877; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pE9mhLbiNFXKaWFGmDBpwheWDufBV96UhYV4t5P9EjI=; b=Ci5gF80Zgkdpvn70Cs8KICeeDiloxNraeKj5CpxyHH4RpVs8NxWwugZaHKtzUojlYncpoa uthALeL2M6gYxNzm20GsKx08mvo09NTYkl6y0dJ0vFm7eVn8hKCfOQgW+DBH+CGr0qD6HX px0glQZ6dukgcGNlJJouYX26D1pIxgM= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=aAyrQlCz; dmarc=none; spf=pass (imf11.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696128877; a=rsa-sha256; cv=none; b=kK2VqG2KfWVDKyVlpdJ7ZyuRl1zJWDuYpK/7Fd6ebLpG4xqu/5F1dNnBTCxXkc/X79zxDc S3Xr7njjFQ5nJFtS+Ougb5GR76yVbvqBFzdIbObK7+RXrA6/UJ7mT9OomqemBGXtaO9tGS /45mkx700G03ZSbjMyZrLxeP4wfIVRU= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id E08F560B33; Sun, 1 Oct 2023 02:54:35 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0723CC433C7; Sun, 1 Oct 2023 02:54:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1696128875; bh=lhdylN6ubXsWPNt/SHxBMnKveIpOU1CtLRkaa4GXlHs=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=aAyrQlCzHV97txvA9wz6K4iZTOs+XVz/J+6DN8XzLEl+8WVMfNpb1Ei1ZkZfXGI7Z QsMBfKvnEqp/6+rs/zHgc1xqYfi7bzVoEEznhs+C9OjJThLV+LfZDsguheUMlTRL94 CBxaWM8/kSSUm9aSMoy31gVYeIbfYEd/BqNaMspg= Date: Sat, 30 Sep 2023 19:54:34 -0700 From: Andrew Morton To: riel@surriel.com Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, linux-mm@kvack.org, muchun.song@linux.dev, mike.kravetz@oracle.com, leit@meta.com, willy@infradead.org Subject: Re: [PATCH v5 0/3] hugetlbfs: close race between MADV_DONTNEED and page fault Message-Id: <20230930195434.3507483510ba7961985fbeb2@linux-foundation.org> In-Reply-To: <20231001005659.2185316-1-riel@surriel.com> References: <20231001005659.2185316-1-riel@surriel.com> X-Mailer: Sylpheed 3.8.0beta1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 0DF5840004 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 5wdshdr4uc73yozur3zxsm6t6ahf9hef X-HE-Tag: 1696128876-793634 X-HE-Meta: U2FsdGVkX18p0Pm2ZBDkz+ygD/LmY/SIc9J0FWBKS5i2wP14oG1v0bknjSIlxF2vcElTGE4wi5QQdgLoFMCgu7abRs4sICry+m4H9nia2hnsmreS5Nwmi0zeS3QwBAztkBB0nvS6i5WC9tw7Rr/W09dgzJv/GsrrRRV3+mahXeTAoZsg7nQn688ulF0bG5/jgQBZfmnoii56NdoRMEOzsC1Db7VYZa5mLxys6Lrp1OQmtotlWFhi2fIA2lL49ytFUlRswddb1mW8duV/eANyR4VRdmVMXYNy5Z3DWhYBOQSzieuoGGvcW0EazOdb2em/R40RxtsCiyv/X2bGu34JpnnOhqJ/90D/NKL3DVAD9pje6pzeYJeUbOtG3ihmkpab/d5mcbLNvDjtNnj5DO4hTzIZin27xIS9+tRFXszAH7BcbFVMP7isFgakiXS4zwvd9mBqr7SVSChtHMa+b7fO5RyWPK7l8Bno2QcVnr2OUCu8g0ehLQOoP3HkY+RCcTu+YHJoutix56KsnA9C4jmcCr1/ITtCtMwpEU6GoZbL7WOOifAbEka/bykcHausqkk/YdMjheA7r7MMRBa80+Bqh8N/rLk9nGCb2VWPWW5+51UQdumWSv1tLxoKxP5+rIfPd/SHOk4Uae0/HWR9X95aayfzQXNKgfE6HeiXjfCdtGto2G46luPihZavBe454KiVicxU3PAy66cT5LSJku7iGviql86QGZ/OHe8+wcZOXME3omb89W04zSbdLdve0ZtGQBNXu8uCAPfUTIYbiMz/Ft9Iw7wnqM7vEYnvwk+j4W7TWUdfzfmOHBVhvT7o8BpZaXzSikKVxujGi2gQHYOwQJJ8Kkco18a4zyd0BftZUPmPrNyuubyuSTfXXLISz6kvX4rkMhIhij61ahhqKharANkMioYiuRRXai3S0KHZy+xUrF2cU0wgDxIfJn4qX3Uwj+twvNkcDpdDMJ2Dtz7 ilKDk+Vl 2rUcExYr+IAvh8oqj6+WQv+jBRB1ACicktqWy88pBUfV9GsCZIqEOt2QHdZuf73xAv+9VZsBXja+i6B+n2FH+LJN3vhdCcWd68h0cdp8lxMzxLKbWLZd/zp0CFFM5OqEJGffvB0Q2IaXyeUFj3+hjuxndcVHs8115fZuxpnw54Gm6i3ySk14ULZL8wOzdRmRV0yWcWmgVQK+2Q6fJ5NbXTp+ZNtB1UBq8pWNjzzX65l14KZEVxCHYZ7n5Xr0aZ9/8mMTPPnMvmcUHKA+XrTqocXJb5EqtNmYKwcHLfHV35l/CLGIUe7GMtqaJ6K/Td7dhDIx6elgnlswOfNtmnq+k3JUU1A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, 30 Sep 2023 20:55:47 -0400 riel@surriel.com wrote: > v5: somehow a __vma_private_lock(vma) test failed to make it from my tree into the v4 series, fix that > v4: fix unmap_vmas locking issue pointed out by Mike Kravetz, and resulting lockdep fallout > v3: fix compile error w/ lockdep and test case errors with patch 3 > v2: fix the locking bug found with the libhugetlbfs tests. > > Malloc libraries, like jemalloc and tcalloc, take decisions on when > to call madvise independently from the code in the main application. > > This sometimes results in the application page faulting on an address, > right after the malloc library has shot down the backing memory with > MADV_DONTNEED. > > Usually this is harmless, because we always have some 4kB pages > sitting around to satisfy a page fault. However, with hugetlbfs > systems often allocate only the exact number of huge pages that > the application wants. > > Due to TLB batching, hugetlbfs MADV_DONTNEED will free pages outside of > any lock taken on the page fault path, which can open up the following > race condition: > > CPU 1 CPU 2 > > MADV_DONTNEED > unmap page > shoot down TLB entry > page fault > fail to allocate a huge page > killed with SIGBUS > free page > > Fix that race by extending the hugetlb_vma_lock locking scheme to also > cover private hugetlb mappings (with resv_map), and pulling the locking > from __unmap_hugepage_final_range into helper functions called from > zap_page_range_single. This ensures page faults stay locked out of > the MADV_DONTNEED VMA until the huge pages have actually been freed. Didn't we decide that [1/3] and [2/3] should be cc:stable? > The third patch in the series is more of an RFC. Using the > invalidate_lock instead of the hugetlb_vma_lock greatly simplifies > the code, but at the cost of turning a per-VMA lock into a lock > per backing hugetlbfs file, which could slow things down when > multiple processes are mapping the same hugetlbfs file. "could slow things down" is testable-for? This third one I'd queue up for testing for a 6.7-rc1 merge, so I'll split the series apart. Not a problem, but it would be a little better if things were originally packaged that way.