From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BEC4C4332F for ; Tue, 29 Nov 2022 21:19:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 670C56B0071; Tue, 29 Nov 2022 16:19:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 620B76B0072; Tue, 29 Nov 2022 16:19:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4E88C6B0073; Tue, 29 Nov 2022 16:19:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3BBE56B0071 for ; Tue, 29 Nov 2022 16:19:59 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id ECA70807BE for ; Tue, 29 Nov 2022 21:19:58 +0000 (UTC) X-FDA: 80187747276.24.16E6C43 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf30.hostedemail.com (Postfix) with ESMTP id 301E480013 for ; Tue, 29 Nov 2022 21:19:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1669756797; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=LSzicMX2zlmciN98+hI9iZ8zrUz0Rds/n2uEFyoUuaA=; b=GLTV1fT6lJ5lNxqVhkHpC8u//LrttfglzC6YmIg90or5B0/W6+7/mvd7UOiHnRQKbsfoMp /dMCB7K3izYor+7bqczhip4GUw/ekjplh1Z+0YpFG7fBZD9tcFCpqyh5RbCzLDu1eZukY3 PlnvUYnMNkn+mjhDvBICDCDg47b3yqE= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-372-647PfPYLPnWc7rkSyKqkpA-1; Tue, 29 Nov 2022 16:19:54 -0500 X-MC-Unique: 647PfPYLPnWc7rkSyKqkpA-1 Received: by mail-qk1-f200.google.com with SMTP id j13-20020a05620a410d00b006e08208eb31so32119389qko.3 for ; Tue, 29 Nov 2022 13:19:54 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=LSzicMX2zlmciN98+hI9iZ8zrUz0Rds/n2uEFyoUuaA=; b=MmjCzraFNS1/Fn621W7nN0w6po9KoJkpwlZe8IT7354Au2tY5CCCr12m3E0Yqt9sie jm4AdUgmvBK4tbVaHbwi3pt7T2OGGKY8LD9toCcsT9vQ4gZDvuuwvbpL9Uk4ujyaUIiD ogQGPIZBrDXOnhc2bNLZ5K//deIapU6hiWVuT06/xV/B/ISBCcNO/BRow2j7VvlGqgoW aPEKiM77MdvE6z+WtzYKxDmV1U7T2do4Jkmjne6ZjIUZh+6p3jvxYWtNs2/ghTQePLZD xWTostthYYDmOYvSLymnc0baKr4YYowpnzRZOtDyYWq+cXUYchoDMvFnLWaqG7p8+YZd pYfQ== X-Gm-Message-State: ANoB5pmF+wVRTtk5+R6gVG3l1Yl/NYO3xWpLvsxFgVq0mdpSE+4Z47o6 q4YrNXW9Chr87ommdKEA2z/kUKvdVv2AR9ux4aBvrGwOAH76KYKJ2jJJv8jUW5xQkQkkGXVM7Z0 R5GFtzsvXSpk= X-Received: by 2002:a05:620a:41a:b0:6f3:b4d7:1725 with SMTP id 26-20020a05620a041a00b006f3b4d71725mr35090535qkp.535.1669756793832; Tue, 29 Nov 2022 13:19:53 -0800 (PST) X-Google-Smtp-Source: AA0mqf4FB6g3Ml877sJriaBgHYXO9Y4BtizTPhnM+s3EwLLchGB47okFkhLxAVMj+0FWqI+0+/qU1g== X-Received: by 2002:a05:620a:41a:b0:6f3:b4d7:1725 with SMTP id 26-20020a05620a041a00b006f3b4d71725mr35090507qkp.535.1669756793572; Tue, 29 Nov 2022 13:19:53 -0800 (PST) Received: from x1n (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id w23-20020ae9e517000000b006f9f3c0c63csm11051297qkf.32.2022.11.29.13.19.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Nov 2022 13:19:53 -0800 (PST) Date: Tue, 29 Nov 2022 16:19:52 -0500 From: Peter Xu To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton , Jann Horn , Andrea Arcangeli , Rik van Riel , Nadav Amit , Miaohe Lin , Muchun Song , Mike Kravetz , David Hildenbrand Subject: Re: [PATCH 00/10] mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare Message-ID: References: <20221129193526.3588187-1-peterx@redhat.com> <20221129124944.8eff54cda65d0f5a8a089e22@linux-foundation.org> MIME-Version: 1.0 In-Reply-To: <20221129124944.8eff54cda65d0f5a8a089e22@linux-foundation.org> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669756798; a=rsa-sha256; cv=none; b=Wq7N8hUEDTH1FB65Yqa6Y1FZ4DW8abYoAllr2Wbdg9OyTFXRN5mKj53xndpyGxRW1ME5kj 9s2Y8jzZT7x+cyTsZMJfq0bKjEmASLacjb/6Cj8xIesqqZaKzZKYvCeO64y3/xUPsyrrD0 RogtOGEb1BFTMTG+dFtYqDwua4iunDM= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=GLTV1fT6; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf30.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669756798; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LSzicMX2zlmciN98+hI9iZ8zrUz0Rds/n2uEFyoUuaA=; b=vow9TrHmk756Ik3NIzB/kGRfFtGnBCytnI6ljJurwiF3X6gAzkYQS9nLFKRp+Vmoyr0w/B VYw6NGFY1PNZHYT5L89LnhGQhNLZ+MGAO3wAhFf4PxUGD3SKybqSK6gt8RfM6rFNrAZ04o VeJHDs/pqevST9vFOSJafjDYz3kbhRU= X-Rspamd-Queue-Id: 301E480013 Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=GLTV1fT6; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf30.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com X-Rspamd-Server: rspam12 X-Rspam-User: X-Stat-Signature: 49jwjegfaqk8ftyw7mpd9jdnjht3ab5h X-HE-Tag: 1669756798-69806 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, Andrew, On Tue, Nov 29, 2022 at 12:49:44PM -0800, Andrew Morton wrote: > On Tue, 29 Nov 2022 14:35:16 -0500 Peter Xu wrote: > > > Based on latest mm-unstable (9ed079378408). > > > > This can be seen as a follow-up series to Mike's recent hugetlb vma lock > > series for pmd unsharing, but majorly covering safe use of huge_pte_offset. > > We're at -rc7 (a -rc8 appears probable this time) and I'm looking to > settle down and stabilize things... I targeted this series for the next release not current, because there's no known report for it per my knowledge. The reproducer needs explicit kernel delays to trigger as mentioned in the cover letter. So far I didn't try to reproduce with a generic kernel yet but just to verify the existance of the problem. > > > > > ... > > > > huge_pte_offset() is always called with mmap lock held with either read or > > write. It was assumed to be safe but it's actually not. One race > > condition can easily trigger by: (1) firstly trigger pmd share on a memory > > range, (2) do huge_pte_offset() on the range, then at the meantime, (3) > > another thread unshare the pmd range, and the pgtable page is prone to lost > > if the other shared process wants to free it completely (by either munmap > > or exit mm). > > That sounds like a hard-to-hit memory leak, but what we have here is a > user-triggerable use-after-free and an oops. Ugh. IIUC it's not a leak, but it's just that huge_pte_offset() can walk the (pmd-shared) pgtable page and also trying to take the pgtable lock even though the page can already be freed in parallel, hence accessing either the page or the pgtable lock after the pgtable page got freed. E.g., the 1st warning was trigger by: static inline struct lock_class *hlock_class(struct held_lock *hlock) { unsigned int class_idx = hlock->class_idx; /* Don't re-read hlock->class_idx, can't use READ_ONCE() on bitfield */ barrier(); if (!test_bit(class_idx, lock_classes_in_use)) { /* * Someone passed in garbage, we give up. */ DEBUG_LOCKS_WARN_ON(1); <---------------------------- return NULL; } ... } I think it's because the spin lock got freed along with the pgtable page, so when we want to lock the pgtable lock we see weird lock state in dep_map, as the lock pointer is not valid at all. -- Peter Xu