From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06768C47425 for ; Fri, 4 Jun 2021 16:24:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8B2A36140C for ; Fri, 4 Jun 2021 16:24:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8B2A36140C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1E8DD6B0073; Fri, 4 Jun 2021 12:24:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1AD286B0082; Fri, 4 Jun 2021 12:24:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 027116B0085; Fri, 4 Jun 2021 12:24:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0058.hostedemail.com [216.40.44.58]) by kanga.kvack.org (Postfix) with ESMTP id C7CD06B0073 for ; Fri, 4 Jun 2021 12:24:04 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 4724F181AC544 for ; Fri, 4 Jun 2021 16:24:04 +0000 (UTC) X-FDA: 78216563208.20.AD1FB7A Received: from mail-lj1-f178.google.com (mail-lj1-f178.google.com [209.85.208.178]) by imf27.hostedemail.com (Postfix) with ESMTP id 3E7228019881 for ; Fri, 4 Jun 2021 16:23:43 +0000 (UTC) Received: by mail-lj1-f178.google.com with SMTP id w15so12263507ljo.10 for ; Fri, 04 Jun 2021 09:23:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=j4zffj7E9EbdbLK666HJjMAp0CCeTiBG3SeLs6yURco=; b=Xd9Q5Akkct+3Yc/b5EMB8H3iNUzudJwdTcz/rqPQmSyq4OQHua2LaBtMg3H+TZICsF Z+qObzNdoIvKqchb8V3DUxkszulPKFrcjr1wib/gKPs8y/FgiDAZWY0ZCRUoJKblDzJy nc+C+WMpyYKPHHEvAefee8NtFffZnK1hSSrz2ljLJVFyBlBTInzllrJm0K188Sjs7gOO L4PdYmDPWTq6jvxLLVM1/rSKcIYERhMVt7avbX5J9MkgPDDoJfb9Y4UCEFTmcZqgbzvP gPWovDFH/aK7MsPJqu6OnqsACkxgAJvk/NItKQ961j2E4/QZLJhdkUKGmgA56fCJfRmC eh9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=j4zffj7E9EbdbLK666HJjMAp0CCeTiBG3SeLs6yURco=; b=mL8Rgch5nR7ipIqA/jL3zuomZJJH6P6xDhpassDtvs8CGdGgZuUmxLPB/Ckh1qzmBM fqVtB6dQBqpqKZSa6RA8wCDUPZ9YzpTzpt6F/QP7bWSt1hlEet90uEfmKkiJBFJqzwgg 1u4Is7bS62epdgHdIr8vLPrbMv9LHl4YnPMrxUz7IZJ2j4p5fvw3neQCkbdOxzq6HBgF f91xK/sgmSa/IPIorK61G3gS5bJsoOjEGEagPrqJE3RQOqA2VyOzUXAsMreYsncsHMzk 47dw7H5ipDH5imtZToanbAxQer90adVRgqOj0XCyTzqAqk6izucLbpRrA6wyAJrTorPw tKzA== X-Gm-Message-State: AOAM533mc+j6oLK0ydj4RXWQRkpBYt/AtZQAyaHN38e0cGgVHaAGpyFI LK3sWV1NXKcqAYwEOOyde+hGCQ== X-Google-Smtp-Source: ABdhPJzTRJby1E7MHjm2jAHe7li5T6aIsGgqsyeh1pmumnjO2kcY+nhmf5gkZSODzdsl3jFuDylcXQ== X-Received: by 2002:a05:651c:178f:: with SMTP id bn15mr4174552ljb.448.1622823831248; Fri, 04 Jun 2021 09:23:51 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id a3sm232252lfu.11.2021.06.04.09.23.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 04 Jun 2021 09:23:50 -0700 (PDT) Received: by box.localdomain (Postfix, from userid 1000) id ECB991027A9; Fri, 4 Jun 2021 19:24:02 +0300 (+03) Date: Fri, 4 Jun 2021 19:24:02 +0300 From: "Kirill A. Shutemov" To: Hugh Dickins Cc: Andrew Morton , "Kirill A. Shutemov" , Yang Shi , Wang Yugui , Matthew Wilcox , Naoya Horiguchi , Alistair Popple , Ralph Campbell , Zi Yan , Miaohe Lin , Minchan Kim , Jue Wang , Peter Xu , Jan Kara , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 5/7] mm/thp: fix page_vma_mapped_walk() if huge page mapped by ptes Message-ID: <20210604162402.iclcdd3ywynkoamy@box.shutemov.name> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 3E7228019881 Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=shutemov-name.20150623.gappssmtp.com header.s=20150623 header.b=Xd9Q5Akk; spf=none (imf27.hostedemail.com: domain of kirill@shutemov.name has no SPF policy when checking 209.85.208.178) smtp.mailfrom=kirill@shutemov.name; dmarc=none X-Stat-Signature: 56ctdc8i9sxoe38afh1u7h4ojgbfwfyd X-HE-Tag: 1622823823-268359 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jun 01, 2021 at 02:13:21PM -0700, Hugh Dickins wrote: > Running certain tests with a DEBUG_VM kernel would crash within hours, > on the total_mapcount BUG() in split_huge_page_to_list(), while trying > to free up some memory by punching a hole in a shmem huge page: split's > try_to_unmap() was unable to find all the mappings of the page (which, > on a !DEBUG_VM kernel, would then keep the huge page pinned in memory). > > Crash dumps showed two tail pages of a shmem huge page remained mapped > by pte: ptes in a non-huge-aligned vma of a gVisor process, at the end > of a long unmapped range; and no page table had yet been allocated for > the head of the huge page to be mapped into. > > Although designed to handle these odd misaligned huge-page-mapped-by-pte > cases, page_vma_mapped_walk() falls short by returning false prematurely > when !pmd_present or !pud_present or !p4d_present or !pgd_present: there > are cases when a huge page may span the boundary, with ptes present in > the next. Oh. My bad. I guess it was pain to debug. > Restructure page_vma_mapped_walk() as a loop to continue in these cases, > while keeping its layout much as before. Add a step_forward() helper to > advance pvmw->address across those boundaries: originally I tried to use > mm's standard p?d_addr_end() macros, but hit the same crash 512 times > less often: because of the way redundant levels are folded together, > but folded differently in different configurations, it was just too > difficult to use them correctly; and step_forward() is simpler anyway. > > Merged various other minor fixes and cleanups into page_vma_mapped_walk() > as I worked on it: which I find much easier to enumerate here than to > prise apart into separate commits. But it makes it harder to review... > Handle all of the hugetlbfs PageHuge() case once at the start, > so we don't need to worry about it again further down. > > Sometimes local copy of pvmw->page was used, sometimes pvmw->page: > just use pvmw->page throughout (and continue to use pvmw->address > throughout, though we could take a local copy instead). > > Use pmd_read_atomic() with barrier() instead of READ_ONCE() for pmde: > some architectures (e.g. i386 with PAE) have a multi-word pmd entry, > for which READ_ONCE() is not good enough. > > Re-evaluate pmde after taking lock, then use it in subsequent tests, > instead of repeatedly dereferencing pvmw->pmd pointer. > > Rearrange the !pmd_present block to follow the same "return not_found, > return not_found, return true" pattern as the block above it (note: > returning not_found there is never premature, since the existence or > prior existence of a huge pmd guarantees good alignment). > > Adjust page table boundary test in case address was not page-aligned. > > Reset pvmw->pte to NULL after unmapping that page table. > > Respect the 80-column line limit. > > Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()") > Signed-off-by: Hugh Dickins > Cc: I tried to review it and superficially it looks good, but it has to be split into bunch of patches. > /* when pud is not present, pte will be NULL */ > - pvmw->pte = huge_pte_offset(mm, pvmw->address, page_size(page)); > + pvmw->pte = huge_pte_offset(mm, pvmw->address, > + page_size(pvmw->page)); AFAICS, it exactly fits into 80-column. > if (!pvmw->pte) > return false; > > - pvmw->ptl = huge_pte_lockptr(page_hstate(page), mm, pvmw->pte); > + pvmw->ptl = huge_pte_lockptr(page_hstate(pvmw->page), > + mm, pvmw->pte); And this one end on 79. Hm? -- Kirill A. Shutemov