From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 381E2C2D0E2 for ; Thu, 24 Sep 2020 06:30:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AB5AF206B5 for ; Thu, 24 Sep 2020 06:30:51 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="P2+nsd4U" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AB5AF206B5 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C5D566B0071; Thu, 24 Sep 2020 02:30:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BE6C76B0072; Thu, 24 Sep 2020 02:30:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AAF296B0073; Thu, 24 Sep 2020 02:30:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0213.hostedemail.com [216.40.44.213]) by kanga.kvack.org (Postfix) with ESMTP id 8D9766B0071 for ; Thu, 24 Sep 2020 02:30:50 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 43CE38249980 for ; Thu, 24 Sep 2020 06:30:50 +0000 (UTC) X-FDA: 77296981860.21.cup15_00084a82715c Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin21.hostedemail.com (Postfix) with ESMTP id 17C27180442C3 for ; Thu, 24 Sep 2020 06:30:50 +0000 (UTC) X-HE-Tag: cup15_00084a82715c X-Filterd-Recvd-Size: 5592 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf09.hostedemail.com (Postfix) with ESMTP for ; Thu, 24 Sep 2020 06:30:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600929049; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=rFFyYTrms2NUjPJLW1DPbzsCtUaxd33WbnTy6rIAmEs=; b=P2+nsd4UixAmGt7HXmlGtVTnEit75PL8li2IBdqnKtFmerF8d0rW2gANgkzD1qS2qqurwo yzSe/Zxj5FEnwlZ77dvtH9eYvHw2XIFMtomD5AnZFp9vPyJW5VSOs8yUHp8oGHq4zeDVwH 9qgZXuJCzwfXkx3kFdHLw0XLEBNXGY8= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-476-Y4uhOkLUOfKTtV0MCvFwnQ-1; Thu, 24 Sep 2020 02:30:45 -0400 X-MC-Unique: Y4uhOkLUOfKTtV0MCvFwnQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 62A6C64096; Thu, 24 Sep 2020 06:30:43 +0000 (UTC) Received: from optiplex-lnx (unknown [10.3.128.5]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 1776678827; Thu, 24 Sep 2020 06:30:41 +0000 (UTC) Date: Thu, 24 Sep 2020 02:30:38 -0400 From: Rafael Aquini To: "Huang, Ying" Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org Subject: Re: [PATCH] mm: swapfile: avoid split_swap_cluster() NULL pointer dereference Message-ID: <20200924063038.GD1023012@optiplex-lnx> References: <20200922184838.978540-1-aquini@redhat.com> <878sd1qllb.fsf@yhuang-dev.intel.com> <20200923043459.GL795820@optiplex-lnx> <87sgb9oz1u.fsf@yhuang-dev.intel.com> <20200923130138.GM795820@optiplex-lnx> <87blhwng5f.fsf@yhuang-dev.intel.com> <20200924020928.GC1023012@optiplex-lnx> <877dsjessq.fsf@yhuang-dev.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <877dsjessq.fsf@yhuang-dev.intel.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Sep 24, 2020 at 11:51:17AM +0800, Huang, Ying wrote: > Rafael Aquini writes: > > The bug here is quite simple: split_swap_cluster() misses checking for > > lock_cluster() returning NULL before committing to change cluster_info->flags. > > I don't think so. We shouldn't run into this situation firstly. So the > "fix" hides the real bug instead of fixing it. Just like we call > VM_BUG_ON_PAGE(!PageLocked(head), head) in split_huge_page_to_list() > instead of returning if !PageLocked(head) silently. > Not the same thing, obviously, as you are going for an apples-to-carrots comparison, but since you mentioned: split_huge_page_to_list() asserts (in debug builds) *page is locked, and later checks if *head bears the SwapCache flag. deferred_split_scan(), OTOH, doesn't hand down the compound head locked, but the 2nd page in the group instead. This doesn't necessarely means it's a problem, though, but might help on hitting the issue. > > The fundamental problem has nothing to do with allocating, or not allocating > > a swap cluster, but it has to do with the fact that the THP deferred split scan > > can transiently race with swapcache insertion, and the fact that when you run > > your swap area on rotational storage cluster_info is _always_ NULL. > > split_swap_cluster() needs to check for lock_cluster() returning NULL because > > that's one possible case, and it clearly fails to do so. > > If there's a race, we should fix the race. But the code path for > swapcache insertion is, > > add_to_swap() > get_swap_page() /* Return if fails to allocate */ > add_to_swap_cache() > SetPageSwapCache() > > While the code path to split THP is, > > split_huge_page_to_list() > if PageSwapCache() > split_swap_cluster() > > Both code paths are protected by the page lock. So there should be some > other reasons to trigger the bug. As mentioned above, no they seem to not be protected (at least, not the same page, depending on the case). While add_to_swap() will assure a page_lock on the compound head, split_huge_page_to_list() does not. > And again, for HDD, a THP shouldn't have PageSwapCache() set at the > first place. If so, the bug is that the flag is set and we should fix > the setting. > I fail to follow your claim here. Where is the guarantee, in the code, that you'll never have a compound head in the swapcache? > > Run a workload that cause multiple THP COW, and add a memory hogger to create > > memory pressure so you'll force the reclaimers to kick the registered > > shrinkers. The trigger is not heavy swapping, and that's probably why > > most swap test cases don't hit it. The window is tight, but you will get the > > NULL pointer dereference. > > Do you have a script to reproduce the bug? > Nope, a convoluted set of internal regression tests we have usually triggers it. In the wild, customers running HANNA are seeing it, occasionally. > > Regardless you find furhter bugs, or not, this patch is needed to correct a > > blunt coding mistake. > > As above. I don't agree with that. > It's OK to disagree, split_swap_cluster still misses the cluster_info NULL check, though.