From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7AE03CD37B0 for ; Mon, 18 Sep 2023 14:52:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D60646B0397; Mon, 18 Sep 2023 10:52:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D104C6B03A0; Mon, 18 Sep 2023 10:52:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BFF406B03A1; Mon, 18 Sep 2023 10:52:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id B1C4C6B0397 for ; Mon, 18 Sep 2023 10:52:08 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6A6634035D for ; Mon, 18 Sep 2023 14:52:08 +0000 (UTC) X-FDA: 81250008336.25.91E100B Received: from mail-qk1-f182.google.com (mail-qk1-f182.google.com [209.85.222.182]) by imf30.hostedemail.com (Postfix) with ESMTP id 51E7B80019 for ; Mon, 18 Sep 2023 14:52:06 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=S0eZ1huP; spf=pass (imf30.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.182 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695048726; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WFs3nj+bJKwYJ1k9TrPfAOIb/ef9SteUtb45Y7MG8pg=; b=3yDmE8Cchn/968rYSOGfw+U59F7BIqeB0RBkJw8LNF1V9AS6eOUSV0Y0Yw/P9T2dGyUsrC 3s6l9DmaiWsNeKgnKedQyC5BAgJrcJCxy0yafYNY/xnAFcwrJO0T9JUicp/p7OkpMP8Itl q4pvwZWm/3RguCXdnddh/2+Dv16OzB4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695048726; a=rsa-sha256; cv=none; b=ERD/rZKF4sngYcdABjCn9KfXebZNcFhxEg4dL2P9YinVSTVF+Uxb836ATK+gIEUee/xkgp 5aOtn1SkT3yUcWXb+4CChr6081/NsRdb2pegYVYQZe62FQ6zUt1BkwBB8gyGslMy5/Gm7C USUULYpRqY0dJHx5N5Dbolsr2nf6mzA= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=S0eZ1huP; spf=pass (imf30.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.182 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org Received: by mail-qk1-f182.google.com with SMTP id af79cd13be357-76ee895a3cbso302984185a.0 for ; Mon, 18 Sep 2023 07:52:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1695048725; x=1695653525; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=WFs3nj+bJKwYJ1k9TrPfAOIb/ef9SteUtb45Y7MG8pg=; b=S0eZ1huPCGnE/5HjmySoewhPp9FdpNbA8diXGLBep2O4BsXg3eBRf8pZHYCLNCiTN7 VgFjl9+UO6ZjwWjfzwaMIaAaVMwgCv0I5rxRO2o8h9aapoSbIq5TIUAqJC9yrj/4KGnr Y9Z6crekIxRoghTTVEj3QvpXeyp2YJxkzjdY9uc5t0J3A7UjKxMM5rmZJqRXfGiGRN8q 1F88YGTrcsgtPIBM+alGl1l818W0j0pwGqu4BHp1Nd1hEZgfdN4MOewSRPU/IA1Kwtdn Rq5ZsQWkir1DEovimWC/uJK7Rck8z4I/buTph3XDonsSgIagf46jEDse/pmeuqLtsY4M KKIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695048725; x=1695653525; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=WFs3nj+bJKwYJ1k9TrPfAOIb/ef9SteUtb45Y7MG8pg=; b=F4+4oit0Z2Y+Mz+Kx6w7D7YL/sM5rPk/7qu5jvu+dWkH2tV6txTqI/qbXXwfkl+wgm cGSj9ISU3zAlJoaMKswA1z0hkNVmBje5Pnb/Uh27yZVGWU4ecJ9M+/ZrRMHdCXX2MPqo BJEkerAqywqhZVJ3KKSOXHv8SS5IMDh1axgY/6oy99AqHsPQ+Im+9iolqKFAa5NIs3rB e3C5333UNFwXDETjr3ehQvFjAdmOqdKQ3cWSVGIsjbvBSmQunozCWoFJEWAMCTk3eJ5f b7bEoeyDTiwoiTsIhVqhzlQe96eGpStlZ214cxRUU4orw2SxUEiX11pdg6AsqOOOf2pD oCmg== X-Gm-Message-State: AOJu0YxU23bYBWjVrcJ188TWIF/uc2QowqiITTutGXDiOl1KZfKYbGXL +UIqIbyUy8YtLk840dVPWzHrXA== X-Google-Smtp-Source: AGHT+IF3jtg8NLpTHvdIGldilcSYWHIqFKGmN6Qp8wfIv9FDYxzi3jhUfsDf9apGMfwbe5hgtp6ovg== X-Received: by 2002:a05:620a:14b1:b0:76e:f804:1532 with SMTP id x17-20020a05620a14b100b0076ef8041532mr8344556qkj.33.1695048725319; Mon, 18 Sep 2023 07:52:05 -0700 (PDT) Received: from localhost (2603-7000-0c01-2716-3012-16a2-6bc2-2937.res6.spectrum.com. [2603:7000:c01:2716:3012:16a2:6bc2:2937]) by smtp.gmail.com with ESMTPSA id w3-20020a05620a148300b0077263636a95sm3184889qkj.93.2023.09.18.07.52.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Sep 2023 07:52:04 -0700 (PDT) Date: Mon, 18 Sep 2023 10:52:04 -0400 From: Johannes Weiner To: Vlastimil Babka Cc: Mike Kravetz , Andrew Morton , Mel Gorman , Miaohe Lin , Kefeng Wang , Zi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene Message-ID: <20230918145204.GB16104@cmpxchg.org> References: <20230911195023.247694-1-hannes@cmpxchg.org> <20230914235238.GB129171@monkey> <20230915141610.GA104956@cmpxchg.org> <20230916195739.GB618858@monkey> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Stat-Signature: zepa6gb4wqo18fcbijye8ih6u3h3outf X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 51E7B80019 X-Rspam-User: X-HE-Tag: 1695048726-560923 X-HE-Meta: U2FsdGVkX1+O5Dh04H1ADkO4WjA8/6PdIs3bLguffPIkkyw8s6GFhWuXZ2aBZAV3rI/Rlzzdsg4If4EyLnJNGM7SZievXhvGGkCRhalfxro4mcu3fIxhpVAoZGi7r/rTT0krEnObqPrd9pgwrJTX12QC5gWRD3Jj3sPa81HENPxxWkfWQwlu9bISzpVZxLzukmegnerfPcMRqeoIreHdYROOhNpBgi/iuAzrgId08+dJpa1xuqm577yFsc4Gd2Fs1sJeJ6bTmZDL05W/v9nGuhsdJEQkajS+3HbeM1JcTxHA2QVK7tTouW9DM8t4Lv+wzLqDWkOu+O+rQiIe4Qdq5MtXjhxpOdUjEWhpk9ojRt2FIo4SB2MK2oJdZ6boCmCXc4wegcqYQtLAX/UEcYdouzJxdCTdxYC8B12QrVwtqJb3raMs9r8H6c8Gvyz+TFEVXPw1oF7R+v/1SIHdcjYqXk7VIU1i5QktZmimDYdkDbtAPMwFSLEhKr4pR6dnJYbIkXvLYd6zypdx7b9iPnpOPq+qh2xzRG9UKBXj+G2dXdVAiiBxbiP4D9cCfLKHfdkQfSbDHNtrxQ8tD5OQPcZRijAeeeI5VoX+XXsdVxcY2o+2KvvEng4YP1TUa4cZQCHTAvfuV6aPPXFK9eDo28Ko3RHSKdUhYCcjOXTDq+pdn61i2Ugu2Gl762n/VzZQTV4irIbt246upz42VcyxaCnNsCLgLHLljcQ28nLytLaUCVU9QlD49WKbFjheAARXR0JJKHq5JLMdfmAZ0x2q/XltSa8OFJWR2pjN8D9Ii+x6ET2a0iPd1qhtHV6NMfuJSh/Dnj8+edS0bxJK1TY7Do2XisBT73gQXFYTU7If1tKrYTLeWvaqGwKz8woigIzkiz7AnOjolaAvUSfErC3v6jMF0/miec4Hzqgoq/zV2nWWfl8fRhl8fj1k87e1HY7IYV4KLbiqmpRAomN5E0Qz/rr HZgCRC0h BhdzGddrrgG9W1Mr/4UX8s17LOv0YoWX7lmSSouLum7l4JynPnmBXzjaIdmqaBf5L4zm/7C5n/WMbzdQ/bd/6YGzxcRc3eZFC7AFQNoZkby417MWQn431agVd9+uQuhmdXkNXztKmF+zSy7cJwAsVkwZOHNuC1QPD4GsDxoaZlQNHMofgrDb3sG+tL7wi3PeWQ0G2hxXJ1jQDjCvSZsLqI8duJD4NFrgtuVgRVOx30KuSeMRzTWNsho3+NNvpiTgmZ7M+a/CRUXSbOIXOUeDJLFYfGDkELcYCI/cfaBRQHaea1eTFxgYsbMLQmgPlLrCOcrqDescnPstO0zBaa97yDC6AfZGQCfQT+gNWWx5Rj+lvni1lQO2Zv1YkYDamO2mHaAxq9Hlb/tHPa6CzWigKKB59I3F3kISe1n/vePyq2WG19PXjTrYvTpUgKeX2CxdKlHkB X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Sep 18, 2023 at 09:16:58AM +0200, Vlastimil Babka wrote: > On 9/16/23 21:57, Mike Kravetz wrote: > > On 09/15/23 10:16, Johannes Weiner wrote: > >> On Thu, Sep 14, 2023 at 04:52:38PM -0700, Mike Kravetz wrote: > >> > In next-20230913, I started hitting the following BUG. Seems related > >> > to this series. And, if series is reverted I do not see the BUG. > >> > > >> > I can easily reproduce on a small 16G VM. kernel command line contains > >> > "hugetlb_free_vmemmap=on hugetlb_cma=4G". Then run the script, > >> > while true; do > >> > echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages > >> > echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/demote > >> > echo 0 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages > >> > done > >> > > >> > For the BUG below I believe it was the first (or second) 1G page creation from > >> > CMA that triggered: cma_alloc of 1G. > >> > > >> > Sorry, have not looked deeper into the issue. > >> > >> Thanks for the report, and sorry about the breakage! > >> > >> I was scratching my head at this: > >> > >> /* MIGRATE_ISOLATE page should not go to pcplists */ > >> VM_BUG_ON_PAGE(is_migrate_isolate(mt), page); > >> > >> because there is nothing in page isolation that prevents setting > >> MIGRATE_ISOLATE on something that's on the pcplist already. So why > >> didn't this trigger before already? > >> > >> Then it clicked: it used to only check the *pcpmigratetype* determined > >> by free_unref_page(), which of course mustn't be MIGRATE_ISOLATE. > >> > >> Pages that get isolated while *already* on the pcplist are fine, and > >> are handled properly: > >> > >> mt = get_pcppage_migratetype(page); > >> > >> /* MIGRATE_ISOLATE page should not go to pcplists */ > >> VM_BUG_ON_PAGE(is_migrate_isolate(mt), page); > >> > >> /* Pageblock could have been isolated meanwhile */ > >> if (unlikely(isolated_pageblocks)) > >> mt = get_pageblock_migratetype(page); > >> > >> So this was purely a sanity check against the pcpmigratetype cache > >> operations. With that gone, we can remove it. > > > > With the patch below applied, a slightly different workload triggers the > > following warnings. It seems related, and appears to go away when > > reverting the series. > > > > [ 331.595382] ------------[ cut here ]------------ > > [ 331.596665] page type is 5, passed migratetype is 1 (nr=512) > > [ 331.598121] WARNING: CPU: 2 PID: 935 at mm/page_alloc.c:662 expand+0x1c9/0x200 > > Initially I thought this demonstrates the possible race I was suggesting in > reply to 6/6. But, assuming you have CONFIG_CMA, page type 5 is cma and we > are trying to get a MOVABLE page from a CMA page block, which is something > that's normally done and the pageblock stays CMA. So yeah if the warnings > are to stay, they need to handle this case. Maybe the same can happen with > HIGHATOMIC blocks? Hm I don't think that's quite it. CMA and HIGHATOMIC have their own freelists. When MOVABLE requests dip into CMA and HIGHATOMIC, we explicitly pass that migratetype to __rmqueue_smallest(). This takes a chunk of e.g. CMA, expands the remainder to the CMA freelist, then returns the page. While you get a different mt than requested, the freelist typing should be consistent. In this splat, the migratetype passed to __rmqueue_smallest() is MOVABLE. There is no preceding warning from del_page_from_freelist() (Mike, correct me if I'm wrong), so we got a confirmed MOVABLE order-10 block from the MOVABLE list. So far so good. However, when we expand() the order-9 tail of this block to the MOVABLE list, it warns that its pageblock type is CMA. This means we have an order-10 page where one half is MOVABLE and the other is CMA. I don't see how the merging code in __free_one_page() could have done that. The CMA buddy would have failed the migrate_is_mergeable() test and we should have left it at order-9s. I also don't see how the CMA setup could have done this because MIGRATE_CMA is set on the range before the pages are fed to the buddy. Mike, could you describe the workload that is triggering this? Does this reproduce instantly and reliably? Is there high load on the system, or is it requesting the huge page with not much else going on? Do you see compact_* history in /proc/vmstat after this triggers? Could you please also provide /proc/zoneinfo, /proc/pagetypeinfo and the hugetlb_cma= parameter you're using? Thanks!