From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06B9AC433EF for ; Mon, 14 Feb 2022 11:55:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9A22C6B0078; Mon, 14 Feb 2022 06:55:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 94F9E6B007B; Mon, 14 Feb 2022 06:55:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8400C6B007D; Mon, 14 Feb 2022 06:55:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 76B286B0078 for ; Mon, 14 Feb 2022 06:55:11 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 2DB7E181AC9CC for ; Mon, 14 Feb 2022 11:55:11 +0000 (UTC) X-FDA: 79141229622.18.91F9AE2 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf18.hostedemail.com (Postfix) with ESMTP id 41EDE1C0006 for ; Mon, 14 Feb 2022 11:55:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1644839708; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=I2dFh+kYHvhKexuMaIfCalaxyniT2Q488OXbUC3EvHk=; b=dX6kB5TQ+/A9O6hixVxtdJVjiwHEsWl1AZHkBUQ2Congb1ahrQc80IhTuDuS1Zaj++IQw1 Et4OabL3awF8wqfd9ek9xyPIU+5eh8vc8U3bhEq4qy1OKfU6/rEmp9Ocz/obdz5E+A6o77 0Lb65JXWGbwMtkI4WjD33WYjnv+64vQ= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-135-GbKluEBrMLmODQMVXN5yHA-1; Mon, 14 Feb 2022 06:55:04 -0500 X-MC-Unique: GbKluEBrMLmODQMVXN5yHA-1 Received: by mail-wr1-f72.google.com with SMTP id q8-20020adfb188000000b001e33a8cdbf4so6805361wra.16 for ; Mon, 14 Feb 2022 03:55:04 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=I2dFh+kYHvhKexuMaIfCalaxyniT2Q488OXbUC3EvHk=; b=yqmRgYAM6RsmIzRmcUnEEs1nPtf4R3MtVKODqb/GqBb4Kq2aJ1Mwvys1hatjd6y+gS NxQGLXaDCqnRZI1Gu9I7YC8jWHQNVbUD3X1g1smO6fWrfK+cJDDtPToxB5mymsv3nMwE UCYFnHT6SOi4iZBX0v44+R/1Azpl9+JNfAIAn03ZQyRewB0W0QmVhtjJb2EpKrtoCZqd g24XRgB+ZIb7KXdW43B0mC783wFariM/4d/mU3shyBd3CUSpq9aT7YzWx2jd92983kc/ iUsWRBKcGL3NrfbaWorQzpODxrZIdPssd/+BwkNEE0d4bSDv5eSVPadd0WXAJT0D9LA+ 01/A== X-Gm-Message-State: AOAM531jqFmWvnnNa35K5BwD0Xj3/b4/ivaT7UV8IRlLNZ/bvBm3HETD AMFgn0LeDB7bk+NEniYAb7Hnu75XX9VfsGWi2GuI7Ih1H4bQnBZzkT45o+/VWp4MQK+otDch7Q7 Gcq6ezil60i0= X-Received: by 2002:a5d:64ae:: with SMTP id m14mr3532626wrp.592.1644839703629; Mon, 14 Feb 2022 03:55:03 -0800 (PST) X-Google-Smtp-Source: ABdhPJxYbzaEc/IN0vQ+bFY2qVkQ8ZjeEZ1wUiTT8wUkpbJcroqzyLBT0oR96pFvQQ00enFj95yR7w== X-Received: by 2002:a5d:64ae:: with SMTP id m14mr3532614wrp.592.1644839703415; Mon, 14 Feb 2022 03:55:03 -0800 (PST) Received: from ?IPV6:2003:cb:c707:5400:d8a3:8885:3275:4529? (p200300cbc7075400d8a3888532754529.dip0.t-ipconnect.de. [2003:cb:c707:5400:d8a3:8885:3275:4529]) by smtp.gmail.com with ESMTPSA id 3sm24607659wrz.86.2022.02.14.03.55.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 14 Feb 2022 03:55:03 -0800 (PST) Message-ID: Date: Mon, 14 Feb 2022 12:55:02 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.0 To: Yang Yanchao , linux-mm@kvack.org Cc: wuxu.wu@huawei.com References: <20220213110703.2008-1-yangyanchao6@huawei.com> From: David Hildenbrand Organization: Red Hat Subject: Re: The process hangs during memory-hotplug. In-Reply-To: <20220213110703.2008-1-yangyanchao6@huawei.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Stat-Signature: 93c4ac8zabjzfn9739t19icozugqay5s X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 41EDE1C0006 Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=dX6kB5TQ; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf18.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=david@redhat.com X-Rspam-User: X-HE-Tag: 1644839710-695083 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 13.02.22 12:07, Yang Yanchao wrote: > Hello, > Hi, > I find a hanging issue during memory-hotplug on kernel-4.18. you actually mean memory hotunplug / memory offlinig IIUC. > Repetition steps: > 1. malloc for all system memory, write 'x', then free > 2. for each removable memory block: Note that "removable=yes" was always racy and upstream Linux nowadays only keeps that property around to not break older user space -- upstream Linux always reports "removable=yes" if memory offlining is supported. > echo offline > /sys/devices/system/memory/memoryXXX/state > Then during the offline process, there is a high probability of being stuck for more than 20 minutes to five hours. > cat /sys/ Device/system/Memory/memoryXXX/state > The status is "going-offline" > I try to understand it by adding some print to the kernel.The discovery process can't exit in this loop: > __offline_pages > do_migrate_range > migrate_pages > unmap_and_move > move_to_new_page > fallback_migrate_page --> return EAGAIN > I try to clear the cache, but it don't seems to solve the problem. > echo 3 > /proc/sys/vm/drop_caches > Can I fix this problem with other Settings? Or can I see why it's stuck? There are no real guarantees what will happen when trying offlinig a memory block that's not onlined to ZONE_MOVABLE. You can observe the zone e.g., via $ cat /sys/devices/system/memory/memory40/valid_zones Normal Even with ZONE_MOVABLE, it can take quite a while (and in corner cases eventually forever) until offlining succeeds. Now, 20 minutes are a bit extreme. User space can always cancel offlining -- in your example, by killing the "echo offline > /sys/devices/system/memory/memoryXXX/state" process. Having that said, as raised by Matthew, a lot changed since 4.18, so you should try reproducing upstream. But even there, you can just cancel offlining if it takes too long. If you observe similar behavior on ZONE_MOVABLE, it would be interesting to find out how to better handle that to make offlining succeed faster. -- Thanks, David / dhildenb