From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.7 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5DE54C4338F for ; Mon, 16 Aug 2021 19:40:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D959760F4B for ; Mon, 16 Aug 2021 19:40:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org D959760F4B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 792AF8D0003; Mon, 16 Aug 2021 15:40:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 742918D0001; Mon, 16 Aug 2021 15:40:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 60A318D0003; Mon, 16 Aug 2021 15:40:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0055.hostedemail.com [216.40.44.55]) by kanga.kvack.org (Postfix) with ESMTP id 452298D0001 for ; Mon, 16 Aug 2021 15:40:11 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id E755E181BC812 for ; Mon, 16 Aug 2021 19:40:10 +0000 (UTC) X-FDA: 78481959780.25.315AD2B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf09.hostedemail.com (Postfix) with ESMTP id 7C6B4300494B for ; Mon, 16 Aug 2021 19:40:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1629142809; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=91yx0t94d7tEDZdUyFtrhsnq9rVN5DLyd0gLh3Xecp4=; b=F0uFIdQK2N3FGS1NsKGYPLS8qctdXvwqfJS5HT1KJhYc7KPcf4wMLCbyNhfIAd+rZrere4 H0ioiL9elUxKV9c4BSRjMaavV5geechuBWx3tvu7wks4sOOX60e1cy9DE074jDp9zA4zez nvG2jmBeKaLL5+keLAsC6Tm399xRNOE= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-552-Dv4vN9-UMmCZSeDHOVVqJA-1; Mon, 16 Aug 2021 15:40:08 -0400 X-MC-Unique: Dv4vN9-UMmCZSeDHOVVqJA-1 Received: by mail-wr1-f71.google.com with SMTP id q11-20020a5d61cb0000b02901550c3fccb5so5789835wrv.14 for ; Mon, 16 Aug 2021 12:40:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=91yx0t94d7tEDZdUyFtrhsnq9rVN5DLyd0gLh3Xecp4=; b=SFRSeEYtKkoWMzEAcBwaYIZR6kADvEjXSQ0y/oSVyqo4p9EeaizeytlKjkAO9vx4Lc EN+MU+JwuHt430FAlJTER8j5kHrPsU6evg7WpptvVt7bsXMMfqSrp1TD5akAFz3a+er9 4cBGCXWx2mgmBJf0odL+cqI3MivwFJgmNi3x3uKutfP5d3Sy2uIs51+vW4Wpd/NNrcr8 HbpT8FZmp5VX1jHLtvXZ/clRBgw55StmzNjenIZTBaaYU7n5RG9ROtiUVXU3xXV77/bz 7jonyfVTrRBvUVpo0g/D67mLG2FLL+Gv4iaOp5aZVDPVmc6Yl8IQNTPeK2nHO4Nj9IJ9 S56g== X-Gm-Message-State: AOAM531MjEaZfixVOaCmr0ok+tSfrbToCq0kzWOQvJ110HC6agdyD5xu 3loAKsypqMn5yrtJAoAKaaq7vlFzCBlqbIP6+LI1FSH8YQnlHWsIv7Jw1Y4S8KAN1NT4yI5QHKm 3EOJebPxhBYI= X-Received: by 2002:a5d:6909:: with SMTP id t9mr137424wru.44.1629142807539; Mon, 16 Aug 2021 12:40:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwAWv9dz1w7R7w0ECBb3TR/VPVl/uEUI6KrJEtALEsjtJiIswbbD/SZQNvpFIPR6rZPwB38xw== X-Received: by 2002:a5d:6909:: with SMTP id t9mr137410wru.44.1629142807341; Mon, 16 Aug 2021 12:40:07 -0700 (PDT) Received: from [192.168.3.132] (p5b0c67f1.dip0.t-ipconnect.de. [91.12.103.241]) by smtp.gmail.com with ESMTPSA id l9sm112312wrt.95.2021.08.16.12.40.06 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 16 Aug 2021 12:40:07 -0700 (PDT) Subject: Re: [PATCH 1/2] mm: hwpoison: don't drop slab caches for offlining non-LRU page To: Yang Shi Cc: =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= , Oscar Salvador , tdmackey@twitter.com, Andrew Morton , Jonathan Corbet , Linux MM , Linux Kernel Mailing List References: <20210816180909.3603-1-shy828301@gmail.com> <08a5ad43-7922-8cf8-31ed-4f6e0c346516@redhat.com> From: David Hildenbrand Organization: Red Hat Message-ID: <87385c20-78c1-ff04-7e91-f10253853994@redhat.com> Date: Mon, 16 Aug 2021 21:40:06 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=F0uFIdQK; spf=none (imf09.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 7C6B4300494B X-Stat-Signature: h9a1bdzq78nmuefshhdxq6r1zjzepccg X-HE-Tag: 1629142810-726882 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 16.08.21 21:37, Yang Shi wrote: > On Mon, Aug 16, 2021 at 12:15 PM David Hildenbrand wrote: >> >> On 16.08.21 20:09, Yang Shi wrote: >>> In the current implementation of soft offline, if non-LRU page is met, >>> all the slab caches will be dropped to free the page then offline. But >>> if the page is not slab page all the effort is wasted in vain. Even >>> though it is a slab page, it is not guaranteed the page could be freed >>> at all. >> >> ... but there is a chance it could be and the current behavior is >> actually helpful in some setups. > > I don't disagree it is kind of helpful for some cases, but the > question is how likely it is helpful and if the cost is worth it or > not. For non-slab page (of course, non-lru too), dropping slab doesn't > make any sense. Even though it is slab page, it must be a reclaimable > slab. Even though it is a reclaimable slab, dropping slab can't > guarantee all objects on the same page are dropped. > > IMHO the likelihood is not worth the cost and side effect, for example > the unsuable system. > >> >> [...] >> >>> The lockup made the machine is quite unusable. And it also made the >>> most workingset gone, the reclaimabled slab caches were reduced from 12G >>> to 300MB, the page caches were decreased from 17G to 4G. >>> >>> But the most disappointing thing is all the effort doesn't make the page >>> offline, it just returns: >>> >>> soft_offline: 0x1469f2: unknown non LRU page type 5ffff0000000000 () >>> >> >> In your example, yes. I had a look at the introducing commit: >> facb6011f399 ("HWPOISON: Add soft page offline support") >> >> " >> When the page is not free or LRU we try to free pages >> from slab and other caches. The slab freeing is currently >> quite dumb and does not try to focus on the specific slab >> cache which might own the page. This could be potentially >> improved later. >> " >> >> I wonder, if instead of removing it altogether, we could actually >> improve it as envisioned. >> >> To be precise, for alloc_contig_range() it would also make sense to be >> able to shrink only in a specific physical memory range; this here seems >> to be a similar thing. (actually, alloc_contig_range(), actual memory >> offlining and hw poisoning/soft-offlining have a lot in common) >> >> Unfortunately, the last time I took a brief look at teaching shrinkers >> to be range-aware, it turned out to be a lot of work ... so maybe this >> is really a long term goal to be mitigated in the meantime by disabling >> it, if it turns out to be more of a problem than actually help. > > Do you mean physical page range? Yes, it would need a lot of work. > TBH, I don't think it is quite feasible for the time being. > > The problem is slabs for shrinker are managed by objects rather than > pages. For example, dentry and inode objects (the most consumed > reclaimable slabs) are linked to lru, and shrinkers traverse the lru > to shrink the objects. The objects in a certain range can not be > guaranteed in the same range of physical pages. Right, essentially you would have to look at each individual object and test if it falls into the physical range of interest. Not that it can't be done I guess, but it screams to be a lot of work. -- Thanks, David / dhildenb