From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BCB6DC433E0 for ; Wed, 13 Jan 2021 22:38:04 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 33D2823370 for ; Wed, 13 Jan 2021 22:38:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 33D2823370 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A8A588D009C; Wed, 13 Jan 2021 17:38:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A3B028D008E; Wed, 13 Jan 2021 17:38:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8DC718D009C; Wed, 13 Jan 2021 17:38:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0020.hostedemail.com [216.40.44.20]) by kanga.kvack.org (Postfix) with ESMTP id 74BCB8D008E for ; Wed, 13 Jan 2021 17:38:03 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 30A55181AEF32 for ; Wed, 13 Jan 2021 22:38:03 +0000 (UTC) X-FDA: 77702216046.30.swim32_5a1792f27521 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin30.hostedemail.com (Postfix) with ESMTP id 0D397180B3AB8 for ; Wed, 13 Jan 2021 22:38:03 +0000 (UTC) X-HE-Tag: swim32_5a1792f27521 X-Filterd-Recvd-Size: 5938 Received: from mail-lf1-f51.google.com (mail-lf1-f51.google.com [209.85.167.51]) by imf38.hostedemail.com (Postfix) with ESMTP for ; Wed, 13 Jan 2021 22:38:02 +0000 (UTC) Received: by mail-lf1-f51.google.com with SMTP id o19so5191167lfo.1 for ; Wed, 13 Jan 2021 14:38:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=OAIWjOtXPXk3f6XGXvlQDMsqqk/Jc2w4tbFamd1OcrM=; b=K/eirvkv5kcBKYbbEyvX7Zv/UJQNaPzp/XrxS/sGEZXeaoXL6+zJxS1r8PZw/FU00U g5iIEhcguSXVlJLsHNhOgpMldxjLP2XQAXQSHPs5kpn7fifZzsWJc3ICDApAhaCRKJWf Ubzk6w+ulrIdDenC5gCem8VvNQCmG4jtjWxf+xMXvLNYkyk9uaqTbNW5WIXoTuSszrPE lv8Mf7vWlMmHcoH3hDLJE5N2XVmtTYrEeKjbA4ZDnXE9UuOz++bZwy5WfMAzmPIzNEXB UmWoAkytH6FipvksxcU+3cajHH3qWhFRHH8MX9eB0t88lKnUSPNrudc1F37KmgjnN7j5 0KxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=OAIWjOtXPXk3f6XGXvlQDMsqqk/Jc2w4tbFamd1OcrM=; b=jlJoFKIXPnk7rKSPRO0A6pki5llntXvc1/B+6KmNSJ/fDtXKGRQP2gtyzWsyl0yh3i 9UwGFkMdUWyxInLjQubVmdlqSARPmIy5mAk4688pIj8SWGPnw5irPzW7NhqmgrSZjOAP HwM4bx9lrg/LxP4eFtGcd7Zqwwkdn8IsF59PUx9HqzNG8g1a7UCUmPB5/TKgNz/h0Okz a7Yv/cfsrLByEb2fmc5VVFZFYTygKHQc4cDevbcG5LNX9F0JyIfYVEAmXTUsdiRWc7qP 2FT/OqvvGUv72NrAqcJQ/oqct0g+RXISAQBuomvbqYbU7YEDw1jqt039AqZSVK1avO0Z +8HQ== X-Gm-Message-State: AOAM532qTbEUuJnbhvsEKpKeVzskmJQuoSrXbUV95+le7vrwWLraESN5 neYhwpc+a3RkDyC1/4SSftzftdJLdGS9WTsl7souTg== X-Google-Smtp-Source: ABdhPJz9YfxCg3PiKUM6Bq9lL0bPJH5jphc+Sky7zSJAZ6PbIhqNiXSN6mcBEDtWNThpsM+kIKuRxP4W7TM1oY2UuJk= X-Received: by 2002:a19:8210:: with SMTP id e16mr1657132lfd.69.1610577480744; Wed, 13 Jan 2021 14:38:00 -0800 (PST) MIME-Version: 1.0 References: <2f0f46e8-2535-410a-1859-e9cfa4e57c18@suse.cz> In-Reply-To: <2f0f46e8-2535-410a-1859-e9cfa4e57c18@suse.cz> From: Jann Horn Date: Wed, 13 Jan 2021 23:37:34 +0100 Message-ID: Subject: Re: SLUB: percpu partial object count is highly inaccurate, causing some memory wastage and maybe also worse tail latencies? To: Vlastimil Babka Cc: Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Andrew Morton , Linux-MM , kernel list , Thomas Gleixner , Sebastian Andrzej Siewior , Roman Gushchin , Johannes Weiner , Shakeel Butt , Suren Baghdasaryan , Minchan Kim , Michal Hocko Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jan 13, 2021 at 8:14 PM Vlastimil Babka wrote: > On 1/12/21 12:12 AM, Jann Horn wrote: > It doesn't help that slabinfo (global or per-memcg) is also > inaccurate as it cannot count free objects on per-cpu partial slabs and thus > reports them as active. Maybe SLUB could be taught to track how many objects are in the percpu machinery, and then print that number separately so that you can at least know how much data you're missing without having to collect data with IPIs... > > It might be a good idea to figure out whether it is possible to > > efficiently keep track of a more accurate count of the free objects on > > As long as there are some inuse objects, it shouldn't matter much if the slab is > sitting on per-cpu partial list or per-node list, as it can't be freed anyway. > It becomes a real problem only after the slab become fully free. If we detected > that in __slab_free() also for already-frozen slabs, we would need to know which > CPU this slab belongs to (currently that's not tracked afaik), Yeah, but at least on 64-bit systems we still have 32 completely unused bits in the counter field that's updated via cmpxchg_double on struct page. (On 32-bit systems the bitfields are also wider than they strictly need to be, I think, at least if the system has 4K page size.) So at least on 64-bit systems, we could squeeze a CPU number in there, and then you'd know to which CPU the page belonged at the time the object was freed. > and send it an > IPI to do some light version of unfreeze_partials() that would only remove empty > slabs. The trick would be not to cause too many IPI's by this, obviously :/ Some brainstorming: Maybe you could have an atomic counter in kmem_cache_cpu that tracks the number of empty frozen pages that are associated with a specific CPU? So the freeing slowpath would do its cmpxchg_double, and if the new state after a successful cmpxchg_double is "inuse==0 && frozen == 1" with a valid CPU number, you afterwards do "atomic_long_inc(&per_cpu_ptr(cache->cpu_slab, cpu)->empty_partial_pages)". I think it should be possible to implement that such that the empty_partial_pages count, while not immediately completely accurate, would be eventually consistent; and readers on the CPU owning the kmem_cache_cpu should never see a number that is too large, only one that is too small. You could additionally have a plain percpu counter, not tied to the kmem_cache, and increment it by 1<