From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 466EAC2D0C9 for ; Thu, 12 Dec 2019 18:29:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 09E0224654 for ; Thu, 12 Dec 2019 18:29:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b="gaEqyElt" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 09E0224654 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A34AA8E0005; Thu, 12 Dec 2019 13:29:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9E5E48E0001; Thu, 12 Dec 2019 13:29:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8AD7C8E0005; Thu, 12 Dec 2019 13:29:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0189.hostedemail.com [216.40.44.189]) by kanga.kvack.org (Postfix) with ESMTP id 71F858E0001 for ; Thu, 12 Dec 2019 13:29:23 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id 1EEA1181AEF1A for ; Thu, 12 Dec 2019 18:29:23 +0000 (UTC) X-FDA: 76257327006.19.chain26_34bef062c205b X-HE-Tag: chain26_34bef062c205b X-Filterd-Recvd-Size: 5906 Received: from mail-lj1-f195.google.com (mail-lj1-f195.google.com [209.85.208.195]) by imf35.hostedemail.com (Postfix) with ESMTP for ; Thu, 12 Dec 2019 18:29:22 +0000 (UTC) Received: by mail-lj1-f195.google.com with SMTP id c19so3359001lji.11 for ; Thu, 12 Dec 2019 10:29:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=RFYeLe4AYUGFszSVCe9rzGReH5534Kp7HFzyL7IgJoo=; b=gaEqyEltnYHeY4bwG67JYh7XhezoYwB/vlBuat5eb9B1wDrnC3gxSKR6wn2/kJ9gaN xQV2GF7RBP8bhWsGKKAD6uojhcOpaKl/HoD3iV1ESQk4Ak3DGeXjvNFW7PMlrhdEMozD y6r7JlXMbpP8G/nes7/wTMJB8Ait6DP/0iQww= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=RFYeLe4AYUGFszSVCe9rzGReH5534Kp7HFzyL7IgJoo=; b=k9U5hZWnY1pzPH4RakJAxganyMES5jeHGquLNErWn+Xm11BnG4kIJ8mmf9bOtU5H3Z Z9iCBKFxbPwCRa0ZPaHZphaG2gnJkSSORyc4JZXpJp3emEt62SF3IPBVkBLGCLUO84/3 7T+qOXb78jF7vFUz8JQMlZ4+t6YXaBdl5tT5y9XtVYc0l2Z18h2nFcStq3eIZNyraChn ikyj1CzYqM18ttyyGUwIrnMlRo7N37Y349El36YSdnadHPSqW4FjMg83xTX/v6tsc1zG zmKuwo/nAlsTi4mPL0pBL6r2inFLbEdZy+A1PYKjqx/kGjny3FnyIBYVBJyEOwSBIBUQ LpEg== X-Gm-Message-State: APjAAAWWx2v6cr6C4J9kuABDbAMRi1OhAZGHedPuyV7Nlh7yRURzzk7C nLWl5PVTlCR8qP19xkiGVTD8/44XhbQ= X-Google-Smtp-Source: APXvYqwkNcA9pt8+01ER49BQnHcNvpN1c4kCCljCjhGyeO8xMO/yaMts1JeQvAZqYppsJIEmD5yL5A== X-Received: by 2002:a2e:850f:: with SMTP id j15mr6854201lji.91.1576175360203; Thu, 12 Dec 2019 10:29:20 -0800 (PST) Received: from mail-lf1-f52.google.com (mail-lf1-f52.google.com. [209.85.167.52]) by smtp.gmail.com with ESMTPSA id n13sm3384713lji.91.2019.12.12.10.29.19 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 12 Dec 2019 10:29:19 -0800 (PST) Received: by mail-lf1-f52.google.com with SMTP id m30so2432009lfp.8 for ; Thu, 12 Dec 2019 10:29:19 -0800 (PST) X-Received: by 2002:a19:4351:: with SMTP id m17mr6767957lfj.61.1576175358789; Thu, 12 Dec 2019 10:29:18 -0800 (PST) MIME-Version: 1.0 References: <0d4e3954-c467-30a7-5a8e-7c4180275533@kernel.dk> <1c93194a-ed91-c3aa-deb5-a3394805defb@kernel.dk> <20191212015612.GP32169@bombadil.infradead.org> <20191212175200.GS32169@bombadil.infradead.org> In-Reply-To: <20191212175200.GS32169@bombadil.infradead.org> From: Linus Torvalds Date: Thu, 12 Dec 2019 10:29:02 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCHSET v3 0/5] Support for RWF_UNCACHED To: Matthew Wilcox Cc: Jens Axboe , Linux-MM , linux-fsdevel , linux-block , Chris Mason , Dave Chinner , Johannes Weiner Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Dec 12, 2019 at 9:52 AM Matthew Wilcox wrote: > > 1. We could semi-sort the pages on the LRU list. If we know we're going > to remove a bunch of pages, we could take a batch of them off the list, > sort them and remove them in-order. This probably wouldn't be terribly > effective. I don't think the sorting is relevant. Once you batch things, you already would get most of the locality advantage in the cache if it exists (and the batch isn't insanely large so that one batch already causes cache overflows). The problem - I suspect - is that we don't batch at all. Or rather, the "batching" does exist at a high level, but it's so high that there's just tons of stuff going on between single pages. It is at the shrink_page_list() level, which is pretty high up and basically does one page at a time with locking and a lot of tests for each page, and then we do "__remove_mapping()" (which does some more work) one at a time before we actually get to __delete_from_page_cache(). So it's "batched", but it's in a huge loop, and even at that huge loop level the batch size is fairly small. We limit it to SWAP_CLUSTER_MAX, which is just 32. Thinking about it, that SWAP_CLUSTER_MAX may make sense in some other circumstances, but not necessarily in the "shrink clean inactive pages" thing. I wonder if we could just batch clean pages a _lot_ more aggressively. Yes, our batching loop is still very big and it might not help at an L1 level, but it might help in the L2, at least. In kswapd, when we have 28 GB of pages on the inactive list, a batch of 32 pages at a time is pretty small ;) > 2. We could change struct page to point to the xa_node that holds them. > Looking up the page mapping would be page->xa_node->array and then > offsetof(i_pages) to get the mapping. I don't think we have space in 'struct page', and I'm pretty sure we don't want to grow it. That's one of the more common data structures in the kernel. Linus