From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D27E5C00140 for ; Tue, 2 Aug 2022 12:06:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 562E86B0071; Tue, 2 Aug 2022 08:06:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 512AE6B0072; Tue, 2 Aug 2022 08:06:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 390C78E0001; Tue, 2 Aug 2022 08:06:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 23DBC6B0071 for ; Tue, 2 Aug 2022 08:06:45 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id DE5A71411EC for ; Tue, 2 Aug 2022 12:06:44 +0000 (UTC) X-FDA: 79754525928.17.FEA09E4 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf30.hostedemail.com (Postfix) with ESMTP id 5DE5880119 for ; Tue, 2 Aug 2022 12:06:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1659442003; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CYEw35vKmDpSrFr4YkYnoue54oWP7di2VQu03Eq7wE0=; b=UE9PyPKFE5JMQkLUgZIM9xEThi//tvan3K4Qj8xUuNdoB6u3z1+ErDkLameEYg7KVEqypw G+QMlyVb7rfca5VHMSryiebPM40YG/7O/dqVh9gi3MSAUEeebt0ILuujE2Sc8WUOU2D4YI tJZSt1DpfAMfgpsuSAbF+r8S1Ye0SZs= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-658-Hso3sAYgM6qAxs2HXMhA1w-1; Tue, 02 Aug 2022 08:06:40 -0400 X-MC-Unique: Hso3sAYgM6qAxs2HXMhA1w-1 Received: by mail-wm1-f69.google.com with SMTP id p2-20020a05600c1d8200b003a3262d9c51so9419784wms.6 for ; Tue, 02 Aug 2022 05:06:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=CYEw35vKmDpSrFr4YkYnoue54oWP7di2VQu03Eq7wE0=; b=wW4g5s09aHMdeb+bMT8u1qxZssDM7Zfyfa/PpuDsZfllGZrZv5z16A9Xw8GysXWULw GWaBD8o28maXripPQDRx4KrZi7NbKmMXa+PjHmtEqv31Eq51Lko6NEIaafFy6XRmxhnL iC9rXw8pf9PhkQBnk8TqeAQeZoSsRu7ht5MS04p19Wr6q8e54yPUZ2ThwW2xrOkzrI46 hDxCExjL+/T/zb6fwl/z3TmImiRfdyOmbfiXXiwLM6GqMxLCGjfEP5Bj2E1XxmucDQSL 0s+kZtsVp81Ei39KgWgHyLTsl0HSaKrRvCYFzx4QFR5TFueyEbOZmRMvLMvHJolJeBVN WNiA== X-Gm-Message-State: AJIora8ezGipLDPNzQBVNF7Db6hTqPVjB0+ymjhZIZIIM/30IJDmdZo0 dcqOaaQ7iT6ghuGaWfcqM/DRbuJXtiMIAnq2m2gswizke5JwCnkZi2WNCioamIPXf+GSyoAv4gX U38iBNujwdMg= X-Received: by 2002:a05:600c:1c26:b0:3a3:2251:c3cb with SMTP id j38-20020a05600c1c2600b003a32251c3cbmr13925740wms.126.1659441999298; Tue, 02 Aug 2022 05:06:39 -0700 (PDT) X-Google-Smtp-Source: AGRyM1sG8ZI6iLhu+8yup8qcR2zbPYbaWcNlL4fEYHya3FIYj+hl/zDMrPBO7Oe86fv8d82ObX2faw== X-Received: by 2002:a05:600c:1c26:b0:3a3:2251:c3cb with SMTP id j38-20020a05600c1c2600b003a32251c3cbmr13925707wms.126.1659441998919; Tue, 02 Aug 2022 05:06:38 -0700 (PDT) Received: from ?IPV6:2003:cb:c707:3800:8435:659e:f80:9b3d? (p200300cbc70738008435659e0f809b3d.dip0.t-ipconnect.de. [2003:cb:c707:3800:8435:659e:f80:9b3d]) by smtp.gmail.com with ESMTPSA id t15-20020adfe10f000000b0022062459ce5sm7199853wrz.30.2022.08.02.05.06.38 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 02 Aug 2022 05:06:38 -0700 (PDT) Message-ID: <49434bea-3862-1052-2993-8ccad985708b@redhat.com> Date: Tue, 2 Aug 2022 14:06:37 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 To: Peter Xu Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , Andrea Arcangeli , Andrew Morton , "Kirill A . Shutemov" , Nadav Amit , Hugh Dickins , Vlastimil Babka References: <20220729014041.21292-1-peterx@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH RFC 0/4] mm: Remember young bit for migration entries In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UE9PyPKF; spf=pass (imf30.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659442004; a=rsa-sha256; cv=none; b=J6Ud//y+UYp05YSxmzRHXQFV7DB7x/2wqSy5KGXMlAw5IiTtDOBq8ljhQkrsbtVXIXZcEU SWfySv2FE52WSzHNQNoIVHwi6Bwc5iU1tOe0cU96deioxKCxa44O9KN2+5eoQvr847O+n2 D1g41NAa0J/0owUlpWBVHLAIrsjfbxQ= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659442004; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CYEw35vKmDpSrFr4YkYnoue54oWP7di2VQu03Eq7wE0=; b=EADxx1Wks2YQRwvUJ2Zdv8Iv2tYsjXjST/2fw+r79tyhRWvtNHRPk+lyT46isNaVJgKSOT QCUFrracx00AJrV2LjjeUTjzHNEkAX2sumHUMs8kNwcXXnZRlxHH8PyzoYZKBa+oKPrcx7 l4IHLRhKr5ys+FDz2vbLdb09cfevuzw= X-Stat-Signature: k4ibm3qifjw3395jin3pbyz61us1nfua X-Rspamd-Queue-Id: 5DE5880119 X-Rspam-User: Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UE9PyPKF; spf=pass (imf30.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam04 X-HE-Tag: 1659442004-393152 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 02.08.22 00:35, Peter Xu wrote: > On Mon, Aug 01, 2022 at 10:21:32AM +0200, David Hildenbrand wrote: >> On 29.07.22 03:40, Peter Xu wrote: >>> [Marking as RFC; only x86 is supported for now, plan to add a few more >>> archs when there's a formal version] >>> >>> Problem >>> ======= >>> >>> When migrate a page, right now we always mark the migrated page as old. >>> The reason could be that we don't really know whether the page is hot or >>> cold, so we could have taken it a default negative assuming that's safer. >>> >>> However that could lead to at least two problems: >>> >>> (1) We lost the real hot/cold information while we could have persisted. >>> That information shouldn't change even if the backing page is changed >>> after the migration, >>> >>> (2) There can be always extra overhead on the immediate next access to >>> any migrated page, because hardware MMU needs cycles to set the young >>> bit again (as long as the MMU supports). >>> >>> Many of the recent upstream works showed that (2) is not something trivial >>> and actually very measurable. In my test case, reading 1G chunk of memory >>> - jumping in page size intervals - could take 99ms just because of the >>> extra setting on the young bit on a generic x86_64 system, comparing to 4ms >>> if young set. >>> >>> This issue is originally reported by Andrea Arcangeli. >>> >>> Solution >>> ======== >>> >>> To solve this problem, this patchset tries to remember the young bit in the >>> migration entries and carry it over when recovering the ptes. >>> >>> We have the chance to do so because in many systems the swap offset is not >>> really fully used. Migration entries use swp offset to store PFN only, >>> while the PFN is normally not as large as swp offset and normally smaller. >>> It means we do have some free bits in swp offset that we can use to store >>> things like young, and that's how this series tried to approach this >>> problem. >>> >>> One tricky thing here is even though we're embedding the information into >>> swap entry which seems to be a very generic data structure, the number of >>> bits that are free is still arch dependent. Not only because the size of >>> swp_entry_t differs, but also due to the different layouts of swap ptes on >>> different archs. >>> >>> Here, this series requires specific arch to define an extra macro called >>> __ARCH_SWP_OFFSET_BITS represents the size of swp offset. With this >>> information, the swap logic can know whether there's extra bits to use, >>> then it'll remember the young bits when possible. By default, it'll keep >>> the old behavior of keeping all migrated pages cold. >>> >> >> >> I played with a similar idea when working on pte_swp_exclusive() but >> gave up, because it ended up looking too hacky. Looking at patch #2, I >> get the same feeling again. Kind of hacky. > > Could you explain what's the "hacky" part you mentioned? SWP_PFN_OFFSET_FREE_BITS :) It's a PFN offset and we're mangling in random other bits. That's hacky IMHO. I played with the idea of converting all code to store bits in addition to the type + offset. But that requires digging through a lot of arch code to teach that code about additional flags, so I discarded that idea when working on the COW fixes. > > I used swap entry to avoid per-arch operations. I failed to figure out a > common way to know swp offset length myself so unluckily in this RFC I > still needed one macro per-arch. Ying's suggestion seems to be a good fit > here to me to remove the last arch-specific dependency. Instead of mangling this into the PFN offset and let the arch tell you which bits of the PFN offset are unused ... rather remove the bits from the offset and define them manually to have a certain meaning. That's exactly how pte_swp_mkexclusive/pte_swp_exclusive/ is supposed to be handled on architectures that want to support it. I hope I could make it clearer what the hacky part is IMHO :) > >> >> >> If we mostly only care about x86_64, and it's a performance improvement >> after all, why not simply do it like >> pte_swp_mkexclusive/pte_swp_exclusive/ ... and reuse a spare PTE bit? > > Page migration works for most archs, I want to have it work for all archs > that can easily benefit from it. Yet we only care about x86-64 IIUC regarding performance, just the way the dirty bit is handled? > > Besides I actually have a question on the anon exclusive bit in the swap > pte: since we have that anyway, why we need a specific migration type for > anon exclusive pages? Can it be simply read migration entries with anon > exclusive bit set? Not before all arch support pte_swp_mkexclusive/pte_swp_exclusive/. As pte_swp_mkexclusive/pte_swp_exclusive/ only applies to actual swap PTEs, you could even reuse that bit for migration entries and get at alteast the most relevant 64bit architectures supported easily. -- Thanks, David / dhildenb