From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4F7CC4338F for ; Wed, 18 Aug 2021 18:13:55 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 53F7F60E09 for ; Wed, 18 Aug 2021 18:13:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 53F7F60E09 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id D4D226B006C; Wed, 18 Aug 2021 14:13:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CFD538D0001; Wed, 18 Aug 2021 14:13:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC4196B0072; Wed, 18 Aug 2021 14:13:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0196.hostedemail.com [216.40.44.196]) by kanga.kvack.org (Postfix) with ESMTP id A0A106B006C for ; Wed, 18 Aug 2021 14:13:54 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 4170C18027984 for ; Wed, 18 Aug 2021 18:13:54 +0000 (UTC) X-FDA: 78488999988.16.35D6BD8 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf07.hostedemail.com (Postfix) with ESMTP id E03AD10063D3 for ; Wed, 18 Aug 2021 18:13:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1629310433; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hDmlm5XVVLxGFrZ0SHbAErAGGO55744SvkIuzRpj4cY=; b=I4VtbJ3cti635we2bPZeXv6BNYs47ZECrymh29NoBdkMX4UdIZVPmBXtWKr1V94Xy+7vUD PTtKSGjL/YYIUWlXIMEu0ckfAboc9qpaI4lMixs4VqA0t9oOoUYkJ4hHHuHfuvqH1EX68b jPKuUqXwLrfNVmjF1HtUCOn5W/zgYI0= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-427-_4ibIV9nN1uy0CperVUWFA-1; Wed, 18 Aug 2021 14:13:51 -0400 X-MC-Unique: _4ibIV9nN1uy0CperVUWFA-1 Received: by mail-wr1-f69.google.com with SMTP id k15-20020a5d628f0000b029015501bab520so843754wru.16 for ; Wed, 18 Aug 2021 11:13:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:cc:references:from:organization:subject :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=hDmlm5XVVLxGFrZ0SHbAErAGGO55744SvkIuzRpj4cY=; b=egGmBpRa177MZtYsScUfPq4N5Dw0zncNhNz5i3UdFc+IJfpJf7sDDuur0ju1wKvzCx ewjPcNUU+tR70k/0joxQSK4APhG5psNi9KVwLJD8LreE9NnEZIRGzd8ML35D77UOmwUu hTV7qFGUdkgDh6/Yl7Fradxwh0JJinSpN8wjpz2JuhsAmaMevVB320QidygsYrr33DGz 3DzPtRYFByX6uXGNSkuJJYX8mRz77kr3k1r8Xi3T2Q5R4RYzfNLuoBLWiMMq6i7V0NTq QFqP/o4zRnJpewdcCCwPh6Tm9Yhs0hH5CBCYQouR+czm/l1teveaYxQhTiIrqLGdLQUU u5hQ== X-Gm-Message-State: AOAM530A/+DAwp5AflN5GFeHtteuJUHrRe0WaMNx3vZKR/GROqiRYdqf oQj0Q7fegxhl6Di0qSAw0qbZVjfDaq13GFmdVBdkV4sHeuOY787aOBqxlZATgiaX0/G17ec8UVL B7CnwKtiyaDI= X-Received: by 2002:a5d:4a50:: with SMTP id v16mr12275089wrs.77.1629310430005; Wed, 18 Aug 2021 11:13:50 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzKdo/0UHPZLbnD5FqI3dueNBGosCAGASSIRZZhsa7DQXAJHhT1AtQiS6uzeLDF30v3JtuWTA== X-Received: by 2002:a5d:4a50:: with SMTP id v16mr12275046wrs.77.1629310429743; Wed, 18 Aug 2021 11:13:49 -0700 (PDT) Received: from [192.168.3.132] (p5b0c6417.dip0.t-ipconnect.de. [91.12.100.23]) by smtp.gmail.com with ESMTPSA id l7sm486862wmj.9.2021.08.18.11.13.48 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 18 Aug 2021 11:13:49 -0700 (PDT) To: Tiberiu Georgescu Cc: Peter Xu , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , Alistair Popple , Ivan Teterevkov , Mike Rapoport , Hugh Dickins , Matthew Wilcox , Andrea Arcangeli , "Kirill A . Shutemov" , Andrew Morton , Mike Kravetz , "Carl Waldspurger [C]" , Florian Schmidt , Jonathan Davies References: <20210807032521.7591-1-peterx@redhat.com> <16a765e7-c2a3-982a-e585-c04067766e3f@redhat.com> <7F645772-1212-4F0D-88AF-2569D5BBC2CD@nutanix.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH RFC 0/4] mm: Enable PM_SWAP for shmem with PTE_MARKER Message-ID: <6ab58270-c487-2a56-b522-ea5100edb13c@redhat.com> Date: Wed, 18 Aug 2021 20:13:47 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <7F645772-1212-4F0D-88AF-2569D5BBC2CD@nutanix.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: E03AD10063D3 X-Stat-Signature: h7yywt9ossrpsz9kaxdbpgpsysimkzch Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=I4VtbJ3c; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf07.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=david@redhat.com X-HE-Tag: 1629310433-112101 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: >> >>> I'm now wondering whether for Tiberiu's case mincore() can also be us= ed. It >>> should just still be a bit slow because it'll look up the cache too, = but it >>> should work similarly like the original proposal. >=20 > I am afraid that the information returned by mincore is a little too va= gue to be of better help, compared to what the pagemap should provide in = theory. I will have a look to see whether lseek on > proc/map_files works as a "PM_SWAP" equivalent. However, the swap offse= t would still be missing. Well, with mincore() you could at least decide "page is present" vs.=20 "page is swapped or not existent". At least for making pageout decisions=20 it shouldn't really matter, no? madvise(MADV_PAGEOUT) on a hole is a nop. But I'm not 100% sure what exactly your use case is here and what you=20 would really need, so you know best :) >> >> Very right, maybe we can just avoid tampering with pagemap on shmem co= mpletely (which sounds like an excellent idea to me) and document it as "= On shared memory, we will never indicate SWAPPED if the pages have been s= wapped out. Further, PRESENT might be under-indicated: if a shared page i= s currently not mapped into the page table of a process.". I saw there wa= s a related, proposed doc update, maybe we can finetune that. >> > We could take into consideration an alternative approach to retrieving = the shared page info in user > space, like storing it in sys/fs instead of per process. However, just = leaving the pagemap functionality > incomplete, and not providing an alternative to retrieve the missing in= formation, does not seem right. Updating the docs with a "can't do" shoul= d be temporary, until an alternative or fix. >=20 As I stated before, making pagemap less broken is not a good idea IMHO.=20 Either make it really correct or just leave it all broken -- and=20 document that e.g., other interfaces (lseek) shall be used. It sounds=20 like they exist and are good enough for CRUI. And TBH, if other interfaces already exist and get the job done, I'm=20 more than happy that we can avoid mixing more shmem stuff into pagemap=20 and trying to compensate performance problems by introducing inconsistenc= y. If it has an fd and we can punch that into syscalls, we should much=20 rather use that fd to lookup stuff then going via process page tables --=20 if possible of course (to be evaluated, because I haven't looked into=20 the CRIU details and how they use lseek with anonymous shared memory). > Also, I think you are talking about my own doc update patch[3]. If not,= please share a link with your > next reply. >=20 > [3] https://marc.info/?m=3D162878395426774 No, that's it. --=20 Thanks, David / dhildenb