From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7216C43217 for ; Mon, 7 Nov 2022 12:27:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 321976B0071; Mon, 7 Nov 2022 07:27:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2D1506B0072; Mon, 7 Nov 2022 07:27:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1C0146B0073; Mon, 7 Nov 2022 07:27:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 0B04B6B0071 for ; Mon, 7 Nov 2022 07:27:15 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id D659340426 for ; Mon, 7 Nov 2022 12:27:14 +0000 (UTC) X-FDA: 80106571188.14.B5C16F9 Received: from mail-ed1-f47.google.com (mail-ed1-f47.google.com [209.85.208.47]) by imf18.hostedemail.com (Postfix) with ESMTP id 2ECA71C0003 for ; Mon, 7 Nov 2022 12:27:11 +0000 (UTC) Received: by mail-ed1-f47.google.com with SMTP id z18so17260725edb.9 for ; Mon, 07 Nov 2022 04:27:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Spa9i7ZZi/PqL0mkZcklMgIzofmfvMGIgf+WKCkXIMQ=; b=jLvnLgvlxGE5ngnjopyA4a2yysB33if8in8kuEiUGZz0XMRPhT3N3XX2UU0Z02SjJa FE7atSEyVzKKdtrwY0PVb8zYTIeaOgJnpezWILnMIUcd6Q6BvMdB5Blys/Sk+si0wdoi cxJWltzEJODEgrR6eSFlCqNwS9ZFIsMPbmeuGAi4XjatM6S9lPMiOqDUaJXOEPV65I5q 6nYIe3M7IkieIO++K+hxKwvxBHwyObk4oDxX+GtHhKVKU5ifHEuRcpYBBaF/NJWhWQo2 wxtsM5/WVBHfJTmKg1iAMoZWyNIEo94nUZkVDFxVJGHi00fgYNWEYCMAvqPlT6szNE4i pocw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Spa9i7ZZi/PqL0mkZcklMgIzofmfvMGIgf+WKCkXIMQ=; b=EJlVqP/zsDedRqFlr0MhmT3cDGvOl1jcaTwf/Y+KSfy5mQXgNN1SB/eSi11bF4KOTq 180nqMNtVrHuT9w/EhOsflxRfaRAJr2PNVI4eZHnlcZBCMlOp0j7LvWKRclb+3vbFzfy lx6KVl0/U75zxHjMtxggEvW3Jr3BA4fT1M/hbo3Th15PLuJAcgXvlakm8+1dpaZbOFlt ojJ26KsBGGiHYf7Z3+aKh0yzw992IdSPPxdsogxiMDzLMgXOfrqNvRegK/2d6ls0KBx7 wO4KRwXifZYX0GQLjNA0DMtyg4zf36Qbwo6a44RWK32ClihGJFMKtMG6QVPZkdEWFjyS y30g== X-Gm-Message-State: ACrzQf3DU+MILHtUO9kiqtZVQvydPWeG9Mj4vAC+h+8XmQOjx9c5YukE PqpgaFfkJ3r2x0b+I44PCwxLM3gG48PWRh93l5ezcw== X-Google-Smtp-Source: AMsMyM6P4MUoGjkj6W0BuOtOqQhVLj9fCKtgk0YVH9i2+0GMg02EXI0DnuuovMLuo6xJlqXa24LIp4zGDBEoY6Qo63g= X-Received: by 2002:a05:6402:386:b0:463:c43f:6628 with SMTP id o6-20020a056402038600b00463c43f6628mr32856779edv.53.1667824030610; Mon, 07 Nov 2022 04:27:10 -0800 (PST) MIME-Version: 1.0 References: <20221103145353.3049303-1-usama.anjum@collabora.com> <20221103145353.3049303-3-usama.anjum@collabora.com> In-Reply-To: <20221103145353.3049303-3-usama.anjum@collabora.com> From: =?UTF-8?B?TWljaGHFgiBNaXJvc8WCYXc=?= Date: Mon, 7 Nov 2022 13:26:59 +0100 Message-ID: Subject: Re: [PATCH v5 2/3] fs/proc/task_mmu: Implement IOCTL to get and/or the clear info about PTEs To: Muhammad Usama Anjum Cc: Andrei Vagin , Danylo Mocherniuk , Alexander Viro , Andrew Morton , Suren Baghdasaryan , Greg KH , Christian Brauner , Peter Xu , Yang Shi , Vlastimil Babka , "Zach O'Keefe" , "Matthew Wilcox (Oracle)" , "Gustavo A. R. Silva" , Dan Williams , kernel@collabora.com, Gabriel Krisman Bertazi , David Hildenbrand , Peter Enderborg , "open list : KERNEL SELFTEST FRAMEWORK" , Shuah Khan , open list , "open list : PROC FILESYSTEM" , "open list : MEMORY MANAGEMENT" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667824032; a=rsa-sha256; cv=none; b=X6P/18SidRiyLrnCDe8mVW6SDdajAyU/tW3nL4GA3sXVWmuK1if160JU/7+5S9g7odR3nq 6Mo4jLXccfJIkw1IjFkuPmcWowawD4HpVCN0KMKEYL4xndT392i96RIg8KLkcOAL61LWio BjC1evpZB0KTwdiXyiY1G9E1vBSG6CY= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=jLvnLgvl; spf=pass (imf18.hostedemail.com: domain of emmir@google.com designates 209.85.208.47 as permitted sender) smtp.mailfrom=emmir@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1667824032; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Spa9i7ZZi/PqL0mkZcklMgIzofmfvMGIgf+WKCkXIMQ=; b=IEYSewHIofyTSsGjmxj5HJGKfDPuTjsuOTMdP+YHi9x0R7Vv34L9xl7RU3TfXplZwi4Wd6 BZEvPMahr4OyJvezllMRrj5HRKpTiL92G+DpnWkgmgVDSgXxDV+iDoxUEGk5CjvSfyA2MO f+uhBm/gzVCv7bcXjWfuTVNYXtzhRM0= X-Stat-Signature: 46ie6rg5s4ezfbtgz3n9ggtsaezs8ue8 X-Rspam-User: Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=jLvnLgvl; spf=pass (imf18.hostedemail.com: domain of emmir@google.com designates 209.85.208.47 as permitted sender) smtp.mailfrom=emmir@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Queue-Id: 2ECA71C0003 X-Rspamd-Server: rspam09 X-HE-Tag: 1667824031-918571 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, 3 Nov 2022 at 15:54, Muhammad Usama Anjum wrote: > This IOCTL, PAGEMAP_SCAN can be used to get and/or clear the info about > page table entries. The following operations are supported in this ioctl: > - Get the information if the pages are soft-dirty, file mapped, present > or swapped. > - Clear the soft-dirty PTE bit of the pages. > - Get and clear the soft-dirty PTE bit of the pages. > > Only the soft-dirty bit can be read and cleared atomically. struct > pagemap_sd_args is used as the argument of the IOCTL. In this struct: > - The range is specified through start and len. > - The output buffer and size is specified as vec and vec_len. > - The optional maximum requested pages are specified in the max_pages. > - The flags can be specified in the flags field. The PAGEMAP_SD_CLEAR > and PAGEMAP_SD_NO_REUSED_REGIONS are supported. > - The masks are specified in rmask, amask, emask and return_mask. [...] > --- a/include/uapi/linux/fs.h > +++ b/include/uapi/linux/fs.h > @@ -305,4 +305,57 @@ typedef int __bitwise __kernel_rwf_t; > #define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\ > RWF_APPEND) > > +/* PAGEMAP IOCTL */ > +#define PAGEMAP_SCAN _IOWR('f', 16, struct pagemap_scan_arg) > + > +/* Bits are set in the bitmap of the page_region and masks in pagemap_sd= _args */ > +#define PAGE_IS_SD (1 << 0) Can we name it PAGE_IS_SOFTDIRTY? "SD" can mean so many things. > +#define PAGE_IS_FILE (1 << 1) > +#define PAGE_IS_PRESENT (1 << 2) > +#define PAGE_IS_SWAPED (1 << 3) PAGE_IS_SWAPPED? > + > +/* > + * struct page_region - Page region with bitmap flags > + * @start: Start of the region > + * @len: Length of the region > + * bitmap: Bits sets for the region > + */ > +struct page_region { > + __u64 start; > + __u64 len; > + __u32 bitmap; > + __u32 __reserved; "u64 flags"? If an extension is needed it would already require a new ioctl or something in the `arg` struct. > + > +/* > + * struct pagemap_scan_arg - Soft-dirty IOCTL argument Since this is no longer a soft-dirty-specific call, it might be better to describe it as "VM scan ioctl" or similar. BTW, the implementation is currently guarded by CONFIG_MEM_SOFT_DIRTY, but CRIU doesn't need that but needs the other bits handling. > + * @start: Starting address of the region > + * @len: Length of the region (All the pages in this lengt= h are included) > + * @vec: Address of page_region struct array for output > + * @vec_len: Length of the page_region struct array > + * @max_pages: Optional max return pages (It must be less than v= ec_len if specified) I think we discussed that this is not counting the same things as vec_len, so there should not be a reference between the two. The limit is whatever fits under both conditions (IOW: n_vecs <=3D vec_len && (!max_pages || n_pages <=3D max_pages). > + * @flags: Special flags for the IOCTL Just "Flags for the IOCTL". > + * @rmask: Required mask - All of these bits have to be set = in the PTE > + * @amask: Any mask - Any of these bits are set in the PTE > + * @emask: Exclude mask - None of these bits are set in the = PTE It might be easier for developers if those were named e.g. "required_mask", "anyof_mask", "excluded_mask". > + * @return_mask: Bits that have to be reported to the user in page= _region "Bits that are to be reported in page_region"? > + */ > +struct pagemap_scan_arg { > + __u64 start; > + __u64 len; > + __u64 vec; > + __u64 vec_len; > + __u32 max_pages; > + __u32 flags; > + __u32 rmask; > + __u32 amask; > + __u32 emask; > + __u32 return_mask; > +}; > + > +/* Special flags */ > +#define PAGEMAP_SD_CLEAR (1 << 0) SD -> SOFTDIRTY > +/* Check the individual pages if they are soft-dirty to find dirty pages= faster. */ > +#define PAGEMAP_NO_REUSED_REGIONS (1 << 1) Please include the description from commitmsg of what this flag does (i.e. how the behaviour differs because of the flag). I'd drop the part about it being faster, as if so - why have the flag at all instead of just always using the faster way? (I only reviewed the API now. The implementation I think could be simpler, but let's leave that to after the API is agreed on.) Best Regards Micha=C5=82 Miros=C5=82aw