From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56AA2C433F5 for ; Wed, 12 Jan 2022 11:38:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AD8476B0148; Wed, 12 Jan 2022 06:38:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A89326B0149; Wed, 12 Jan 2022 06:38:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 976C46B014A; Wed, 12 Jan 2022 06:38:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0190.hostedemail.com [216.40.44.190]) by kanga.kvack.org (Postfix) with ESMTP id 885F76B0148 for ; Wed, 12 Jan 2022 06:38:35 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 3542F1803032F for ; Wed, 12 Jan 2022 11:38:35 +0000 (UTC) X-FDA: 79021437390.19.824949C Received: from mail-ua1-f53.google.com (mail-ua1-f53.google.com [209.85.222.53]) by imf31.hostedemail.com (Postfix) with ESMTP id D6D3420012 for ; Wed, 12 Jan 2022 11:38:34 +0000 (UTC) Received: by mail-ua1-f53.google.com with SMTP id u6so4261085uaq.0 for ; Wed, 12 Jan 2022 03:38:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=DEEfnie0vnnwjJJqvtN1m52MNlAVFRLyqxTljyILzzU=; b=ELzMipdt0s8Dg3jkR89PaA4vJLjw0vvDktOsF0rEIB6Dqwm5hUOvDxSWbSMOkm7pao JSrYniF4ldRtdGhNUWk4gb4Tr3/2qvj2xX1ywe58nA/6b+SAK3pPCKtUaVj2d2waR/Dk 6gkpVSIhzyYGxCAfzIAoS2teAK1zDadRItpatbGHXKHnWz0dnHryK+GbTNR//alrEf/o hswJzmJYkw4+FCgytrnAdZ+oMLehle4nGkINTSrt+FkvWm23dFVH9+01POpMdEy/bImf ydrVyYlGFmKPnoMKBcV0CPaxIZmdOKeaHEwhZUKmb74uCWHXBJOCCy3xJsdN8rtOMNHJ KGUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=DEEfnie0vnnwjJJqvtN1m52MNlAVFRLyqxTljyILzzU=; b=lfUdpw4ldTOaYewTsgI+7ZMZ1D2n1mi7EhQ5zBPvWrbeq1wuiEBgKLbTIKDf1E71Y/ 9ukgXeE3BpHgtdpkGlX1q1phKz78Ov/4C0e+jGvL6J0n8Brt4tkFP+KO6laznCId4CLP qtcsr7CCIv1YXM3zhZBPyYe2uN4Eo5wv2tKdKaz0eSRfFm69O58OA70F6cU7iLC6GBZT i58r0gn6zPS26z2RMpU3O88fKf/+AyShN4IfxFTOndvNV0wLUVWKyna+QyDUE3AUZAc1 Qhk0P8dDeNRnxndwraQ6ALxX+ofBaNE7QahetWZz4Drhs9zIBkW44gffMocotr1H1G8q /4Dw== X-Gm-Message-State: AOAM532wUtiPs7HpyUpzzml9g1n/HY1U+uhVkEmDQLaJNW0QzmOVRS1q INnBBHXpi1aNpd+cG8gcZtrFWneTXCk7Om+Q8dw= X-Google-Smtp-Source: ABdhPJwTLbCDe+A/QJBTG6EEDh7AA2NWzOiY4c8fw5spWCrI7u88VSJ9N0ZCc/iuculqL20fpyRLhMNdG7uE+k0Mdb4= X-Received: by 2002:a05:6102:2923:: with SMTP id cz35mr3836997vsb.25.1641987514159; Wed, 12 Jan 2022 03:38:34 -0800 (PST) MIME-Version: 1.0 References: <1641488717-13865-1-git-send-email-quic_charante@quicinc.com> In-Reply-To: From: Mark Hemment Date: Wed, 12 Jan 2022 11:38:23 +0000 Message-ID: Subject: Re: [PATCH v3 RESEND] mm: shmem: implement POSIX_FADV_[WILL|DONT]NEED for shmem To: Charan Teja Kalla Cc: Hugh Dickins , Andrew Morton , "Matthew Wilcox (Oracle)" , vbabka@suse.cz, rientjes@google.com, mhocko@suse.com, Suren Baghdasaryan , Shakeel Butt , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Charan Teja Reddy Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: D6D3420012 X-Stat-Signature: hzxfkptyf4y8d6asnfo7ypyjgo7513n3 Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=googlemail.com header.s=20210112 header.b=ELzMipdt; dmarc=pass (policy=quarantine) header.from=googlemail.com; spf=pass (imf31.hostedemail.com: domain of markhemm@googlemail.com designates 209.85.222.53 as permitted sender) smtp.mailfrom=markhemm@googlemail.com X-HE-Tag: 1641987514-120342 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 10 Jan 2022 at 15:14, Charan Teja Kalla wrote: > > Thanks again Mark for the review comments!! > > On 1/10/2022 6:06 PM, Mark Hemment wrote: > > On Thu, 6 Jan 2022 at 17:06, Charan Teja Reddy > > wrote: > >> > >> From: Charan Teja Reddy > >> > >> Currently fadvise(2) is supported only for the files that doesn't > >> associated with noop_backing_dev_info thus for the files, like shmem, > >> fadvise results into NOP. But then there is file_operations->fadvise() > >> that lets the file systems to implement their own fadvise > >> implementation. Use this support to implement some of the POSIX_FADV_XXX > >> functionality for shmem files. > > ... > >> +static void shmem_isolate_pages_range(struct address_space *mapping, loff_t start, > >> + loff_t end, struct list_head *list) > >> +{ > >> + XA_STATE(xas, &mapping->i_pages, start); > >> + struct page *page; > >> + > >> + rcu_read_lock(); > >> + xas_for_each(&xas, page, end) { > >> + if (xas_retry(&xas, page)) > >> + continue; > >> + if (xa_is_value(page)) > >> + continue; > >> + if (!get_page_unless_zero(page)) > >> + continue; > >> + if (isolate_lru_page(page)) > >> + continue; > > > > Need to unwind the get_page on failure to isolate. > > Will be done. > > > > > Should PageUnevicitable() pages (SHM_LOCK) be skipped? > > (That is, does SHM_LOCK override DONTNEED?) > > > Should be skipped. Will be done. > > > > > ... > >> +static int shmem_fadvise_dontneed(struct address_space *mapping, loff_t start, > >> + loff_t end) > >> +{ > >> + int ret; > >> + struct page *page; > >> + LIST_HEAD(list); > >> + struct writeback_control wbc = { > >> + .sync_mode = WB_SYNC_NONE, > >> + .nr_to_write = LONG_MAX, > >> + .range_start = 0, > >> + .range_end = LLONG_MAX, > >> + .for_reclaim = 1, > >> + }; > >> + > >> + if (!shmem_mapping(mapping)) > >> + return -EINVAL; > >> + > >> + if (!total_swap_pages) > >> + return 0; > >> + > >> + lru_add_drain(); > >> + shmem_isolate_pages_range(mapping, start, end, &list); > >> + > >> + while (!list_empty(&list)) { > >> + page = lru_to_page(&list); > >> + list_del(&page->lru); > >> + if (page_mapped(page)) > >> + goto keep; > >> + if (!trylock_page(page)) > >> + goto keep; > >> + if (unlikely(PageTransHuge(page))) { > >> + if (split_huge_page_to_list(page, &list)) > >> + goto keep; > >> + } > > > > I don't know the shmem code and the lifecycle of a shm-page, so > > genuine questions; > > When the try-lock succeeds, should there be a test for PageWriteback() > > (page skipped if true)? Also, does page->mapping need to be tested > > for NULL to prevent races with deletion from the page-cache? > > I failed to envisage it. I should have considered both these conditions > here. BTW, I am just thinking about why we shouldn't use > reclaim_pages(page_list) function here with an extra set_page_dirty() on > a page that is isolated? It just call the shrink_page_list() where all > these conditions are properly handled. What is your opinion here? Should be possible to use reclaim_pages() (I haven't look closely). It might actually be good to use this function, as will do some congestion throttling. Although it will always try to unmap pages (note: your page_mapped() test is 'unstable' as done without the page locked), so might give behaviour you want to avoid. Note: reclaim_pages() is already used for madvise(PAGEOUT). The shmem code would need to prepare page(s) to help shrink_page_list() to make progress (see madvise.c:madvise_cold_or_pageout_pte_range()). Taking a step back; is fadvise(DONTNEED) really needed/wanted? Yes, you gave a usecase (which I cut from this thread in my earlier reply), but I'm not familiar with various shmem uses to know if this feature is needed. Someone else will need to answer this. Cheers, Mark > > > > > ... > >> + > >> + clear_page_dirty_for_io(page); > >> + SetPageReclaim(page); > >> + ret = shmem_writepage(page, &wbc); > >> + if (ret || PageWriteback(page)) { > >> + if (ret) > >> + unlock_page(page); > >> + goto keep; > >> + } > >> + > >> + if (!PageWriteback(page)) > >> + ClearPageReclaim(page); > >> + > >> + /* > >> + * shmem_writepage() place the page in the swapcache. > >> + * Delete the page from the swapcache and release the > >> + * page. > >> + */ > >> + __mod_node_page_state(page_pgdat(page), > >> + NR_ISOLATED_ANON + page_is_file_lru(page), compound_nr(page)); > >> + lock_page(page); > >> + delete_from_swap_cache(page); > >> + unlock_page(page); > >> + put_page(page); > >> + continue; > >> +keep: > >> + putback_lru_page(page); > >> + __mod_node_page_state(page_pgdat(page), > >> + NR_ISOLATED_ANON + page_is_file_lru(page), compound_nr(page)); > >> + } > > > > The putback_lru_page() drops the last reference hold this code has on > > 'page'. Is it safe to use 'page' after dropping this reference? > > True. Will correct it in the next revision. > > > > > Cheers, > > Mark > >