From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1CE3C77B7F for ; Mon, 8 May 2023 20:40:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EF49B6B007E; Mon, 8 May 2023 16:40:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EA51C900003; Mon, 8 May 2023 16:40:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D936D900002; Mon, 8 May 2023 16:40:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id CA59F6B007E for ; Mon, 8 May 2023 16:40:11 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 632EF408EB for ; Mon, 8 May 2023 20:40:11 +0000 (UTC) X-FDA: 80768255022.16.31D3CBD Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf04.hostedemail.com (Postfix) with ESMTP id 9B7E040005 for ; Mon, 8 May 2023 20:40:09 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=HGgJ4D5c; spf=pass (imf04.hostedemail.com: domain of 3KF5ZZAsKCHMRTbVicVpkeXXffXcV.TfdcZelo-ddbmRTb.fiX@flex--ackerleytng.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3KF5ZZAsKCHMRTbVicVpkeXXffXcV.TfdcZelo-ddbmRTb.fiX@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1683578409; a=rsa-sha256; cv=none; b=GDZ+zrRgEZFIIybKpUK97AssoBjmqEDNCHHw3FAldLSlIBOXqroKyxWnfRKsqhUU4yPahG F7GQvMN/3g7yWw4bqml6MYAiOyq2WtstC0oW0i7RprDLM20JlvuxokggJ/974VEau8y/cl yKmqcslDqYb4GUr1ryEmlbU4m9oyXDg= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=HGgJ4D5c; spf=pass (imf04.hostedemail.com: domain of 3KF5ZZAsKCHMRTbVicVpkeXXffXcV.TfdcZelo-ddbmRTb.fiX@flex--ackerleytng.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3KF5ZZAsKCHMRTbVicVpkeXXffXcV.TfdcZelo-ddbmRTb.fiX@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1683578409; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:dkim-signature; bh=Zp9MF4afK7LOvc1F2U5JVnBUybefUQdIksarawJ2o+k=; b=iteA7kx7Rx7w6Z8+g7+UeiLILEUcy0oKlA5Uwx4V3Le2Td6ThKz8fJ/vaB9h88X/6lyJUI 3lJtcKKWLyyrBbia/rzoLKZQ5UtuY7trApIbDRADABAh1sWZZUnCvzwZZFzl4JXp2DdVhZ Nhuh07Yi1QZV6qko0MGyz7pbuvsD0Gw= Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-b9a6f15287eso36655719276.1 for ; Mon, 08 May 2023 13:40:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1683578408; x=1686170408; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date:from:to :cc:subject:date:message-id:reply-to; bh=Zp9MF4afK7LOvc1F2U5JVnBUybefUQdIksarawJ2o+k=; b=HGgJ4D5cDpYlkabNRd1PNopWqWAJ436/Ae803Xn1AOz7pwbztYV2/Tl/mrT6SBh+pI j0RI67jevmzkktR8gSWIFEXFTUvvw4K866R2Uthrqz2dQYNtIqGCOeviEQkYWBo7KlJq 7/+BmWM5owKUlDUm1b0FDW9Z989glOz5nPUAxilSrGyENKKqR3ZOwHB/xzJBH+z5oA5P OpW2NiJLSJAJ9S+ltwAfV4afVKO1RQ9qRwTrSi4hysguEfpj4czRiRvBrjrbcVX6mFVI IKQjm0+TMHIsQklQddq0AlyGV5fN99oewNxdUDPoBQFiJ0Hzwa4SsVAz16gjjJghKrDV JZpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683578408; x=1686170408; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Zp9MF4afK7LOvc1F2U5JVnBUybefUQdIksarawJ2o+k=; b=hdLh8p1n+C6l/5vOl3b8zTbPyByKEIdTRLrXNgHx2olSaMcOYgWp3I5STVSWERQCJt gNKGk/MLO/cW1X8JzHM0reVViTYOBbHEkKC7lSC3rrQH9XPvcAdrZGo0GoLp3Lq0i13F CuTyMNa1DxLNXm13d9rZOxwNx5YIykgDlmTknoNFo81fJZpqW1GSmL6nUmO5gZ9kRO1B FnXuXVk1ykm7AdyKyoQpZXL8Kb3G4UjB1VUaofHkKu2MLQe4elfpDZNEfbWfLb8kb4wi he5CEuyH9GT2FxE3qKlsvcsNVVuJfEqXdhG8vApZJ0bBisOXlv/0M9eoP611LncKFV/t 0toA== X-Gm-Message-State: AC+VfDyB2IEgoNKWNe4axmZU41Tst2i0ke+/QBeT0WmsEKVJHFotmtsf 71rnDT3B3nfaK4x5/nx2ZpzduBsFQzC/R35RpQ== X-Google-Smtp-Source: ACHHUZ6UCLfD1sTZmb/EWhwwc/INAhyN/1c7ic2WE25k5j68yRheUtzOOBb/+fOVm4BYPuVNYnr1CfTSC+f5iScc5w== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a05:6902:990:b0:b8f:5b11:6d6c with SMTP id bv16-20020a056902099000b00b8f5b116d6cmr7998217ybb.1.1683578408608; Mon, 08 May 2023 13:40:08 -0700 (PDT) Date: Mon, 08 May 2023 20:40:07 +0000 In-Reply-To: (message from Ackerley Tng on Mon, 08 May 2023 16:29:29 +0000) Mime-Version: 1.0 Message-ID: Subject: Re: [PATCH 2/2] fs: hugetlbfs: Fix logic to skip allocation on hit in page cache From: Ackerley Tng To: Ackerley Tng Cc: mike.kravetz@oracle.com, willy@infradead.org, sidhartha.kumar@oracle.com, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, muchun.song@linux.dev, jhubbard@nvidia.com, vannapurve@google.com, erdemaktas@google.com, rientjes@google.com Content-Type: text/plain; charset="UTF-8"; format=flowed; delsp=yes X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 9B7E040005 X-Stat-Signature: mq4nbgxymrq8h57583qcefn4iu6m8dzm X-HE-Tag: 1683578409-631171 X-HE-Meta: U2FsdGVkX18KaKf8MvDSJKht5eN4GSchdkVylNO5q8TdgFRko3iHJeeH6WOw4OHUPa91R+CiYZdDuHeY3xCHm1VTVja9ay5y6wkQ2LyoEzi+NWEXHHkjvRMkHSPtq/tX66fqWRWb00RtWODG/UIlfNbXbKDgPheenlW6R9b8sRchtGOr/iDhWSV99CqKreN+LxrClThAz205RcREnkUMVX8iJpPT3UA05XXKTXAjC4Vl5k4EhAwliwm9mQustM164tX7PcfaUzVUgjV7AIl+yhqVECWr/2oFDn1fF8lxhKu3HJFq7evalnxMv5PeclzBLALW6VEIxwnd/ilhVshsvUWry2fk89Xd392UWuhFyOUhncmaFam0GR90j4DWBfK54T79DBb5946+3YzXGz4r1dnKh5nvy/eFBu82R5hiJN3LymUvePGSnicRnMjpivQnr8QCbQlLX/4wvEOWtgxqbtIA4bZuHB+yrTb51Y872lS/JTqfKAbaqG8g5XDZBF/2op/2MnJejXQoL8wa8pPrG5eBZ5AYpaC60vhtRv9UEx/5SQjeTIYZE+LpSk58pN25rPXBuy8uPz8dGC+0f78880m3Vn4aXDFJp3g5lGxNF4v73BM9ie9HTpCkOX6/FKS2c7BxUFHr+WwXZyqc/GrcATn38nMEb374epc2AVkcStS08pDD6GqqkjH+E1RNJUPjvhtku79w+RNtDqGgWNWRJ0oDHkvAZAAPrUBRV8WQ5Ajs0dZX6qwWEyQdcY6/tBUpdw6O/KLx0ukkteyRa25k+aafy8Hq29ERwsb5Ecjlt/KTz2LYI3m0o5iCKjkpMPi26n5lnx9E7Ex3UDJqR4LkjKbsJyXxQ8TrdVro/nD+m0ERR7AsdcOu89xfr4ZalaL3FfKcrCc/3vyOSBcT3YUi38cb86cezQS6wbUtO9bNbfZDB/fPiojV9IrzgPuRjowngtu8rqDExdrLBkaiVgy zCjC2E0M XECgv6T9cE6iQc8vLHXMCBE3Z1LBJLcM67L+gtlWYh8Zzvx2wELO17sQZfuBgUmEUPIz72PsRHXECpfRXfH3ELzNjsNRJMfQHvLH8cxmpR8QS+jPD18HTCa518IMZfkXVQ+MqL+t6M94nLQ/ND62SiCHmjmcFi/4HcY3pqYCIOPDHeiNTjNg4GU/0qEIsd1ijfCZ3PwOGJvY+cC0jlNNAjM75UkZxz2q/76CBerzws2ddP4fiJb+9aV9XylzlFyP3KtqvfE7QUAxXAa93fOGCbcGszqz4IYkighkZyAxyx6utPoBosix+kXVEDDxaz6pq+8AdgQtmyXHP7qzOaN1wQ3yfvSFsxa0+h2cXSCE9ZkFeXfFWiXRB5koAsNKwZniH+cEOY/1v83okrfWZ0Y44x6N443xiH7KNLjpiTkSjeSjN+8vaPkKwphOpPb+0uOjSMxjq8dJGYjuDio7Fe12uM9FoJZTj5nyxAzH5RSoozSktqh+B4czWfyJ2wmWV337xj1Asce3oY4YRLm8AoMyImMhz8W95qUymhvxU0Ke/jqNWLCTbP4pNhF7xENlUF2OrnaqBqV0qZxmTen8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: I figured it out, the bug is still in the use of page_cache_next_miss(), but my earlier suggestion that xas_next always advances the pointer is wrong. Mike is right in that xas_next() is effectively xas_load() for the first call after an XA_STATE() initialiation. However, when max_scan is set to 1, xas.xa_index has no chance to be advanced since the loop condition while(max_scan--) terminates the loop after 1 iteration. Hence, after loading happens with xas_next() regardless of the checks within the loop (!entry, or xa_is_value(entry), etc), xa.xas_index is not advanced, and the index returned always == the index passed in to page_cache_next_miss(). Hence, in hugetlb_fallocate(), it always appears to be a page cache miss. Here's code from a selftest that can be added to lib/test_xarray.c: /* Modified from page_cache_next_miss() */ static unsigned long xa_next_empty(struct xarray *xa, unsigned long start, unsigned long max_scan) { XA_STATE(xas, xa, start); while (max_scan--) { void *entry = xas_next(&xas); if (!entry) { printk("entry not present"); break; } if (xa_is_value(entry)) { printk("xa_is_value instead of pointer"); } if (xas.xa_index == 0) { printk("wraparound"); break; } } if (max_scan == -1) printk("exceeded max_scan"); return xas.xa_index; } /* Replace this function in lib/test_xarray.c to run */ static noinline void check_find(struct xarray *xa) { unsigned long max_scan; xa_init(&xa); xa_store_range(&xa, 3, 5, malloc(10), GFP_KERNEL); max_scan = 1; for (int i = 1; i < 8; i++) printk(" => xa_next_empty(xa, %d, %ld): %ld\n", i, max_scan, xa_next_empty(&xa, i, max_scan)); printk("\n"); max_scan = 2; for (int i = 1; i < 8; i++) printk(" => xa_next_empty(xa, %d, %ld): %ld\n", i, max_scan, xa_next_empty(&xa, i, max_scan)); } Result: entry not present => xa_next_empty(xa, 1, 1): 1 entry not present => xa_next_empty(xa, 2, 1): 2 exceeded max_scan => xa_next_empty(xa, 3, 1): 3 exceeded max_scan => xa_next_empty(xa, 4, 1): 4 exceeded max_scan => xa_next_empty(xa, 5, 1): 5 entry not present => xa_next_empty(xa, 6, 1): 6 entry not present => xa_next_empty(xa, 7, 1): 7 entry not present => xa_next_empty(xa, 1, 2): 1 entry not present => xa_next_empty(xa, 2, 2): 2 exceeded max_scan => xa_next_empty(xa, 3, 2): 4 exceeded max_scan => xa_next_empty(xa, 4, 2): 5 exceeded max_scan => xa_next_empty(xa, 5, 2): 6 entry not present => xa_next_empty(xa, 6, 2): 6 entry not present => xa_next_empty(xa, 7, 2): 7 Since the xarray was set up with pointers in indices 3, 4 and 5, we expect xa_next_empty() or page_cache_next_miss() to return the next index (4, 5 and 6 respectively), but when used with a max_scan of 1, we just get the index passed in. While max_scan could be increased to fix this bug, I feel that having a separate function like filemap_has_folio() makes the intent more explicit and is less reliant on internal values of struct xa_state. xas.xa_index could take other values to indicate wraparound or sibling nodes and I think it is better to use a higher level abstraction like xa_load() (used in filemap_has_folio()). In addition xa_load() also takes the locks it needs, which helps :). I could refactor the other usage of page_cache_next_miss() in hugetlbfs_pagecache_present() if you'd like! On this note, the docstring of page_cache_next_miss() is also inaccurate. The return value is not always outside the range specified if max_scan is too small.