ksummit.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* Re: [TECH TOPIC] re-think of richACLs in AI/LLM era
@ 2025-09-19  6:16 Coly Li
  0 siblings, 0 replies; 3+ messages in thread
From: Coly Li @ 2025-09-19  6:16 UTC (permalink / raw)
  To: Jan Kara; +Cc: Paul Moore, Randy Dunlap, Steven Rostedt, ksummit

On Wed, Sep 17, 2025 at 09:59:09AM +0800, Jan Kara wrote:
> On Wed 17-09-25 01:12:48, Coly Li wrote:
> > Users store they photos on the system, and the compact AI module processes all
> > their photos and groups all the photos into different categories like pizza,
> > dogs, cats, foods or group photos. After the process done, users may see they
> > photos in different categories that the AI module thinks they should be in. Then
> > users may share the categories with photos to others. If indentical categories
> > shared by different users, the shared photos can be combined all together. And
> > AI module may continue to process the shared photos and generate new categaries
> > from the shared photos, e.g. pizza in the same city, cats and dogs in closed
> > location, group photos contains the most common people, etc. Now the differet
> > categories are implemented by different directories in the publicly shared
> > directory.
> > 
> > In each category directory, photos with a category (or attribution) can be
> > accessed as hard links to the original photo inodes and share the identical
> > inodes. All these category directories are created by the AI module, although
> > the photos are shared from each users. If a user is identified from a group
> > photo, and this user is noticed that the photo is publicly shared. If this user
> > doesn't want his face to be shared in public, for an optinal privacy protection
> > right, this user can remove the hardlink of the photo which his or her face is
> > in, that is he or she can remove the hardlink (dentry) under a publicly shared
> > directory which this user doesn't have write permission. Because this user can
> > be idnetified as owner of his or her face, and the photo has his face in, he or
> > she should have write permission to delete the photo, but no write permission to
> > other photos in same category directioy which his or her face is not in.
> 
> Well, from what you describe I'd say that the category directories should
> just be AI owned rwxrwxrwt dirs (do notice the sticky bit set). This is how
> /tmp/ is usually setup. This means that everybody can read the dir,
> everybody can delete files but only if they are their owner, everybody can
> create files - this is the part you probably don't want but *that* is
> pretty easy to restrict by a LSM (practically any one can do this).

This is quite similar to what we are doing now (self-define rules + ebpf hooks)
but your suggestion might be in a more elegant way.

By the above method, our challenges are,
- Application may treat this behavior as a bug
    Once the write/delete access is denied, user application cann't understand
  why the request was rejected. User space application can check permission bits
  and acl, but cannot check the LSM rules, they cannot understand why all
  permission granted but the write/delete access is rejected.
    Currently in our products it is fine, because all applications are written
  by ourself, we know the access deny is from the security rules voilation. But
  in long term this might be a potential challenge.

- Cannot tell the real permission fastly
    From web UI interface, users can click the mouse right button to check his
  or her permission of this specific file or directory. Our current rules-based
  access control needs to reverse iterate all the rules to determine the final
  permission which the user obtains. It is very slow and inconvenient, and we
  don't have proper method to handle the permission display yet.

- Rules store/load/management
    Crrently all the rules are persisted in data base and loaded into in-kernel
  memory table. The rules can be checked very fast and works fine for relative
  small data set and access rules at this moment. But in worst case maybe each
  sharedfile will have a signle rule for its access control, when number of
  shared files and control policies increase more and more, such method doesn't
  scale and is not agile in store/load/management very soon.

This is view from users (both user space developers and end users). Currently I
don't see perfect solution with LSM may solve challenge from view of users.

Thanks for your suggestion.

Coly Li

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [TECH TOPIC] re-think of richACLs in AI/LLM era
@ 2025-09-17  2:38 Coly Li
  0 siblings, 0 replies; 3+ messages in thread
From: Coly Li @ 2025-09-17  2:38 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: Paul Moore, Steven Rostedt, Jan Kara, ksummit

On Tue, Sep 16, 2025 at 11:07:27AM +0800, Randy Dunlap wrote:
> Hi Coly,
> 
> On 9/16/25 10:12 AM, Coly Li wrote:
> > On Wed, Sep 10, 2025 at 03:11:55PM +0800, Paul Moore wrote:
> >> On Wed, Sep 10, 2025 at 9:32 AM Coly Li <colyli@fnnas.com> wrote:
> >>> On Mon, Sep 08, 2025 at 09:03:24PM +0800, Paul Moore wrote:
> >>
> >> ...
> >>
> >>>> I can't say I'm familiar with the RichACL concept, but generally
> >>>> speaking yes, the LSM framework exists as a way to implement access
> >>>> control mechanisms beyond the traditional Linux access controls (other
> >>>> things too, but those aren't really relevant here).
> >>>
> >>> Is it convenient for normal users or non-root processes (including the policy agent) to
> >>> setup the LSM rules? We need to allow normal users to set their own access control policy
> >>> for the data they owned.
> >>
> >> Management of an individual LSM's configuration is generally left to
> >> the individual LSM.  Some LSMs restrict their configuration knobs
> >> behind capabilities or their own access controls, while others allow
> >> unprivileged access to the configuration; it depends on the LSM's
> >> security model.  As an unprivileged example, Landlock allows
> >> applications, run by arbitrary users, to set their own Landlock
> >> security policy via the Landlock API.
> >>
> > 
> > I forward the suggestion to our application team, and they try to evaluate
> > and replied the result.
> > 
> > Currently they are using bpf to do the access control rules checking, and
> > LSM access control method e.g. Landlock is quite similar to a rules based
> > control method. They still need to persist all the rules on disk, and load
> > the rules during system initlization time. When the number of rules increased,
> > the maintennance is complicated and slow.
> > 
> > Also the application team give me a use case, ask how to achieve the access
> > control effiently. Let me describe by the followed text.
> > 
> > Users store they photos on the system, and the compact AI module processes all
> > their photos and groups all the photos into different categories like pizza,
> > dogs, cats, foods or group photos. After the process done, users may see they
> > photos in different categories that the AI module thinks they should be in. Then
> > users may share the categories with photos to others. If indentical categories
> > shared by different users, the shared photos can be combined all together. And
> > AI module may continue to process the shared photos and generate new categaries
> > from the shared photos, e.g. pizza in the same city, cats and dogs in closed
> > location, group photos contains the most common people, etc. Now the differet
> > categories are implemented by different directories in the publicly shared
> > directory.
> > 
> > In each category directory, photos with a category (or attribution) can be
> > accessed as hard links to the original photo inodes and share the identical
> > inodes. All these category directories are created by the AI module, although
> > the photos are shared from each users. If a user is identified from a group
> > photo, and this user is noticed that the photo is publicly shared. If this user
> > doesn't want his face to be shared in public, for an optinal privacy protection
> > right, this user can remove the hardlink of the photo which his or her face is
> > in, that is he or she can remove the hardlink (dentry) under a publicly shared
> > directory which this user doesn't have write permission. Because this user can
> > be idnetified as owner of his or her face, and the photo has his face in, he or
> > she should have write permission to delete the photo, but no write permission to
> > other photos in same category directioy which his or her face is not in.
> 
> 
> What permissions/rights does the AI module have such that it can create
> a file in shared/photos/faces from my personal files?

By default the AI module is not installed. If admin user installs the AI module
and a user permit AI module to process his personal information, the AI module
has read-only permission to the original data.

> The shared file with my photo is still owned by the sharing user, correct?

Yes, the original files are owned by the users who share the files to public.

> What are the permissions/owner of the parent directory?

If a user shared selected photos with a specific category/attribute, the parent
directory of the shared photos or category directories of the shared photos are
not owned by the original files owner. All the shared photos will be processed
furthe to build more connections and categories. 

> What permissions are required in my personal photos directory to allow
> files there to be shared?
>

The original files are not visible/accessible to other users by default.
AI process has read access granted by posix acl. But removing the dentry/hard-
link from a shared direcotry which the user has no write permission is not
supported by posix acl, and currently it is implemented by a ugly and in-
efficient ebpf hack. We don't like such hack, and it won't work in future for
huge number of files with more complicated AI process/share circumstances.

 
> It sounds to me like the AI was aggressive in sharing; now you (the user)
> want to correct/fix that. The AI could be trained better, but that's too
> difficult. (just playing devil's advocate there)

It is expected behavior, we don't have the motivation to correct/fix the
behavior. By default the AI processed photos are only accessed by the owner who
uploads the photos onto the system. People do want to share the AI processed
photos to others, e.g. all photos are categoried into 'fine food', and select
all photo in this category to other users on the same system. Currently they are
faimily members and very close friends or relatives. What I described in the
above case is a prolicy/procedure how conflict of privacy protect is solved.

Unlike the public clound and AI facilities, on our product the AI module only
processes data locally and no where to upload or share the processed information
in public, and users can use the compact AI capability without register a global
and unique online account. And the AI processed information can be shared very
convenient among all users on the system.


> 
> I haven't looked at this in any detail, but I'm wondering if an
> intermediate directory level with "my" permissions/ownership would
> allow a fix for this issue. Then I (the user) could remove the file
> (the hardlink) from the intermediate directory. However, that might

It is almost impossible to use intermediate directories. Because the photos in
a category directory may come from different users. The AI module processes the
shared photos again and again, tries to find interesting connection or
attributes of these photos and creates different categories every several days
or weeks. This is not initialized by the owner of the photos. Our system is free
of charge to end users, people may deploy the software on very normal hardware,
e.g. an old laptop bought 15 years ago. So the AI process is slow, it is
probably new photo categories show up after several weeks.

> leave some dangling link in the final shared directory. I don't know
> enough about it to answer that.

Neither me :-) Our product is Debian Linux + many user space applications. The
AI applications are not open sourced yet. I maintain the kernel part and try to
post all the kernel changes back to upstream. The access control lists require-
ment is from our application team, the kernel team tried to implement the
requirement with existed kernel infrastructure and failed, this is why I propose
such a topic on kernel summit.
 
> 
> > The above example is one of the simple case just for photos processing and
> > sharing in the AI context. The rules of access control are created or destoried
> > dynamically and maybe only exist for a short period. And the number of rules are
> > quite large.
> > 
> > Current rules based access control is inefficient and complicated to implement
> > for the above simple case, and the application team replies they don't see the
> > rules based LSM method can make it be more simple.
> 
> [snip]
> 
> >> One of the important parts of the LSM framework as a whole is that
> >> LSMs can not grant access that would otherwise be blocked by the
> >> standard/discretionary access controls built into the Linux kernel; in
> >> other words, LSMs can only say "no" to an access, they can not grant
> >> access by themselves.  Yes, this is by design, and no, I see no reason
> >> to change that design decision at this point in time (doing so would
> >> require a tremendous amount of work and likely introduce a fair number
> >> of security regressions for quite some time).
> > 
> > I understand and agree the concern of security. But the reality is more and more
> > similar or relative access control requirements will sprint up from the AI/LLM
> > applications and use cases. We just want to solve the access control challenge,
> > and the LSM rules based methods are not easy for application developers.
> 
> Do you have control over the AI that does this photo directory magic?

If you mean the shared photos category directories, once the photos are
explicitly shared, for the shared photos among all users, new category
directorie are created and decided by the AI modules. Users still have write or
delete permission if they are matched by some permission check. For example one
user's face ID matched in a shared group photo, this user will have permission
to delete this dentry of this shared photo category.

So the user doesn't have full control over the shared photo directories, and
they still have control if the photo related to them.
 
> Anyway, it sounds like a reasonable topic for the kernel summit (as a
> Tech Topic) instead of the maintainer's summit (process stuff).

Yes, I change the email topic to TECH TOPIC. And also sent proposal as TECH
TOPIC again to kernel summit mailing list, submitted the proposal on web page
as well.

Thank you for the discussion.

Coly Li

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [TECH TOPIC] Re-think of richACLs in AI/LLM era
@ 2025-09-08 13:57 Coly Li
  0 siblings, 0 replies; 3+ messages in thread
From: Coly Li @ 2025-09-08 13:57 UTC (permalink / raw)
  To: ksummit

[Resend proposal for Kernel Summit]

Hi folks,

This is Coly Li. I’ve been maintaining bcache for a while and have met Linus,
Greg, Ted, and other maintainers in person at many conferences. Yes, I am a
sustained and reliable kernel developer.

Recently, I joined a startup (https://fnnas.com) that provides AI/LLM
capabilities for personal or micro-enterprise storage. We help users share and
communicate AI/LLM-processed information from their stored data more
conveniently.

Our users can run highly compact LLMs on their own normal and inexpensive
hardware to process photos, videos, and documents using AI. Of course, it’s slow
but that’s expected and acceptable. They can even come back to check the results
weeks later.

In our use case, different people or roles store their personal and sensitive
data in the same storage pool, with different access controls granted to AI/LLM
processing tasks. When they share specific information or data with others
within the same machine or over the internet, the access control hierarchy or
rules become highly complicated and impossible to handle with POSIX ACLs.

We tried bypassing access control to user space, which worked well except for
scalability and performance:
- As the number and size of files increase, storing all access control rules in
 user space memory doesn’t scale—especially on normal machines without huge
 memory resources.
- For some hot data sets (a group of files and directories), checking access
 control rules in user space and hooking back to the kernel is highly
 inefficient.

Therefore, the RichACL project comes back to mind. Of course, RichACL alone
isn’t enough. A high-level policy agent (in user space) is still needed for
task/session-oriented access and sharing policy control, but RichACL can help
implement file system-level access control. This would give us a context-aware
and highly efficient access control implementation.

What I’d like to discuss is:
- After almost 10 years, should we reconsider RichACL in the AI/LLM era?
- What are the major barriers or remaining work needed to get RichACLs into
 upstream?

Since our first public beta was released 13 months ago, we now have over one-
million active installations running daily. This is a real workload for RichACL
and represents real feature demand from end users. If you’re interested in this
topic, we’d be happy to provide more details about the access control
requirements in AI workloads and even show a live demo of the use case.

Thanks in advance.

Coly Li

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-09-19  7:05 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-19  6:16 [TECH TOPIC] re-think of richACLs in AI/LLM era Coly Li
  -- strict thread matches above, loose matches on Subject: below --
2025-09-17  2:38 Coly Li
2025-09-08 13:57 [TECH TOPIC] Re-think " Coly Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox