Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> And listing files is slow. While the joy of Amazon S3 is that you can read and write at extremely, extremely, high bandwidths, listing out what is there is much much slower. Slower than a slow local filesystem.

I was taken aback by this recently. At my coworkers request, I was putting some work into a script we have to manage assets in S3. It has a cache for the file listing, and my coworker who wrote it sent me his pre-populated cache. My initial thought was “this can’t really be necessary” and started poking.

We have ~100,000 root level directories for our individual assets. Each of those have five or six directories with a handful of files. Probably less than a million files total, maybe 3 levels deep at its deepest.

Recursively listing these files takes literally fifteen minutes. I poked and prodded suggestions from stack overflow and ChatGPT at potential ways to speed up the process and got nothing notable. That’s absurdly slow. Why on earth is it so slow?

Why is this something Amazon has not fixed? From the outside really seems like they could slap some B-trees on the individual buckets and call it a day.

If it is a difficult problem, I’m sure it would be for fascinating reasons I’d love to hear about.



S3 is fundamentally a key value store. The fact that you can view objects in “directories” is nothing more than a prefix filter. It is not a file system and has no concept of directories.


Directories make up a hierarchical filesystem, but it’s not a necessary condition. A filesystem at its core is just a way of organizing files. If you’re storing and organizing files in s3 then it’s a filesystem for you. Saying it’s “fundamentally a key value store” like it’s something different is confusing because a filesystem is just a key value store of path to contents of file.

Indeed there’s every reason to believe that a modern file system would perform significantly faster if the hierarchy was implemented as a prefix filter than actually maintaining the hierarchical data structures (at least for most operations). You can guess that this might be the case that file creation is extremely slow on modern file systems (on the order of hundreds or maybe thousands per second on a modern NVME disk that can otherwise do millions of IOPs and listing the contents of an extremely large directory is exceedingly slow)


In context of the comment I was addressing, it’s clear that filesystem means more than just a key value store. I’d argue that this is generally true in common vernacular.


This is a technical website discussing the nuances of filesystems. Common vernacular is how you choose to define it but even the Wikipedia definition says that directories and hierarchy are just one property of some filesystems. That they became the dominant model on local machines doesn’t take away from the more general definition that can describe distributed filesystems.


I'm kind of chuckling at this thread because you're working so hard to not understand.

I think the previous poster could/should have said, "It is not a hierarchical file system and has no concept of directories." where I added the word "hierarchical".

But it's also pretty obvious that was the point.


I disagree with that characterization because the contrast by OP was that S3 is “just a KV store implying” it doesn’t meet the criteria for being considered a filesystem.

For example, you could implement POSIX directory semantics on top of S3. About the only POSIX filesystem API you couldn’t implement it append / overwrite (well you could but it might be prohibitively expensive).


A real hierarchy makes global constraints easier to scale, e.g. globally unique names or hierarchical access controls. These policies only need to scale to a single node rather than to the whole namespace (via some sort of global index).


no - a filesystem implementation on an ordinary OS has more than what you mention, including interfaces to disk device drivers


If I wanted to use S3 as a filesystem in the manner people are describing I would probably start looking at storing filesystem metadata in a sidecar database so you can get directory listings, permissions bits, xattrs and only have to round-trip to S3 when you need the content.


Isn't this essentially what systems like Minio and SeaweedFS do with their S3 integrations/mirroring/caching? What you describe sounds a lot like SeaweedFS Filer when backed by S3


The way that you said "recursively" and spent a lot of time describing "directories" and "levels" worries me. The fastest way to list objects in S3 wouldn't involve recursion at all; you just list all objects under a prefix. If you're using the path delimiter to pretend that S3 keys are a folder structure (they're not) and go "folder by folder", it's going to be way slower. When calling ListObjectsV2, make sure you are NOT passing "delimiter". The "directories" and "levels" have no impact on performance when you're not using the delimiter functionality. Split the one list operation into multiple parallel lists on separate prefixes to attain any total time goal you'd like.


All these comments saying merely "S3 has no concept of directories" without an explanation (or at least a link to an explanation) are pretty unhelpful, IMO. I dismissed your comment, but then I came upon this later one explaining why: https://news.ycombinator.com/item?id=39660445

After reading that, I now understand your comment.


I appreciate you sharing that point of view. There's a "curse of knowledge" effect with AWS where its card-carrying proponents (myself included) lose perspective on how complex it actually is.


Yes, this is very good advice and will likely solve their problem


A fun corollary of this issue:

Deleting an S3 bucket is nontrivial!

You can't delete a bucket with objects in it. And you can't just tell S3 to delete all the objects. You need to send individual API requests to S3 to delete each object. Which means sending requests to S3 to list out the objects, 1000 at a time. Which takes time. And those list calls cost money to execute.

This is a good summary of the situation: https://cloudcasts.io/article/deleting-an-s3-bucket-costs-mo...

The fastest way to quickly dispose of an S3 bucket turns out to be to delete the AWS account it belongs to.


No, don't do that. Set up a lifecycle rule that expires all of the objects and wait 24 hours. You won't pay for API calls and even the cost of storing the objects themselves is waived once they are marked for expiration.

The article has a mistake about this too: expirations do NOT count as lifecycle transitions and you don't get charged as such. You will, of course, get charged if you prematurely delete objects that are in a storage class with a minimum storage duration that they haven't reached yet. This is what they're actually talking about when they mention Infrequent Access and other lower tiers.


Still counts as nontrivial.


This is really easy; much easier than trying to delete them by hand. AWS does all the work for you. It takes longer to log into the AWS Management Console than it does to set up this lifecycle rule.


Literally 1 API call.


Two. The one to set up the lifecycle rule. Then the one to delete the bucket, some number of hours later.


Incorrect. One call to trigger a step function that sets up the lifecycle rule, sleeps for 24 hours and then deletes the bucket.

Stop being silly, as if 1 vs 2 API calls matters. You should empty large buckets with lifecycle policies. It's trivial.


Imagine for a second you’re a Unix user, familiar with the rm command.

Imagine you are using windows for the first time and you want to delete a directory, so you find an answer on Serverfault that explains that to do so you need to spin up a COM object that marks the directory for deletion, then the next day comes back and deletes it.

You might be inclined to say ‘that seems overly complicated’.

The original answerer is confused though. ‘It’s trivial, stop being silly. Can you think of a simpler way to delete a directory?’

Do you see now why I thought the ‘non triviality’ of deleting an S3 bucket was perhaps relevant in a discussion on an article about why S3 is both simpler and more complex than a file system?

And why your approach might not actually be making the case for it being as simple as you think?


Right click, move to recycle bin, wait for the progress bar to finish. Except the progress bar takes a day or so.

This is only needed if you have a huge (100 million+) bucket, at which point you should be experienced with s3, otherwise you can just click the big, clear and obvious “empty bucket” button on the console.


I think it’s far more mundane a reason. You can list 10k objects per request and getting the next 10k requires the result of the previous request, so it’s all serial. That means to list 1M files, you’re looking at 100 back to back requests. Assuming a ping time of 50ms, that’s easily 5s of just going back and forth, not including the cost of doing the listing itself on a flat iteration. The cost of a 10k item list is about the cost of a write which is kinda slow. Additionally, I suspect each listing is a strongly consistent snapshot which adds to the cost of the operation (it can be hard to provide an inconsistent view).

I don’t think btrees would help unless you’re doing directory traversals, but even then I suspect that’s not that beneficial as your bottleneck is going to be the network operations and exposed operations. Ultimately, file listing isn’t that critical a use case and typically most use cases are accomplished through things like object lifecycles where you tell S3 what you want done and it does it efficiently at the FS layer for you.


That's 5s of a 15m duration. I don't think it matters in the least.


Depends how you’re iterating. If your iterating by hierarchy level, then you could easily see this being several orders of magnitude more requests.


It's not a good model to think of S3 has having directories in a bucket. It's all objects. The web interface has a visual way of representing prefixes separated by slashes. But that's just a nice way to present the objects. Each object has a key, and that key can contain slashes, and you can think of each segment to be a directory for your ease of mind.

But that illusion breaks when you try to do operations you usually do with/on directories.


Are you performing list calls sequentially? If you have O(100k) directories and are doing O(100k) requests sequentially, 15 minutes works out at O(10ms) per request which doesn’t seem that bad? (assuming my math is correct…)


At risk of being pedantic, you seem to be using big O to mean “approximately” or “in the order of”, but that’s not what it means at all. Big O is an expression of the growth rate of a function. Any constant value has a growth rate of 0, so O(100k) isn’t meaningful: It’s exactly the same as O(1).


You're right technically, it's an abuse of notation that isn't uncommon. My physics profs would do it in college.


Fair point, I guess the notation ~100k, ~10ms would be better.


I implemented a solution by threading the listing. Get the files in the root then spin a separate process to do the recursion for each directory.


> Why is this something Amazon has not fixed?

It's common to store metadata on DynamoDB where it can be queried, and just have whatever arbitrary links to the values in the buckets.


> Why is this something Amazon has not fixed? From the outside really seems like they could slap some B-trees on the individual buckets and call it a day.

They fixed it already, it's called DynamoDB. With some SQS and Lambda glue you can index your S3 content in any way you want for later retrieval.


Take this opportunity to read the docs and discard assumptions. Enumerating buckets as though they’re directories will seem peculiar when you understand it is designed for billions of items and up. Index your objects separately, in whatever form makes sense to your application.


It's not "fixed" because it's not a problem. You're just using it wrong.


> Recursively listing these files

There's no "recursive" nature to S3 buckets. "Listing a directory" is simply listing keys by a prefix.

So list by the upper-most prefix that you want. If you have 1,000,000 files, it will take 1,000 API calls to list everything.

If each call takes 1s (I have no idea what your latency to the S3 bucket region is), then it will indeed take 15 min.

https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObje...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: