DEV Community


Posted on • Updated on

How do you manage not used images in website like cms?

I'm making website and I have different entities: users, comments, forum threads. It's something like wordpress with simple forum but I built it myself with laravel.

I use storage like s3.

Everything is great but users can upload images in posts, comments, forum threads and have avatar.

With avatar everything is fine:
Event: user uploads avatar
Action: system (cms) will upload image to s3 storage and on success delete previous avatar and change user row in database to use new avatar. Perfecto!

The hardest part is for forum comments, threads.
Event: user uploads image on comment form or thread form.
Action: system (cms) will upload image to s3 storage, add bbcode to textarea in form and add to images table row for uploaded image.

User can after leave page without posting.
User can remove markdown code if he doesn't like image.

So there are some zombie images. Every week I have a cron job to remove zombie images.
System will foreach images in my images table will search that image in comments, threads using lame

SELECT COUNT(*) FROM posts WHERE body LIKE %{image}%
SELECT COUNT(*) FROM comments WHERE body LIKE %{image}%

And if it doesn't find it will delete it.

I'm considering to improve it - to loop through posts and comments updated after NOW() and parse with preg_match_all all images used, add to array and then foreach images in my images table will check if image in_array, otherwise delete image.

I'm not sure but this still feels not perfecto.

How do yall manage this kind of things?

Top comments (3)

mykezero profile image
Mykezero • Edited

The only approach that I can think of right now is:

  1. When a user uploads an image, that file lands in a temp directory.
  2. When the user finally posts the comment/thread, move the image from the temp to a perm directory.

Then, you can apply a S3 lifecycle policy to the temp directory and have the non-posted images expire after X days, with an expiration action that deletes them. That would make the temp directory self cleaning and all the in-use files would be in the perm directory.

Now, if you also wanted to clean up the old image rows in the database, you could add a DateCreated and IsPosted columns to the image table and set those when the user posts the comment/thread.

Then create a computed column IsDeleted which uses both the IsPosted flag and DateCreated datetime column to return true when IsPosted is false and X days (from the S3 lifecycle policy) have past.

Now you'd only need a cron job to clean up the unused image rows. Or if you didn't care about cleaning up the image rows, use the IsDeleted flag in your SQL queries, so that you could ignore the deleted images when querying the images table.

Just make sure to add an index on IsPosted and DateCreated, so that the cron job is super fast in finding the image rows it needs to delete.

cicirello profile image
Vincent A. Cicirello

If you want additional suggestions on this, maybe add the #webdev tag to the post. This seems like something web developers would encounter frequently. You would need to remove a tag to add one since limit is 4. Maybe remove #algorithms since this isn't really an algorithms thing, and then add #webdev. That might help get your question into feeds of more people likely to have relevant suggestions on how to solve this issue.

cicirello profile image
Vincent A. Cicirello

I agree with the suggestion in mykezero's comment. I was about to suggest the same. Put images in a temporary location while user drafting post. If they abandon the post then remove them. If they post it, move images to a more long term location.