Which language is more suitable heavy file tasks

cfile-systemslinuxprogramming-languagesshell

I need to write a script (based on basic functions) to process /image/audio/video files. The process is mainly filesystem tasks and converts. The database of files has been stored by mysql. The script is simple but cause heavy tasks on the system; for example renaming/converting/copying thousands of file in a run. The script does not read the content of files into memory, it just manage the commands for sub-processes. The main weight is on the communication with filesystem. The script will be used regularly for new files. My concern is about performance. I am thinking of

  1. Shell script
  2. a complied language like C

Please advise which programming language is more suitable for this purpose and why?

UPDATE: An example is to scan a folder for images, convert them with ImageMagick, move files to destination folder, get file info, then update the database. As you can see, the process has no room for optimization, and most of languages have similar APIs for popular programs like ImageMagick, MySQL, etc. Thus, it can be written in any language. I just wish to reduce resource usage by speeding up the long loop.

NOTE: I know that questions about comparing languages are not favorable, but I really had problem to choose, because the problems can appear in action.

Best Answer

It sounds to me like you will simply be handing these files to another piece of software to actually read them in. If that's the case, use python or ruby or whatever easy to use high level language you have on hand because this program isn't actually IO intensive.

Now, if you are actually reading the contents of files in yourself to process on your own then I would say if the language you normally use doesn't have any interoperability for making OS level calls, don't use that language.

Ideally in this case you'd want a high level language with low level facilities.

C# for instance does this well in allowing high level handling of the simple stuff like processing the user input from command line and organizing the steps of your task or whatever else, but then allowing OS calls and direct memory management for highest possible performance (if necessary!). Java may do similarly? Not certain. Haskell is very high level as well as having facilities for direct memory manipulation, though Haskell has an extremely high barrier for learning if you don't already know it. C++ is probably the most commonly used language for this type of task for this exact purpose, it is a high level language and with C in it's roots it has complete low level facilities available.

That said, beware of premature optimization. Writing a program in a language that isn't your strongest language will likely under perform the one you could write in your native language as you wouldn't be aware of available optimizations or be using the language appropriately. Further the only way to know if the quickest most robust route of using your main language won't work is by giving it a go which would be far quicker than going all in on a language you don't know. So prototype something and see how it performs, if you don't think it does well enough then whip up a quick prototype in other languages and compare.

Prototyping as a part of your technical solution analysis is an ever important skill for all programmers, so take advantage of this task to practice it. In the scientific method we don't try to prove our hypothesis as that leads to bias, rather we try to disprove them. Your hypothesis is your normal language won't perform well enough for this task, so start by trying to disprove that.

Edit: Based on your edit, your program is absolutely not IO intensive, so write it in the language you are most comfortable with

Related Topic