I did this type of thing at a previous company. We had a very large C application that took ages to do a complete build. 45 minute compile time, but we managed to get it down to less than 6.
Measure:
Firstly as you are doing I would measure each part of the process:
- Check out 15 minutes.
- Compiling Java branch takes 20
- Compiling JSP branch takes 40
Time: 75 minutes
However you state in the question that is takes an hour and a half so we have a missing 15 minutes, where is this is the build process ?
Running the ANT script, downloading all files off Version Control,
compiling JSP's into JAVAs and then compiling those JAVAs into Class
files takes an hour and a half.
So the first thing is to do an update of SVN rather than a checkout as in comments you suggest, this reduces the build time to 62 minutes. (Average 6 builds a day)
Next I would think look at the build scripts, is it possible to build your JAVA code or JSP in parallel. As it's legacy code it might not have been told to build over multiple CPU's. Our project was still sitting on a single CPU. You mention ANT it has the ability to build in parallel. Parallel Tasks
If it already doing this does writing them to disk pose an issue ? I ask this because if you are creating a lot of objects, the disk could be hitting it's bandwidth. Try an SSD drive for building. I mention this because it's a low cost solution to implement.
Other things to consider:
You want fast feedback on a change, does it need to be full feedback ?
- Linting the code for faster feedback for some errors.
- Unit tests.
- CI server to distribute the load over several machines.
Long Term:
- Split the code out into multiple parts, do you have utilities that could be compiled independently, then included later ?
- PMD can show you where duplicate code is being used. Legacy projects normally have this issue where the same code is included in many places.
An advice: Do not think of your solution as a monolith. Be open to the idea of it being more than one program.
Pros and Cons
Honeslty, I am unsure if these are pros on cons, I leave it to your judgement...
- Store all scripts in Db and provide them to interpreters via command line (for example python.exe -c ).
- The scripts cannot use the file system.
- Not metaprogramming friendly.
- Store all scripts in Db and create temporary file when execution is required. Provide tmp file to interpreter and remove it once job is done.
- You can allow the scripts to use the file system.
- Metaprogramming friendly.
- Store all scripts in files and provide script files to interpreters.
- You can allow the scripts to use the file system.
- Not metaprogramming friendly (if you want to protect scripts from each other).
Errata: It is possible to allow scripts to communicate/interfere with each other in any option. It is easier on option 1.
How safe they are? Well, they are about the same. The measures you need to take to make your system secure has nothing to do with the storage of the scripts, but how you run them... except that doesn't mean what you are thinking.
Security Concerns:
Just running arbitrary third party code is security concern. Two concerns actually:
- Do not allow scripts to mess with other scripts
- Do not allow scripts to mess with the underlying system
For the first concern, it could appear that taking advantage of a database engine and withholding access to it will prevent a malicious script from messing with other scripts. However, the malicious script can still mess with the rest of the system (which might or might not include tampering with the database engine).
The scripts could cause a lot of damage to your system. Just as an example, one could download malware and configure it run on a scheduled task or on reboot.
In terms of the second concern, the database is not buying you much security.
In addition, passing the code via command line will not make it less harmful.
For that second concern, your first real option is to run the code as an operating system user with low privileges. Besides that, I would suggest virtualization solutions (containers, virtual machines, or other generic sandboxes).
Now, if you consider running your server code under a user with low privileges, you will have to use option 3:
Store all scripts in Db and provide them to interpreters via command line (for example python.exe -c ).
That is, because, if you have right to write files, so will the scripts. Thus, better do not have that right.
Note: You can't really block them from the network with this method.
The database here might be of any kind. In fact, it could be storing the scripts in plain in your file system. Except, you do not pass the script to the interpreter, instead you read the code to memory and pass it to the interpreter via command line.
Errata: The database might be of any kind, but must run as a service, free from the contraint of not being able to write.
Speaking of reading, grant read access only to what you need.
I remind you that we are using command line because we have to. Using the command line is not what makes it safe.
Addendum: Due to the comments, I have considered if it is possible to use option 3 with file write access, without going into what I describe below (using two operating system users). It is vialbe to use option 3, on a single operating system user, with file write access. As long as the operating system user has only right over a constrained area. That would also allow the scripts to communicate/interfere with each other (the same would apply for option 2).
On the other hand, if you need to grant write access to the scripts (or if you don't want to use the command line), you will have to split your server solution in two operating system users. Run code under the first user to take requests, managing scripts, calling code to run as the second user and responding. Run code under the second user to run the scripts.
Have the code of the first user place the script in a temporary folder, grant write access on the folder to the second user. Make sure that folder is the current path for the script you run.
The script will be able to use the file system inside that folder, and you can use the first user to wipe it.
There is still a problem: they can try to fill the disk with a script. Even if you set a small quota for the second user, they may cause a denial of service. To mitigate this, you may have code running as the first user to monitor the file system (not pooling, register a notification from the OS, being asp.net that would be using FileSystemWatcher) and if the script is using too much space you can kill it, and flag it as dangerous.
You can continue to use option 3 with that setup. Yet, you do not need to. You can use option 2:
Store all scripts in Db and create temporary file when execution is required. Provide tmp file to interpreter and remove it once job is done.
If you do, you may or may not grant write access on the script themselves. It is possible that the script has code that reads or write itself as some sort of metaprogramming technique. You need to consider if you want to allow this.
You need metadata. Either a document oriented database or a relational database will work.
In fact, if you have the two operating system users solution, you can even use option 1:
Store all scripts in files and provide script files to interpreters.
Just do not grant the second user right to write the scripts. Preventing the scripts to mess with other scripts.
As I said at the start, in option 1 you can allows the scripts can communicate/interfere with each other. This is happens when one script write to a file, and another one reads. If you want to prevent this, either don't grant them write access or don't use option 1.
Addendum: There is a possible side effect of option 1 with write access: scripts could create other scripts. If this is a problem, it would be better to keep a list of the "valid" script somewhere else.
Other concerns
You have said (regarding option 1):
This also allows to change scripts externally using user's favorite text editor and allows to debug scripts.
That should not be a concern, because you are providing a web GUI, which the user should be using. If you need to provide a text editor, build it in the GUI. Debugging is probably out of the question.
(...) multiple users could be accessing to and executing same files at the same time, since they operate from web GUI
You can control that from your web code. You may – for example – queue requests to prevent too many script running at once (or the same one running multiple times, if that is really a concern).
You have said (regarding option 3):
Problem with this approach is that I need to heavily escape the script code in order to paste it as command line argument.
You could create an executable that reads the code (from where you have it stored) and outputs the code (with whatever preprocessing you need to do) to standard output. Then redirect the standard input of the interpreter to take the output of that executable. A.k.a Piping.
Best Answer
It's not uncommon to do what you described. In fact, when you create a new database for your user, you basically creating a new file for that user. So, it just adding a file to the set of per user files.
The choice of when to do the work begs explanation though. Doing the sign up process in batch periodically can create load spikes on the servers. It's better and simpler to copy all the files for new users straight away. If the process is long enough (e.g. creating a vm for user), you show them an "setup in progress" page.