I've been developing web apps for a while now and it is standard practice in our team to use agile development techniques and principles to implement the software.
Recently, I've also become involved in Machine Learning and Natural Language Processing. I heard people primarily use Matlab for developing ML and NLP algorithms. Does agile development have a place there or is that skill completely redundant?
In other words, when you develop ML and NLP algorithms as a job, do you use agile development in the process?
Best Answer
Maching Learning and Natural Language Processing is somewhat data-driven. Without a continuous supply of high-quality data (which must be re-captured whenever new criteria are added), the software development may miss its intended target.
The customer and product owner may devote a bigger fraction of their time toward test data collection.
The adaptations will depend on:
There will be two feedback loops:
Thus, we see that "data" replaces "features" as the main definition of progress.
Because of the increased importance of "research / spike" in ML/NLP development, there is a need for a more organized approach to spike - something you may have already learned from any graduate research teams. Spikes are to be treated as "mini-tasks", taking from hours to days. Implementation of one algorithm suite will take longer, in some cases weeks. Because of task size differences, spikes and implementations are to be prioritized separately. Implementations are costly, something to be avoided if possible. This is one reason for using canned algorithms / existing libraries.
The scrummaster will need to constantly remind everyone to: (1) note down every observation, including "passing thoughts" and hypotheses, and exchange notes often (daily). (2) Spend more time on spikes (3) use existing libraries as much as possible (4) don't worry about execution time - this can be optimized later.
If you do decide to implement something that's missing in libraries, do it with good quality.
Daily activities:
Sprint activities:
About the note on deferring optimization: The thought-to-code ratio is much higher in ML/NLP than in business software. Thus, once you have a working idea, rewriting the algorithm for an ML/NLP application is easier than rewriting a business software. This means it is easier to get rid of inefficiencies inheritant in the architecture (that is, in the worst case, simply do a rewrite.)
(All editors are welcome to rearrange (re-order) my points.)