Small module pipelines

Something I’ve noticed over my years of developing stuff for people is the distinction between those who love small module pipelines and those who prefer monolithic large app pipelines.

An example of small app pipelines is a pipeline made up of lots of little apps – each app modifying data and passing it on to the next app. Usually under control of some master calling process that marshalls the data and ensures that only the data that *needs* to be processed actually gets processed – it handles dependency checking and so on. A good example of a control process might be Scons or KJam – a Python built system.

A more monolithic approach is a large app that basically does all your processing for you – everything that does data transformation is within the app. You just call that one big app and sit back and wait till it’s done.

Now the advantages of the small app is flexibility. Want to change how shaders are generated? Sure, just replace that module with another one. As long as it can handle the file formats as they are forwarded on, you are good to go.
Another good thing about this process is that the data itself is usually held in file format between applications which makes debugging that much easier. If one node in your tree is doing bad things you have the trail of data to look at to work out which node is doing The Bad Stuff(tm).
Another advantage is that the module code tends to be able to be incorporated into other modules relatively easily. Have a materials processing module? Great, you can probably wrap that up into a Maya Plugin relatively easily because the code is designed as standalone code in the first place.

Cons of this approach include speed issues – since each app is an individual app it means that the app needs to start up, data needs to be loaded, processed, saved and then passed onto the next. This is both time and bandwidth consuming with so many loaded and saved temporary files. It’s also loaded with many points of failure – each node could be the wrong version of the application, or someone made a modification of that node and didn’t test the entire pipeline to ensure it plays well with others.
Another issues is lack of consistency. While a well run project has very defined parameters for how modules are built there is often enough vagueness that developers tend to create their own ways of how the module logs errors, or what languages a module uses (“Oh, this bit is Python, but that calls this Perl Module that then accesses this other website”). The lack of overall framework often results in each module having it’s own set of very specific overall dependencies on 3rd party code / feature sets. Sometimes these dependencies can even be at odds with what *other* nodes in the network require.
One last problem that can crop up is an extension of the internal dependencies problem – what works in isolation doesn’t work in combination. For example, a module that’s written to work on it’s own can’t be compiled into another module simply because it uses the same libraries as the larger module but a different version. There’s an external library collision – then what do you do? Because everything is built in isolation there is way less forcing of conformity of library usage. The classic “Well, it works on my machine” problem.

The advantage of the large app is that generally data is passed from one internal process to the next in memory, which makes it a lot faster than the small node pipeline. Also, all the code is in one place which makes the dependencies issues far less – you *have* to load the entire pipeline in order to test new code because, well, it’s all in one place.

The disadvantage of this approach is that code reuse tends to be at a minimum – when the code for a particular operation is inside a larger application it tends to get targeted at that specific application and molded for it – the idea of an independent module with no dependencies tends to get lost. The code itself also tends to be way more mission specific and less flexible than code written for smaller modules, and certainly there is less in the way of error checking internally because you tend to trust the data that is fed in more than you would as an independent module (although that can also have the plus of speeding the code up a bit – lacking all that value range checking it just Is Faster).

It’s also way harder to debug – you get all the logging output from every step rather than just the one you want and generally have to sit through gobs and gobs of other code to get to the part you want.

An observation I’ve noticed is that individual engineers preferences toward one type of pipe against another tends to come from their platform of choice.
Linux is a small app driven environment – lots of small console apps all strung together to make an operating system. If you’ve ever seen linux users string together commands on the command line, piping the data from one into another you are seeing a microcosm example of what the small app pipeline looks like.

Windows users tend to go for more monolithic application approach since that’s what dialogs and so on are built around. Windows does have DLL’s it’s true, so each module could be built as a small app, but anyone who’s done any extensive work in DLL’s can tell you, versioning can get out of hand very very quickly under windows and sometimes diagnosing this can be of great pain.

My personal feeling leans towards the monolithic app, simply because I’m a windows user – the small app pipeline just has too many points of failure and too many smaller internal dependencies that all have to be set perfectly for it to work.

Whatever else you may say about windows, it does at least have far more graceful legacy handling of older formats. Smaller hand built modules tend to be far less fault tolerant, but report less so when they do fail you have no idea why. The small module approach is definitely The Way To Go in certain situations, but too much dependency on it means you end up with large pipelines with god awful sets of dependencies within them. One change tested in isolation and it brings the whole thing down.

Just something to think about when you are designing your next pipeline.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>