Databases in VFX Pipelines (Prism-Pipeline)

Rico · May 9, 2018, 10:51pm

I’ve developed a Pipeline (https://prism-pipeline.com/), which is based on a folder structure. At the moment I’m thinking about adding a database (mysql) to it, but I have some concerns.

One thing that works really nice with Prism is that the team can work at different locations without being in the same network and without a global server. The project folder will be synchronized with a cloud service like Dropbox or Google. So even if someone has no internet connection (or the connection is blocked by other up-/downloads) he can create new shots and when the internet conntion works again, Dropbox will sync all new files and folders to the other team members.

Does this workflow works with a database, too? If two people create two different shots on their PC without internet, will Dropbox sync the database so that both shots exists later on?
Are there other advantages/disadvantages for databases specifically related to VFX Pipelines?

Theodox · May 10, 2018, 5:47am

You’d probably want to have people connect to the database remotely rather than sharing the files . A MySql db is a big blob of data that is not easily mergeable, so you would not be able to resolve conflicts between changes made by two people in different locales.

You can however run a database server over the internet and then everyone can get up-to-date info from it. Typically production data is not so high volume that it will saturate even a smallish network. A python database app like Django will let you provide UI to your users with a browser or handle requests sent via HTTP requests by programs.

For myself, I’d prefer a database over the network, possibly with a local replica version that could be downloaded for people working offline. I don’t know if it even makes sense to worry about the offline case much anymore, the number of people who can actually work effectively without an internet connection gets smaller all the time. The main reason to support offline work as a regular feature would be if you were supporting studios in India or China were a lot of contract studios don’t like to give people live internet access.

RobertKist · May 10, 2018, 9:34am

You could consider SQLite. It obviously depends strongly on your use-case if that’s a solution. However, consider SQLite over running queries of huge JSON dicts or huge directory structures. Benefit one is that repeat queries will be quicker. Benefit 2 is that reading a SQL query is easier to read and maintain than a complex query written in Python.
SQLite supports both in-memory and disk based operations. So you could upload a SQLite database file via Dropbox. However, you will still have to merge it manually (if that’s what you want/need). This may be difficult or straight forward, really depending much on your database design and amount of data.
SQLite ships with Python by default.

If you want to go with a central server, you could also consider MongoDB. You are not restricted to tables. Data, instead looks very much like JSON or Python dicts, and it is great if the structure of your data keeps changing. While MongoDB has still very powerful queries, it’s not at the same level as SQL. SQL is king when it comes to complex queries.
PyMongo is a very solid module to work with MongoDB - it’s written in Python (no C parts) and runs on most (all?) DCC apps. PyMongo automatically detects connection drops and re-connects, which is a nice feature.

Generally, I would not recommend connecting to database servers over the internet - they are very much designed for handling connections from a local network. If you need to connect to your database from far away, you should consider writing a HTTP server which accepts data from remote clients and then queries the database.

For MySQL / MariaDB, the PyMySQL driver is good. It’s entirely written in Python (i.e. runs on most DCC apps) and can be installed via pip on all major OSes.

Theodox · May 10, 2018, 5:41pm

A central server is very nice – it’s also an ongoing maintenance thing, since you have to worry about internet security issues. A lot depends on whether you have access to high-quality IT support or not.