Handling/tracking asset paths and dependencies

So our current pipeline is pretty fractured–most tools exist for specific tasks, are fairly isolated, and they work with/on files that an artist tracks down and opens manually.

Part of the struggle in changing that is developing a consistent method to track and retrieve assets of different types, and access the different dependencies and outputs that go along with each type. I’m looking for any inputs people are willing to give on ways they handle/have handled that.

Do you have a hard-coded master list in a DB that has entries for every asset, its type, and related information?
Do you base it largely off the file system, and build a list based off the existing folders, with a consistent file structure for each asset (inputs [models/textures/blendtargets]; outputs [rigs]; etc)?
Do you have a module with definitions for each asset type that knows the particulars of that asset type, and have file structures be consistent within an asset type? That way there is a method like get_assets(‘props’) that returns all props in a data structure with a common interface for getting paths/dependencies?

Any inputs are appreciated.

I’ve done it lots of ways, and they all have flaws. However doing something is way, way better than doing nothing. Make all of your knowledge about how things work into data structures and code instead of wierd inferences (“up two folders, down into “animation”, and find the file with the same first six characters and today’s date”). Allow tools to work on explicit knowledge instead of naming conventions, location rules, or bugging users for input (like:
“I want to build a level - what assets do I need? Please sync them and build!”) not (“sync these 4 folders and those two files before rebuilding the level”)

The tricky question is where to connective tissue - the metadata - is going to live.

Disk based tracking files – like unity .meta files – are easy to do and work well with source control; plus they can be versioned themselves so you can see when changes to the topology of your project change. However they can be subverted by knowledgeable users (not always a bad thing, if you’re doing some isolated experimentation) and you have to have a strong toolchain to make sure the metadata stays in lockstep with the assets you’re tracking. If you do this use a text based, human readable format (YAML, JSON or XML in order of preference) and make good libraries to abstract the details of reading and writing the actual disk files.

Databases have a steeper learning curve (unless you have prior experience) and they introduce a potential point of failure – what happens if the server goes down? – although nowadays you can just host it all on Amazon or whatever and have 99.9% uptime. They are great for collecting stats and tracking progress - the system we used for State of Decay let me produce realtime charts for memory in all the major areas of the game and publish it in our wiki as web page … until confluence broke all their SQL plugins, anyway…

The main problem with DBs is making sure they reflect the state of the actual game. The best way I’ve ever come up with it to update the DB from a perforce checkin trigger so the DB only changes when the ‘official’ state of the game changes. However that can be tricky since you may not have the knowledge you need when the trigger runs – if, say, you want to track the memory size of assets in the DB you need a way to collect that info from the model file when it’s checked in and that tracking code will be running on the perforce server, not a dev machine with all of the latest tools/code etc. SQL is also not the greatest tool for managing trees of data - while it’s certainly doable to have the db tell you “this level includes this model which uses these textures and this shader” you need some careful table design to make that snappy – a good SQL consultant is a great help, although you can teach yourself if you’ve got the usual TA gumption.

All that said this is still my preferred solution, since the data is so accessible in lots of ways - from inside tools, from the web, and from dedicated SQL tools. Plus, if you’re doing it in Python Django makes this stuff very high-productivity for you as a coder. Check out Adam Plechters GDC 2011 talk for some cool ideas of things you can do with a good database

I’ve never tried Adam’s use of Perforce attributes as a way of pushing the metadata into perforce directly, but that looks like a cool way to guarantee the lockstep of metadata and data. If I were going that route today I’d make all my key metadata into something like YAML nuggests and stick them into a single attrib on all my items, effectively making it a version of disk-based meta files.

Shameless plug: the book i worked on last year has a couple of chapters on metadata and asset management that might be of use as you ponder this.

We’re still doing this, still loving it. One key detail I doubt I mentioned… while putting metadata in attributes on Perforce files is sweet, it’s quite slow to do Perforce-based searches on them. Getting those values requires a Perforce fstat… it’s just not built for database-style queries. To counter this, our tools framework indexes all Perforce attributes into a little SQLite database on each user’s machine, and tools use that as a basis for all queries/searches. Works great.

Adam : any experience with p4ToDB?

I remember reading about that when it came out, but I did not give it a test drive. The key reason was that data you get from the DB only reflects what’s on the Perforce server, not what’s locally on the user’s workspace. At least that’s how it appeared to me. Having locally-changed values indexed into the DB was crucial for us. That enables tools like our asset browser, for instance, to show current attribute values the user has modified but not submitted yet (just like contents of the actual files).

There’s also only one mention of Perforce attributes in the release notes for P4ToDB, and it’s only a note about some attribute data getting incorrectly converted when using this tool.

On the other hand, I did mean to have a closer look at that thing. It’s possible my impressions were incorrect. When P4ToDB came out our system was already mature, so my motivation to dig into it was low.

I’m also looking into this at the moment for our pipeline.

One problem I found with disk based tracking files (if I understand it correctly!) is that you would always need an interface to the files. So you can’t have a user just access files over a network share for example. Then you would lose track of what is happening when people move/add stuff? It also means writing hooks for all applications so upon save/load it stores the meta data correctly. This sadly isn’t possible in all apps we use in our pipeline.

I’ve been leaning towards the DB way but the perforce attributes look very interesting. I will have to investigate that a bit more.
Anyone using NoSQL (like mongodb) for this sort of thing?

The ‘nice’ thing about disk based files is that they are (relatively) simple to get at from lots of places - you could write them from Maya python, Photoshop JS, and internal tools interchangeably.

The bad thing is that you need to maintain as many separate codebases as you have languages to support. Thats one reason why an open , text format like YAML or JSON is a must.

You could do what, eg, Unity does and mandate a 1-1 file > meta file relationship, which means tons of files which are mostly uninteresting, but it does mean you can always know that //network/folder/file.mb will be tracked by //network/folder/file.metadata

FWIW some companies (ArenaNet is one I’m sure of) use Mongo, and they at least are happy with it. It depends largely on how well you think you know the problem domain. Mongo is more flexible, SQL is better for analytics because it enforces apples-apples.

can Unity’s metadata files be customized with pipeline tools?
for example ainserting a JSON strings of our own custom asset data to it?
might be a useful way of piggybacking pipeline data on the Unity project data.
there could be issues I’m not thinking of…

Yep! That’s what I do. I stuff the metadata I want to preserve into a JSON string on a custom attribute before exporting and then the unity asset processor finds it in OnPostprocessGameObjectWithUserProperties(). I use this for both controlling import settings (what kind of animation controller, for example, or no second UVs) and also for project info. You can stick anything you want into the user data field of a meta file, so I just put my JSON in there too.

The only big issue is that the code should be very bulletproof - if it crashes, you can’t import assets until you fix it!

Theodox, sorry to resurrect an old thread but, If I follow you correctly:
[ul]
[li] The metadata you want is a JSON string[/li][li] saved to a custom attribute (in Maya/ Max)[/li][li] I know custom attributes will export with FBX[/li][li] Then using OnPostprocessGameObjectWithUserProperties() you extrcat the JSON from the custom attribute [/li][li] and add it to a userData attribute in the <asset>.meta file? (I assume by setting AssetImporter.UserData? )[/li][/ul]

do I have that right?
Also, where are texture assignments tracked in this? does a mesh asset keep a list of its texture assignments in its custom attribute?

We used the JSON blob for two different uses: it could contain import instructions (like “Add a bounding sphere component to this bone” or “make this a mesh collider”) or arbitrary metadata, which was forwarded into the YAML .meta file using UserData

We didn’t do anything with texture assignments although we thought about it. Since the .meta files are YAML and UserData isn’t critical path for default Unity it would be pretty easy to, say, hack up a Photoshop export script that created and updated metas for you