Storing/Retrieving Extension Data

rank13 · September 5, 2023, 12:06pm

The Timeline Actions extension I was working on earlier this year was at the extreme end (for me). I ended up using ValueTree as the mechanism to serialise to an xml file. But it had a hierarchy with multiple properties at each level:

Songs
- Song
  - Tracks
    - Track
  - Clips
    - Clip

It sounds like I would be better off with the blob? Or would I store the xml as a single dictionary value?

DaveBoulden · September 5, 2023, 12:19pm

@rank13 In which case you would store your serialized value tree in one dictionary entry. Others might use song UUIDs to store individual values or blobs, a different dictionary entry for each song if that is more suitable to their situation. I think we may all be talking about the same solution, but applying it in different ways.

simon · September 5, 2023, 1:37pm

Yes, you can use a key/value interface to store and retrieve a single blob of data. As @rank13 said, one would just use a single key.

My point is that a key/value interface is much more complex than a single store/retrieve interface. If we start with the two fairly simple functions above, we will soon run into more questions:

Should there be a function that just tells me whether there is any data under a specific key?
How would I delete a key/value pair?
Should GP offer a possibility for the extension to iterate over all keys that the extension has stored? Or should the extension keep track of those keys itself?

Getting this right is not trivial and thus (as far as I am concerned) not a good fit for the interface between GP and extension, where you have a technical bottleneck (anything needs to be expressed in fairly trivial C structures, you cannot pass any actual Strings or smart pointers between GP and SDK) as well as a human divide (David writes the code for GP, somebody else writes the code that uses the SDK).

Think of your song chooser extension: You might need HTML/CSS parsing or even just the ability to open up a window and draw into it, but none of that functionality comes with Gig Performer or the extension SDK. Instead, you have the burden of either adapting an example or choosing another way (e.g. a 3rd party lib) to do that. But you have the massive advantage that the interface between the extension and GP stays fairly small and does not change too much, so it was very well doable to backport the extension to GP 4.5. The larger and more complex that interface gets, the more you are tied to a specific GP version.

simon · September 5, 2023, 1:45pm

(emphasis mine)

Note that in any non-trivial case (storing a single number, string, etc.), you will have to deal with the serialization of some data structure here. And if that is the case, I think the gain of a single nesting level (via the keys) is almost negligible.

dhj · September 5, 2023, 1:45pm

Yes….the idea is to just implement a dictionary. Whether you use a single name and manage the data yourself or you use multiple names with different items in each, is totally up to the developer. As long as I receive a string (std::string), it doesn’t matter to me what’s in it.

Now, there is another approach. You could just send a blob of arbitrary data along with a length, i.e. a blob, which I will compress and save much like we save plugin state. Then you can send whatever you want and you’re not limited to null-terminated strings.
That said, it seems to me that a string optionally representing a chunk of XML or even JSON is the way to go. The JUCE system has a nice set of functions for creating XML data structures from a string representation and back. Once you are in XML land, JUCE also has a nice set of functions for saving and restoring attributes as well as a nested tree structure.
Or you could just have a sequence of a=b values and then turn that into a single dictionary entry.

I don’t think it’s worth getting too complicated

simon · September 5, 2023, 1:50pm

That’s what I would use in practice

But why would you want to enforce strings (instead of binary data)? As I said, you cannot directly pass a string object directly between GP and SDK anyway. The only thing I see that one could gain is the fact that a null-terminator would save you from passing along an explicit length.

dhj · September 5, 2023, 2:09pm

I don’t have strong feelings about this – however, from the developer perspective (and having had to deal with this stuff myself in GP itself), it’s a heck of a lot easier to use something like XML and string handling rather than having to use MemoryBlocks and do your own conversion. In particular, if you’re using the JUCE XmlElement class, it’s trivial to just do things like this

  XmlElement* xml = new XmlElement("some tag"); // Create a new Xml element
  xml->setAttribute("foo", 42);      // Set a few attributes
  xml->setAttribute("pi", 3.14159);
  xml->setAttribute("name", "David");

  // Add child elements if you need them
  // .... blah blah blah

  String asString = xml->createDocument("");  // Convert the xml document into a string

  Save("MyData", asString);  // Save the string back into Gig Performer

  delete xml;  // Or of course, just use a std::unique_ptr

Or of course if your needs are trivial then just

   Save("foo", 42)l
   Save("pi", 3.14159)
   Save("name", "David")

simon · September 5, 2023, 2:22pm

Yes, I am familiar with these APIs

That’s fairly trivial, indeed But as opposed to saving, loading is a bit more involved since now you have to deal with what data type you expect, who allocates memory for the string etc. (By “you”, I mean obviously you as SDK and GP developer, but extension developers are involved here too of course!)

In my proposal, as an extension developer you would just always go through something like JUCE (or any JSON library, or Protobuf or whatever floats your boat) which have mature APIs for this kind of stuff and anything related to typing or memory management does not need to go through the C FFI.

DaveBoulden · September 5, 2023, 2:38pm

In my instance, I tend to use nlohmann/json to package up and serialize my structured data since I am using JavaScript within my stack, so the simple dictionary works very well for my needs whilst still being equally usable for anyone using JUCE Xml handling or any technique of their own choosing to represent/serialize their data as a simple string.

A dev can choose to read their data as a single entity upon instantiation and store it when closing or create their own structuring by choosing their own format for the dictionary keys and read and them them upon rackspace/song activation or change. I think it offers the widest possible appeal to potential extension developers from beginners to seasoned pros.

dhj · September 5, 2023, 2:43pm

It seems to me that if the developer doesn’t know the type of what he/she has saved, all bets are off. Even in an XML document, all attributes are just saved as strings. The developer either needs to know how to interpret them or depend on some other description such as XSL, etc

pianopaul · September 5, 2023, 3:00pm

and XSD-Definition

simon · September 5, 2023, 3:10pm

To put it into more technical terms: You need to multiplex any save and restore function with the data type (in your example: 3) that you want to offer. And I assume it would not be wise for Gig Performer to assume that any extension always calls the right restore method for the key that it is requesting, so Gig Performer of course needs to manage and store what type behind any stored key is. And you need some well-defined way of handling restores through the function with the wrong type. All of that is certainly doable, but now your API is already quite a bit larger than two functions, it’s basically a copy of the GP_VM_Push/Pop* functions now.

The next hurdle is memory management: For GP_VM_PopString, I need to allocate a buffer myself as extension developer. Especially for the use cases that come to mind in extensions (e.g. storing user-defined data, such as pictures), the size of the data might vary from < 1 KB to > 10 MB, so allocating with a constant size is not sensible. So what I need to do instead is store some data size in another key, take care to keep that up to date, always read that key first, then allocate a buffer and restore. Do you see where I am getting at? There’s lots of potential for bugs, especially for programmers with less experience. So we can either put even more work into this API (and likely change it frequently) or make the API for this really primitive and advise people to use some standard serialization/deserialization lib (which they can pick and upgrade themselves, find tutorials on the Internet for, …).

simon · September 5, 2023, 3:12pm

Enterprise Quality Extension Development™©

DaveBoulden · September 5, 2023, 3:48pm

Unless those calls were actually

   Save("foo", "42")
   Save("pi", "3.14159")
   Save("name", "David")

We also shouldn’t assume the extension is going to be written in C or C++, so requiring the extension developer to serialize to text would seem to me to be the safest option.

Frank1119 · September 5, 2023, 6:07pm

I’ve been busy with saving random strings/doubles/integers/booleans in the gigfile: what I did, was a combo of an extension and a vst plugin. Due to how things work in Windows (I don’t know if it would work for macos), the vst can make calls to the extension.

The extension I use, creates, sets and returns label-value pairs and exposes these functions by adding script functions
When gp retrieves the vst-state for saving, the vst serializes the label-values associated with the rackspace it is in
When gp loads a gig file and presents the saved state to the vst, it reapplies the state in the extension

Luckily to implement such a scheme, gp doesn’t have to depend on platform possibilities.

The nice thing of this approach is that the state is easily copied to a another rackspace by copying the plugin and when duplicating the rackspace the state is also copied.

Deleting the plugin also deletes the label-value pairs from the associated rackspace, so no dangling stuff (after saving the gig)

dhj · September 5, 2023, 7:40pm

OK - let’s not confuse things here ---- that function call is going the other way, the extension doesn’t control it and so we need to have a way for the extension to know how much space to allocate before invoking GP_VM_PopString() so you don’t have to guess.

It may be that I need to just add a function that will let you know the size of the string that is sitting on the stack ready to be popped. If that’s useful, submit bug report and I’ll add it. I certainly wasn’t thinking about images in strings as use cases, just, you know, regular strings.

The above example has nothing to do with saving and restoring data since that activity is completely under the control of the extension.

I do not think you want to be storing 10MB blobs of data in a user’s gigfile, That’s going to half disastrous impact on gig loading and RAM. And in fact this leaves me rather concerned about the notion of actually storing extension data in a gig file in the first place. Images should probably be stored in the file system or external database and we just use filenames or database queries to reference them.

That’s a red herring. The mechanism I was proposing would just save a string (defined as a C null-terminated type) with a name. The C++ wrapper would provide the overloading mechanisms so you could save and restore values of different types. That’s just for convenience.

Assuming that serialization is optional, that would be fine. A developer who just needs to store a few integer and floating point values (say), shouldn’t have to go to the trouble of serializing stuff.

However, with all of the above taken into consideration, I am now wondering whether we should even have such a dictionary. It might be better to use the same mechanism as the host uses for dealing with plugin state. So if we use that model, then the extension does not explicitly save or load state. Instead, it responds to two requests

getState - the host will call this to request the state from the extension. Typically the host will do this when the user requests that the gig file be saved
restoreState - the host will call this, passing in previously saved state and the extension can then deal with it any way it wants.

The “state” can either be a blob (i.e, just a pointer to some data along with a length, typically implemented by JUCE using the MemoryBlock type) or it could just be a string. The string could just be a set of name=value pairs (like the old INI file stuff) or it could represent an XML document. The host won’t care.

The question then becomes whether to store this in the gigfile or to use a separate file.

Thoughts?

simon · September 5, 2023, 8:13pm

A separate file that is still 1:1 related to the gig file or just any external file, irregardless of what gig is loaded?

if 1:1 related to a gig file, how would we handle copying/moving the gig file?
if it’s just any external file, why would extensions need to go through Gig Performer instead of accessing the file system themselves?

Storage inside the gig file seems sensible to me.

dhj · September 5, 2023, 8:27pm

And that’s another good question I already wondered about. I can easily imagine that there would be a need to save state that is independent of particular gig files. So in fact, there might need to be two files involved (if we’re not storing inside the gig file), one for global settings.

It may be that the right way to organize this stuff is to have a folder hierarchy under the Gig Performer top-level folder (where we store other stuff) and then there would be a folder per extension and inside that folder would be a “global” settings file and then individual files with the same names as gigfiles.

simon · September 5, 2023, 8:55pm

I like the idea of having a subfolder per extension as the canonical location to put any global data

Wouldn’t that system break if you have two gigfiles with the same name (but in different folders, e.g. ~/project-1/final-show.gig and ~/project-2/final-show.gig )?

As far as I see:

Arguments for storing gig-specific data in a separate file

The gig file itself stays small(er).
Users can see from the file system which extensions store data for a specific gig (and how much), as opposed to a management dialog in GP which would need to be built for that.

Arguments for storing gig-specific data directly in the gig file

Everything users might want to do with files (copy, rename, move to a different computer, …) works just as expected.
Even though the gig files are larger by the amount of gig-specific extension data, the total amount of data needed on disk is still the same.

dhj · September 5, 2023, 9:01pm

That is not a good argument — the entire gigfile has to be loaded into Gig Performer — keeping the size down in RAM is relevant.