Page tree
Skip to end of metadata
Go to start of metadata

This page documents the change history of MGM schema formats (as implemented in Galaxy MGM code), MGM tool adapters, as well input/output data types. As we continue development, adding new features and fixing bugs, we may need to change the previously implemented schema formats, adapter code, and data types used by the MGMs. This could result in slightly different output formats and/or data types, which could impact the external system when they parse such outputs. The strategy is that, we will provide handler to parse all used formats and data types etc, so previously generated outputs won't become unusable.

On top of this, we hope to add versioning to the MGM tools to allow previously existing workflows referencing older version tools still be uploadable. As Galaxy allows only one active version per tool ID, such older version tools will be hidden from the tool panel, and users won't be able to use them in newly created workflows.

MGM Schema Changes

  • 2020-10-01: a typo in STT schema is fixed. Prior to this fix, the AMP STT schema implemented used "result" instead of "results" in the JSON file. This means all previously generated STT outputs of the AMP Transcript type will carry this typo. New AMP Transcript outputs from the date staging is upgraded will have this typo fixed.


MGM Data Type Changes

  • 2020-10-01: new data types are added to Galaxy:
    • av is now super type of audio and video
    • wav is now subtype of audio
    • music/speech is now subtype of wav
    • segments is changed to segment
    • transcript, ner, vocr, shot, face, vtt are added as subtype of json

Note

When a new AMP data type is added to Galaxy, and it has non standard extensions (such as video, music, speech, transcript, ner), we need to add those to the data type list that AMP manages (MediaServiceImpl), they will be converted to standard extension such as txt, html, json, wav etc. which browsers can handle. 

MGM Adapter Changes

  • 2020-10-01:
    • Tool sections are reorganized so that each section corresponds to a category listed in MGM page, and all tools belonging to that category is listed under that section
    • Tool labels and a few tool IDs, input/output labels and data types, as well as help info are added/updated to make the display more user friendly and informative.

MGM Tool Versioning Strategy

Until we investigate further on how to use Galaxy tool shed to do versioning, we use a manual versioning strategy as below:

  • If there're changes on a tool, but no changes on tool ID or input/output names, change the tool config xml directly and save it
  • Otherwise, we attach a version number to the existing tool ID, or increase the current latest by 1 (for ex, tool_id = aws_transcribe_1 and so on), and save the config to a new file, using the same filename plus the current version number as the new filename (for ex, aws_transcribe_1.xml)
  • Furthermore, if the new version also uses updated dependencies such as python script called by it, add the same version number to the python filename as well, and leave the original python file intact for the original tool.
  • In tool_conf.xml, replace the old tool xml file with the newest version, and add the original tool to the section "Obsolete MGMs".

With tool versioning, we guarantee that old tools and workflows will still function after update, so unfinished invocations can resume automatically after deploy, and can be rerun as well if desired. Meanwhile, the old versions are hidden from the tool panel, so users are not allowed to use them in newly created workflow or jobs, and they need to use the new version instead.

MGM Data Type Change Strategy

If we want to change file extension field on an existing data type, and this data type is already used by some existing datasets in Galaxy history, then precaution should be taken to avoid exceptions when old datasets are accessed:

  • Instead of modifying the existing data type, say D, add a new one data type say D', inherit the same parent as D.
  • Add a comment in D's class to indicate it's deprecated, and use D' instead. Also, add "Deprecated" to its label. Leave the file extension unchanged.
  • Overwrite D's sniffer to always return false, and remove D from tool_conf.xml sniffers section, so no new datasets will identified as the old one.
  • In the new data type D',  set file extension to the new value, and set label to the original label of D.
  • Define sniffer for D' (most likely, reuse same logic as the original sniffer of D), add D' to the sniffers section in tool+conf.xml.
  • Replace D with D' in tools that reference D.

This way, old datasets referencing the deprecated data type will still work, but new datasets will never be identified as the old one.

  • No labels