Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

This page contain miscellaneous scripts the were used for data verification and correction during the process and in the aftermath of upgrading from R5 to R6. This scripts are for reference only and not applicable generally, but rather they serve as examples of how to work with the data in R5 and R6.

Missing Posters

After migration we noticed that some of our videos didn't have thumbnails or posters. We ran these scripts to check the status of these items in R6. We noticed they either didn't have derivatives, or they had a duration of 0. So we checked the status of these item and R5 and found that they had the same issues there.

Scripts I ran to determine state of problem masterfiles: 

# On Fedora 4 system, find migrated MasterFiles that don't have posters or durations 

mfs = [] 
durations = [] 
count = 0 
MasterFile.find_each({},{batch_size:5}) do |mf| 
  count += 1 
  mfs << unless mf.file_format == 'Sound' or mf.has_poster? 
  durations << if mf.duration.nil? 
  if count%200 == 0 
    puts " #{count} - #{mfs.count}, #{durations.count}" 
    print "." 

# Get the Fedora 3 pids for those items 

s = MigrationStatus.where( f4_pid: mfs|durations ) 
s.each do |s| 
  f3s << s.f3_pid 


# On Fedora 3 system, see if those MasterFiles had derivatives/duration before migrating 

f3s.each do |id| 
  mf = MasterFile.find(id) 
  puts "#{id}: #{mf.duration} #{mf.derivatives.count}" 

Permalink Collisions

For some reason, after migrating we had permalinks the pointed to more that one MasterFile. We ran this script to identify those collisions and then fix them.

require 'open-uri'
require 'nokogiri'
result = Nokogiri::XML(open("http://localhost:8983/solr/avalon/select?q=*&rows=0&facet=on&facet.field=identifier_ssim&facet.limit=-1&facet.mincount=2"))
collided_ids = result.xpath('//lst[@name="identifier_ssim"]/int/@name').collect(&:value)
collided_ids.each do |id|
  fields = ["id", "has_model_ssim", "system_create_dtsi", "system_modified_dtsi"]
  collided_docs = Nokogiri::XML(open("http://localhost:8983/solr/avalon/select?q=identifier_ssim:#{id}&fl=#{fields.join(',')}"))
  collided_docs.xpath('//doc').each do |doc|
    field_values = fields.collect {|f| {f.to_sym => doc.xpath("*[@name='#{f}']").text} }
    puts "#{id} -> (#{field_values.join(",")})"

# use the above to generate a list of collisions, then munge the into this form:
# h={permalink_id1: [masterfile_id1, masterfile_id2], ... } 
# once your h hash looks good, pass it to split_mfs to correct collisions
def split_mfs h
  mo_cache = {}
  ids_cache = {}
  h.values.each do |vals|
    mf1, mf2 = vals
    m1 = MasterFile.find(mf1) rescue nil
    m2 = MasterFile.find(mf2) rescue nil
    good_mf = nil
    bad_mf = nil
    mo = nil
    print "#{} (1) / #{} (2): MediaObject "

    if m1.derivatives.count > 0
      good_mf = m1
      bad_mf = m2
    elsif m2.derivatives.count > 0
      good_mf = m2
      bad_mf = m1

    if good_mf.present? and bad_mf.present?

      mo_id = good_mf.media_object_id ||
      mo_cache[mo_id] ||= MediaObject.find(mo_id) rescue nil
      mo = mo_cache[mo_id]

      if mo.present?
        print "#{mo_id} "
        ids_cache[mo_id] ||= mo.master_files.collect(&:id)
        if ids_cache[mo_id].include?
          puts " correctly associated with good mf #{}, deleting bad_mf #{}"
        elsif ids_cache[mo_id].include?
          mf_index = mo.ordered_master_file_ids.index
          puts " dropping association with and deleting bad_mf #{} at index #{mf_index}. Associating good_mf #{}"
          good_mf.media_object_id = mo_id
          mo.ordered_master_files.delete_at( mf_index )
          mo.ordered_master_files.insert_at( mf_index, good_mf )
          mo.master_files -= [bad_mf]
          puts " not associated with either mf #{} or #{}"
        puts " media_object not found"
      puts " derivatives not found "
  • No labels