Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

This page contain miscellaneous scripts the were used for data verification and correct during the process and in the aftermath of upgrading from R5 to R6. This scripts are for reference only and may not be applicable generally, but rather to serve as examples of how to work with the data in R5 and R6.

Missing Posters

After migration we noticed that some of our videos didn't have thumbnails or posters. We ran these scripts to check the status of these items in R6. We noticed they either didn't have derivatives, or they had a duration of 0. So we checked the status of these item and R5 and found that they had the same issues there.

Scripts I ran to determine state of problem masterfiles: 

# On Fedora 4 system, find migrated MasterFiles that don't have posters or durations 

mfs = [] 
durations = [] 
count = 0 
MasterFile.find_each({},{batch_size:5}) do |mf| 
  count += 1 
  mfs << mf.id unless mf.file_format == 'Sound' or mf.has_poster? 
  durations << mf.id if mf.duration.nil? 
  if count%200 == 0 
    puts " #{count} - #{mfs.count}, #{durations.count}" 
  else 
    print "." 
  end 
end 

# Get the Fedora 3 pids for those items 

s = MigrationStatus.where( f4_pid: mfs|durations ) 
s.each do |s| 
  f3s << s.f3_pid 
end 

f3s 

# On Fedora 3 system, see if those MasterFiles had derivatives/duration before migrating 

f3s.each do |id| 
  mf = MasterFile.find(id) 
  puts "#{id}: #{mf.duration} #{mf.derivatives.count}" 
end 

Permalink Collisions

For some reason, after migrating we had permalinks the pointed to more that one MasterFile. We ran this script to identify those collisions and then fix them.

require 'open-uri'
require 'nokogiri'
result = Nokogiri::XML(open("http://localhost:8983/solr/avalon/select?q=*&rows=0&facet=on&facet.field=identifier_ssim&facet.limit=-1&facet.mincount=2"))
collided_ids = result.xpath('//lst[@name="identifier_ssim"]/int/@name').collect(&:value)
collided_ids.each do |id|
  fields = ["id", "has_model_ssim", "system_create_dtsi", "system_modified_dtsi"]
  collided_docs = Nokogiri::XML(open("http://localhost:8983/solr/avalon/select?q=identifier_ssim:#{id}&fl=#{fields.join(',')}"))
  collided_docs.xpath('//doc').each do |doc|
    field_values = fields.collect {|f| {f.to_sym => doc.xpath("*[@name='#{f}']").text} }
    puts "#{id} -> (#{field_values.join(",")})"
  end
end


# use the above to generate a list of collisions, then munge the into this form:
# h={permalink_id1: [masterfile_id1, masterfile_id2], ... } 
# once your h hash looks good, pass it to split_mfs to correct collisions
def split_mfs h
  mo_cache = {}
  ids_cache = {}
  h.values.each do |vals|
    mf1, mf2 = vals
    m1 = MasterFile.find(mf1) rescue nil
    m2 = MasterFile.find(mf2) rescue nil
    good_mf = nil
    bad_mf = nil
    mo = nil
    print "#{m1.id} (1) / #{m2.id} (2): MediaObject "

    if m1.derivatives.count > 0
      good_mf = m1
      bad_mf = m2
    elsif m2.derivatives.count > 0
      good_mf = m2
      bad_mf = m1
    end

    if good_mf.present? and bad_mf.present?

      mo_id = good_mf.media_object_id || bad_mf.media_object.id
      mo_cache[mo_id] ||= MediaObject.find(mo_id) rescue nil
      mo = mo_cache[mo_id]

      if mo.present?
        print "#{mo_id} "
        ids_cache[mo_id] ||= mo.master_files.collect(&:id)
        if ids_cache[mo_id].include? good_mf.id
          puts " correctly associated with good mf #{good_mf.id}, deleting bad_mf #{bad_mf.id}"
          bad_mf.delete
        elsif ids_cache[mo_id].include? bad_mf.id
          mf_index = mo.ordered_master_file_ids.index bad_mf.id
          puts " dropping association with and deleting bad_mf #{bad_mf.id} at index #{mf_index}. Associating good_mf #{good_mf.id}"
          good_mf.media_object_id = mo_id
          good_mf.save!
          mo.ordered_master_files.delete_at( mf_index )
          mo.ordered_master_files.insert_at( mf_index, good_mf )
          mo.master_files -= [bad_mf]
          mo.save!
          bad_mf.delete
        else
          puts " not associated with either mf #{good_mf.id} or #{bad_mf.id}"
        end
      else
        puts " media_object not found"
      end
    else
      puts " derivatives not found "
    end
  end
end
  • No labels