Skip to end of metadata
Go to start of metadata

Jhove is a project from Harvard for automatic description of various content types. It has a relationship to JSTOR and Portico. More information on these relationships can be found at the Portico Communities page.

VTLS has a SOP interface for Jhove, which I have installed for testing on Rhyme.

General Jhove documentation is available at the main Jhove site.

My focus for Jhove was in automatically extracting a MIX MD record from images... A sample of my testing code can be seen here:

	// ------- ------- ------- ------- -------
	// Jhove requires this:
	App app = new App("", "", new int[] {0, 0, 0}, "", "");
	JhoveBase je = new JhoveBase();
	je.setLogLevel("ERROR");

	// ------- ------- ------- ------- -------
	// The configuration can be from properties or standalone:
	String configFile = JhoveBase.getConfigFileFromProperties();
	configFile = "jhove\\jhove.conf";
	String saxClass = JhoveBase.getSaxClassFromProperties();
	
	je.init(configFile, saxClass);
        
	// ------- ------- ------- ------- -------
	// This is an output location for the generated XML:
	File f = File.createTempFile("mix-", ".tmp");
	String file = f.getPath();

	Module module = null;
	String tmp = filename.toLowerCase();  
	if(tmp.endsWith(".jpg") || tmp.endsWith(".jpeg"))
	{
		module = je.getModule("JPEG-hul");
	}
	else if(tmp.endsWith(".tif") || tmp.endsWith(".tiff"))
	{
		module = je.getModule("TIFF-hul");
	}
	else if(tmp.endsWith(".gif"))
	{
		module = je.getModule("GIF-hul");
	}
	else if(tmp.endsWith(".pdf"))
	{
		module = je.getModule("PDF-hul");
	}
	// TODO: determine proper extension:
	else if(tmp.endsWith(".jpeg2000"))
	{
		module = je.getModule("JPEG2000-hul");
	}
        
	OutputHandler handler = je.getHandler("XML");
        
	// ------- ------- ------- ------- -------
	// This processes the input(s), and puts the output in a file (or wherever):
	je.dispatch(app, module, handler, handler, file, new String[] {filename});

	// ------- ------- ------- ------- -------
	// The rest of this is post-processing the XML for my use:
	TransformerFactory factory = TransformerFactory.newInstance();
	Templates pss = factory.newTemplates(new StreamSource(new File("jhove\\mix.xsl")));
	Transformer transformer = pss.newTransformer();

	DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
	DocumentBuilder builder = domFactory.newDocumentBuilder();
	Document document = builder.parse(f);
    	
	DOMSource src = new DOMSource(document);
	DOMResult dst = new DOMResult();
	transformer.transform(src, dst);
    	
	Node node = dst.getNode();

	// ------- ------- ------- ------- -------
	// I'm sure this could be improved, but basically rip the desired content
	// from the XML via XPath
	XPathFactory xpFactory = XPathFactory.newInstance();
	XPath xp = xpFactory.newXPath();
	xp.setNamespaceContext(new MetadataNamespaceContext());
    	
	setMimeType(xp.evaluate("/mix:mix/mix:BasicImageParameters/mix:Format/mix:MIMEType/text()", node));
	setByteOrder(xp.evaluate("/mix:mix/mix:BasicImageParameters/mix:Format/mix:ByteOrder/text()", node));
	setCompressionScheme(xp.evaluate("/mix:mix/mix:BasicImageParameters/mix:Format/mix:Compression/mix:CompressionScheme/text()", node));
	setColorSpace(xp.evaluate("/mix:mix/mix:BasicImageParameters/mix:Format/mix:PhotometricInterpretation/mix:ColorSpace/text()", node));
	setStripOffsets(xp.evaluate("/mix:mix/mix:BasicImageParameters/mix:Format/mix:Segments/mix:StripOffsets/text()", node));
	setRowsPerStrip(xp.evaluate("/mix:mix/mix:BasicImageParameters/mix:Format/mix:Segments/mix:RowsPerStrip/text()", node));
	setStripByteCounts(xp.evaluate("/mix:mix/mix:BasicImageParameters/mix:Format/mix:Segments/mix:StripByteCounts/text()", node));
	setPlanarConfiguration(xp.evaluate("/mix:mix/mix:BasicImageParameters/mix:Format/mix:PlanarConfiguration/text()", node));
	setOrientation(xp.evaluate("/mix:mix/mix:BasicImageParameters/mix:File/mix:Orientation/text()", node));
	setSamplingFrequencyUnit(xp.evaluate("/mix:mix/mix:ImagingPerformanceAssessment/mix:SpatialMetrics/mix:SamplingFrequencyUnit/text()", node));
	setXSamplingFrequency(xp.evaluate("/mix:mix/mix:ImagingPerformanceAssessment/mix:SpatialMetrics/mix:XSamplingFrequency/text()", node));
	setYSamplingFrequency(xp.evaluate("/mix:mix/mix:ImagingPerformanceAssessment/mix:SpatialMetrics/mix:YSamplingFrequency/text()", node));
	setImageWidth(xp.evaluate("/mix:mix/mix:ImagingPerformanceAssessment/mix:SpatialMetrics/mix:ImageWidth/text()", node));
	setImageLength(xp.evaluate("/mix:mix/mix:ImagingPerformanceAssessment/mix:SpatialMetrics/mix:ImageLength/text()", node));
	setBitsPerSample(xp.evaluate("/mix:mix/mix:ImagingPerformanceAssessment/mix:Energetics/mix:BitsPerSample/text()", node));
	setSamplesPerPixel(xp.evaluate("/mix:mix/mix:ImagingPerformanceAssessment/mix:Energetics/mix:SamplesPerPixel/text()", node));

I've attached a sample Jhove output, as well as the MIX XSL and the Jhove configuration I referred to in the code sample.

  • No labels