#usernavbar()

Jhove

DIDO -- Digital Images Delivered Online DLC -- Digital Library of the Commons Hohenberger Photographs Lilly Sheet Music U.S. Steel Cushman Victorian Women Writers Project Variations2 Wright American Fiction Project Hoagy Carmichael Collection
Skip to end of metadata
Go to start of metadata
Digital Library Infrastructure

Jhove is a project from Harvard for automatic description of various content types. It has a relationship to JSTOR and Portico. More information on these relationships can be found at the Portico Communities page.

VTLS has a SOP interface for Jhove, which I have installed for testing on Rhyme.

General Jhove documentation is available at the main Jhove site.

My focus for Jhove was in automatically extracting a MIX MD record from images... A sample of my testing code can be seen here:

	// ------- ------- ------- ------- -------
	// Jhove requires this:
	App app = new App("", "", new int[] {0, 0, 0}, "", "");
	JhoveBase je = new JhoveBase();
	je.setLogLevel("ERROR");

	// ------- ------- ------- ------- -------
	// The configuration can be from properties or standalone:
	String configFile = JhoveBase.getConfigFileFromProperties();
	configFile = "jhove\\jhove.conf";
	String saxClass = JhoveBase.getSaxClassFromProperties();
	
	je.init(configFile, saxClass);
        
	// ------- ------- ------- ------- -------
	// This is an output location for the generated XML:
	File f = File.createTempFile("mix-", ".tmp");
	String file = f.getPath();

	Module module = null;
	String tmp = filename.toLowerCase();  
	if(tmp.endsWith(".jpg") || tmp.endsWith(".jpeg"))
	{
		module = je.getModule("JPEG-hul");
	}
	else if(tmp.endsWith(".tif") || tmp.endsWith(".tiff"))
	{
		module = je.getModule("TIFF-hul");
	}
	else if(tmp.endsWith(".gif"))
	{
		module = je.getModule("GIF-hul");
	}
	else if(tmp.endsWith(".pdf"))
	{
		module = je.getModule("PDF-hul");
	}
	// TODO: determine proper extension:
	else if(tmp.endsWith(".jpeg2000"))
	{
		module = je.getModule("JPEG2000-hul");
	}
        
	OutputHandler handler = je.getHandler("XML");
        
	// ------- ------- ------- ------- -------
	// This processes the input(s), and puts the output in a file (or wherever):
	je.dispatch(app, module, handler, handler, file, new String[] {filename});

	// ------- ------- ------- ------- -------
	// The rest of this is post-processing the XML for my use:
	TransformerFactory factory = TransformerFactory.newInstance();
	Templates pss = factory.newTemplates(new StreamSource(new File("jhove\\mix.xsl")));
	Transformer transformer = pss.newTransformer();

	DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
	DocumentBuilder builder = domFactory.newDocumentBuilder();
	Document document = builder.parse(f);
    	
	DOMSource src = new DOMSource(document);
	DOMResult dst = new DOMResult();
	transformer.transform(src, dst);
    	
	Node node = dst.getNode();

	// ------- ------- ------- ------- -------
	// I'm sure this could be improved, but basically rip the desired content
	// from the XML via XPath
	XPathFactory xpFactory = XPathFactory.newInstance();
	XPath xp = xpFactory.newXPath();
	xp.setNamespaceContext(new MetadataNamespaceContext());
    	
	setMimeType(xp.evaluate("/mix:mix/mix:BasicImageParameters/mix:Format/mix:MIMEType/text()", node));
	setByteOrder(xp.evaluate("/mix:mix/mix:BasicImageParameters/mix:Format/mix:ByteOrder/text()", node));
	setCompressionScheme(xp.evaluate("/mix:mix/mix:BasicImageParameters/mix:Format/mix:Compression/mix:CompressionScheme/text()", node));
	setColorSpace(xp.evaluate("/mix:mix/mix:BasicImageParameters/mix:Format/mix:PhotometricInterpretation/mix:ColorSpace/text()", node));
	setStripOffsets(xp.evaluate("/mix:mix/mix:BasicImageParameters/mix:Format/mix:Segments/mix:StripOffsets/text()", node));
	setRowsPerStrip(xp.evaluate("/mix:mix/mix:BasicImageParameters/mix:Format/mix:Segments/mix:RowsPerStrip/text()", node));
	setStripByteCounts(xp.evaluate("/mix:mix/mix:BasicImageParameters/mix:Format/mix:Segments/mix:StripByteCounts/text()", node));
	setPlanarConfiguration(xp.evaluate("/mix:mix/mix:BasicImageParameters/mix:Format/mix:PlanarConfiguration/text()", node));
	setOrientation(xp.evaluate("/mix:mix/mix:BasicImageParameters/mix:File/mix:Orientation/text()", node));
	setSamplingFrequencyUnit(xp.evaluate("/mix:mix/mix:ImagingPerformanceAssessment/mix:SpatialMetrics/mix:SamplingFrequencyUnit/text()", node));
	setXSamplingFrequency(xp.evaluate("/mix:mix/mix:ImagingPerformanceAssessment/mix:SpatialMetrics/mix:XSamplingFrequency/text()", node));
	setYSamplingFrequency(xp.evaluate("/mix:mix/mix:ImagingPerformanceAssessment/mix:SpatialMetrics/mix:YSamplingFrequency/text()", node));
	setImageWidth(xp.evaluate("/mix:mix/mix:ImagingPerformanceAssessment/mix:SpatialMetrics/mix:ImageWidth/text()", node));
	setImageLength(xp.evaluate("/mix:mix/mix:ImagingPerformanceAssessment/mix:SpatialMetrics/mix:ImageLength/text()", node));
	setBitsPerSample(xp.evaluate("/mix:mix/mix:ImagingPerformanceAssessment/mix:Energetics/mix:BitsPerSample/text()", node));
	setSamplesPerPixel(xp.evaluate("/mix:mix/mix:ImagingPerformanceAssessment/mix:Energetics/mix:SamplesPerPixel/text()", node));

I've attached a sample Jhove output, as well as the MIX XSL and the Jhove configuration I referred to in the code sample.

  • No labels