Reading Image Data with Metadata Extractor

Over the past decade, digital photography has grown by leaps and bounds. When I started a company in the photo album industry ten years ago, my partners were scanning negatives and working with 30MB raw image files until they switched over to shooting digitally. One of the powerful things that happened a few years ago was that metadata was added to digital images that allowed you to record more information about the picture. Not only are you able to extract the data that a picture was taken, but you can extract information like the camera that was used, the focal distance, the F-Stop used, and so forth. There are several libraries to help you extract this information and this article focuses on one that I found to be popular in my searches: Drew Noakes ETADATA XTRACTOR, also known as "metadata-extractor".

Setup

You can download the JAR file directly from the Google Code Repository or if you are using Maven you can simply add the dependency to your POM file:

<dependency>
   <groupId>com.drewnoakes</groupId>
   <artifactId>metadata-extractor</artifactId>
   <version>2.4.0-beta-1</version>
</dependency>

Additionally, the JavaDoc is hosted online on Google Code's website and can be accessed here.

Example

Nothing helps demonstrate the usage of a technology than a "Hello, World" program, so in this example we're going to open an image file and display all of the information that we can obtain from it. Additionally, we will explicitly extract the date that a picture was taken so that we can organize pictures by year and month.

Listing 1 shows our sample, which opens an image file, obtains its metadata, and then displays all of the tags available for that image.

Listing 1. Test.java

package com.geekcap.photoorganizer;

import com.drew.imaging.ImageMetadataReader;
import com.drew.metadata.Directory;
import com.drew.metadata.Metadata;
import com.drew.metadata.Tag;
import com.drew.metadata.exif.ExifDirectory;

import java.io.File;
import java.text.DateFormat;
import java.util.Calendar;
import java.util.Date;
import java.util.Iterator;

public class Test
{
    public static void main( String[] args )
    {
        if( args.length == 0 )
        {
            System.out.println( "Usage: Test <image-file>" );
            System.exit( 0 );
        }
        
        String filename = args[ 0 ];
        System.out.println( "Filename: " + filename );

        try
        {
            File jpgFile = new File( filename );
            Metadata metadata = ImageMetadataReader.readMetadata( jpgFile );

            // Read Exif Data
            Directory directory = metadata.getDirectory( ExifDirectory.class );
            if( directory != null )
            {
                // Read the date
                Date date = directory.getDate( ExifDirectory.TAG_DATETIME );
                DateFormat df = DateFormat.getDateInstance();
                df.format( date );
                int year = df.getCalendar().get( Calendar.YEAR );
                int month = df.getCalendar().get( Calendar.MONTH ) + 1;

                System.out.println( "Year: " + year + ", Month: " + month );

                System.out.println( "Date: " + date );

                System.out.println( "Tags" );
                for(Iterator i = directory.getTagIterator(); i.hasNext(); )
                {
                    Tag tag = ( Tag )i.next();
                    System.out.println( "\t" + tag.getTagName() + " = " + tag.getDescription() );

                }
            }
            else
            {
                System.out.println( "EXIF is null" );
            }

        }
        catch( Exception e )
        {
            e.printStackTrace();
        }
                
    }
}

This yields the following output for my sample image file:

Filename: /media/FC30-3DA9/DCIM/100CANON/IMG_9980.JPG
Year: 2012, Month: 3
Date: Sun Mar 25 00:51:33 EDT 2012
Tags
	Make = Canon
	Model = Canon EOS REBEL T1i
	Orientation = Top, left side (Horizontal / normal)
	X Resolution = 72 dots per inch
	Y Resolution = 72 dots per inch
	Resolution Unit = Inch
	Date/Time = 2012:03:25 00:51:33
	Artist = 
	YCbCr Positioning = Datum point
	Copyright = 
	Exposure Time = 0.02 sec
	F-Number = F4.5
	Exposure Program = Unknown program (0)
	ISO Speed Ratings = 100
	Exif Version = 2.21
	Date/Time Original = 2012:03:25 00:51:33
	Date/Time Digitized = 2012:03:25 00:51:33
	Components Configuration = YCbCr
	Shutter Speed Value = 1/49 sec
	Aperture Value = F4.6
	Exposure Bias Value = 0 EV
	Metering Mode = Multi-segment
	Flash = Flash did not fire, auto
	Focal Length = 18.0 mm
	User Comment = 
	Sub-Sec Time = 74
	Sub-Sec Time Original = 74
	Sub-Sec Time Digitized = 74
	FlashPix Version = 1.00
	Color Space = sRGB
	Exif Image Width = 4752 pixels
	Exif Image Height = 3168 pixels
	Focal Plane X Resolution = 447/2376000 inches
	Focal Plane Y Resolution = 593/3168000 inches
	Focal Plane Resolution Unit = Inches
	Custom Rendered = Normal process
	Exposure Mode = Auto exposure
	White Balance = Auto white balance
	Scene Capture Type = Standard
	Compression = JPEG (old-style)
	Thumbnail Offset = 10316 bytes
	Thumbnail Length = 16445 bytes
	Thumbnail Data = [16445 bytes of thumbnail data]

As you can surmise, we have a Canon Rebel T1li and this picture was show on 3/25/2012. It is pretty incredible that each image knows so much information, but as you can see, all of this information is available. Furthermore, the extract-metatdata project provides us access to all of this information.

Reading an image file using the Extract MetaData utility involves the following steps:

  1. Create a java.io.File object that references the image file
  2. Obtain a com.drew.metadata.Metadata instance for that file, using the ImageMetadataReader
  3. Obtain one of the available Directories present in the Metadata object - in this example we asked for the ExifDirectory
  4. Obtain the specific value you want by executing one of the Directory methods (e.g. getDate(), passing it the tag that represents the date
  5. Alternatively, iterate over all of the Directory's Tags and obtain the Tag's name and description

If the code looks a little messy to you, don't worry, I felt the same way. The Extract Metadata website examples are cleaner and use generic collections, but, at least for the latest version of the library in the public Maven repository (2.4.0-beta-1), those methods are not available. The alternative, however, is not too cumbersome: obtain and use a TagIterator, like would do in pre-JavaSE 5.0 days, to iterate over all tags contained in the Directory.

My primary goal in using this library was to obtain the date that a picture was taken (so that I can programmatically sort pictures by month and year), so the example demonstrates how to extract the picture's date. From the ExifDirectory, I executed the getDate() method, passing it an ExifDirectory.TAG_DATETIME value, which returns a java.util.Date instance. I then used a DateFormat class to obtain a Calendar with which to extract the year (Calendar.YEAR) and month (Calendar.MONTH) that the picture was taken. With these values I am free to create a year directory (e.g. 2012), create month subdirectories (e.g. 01, 02, ..., 12), and then copy pictures to those directories. This is an inexpensive solution if you want fine-grained control over programmatically sorting your pictures.

This code in listing 1 is all built from the POM file shown in listing 2.

Listing 2. pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" 
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
         http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.geekcap</groupId>
    <artifactId>photoorganizer</artifactId>
    <version>1.0-SNAPSHOT</version>
    <packaging>jar</packaging>

    <name>photoorganizer</name>
    <url>http://maven.apache.org</url>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>2.0.2</version>
                <configuration>
                    <source>1.6</source>
                    <target>1.6</target>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-jar-plugin</artifactId>
                <configuration>
                    <archive>
                        <manifest>
                            <addClasspath>true</addClasspath>
                            <classpathPrefix>lib/</classpathPrefix>
                            <mainClass>com.geekcap.photoorganizer.Test</mainClass>
                        </manifest>
                    </archive>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-dependency-plugin</artifactId>
                <executions>
                    <execution>
                        <id>copy</id>
                        <phase>install</phase>
                        <goals>
                            <goal>copy-dependencies</goal>
                        </goals>
                        <configuration>
                            <outputDirectory>${project.build.directory}/lib</outputDirectory>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

        </plugins>
    </build>

    <dependencies>

        <dependency>
            <groupId>com.drewnoakes</groupId>
            <artifactId>metadata-extractor</artifactId>
            <version>2.4.0-beta-1</version>
        </dependency>

        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.6</version>
            <scope>test</scope>
        </dependency>
    </dependencies>
</project>

Summary

As photography, and specifically digital photography, has evolved, digital images have been extended to contain more and more information. For example, one of the more popular standards, the EXIF standard, will allow you to obtain simple information such as when a picture was taken, but also provides information such the camera make and model, the F-Stop that was used, whether or not a flash was used, and so forth. The Extract Metadata library is a library that allows you to easily extract this metadata from an image file.

In this article I demonstrated how to use the Extract Metadata library to extract the EXIF data from an image file, including both how to extract the data generically as well as how to explicitly extract the date when a picture was taken.