A load file helps load and organize information within e-discovery software so that the documents may be viewed, searched and filtered.
Attorneys dealing with electronic discovery (e-discovery) often produce electronic documents with “load files.” But, what is a load file? In short, it is a file that helps load and organize information within e-discovery software so that the documents may be viewed, searched and filtered.
When ESI (electronically stored information) is produced with load files, information for each document is contained in multiple files. The first is an image file which is exactly as it sounds–an image of the document. Often the image files are produced in .tiff (tagged image file format), but when files are converted to .tiff, they lose information such as textual content and metadata. Because this data is lost, also lost is the ability to search for information contained in the document. So that the documents may be searched after being loaded into e-discovery software, additional files are created containing the metadata and document contents. The load file then ties all the information together within the software by connecting the image files to the right text and metadata files.
There are multiple types of load files (Concordance, Relativity, Summation to name a few) but generally, load files are just delimited text or CSV files. Because a specific type of load file may be desired (i.e. one that is compatible with a particular type of e-discovery software), lawyers should consider specifying the preferred load file type directly in document requests served on a litigation opponent or subpoena recipient.
Of note, however, is that load files are often unnecessary because many document review platforms now ingest documents in their native form. This is often preferable because the files do not have to be converted into image files and there is no data loss. This makes document review and search much more efficient because, among other things, it makes files easier to filter and sort.
What Metadata Fields Should I Request?
To figure out what metadata to request in document productions, you should first ask the requesting party. Below are a common requirements for document productions.
IMAGES:
Produce documents in Single Page Group IV TIFF files
Image Resolution at least 300 DPI
Black and White unless color is necessary to understand the meaning
File Naming Convention: Match Bates Number
Insert Placeholder image for files produced in Native form (see Section 2)
Original document orientation shall be retained
FULL TEXT EXTRACTION / OCR:
Produce full extracted text for all file types of ESI (Redacted text will not be produced)
Production format: Single text file for each document, not one text file per page
File Naming Convention: Match Beg Bates Number
LOAD FILE SPECIFICATIONS:
Images Load File: Opticon OPT file
Metadata Load File: Concordance DAT file with field header information added as the first line of the file. Export using Concordance default delimiters.
Extracted TEXT: Reference File Path to TEXT file in DAT file
Native Files Produced: Reference File Path to Native file in DAT file
ESI PRODUCTION METADATA FIELDS:
BegBates: Beginning Bates Number
EndBates: Ending Bates Number
BegAttach: Beginning Bates number of the first document in an attachment range
EndAttach: Ending Bates number of the last document in attachment range
Custodian: Name of the Custodian of the File(s) Produced – Last Name, First Name format
FileName: Filename of the original digital file name
NativeLink: Path and filename to produced Native file
EmailSubject: Subject line extracted from an email message
Title: Title field extracted from the metadata of a non-email document
Author: Author field extracted from the metadata of a non-email document
From: From field extracted from an email message
To: To or Recipient field extracted from an email message
Cc: CC or Carbon Copy field extracted from an email message
BCC: BCC or Blind Carbon Copy field extracted from an email message
DateRcvd: Received date of an email message (mm/dd/yyyy format)
DateSent: Sent date of an email message (mm/dd/yyyy format)
DateCreated: Date that a file was created (mm/dd/yyyy format)
DateModified: Modification date(s) of a non-email document
Fingerprint: MD5 or SHA-1 has value generated by creating a binary stream of the file
ProdVolume: Identifies production media deliverable
ExtractedText: File path to Extracted Text/OCR File
Redacted: “Yes,” for redacted documents; otherwise, blank
PAPER DOCUMENTS METADATA FIELDS:
BegBates: Beginning Bates Number
EndBates: Ending Bates Number
BegAttach: Beginning Bates number of the first document in an attachment range
EndAttach: Ending Bates number of the last document in attachment range
Custodian: Name of the Custodian of the File(s) Produced – Last Name, First Name format
ProdVolume: Identifies production media deliverable
Another resource is the Department of Justice Standard Specifications for Production of ESI or the Security and Exchange Commission’s Data Delivery Standards. These specifications include a list of metadata fields requested by the DOJ and SEC in their cases. However, both are thorough lists and all the fields included may not be necessary for every case. However, it is a good reference and many e-discovery software products offer the DOJ and SEC metadata lists as load file templates.