A document database full of records is data. A document database where every record is consistently classified, reliably described, and accurately coded according to a documented protocol is evidence infrastructure.

What Document Coding Is

Document coding is the process of classifying, describing, and tagging records in a document collection according to a defined protocol. It determines how each record is named, dated, categorized by document type, assigned to relevant issues, and linked to relevant people, places, events, and institutions.

Coding is what makes a collection searchable. Without consistent coding, a team cannot reliably find all records related to a specific issue, date range, person, or event. Searches return incomplete results, and the database cannot be relied on.

Why Coding Matters for Litigation

In litigation, coding determines whether the collection can support analysis and production. If coding is inconsistent (different coders applying different naming conventions, different date formats, different document-type classifications), the collection becomes unreliable as an evidence system.

Inconsistent coding means inconsistent search results. Inconsistent search results mean that records relevant to the file may be missed, misclassified, or misrepresented. This creates risk: not from what the records say, but from how they were organized.

The Coding Protocol

A coding protocol is the documented set of rules that governs how records in the collection are coded. It defines conventions for dates, names, document types, issue categories, metadata fields, and any collection-specific classifications.

The coding protocol should be documented in a coding manual: a reference that every coder follows, every QA reviewer checks against, and every team member can consult. Without a coding manual, coding decisions are made by individual judgment rather than by standard, and individual judgment varies.

Common Coding Problems

Coding problems are often invisible until someone tries to rely on the database. Common problems include:

  • Date formats that vary between coders or across sections of the database
  • Name spellings that are inconsistent, making person searches unreliable
  • Document-type classifications that are vague, overlapping, or undefined
  • Issue coding that reflects one coder's interpretation rather than a defined standard
  • Missing metadata fields that make records unsearchable by key criteria
  • No audit trail showing who coded what and when

From Data to Evidence

The difference between data and evidence is reliability. Data is what exists in the database. Evidence is what the team can rely on: records that are consistently coded, reliably searchable, and documented well enough that the coding methodology can be explained under scrutiny.

Building that reliability requires a coding protocol, a coding manual, consistent application, and quality assurance review. It is not automatic. It is a process.

A database full of records is not an evidence system. It becomes one when every record is coded consistently, according to a documented protocol, and verified through quality assurance.