11 May 2012 at 15:49 BST

Code breakers

Electronic document searching and review is often the bane of modern litigation. Nick Patience considers an emerging technique designed to make the process more efficient

The amount of information law firms must process is growing rapidly. Increasing volumes of complex digital data are being created and stored – and much of it is in unstructured and difficult-to-search formats.
In hearings such as the ongoing and highly controversial Leveson Enquiry into press standards in the UK – in which the available evidence is largely contained in millions of individual digital documents such as contracts, letters, meeting notes, emails and instant messages – the traditional approach of employing an army of paralegals to review the documents is no longer practical. It would take for ever to conduct keyword searches, make an initial appraisal and present a manageable selection of information to senior counsel for review and subsequent presentation to the court.

Human error

The problems are obvious. The time and cost implications are dramatic, human error is inherent and the chance of missing critical documents is very high.
It is for all of those reasons that many in the legal sector view a recent New York Court opinion from Judge Andrew Peck as a landmark. The judge ‘officially’ sanctioned the use of a predictive coding workflow when he said: ‘Computer-assisted review now can be considered judicially approved for use in appropriate cases.’ Predictive coding – a particularly advanced element of computer-assisted review – was already emerging as the preferred approach for document analysis and review, and this opinion will undoubtedly expedite the process. 
Contrary to some perceptions, a predictive coding workflow does not remove people from the review process. Usually a senior lawyer directly involved in the case trains the computer program by providing initial instructions at the start of the process to ‘seed’ the software. Rather than simply identifying specific keywords and phrases, concept search technology such as predictive coding understands the meaning of documents and recognises patterns within them.
For example, while a keyword-based system will always group together all documents containing the word ‘calypso’, a machine-learning system will instead automatically understand that calypso has multiple meanings (an environmental satellite, a style of music, a sea goddess and an ice cream) and will group the documents according to each one.
Documents are then ranked according to likely relevance. Those at the top of the pile can be directly reviewed by a senior lawyer – rather than a less knowledgeable paralegal. When lawyers are relying on technology to find all documents relevant to a particular topic – no matter what specific keyword is used – this difference becomes absolutely critical.
In addition, none of the original documents is discarded – and the instructions can be amended to narrow, broaden or refine the search criteria according to the initial results.

Mountains of data

Of course, there are different workflows and types of technology that can be used to conduct predictive coding. Seed sets can be constructed in different ways, confidence intervals can be different, and sampling can be used at one or several stages of the process.
No approach to searching mountains of digital data will ever be perfect. However, predictive coding has already proved itself in operation. It is much faster and more cost effective than other approaches. Human error is reduced to the minimum. Perhaps most importantly, the accuracy of retrieval also exceeds that of manual, human review.
A typical, non-predictive coding review will miss approximately 80 per cent of relevant documents. Like any approach, predictive coding should be benchmarked against other available approaches, not perfection, especially when the default approach for many already misses at least one out of every four documents.

Fear of machines

It is widely accepted that some form of computer-assisted review is used in the analysis of legal documents, from rudimentary techniques such as date range filtering to more sophisticated methods, including clustering and phrase extraction. But an absence of real understanding and perhaps a fear that machines will fully replace human review, has delayed the acceptance of the most advanced techniques, including predictive coding.
In fact, predictive coding provides a proportionate and pragmatic approach to document review. Whereas senior lawyers do not usually interest themselves in document searches until late in the process, predictive coding gets them involved both at the start and at the end, enabling them to get on with case work in the meantime, and providing a more transparent and open system for all.

Nick Patience is a senior market analyst at Recommind, a global designer of predictive information software

Also read...

Corporate litigation heats up

Climate change litigation poses growing threat to companies' profitability, warns law firm partner.