The shift from document-based to data-based discovery
In the legal world, there was a simpler time – a time before the “e” in e-discovery. When the Federal Rules of Civil Procedure were established in 1938, handwritten and typewritten documents were the primary methods for creating content. If you wanted to duplicate content, mimeographs and offset printing were the main machine options.
New technology continued to develop, and, with it, the discovery landscape and supporting processes matured. Word processing and laser printing are major examples of progress, but it did not stop there. Automated OCR, machine coding and deduplication have all been added to the toolbox. Now you can even use automation to categorize data. All of these have strengthened the capabilities of attorneys to navigate through the discovery process and into trial.
Today, discovery is filled with automation. We can extract metadata, visualize data trends and cull information. Additionally, many technology solutions have been designed to hand human tasks off to machines. Graphic images and audio files can be converted to searchable text without manually transcribing the content. The goal is to make the process of discovery more efficient and less expensive, and to allow attorneys to paint a more detailed picture of the facts.
What is the next evolution? What will the next technological breakthrough be?
Today, we are starting to see a shift that is less about needing the shiny new object and more about leveraging tools and data that already exist in novel ways. Achieving efficiencies and cost reductions remain important, but there’s a greater emphasis on being informed in ways that can drive the development of strong case strategies. In short, we believe that the shift from documents and content to data-based discovery and the search for context is underway.
A History of Shifts
In just the last 20 years, we have seen major shifts as the industry has almost exclusively migrated from paper documents to electronically stored information (ESI). With paper documents, we considered only the content within the four corners of the page. With ESI and the tools that accompany it, we study metadata and the context within electronic documents, emails, text messages and more. We are thinking less about the number of gigabytes and more about the diversity and scope of “Big Data.”
Leading this shift to data-based discovery are attorneys who seek to understand what information sources exist, why these sources are created, how they are used, what value they contain and what they contribute to context. These attorneys understand that while users may generate files, the systems that interact with these files may automatically create information as useful, sometimes even more useful, than the content itself. They also recognize that new systems humans interact with as part of their jobs (e.g., sensory systems in automobiles) are creating both active and passive data. These data components are key tools that help counsel recognize and leverage context and build a stronger narrative.
This shift in focus from content to context requires legal professionals to take a more targeted approach to the discovery process. And a targeted approach does not necessarily mean narrower. Data sources have increased, and so has the variety of data. It’s important to complete a comprehensive review of all data systems and create a data map in order to know where data lives, how it was created, when it moves and why it exists. Only then can these sources be effectively evaluated to see their potential contribution to context. With an understanding of contextual possibilities, a more detailed and informative data picture can be created.
To take advantage of this shift, it’s vital to know the potential types of data sources and where that data is located. For data-based discovery, we work with clients to focus on three major categories of data:
- Structured data
- Mobile data
- Internet of things (IoT) data
Computer users interact with structured data and its repositories daily: from logging into a computer, to sending an email, to clocking in and out of work, to opening a file in your document management system, to searching in your SharePoint portal. Additionally, structured data systems are used beyond individual computer connections. Consider your daily financial activity. When you use a credit or debit card, a virtual wallet or a platform like Paypal, you are accessing and creating structured data.
The transactions are made possible by, and saved in, databases. A database is a context wonderland, full of data descriptions and relationships. When this context is overlaid with email or instant messages sent, a skilled legal professional can sometimes use it all to gauge the intent, state-of-mind or disposition of a computer user. For example, if you know that one of your employees parked near a competitor’s building, and you know that he or she sent text messages to that competitor, the combination of the two together adds context.
Mobile devices generate massive amounts of data and also contain multiple layers of information and detail. There are apps for messaging, phone calls, photos, games, directions, currency exchange, encrypted communications and more. Mobile devices interface with email and social media accounts while connecting to signal towers and generating geo-location data. The device in your hand is constantly interacting with structured data. Some of it is stored on the device, while other silos reside in the cloud. That data is rich with both content and context. The context offered might include location, timing and activity.
Internet of Things (IoT) Data
IoT data is now everywhere. Fitness trackers, home automation devices, environmental controls, automobiles, traffic monitors, artificial intelligence devices, media players and a host of other electronic items all contain useful, and contextual, data. Many of these connect to apps over networks, gathering data that may be retrieved and analyzed when needed. When you ask Alexa, Siri or Google a question, structured data is the engine driving the process. And don’t assume that it is all in the same place.
With the Amazon Echo, your recorded voice commands are converted to electronic queries. These lead to computer-generated responses or actions. This data is parsed and stored in multiple locations, including the device itself, your smartphone or tablet, and the cloud. When all is said and done, you have structured data, mobile data and IoT data stored in disparate locations – all generated from a single verbal command. These systems leverage structured data to answer your questions. Even the questions themselves are stored and may prove useful when a case arises.
These three major categories of data are needed to complete a Big Data picture. And more than ever, attorneys are able to use the information gained from these for greater contextual detail.
Building the Data Picture
The process of determining what data exists and where it is stored is integral to understanding the potential context. This information will inform collection and downstream analytic strategies. While physical bit-for-bit or logical collections of computer hard drives are not always required, data must be collected defensibly. Additionally, preserving and analyzing key databases or IoT sources may be preferred, since crucial data needed for context can be almost anything, including GPS coordinates, on/off commands, audit logs or time/date stamps.
To guide our clients through the identification of these information sources, we emphasize the need to map systems and data within each source. During the mapping process, we can identify all systems, learn data initiation and storage points, and understand how the systems interact. Properly using data often requires an evaluation of what happens to it during its lifecycle. We also want to understand how data is named, described and categorized within each structured data system.
As we have discussed, data is now collected from many different sources. There must be a process to normalize the metadata from each. For analysis purposes, metadata fields from different sources with different formats and field names must be moved into an environment with a common format. Aggregating the data this way allows for the integration and comparison of frequently found key data points, such as time, date and location. This common format must also accommodate other nuanced data elements, such as volume, action, change, repetition, device commands and environmental states.
Once the process of aggregating the data is completed, the complex data from different sources now exists in a single framework, helping us see the story it tells more clearly and concisely. Now that the data sources have been identified, mapped, consolidated and analyzed, reports and visualizations can quickly be generated.
One unique visualization that iDS can prepare to aid in adding context is the rendering of collected text messages in their native format, which can even be imported into kCura’s Relativity review software as a fully searchable document.
Additionally, concepts such as sentiment analysis (understanding the meaning of, and/or intent behind, communications) can be realized when disparate data groups are combined. This allows for patterns and inconsistencies to stand out for each individual.
In one scenario, you might seek to compare an employee’s actual activities to expected or purported activities. We used data extracted from our proprietary xIoT system to create the graph below, which overlays system login/boot-up time stamps, GPS coordinates, invoicing, order fulfillment activities and time cards to achieve this comparison. For an on-the-clock/off-the-clock dispute, visualizations of what employees actually did could bolster and identify a class, or dissolve one completely.
For another scenario using multiple data points, custom visualizations and sentiment analysis, start by building a timeline graph showing emails, text messages, stock trade requests and fulfillment activities, along with the timing of each (from data time stamps).
If you then analyze the sentiment in the emails and texts, you may be able to parse negative and positive intent. Layering that analysis with the timeline could allow you to focus your investigation on key actors and build effective case strategies for a more positive outcome.
The Bottom Line
What data should you collect? Which sources should be evaluated? What narrative does the data support? It’s almost impossible to answer these questions before going through the process of recognizing your sources. Taking a focused approach to locate and understand all the data at your disposal will help bring invaluable context front and center.
Once data is aggregated, an experienced data analytics team can help you can identify the most appropriate case strategy, leverage the right data for that strategy and compile the necessary visualizations and reports. Prioritizing context over content can help your team decide to fight, or move to settle, before a document request is ever fulfilled.