Among its many benefits, the GDPR emphasises the importance of good data governance: how you create, edit, store, and manage data. You can reduce the risk of data breaches and other problems by minimising the data you save and making sure the data is correct. The right tools and practices embed good data governance across the entire document lifecycle.

The document lifecycle

The document lifecycle encompasses the journey every document takes, including the processes used to create, refine, deliver, and finally archive or destroy the document when a project or matter concludes.

Emails and documents are the most common data containers in most organisations.

Emails and documents (word processing, spreadsheet, and slide presentation files) constitute the most common data containers in most organisations. Including them in your data governance strategy improves productivity, reduces risk, and enhances effectiveness.

Data, Metadata, and the Document Lifecycle

When thinking about documents and data, consider them holistically: the data you can see in the body of documents; and the mostly hidden data around them—metadata—that every information system and application routinely adds and maintains.

The visible content coalesces when authors, editors, and collaborators work in them. Automation generates new documents and content. Integrations pull and update data from other sources like contact management systems to insert up-to-date details in the document body.

Additionaly, a wide variety of metadata gets added, changed, and deleted throughout the document lifecycle, for example:

  • operating systems—Windows, iOS, Android, Linux—maintain metadata to manage and secure documents;
  • practice management, case management, and document management systems maintain an extensive variety of metadata: history (version number and who worked on them when), location, automation, permissions, etc;
  • MS Office applications add and update metadata such as the last edit or print times, who edited the document, date/time stamps on comments and tracked changes, and more;
  • cameras embed metadata in images that records when and how a picture is taken, and where it is taken if location tracking is turned on.

Data doesn’t update itself

Applications manage some metadata automatically. Unfortunately significant metadata remains unchanged unless someone changes it manually.

Zombie data is a universal problem.

That can lead to zombie data lurking within documents. MS Word, Excel, PowerPoint don’t update built-in properties like Author or Company, for example.

Zombie data is a universal problem—especially whenever someone repurposes an old document to create a new one.

Zombie data can come back to bite you

People often repurpose old documents because rightly or wrongly they perceive it as easier than starting from a template or model document. They open an old (possibly ancient) document, save it as a new document in a new matter, and start amending.

That causes problems. The author has to find and amend/remove every instance of obsolete content. Often, old document body content slips through the cracks. Inevitably, unchanged metadata remains hidden but active. That zombie data can come back to bite you.

An example: staff at one company routinely repurposed documents based on a set of precedents created by a consultant; the Word built-in metadata listed that consultant as the “Author”. Users never thought to change the Word documents’ built-in properties. Every production document based on those precedents included that consultant’s name in the built-in Word metadata as the “Author”.

During an investigation involving the company, a government regulator checked the metadata on the documents they considered evidence. The precedent consultant’s name kept cropping up as the “Author” so the regulators roped them into the investigation as a person of interest. That consultant had nothing to do with the matter under investigation. Nevertheless, they and the company and the investigators wasted time and money discovering they were on a side track to nowhere.

Create cleanly

Repurposing documents is the slowest, most error-prone way to create a fresh new document. People do it because either they don’t have good templates or precedents, or those best-practice starting points are too hard to find or use.

Example: A lawyer who worked at a global firm told me the firm had a wide set of templates and precedents, and a fantastic collection of Word styles to use when formatting. But no one used those templates and styles because they didn’t integrate with how people worked.

Templates and precedents need to be available at the point of use where an author will expect to find them within the application whether you use a document automation solution or a manually navigated folder structure in a DMS or elsewhere. Somewhere on the Word, Excel, or PowerPoint ribbon is optimal.

That rule stands even for browser-based templating solutions: you can add a button to an MS Office ribbon that opens the web browser and navigates automatically to the template/precedent portal, so an author need not switch applications manually. Good data governance depends on compliance, and compliance increases when people encounter fewer clicks and screen hops.

Tips to create cleanly

Whether you enhance document creation with automation or not, some basic steps help enforce good data hygiene:

  • Make sure documents are well-formatted and easy-to-use.
  • Enable MS Office integration and automation features that guide users through filling in any blanks, and ideally pull in data stored in other systems like a practice management or contact management system to avoid having to retype existing data. If you use a document automation system, even better.
  • Run templates and model documents through a dedicated metadata cleaning application before publishing them.
  • Minimise the metadata saved in any template or precedent. MS Office built-in properties should usually be left blank. Very few (if any) applications use those built-in properties and no one in the real world relies on them. Word, Excel, and PowerPoint don’t update every built-in property when saving as a new document so it’s easy for those built-in properties to become out of date or inappropriate.
  • Include only the custom properties, document variables, bookmarks, and fields required by your document automation application, and leave them empty or with default values, ready for the automation to fill them in.

Important note: the “Document Inspector” metadata cleaning feature in MS Office applications is very limited, ignores important metadata types like Document Variables, and doesn’t allow exclusions to keep Custom Properties, Document Variables, and Fields commonly used by document automation. Without exclusions, a metadata cleaner deletes those helpful automation metadata containers, killing the automation that relies on them.

Proofread thoroughly

Spell checking helps but it’s not intelligent—it won’t flag mistakes like “their” vs “there” for example. Check documents regularly for accuracy and style.

In another article I review how automated proofreading tools reduce risk and boost productivity. Proofreading tools help users find missing or out-of-date information, and style, spelling, and punctuation problems. A checking tool needs to find and fix:

  • broken cross-references and citations
  • problems with names, dates, and terminology
  • formatting inconsistencies
  • numbering issues
  • editing mistakes
  • style and document structure problems

Share securely

Sharing documents safely and securely involves:

  • comparing them to discover and remedy unintended differences from version to version;
  • cleaning them to remove unnecessary metadata while keeping useful metadata;
  • sending them to the right people only.


Document comparison supports data governance by highlighting differences to confirm changes and make sure unexpected changes haven’t crept into the document. Beware that no comparison tool will flag changes that should have occurred but haven’t. It can’t detect inappropriate content that remains unchanged in a repurposed document because the author didn’t notice it.

Any professional document comparison tool needs to surface the complete range of content changes, not just text. Embedded objects including images or Excel workbooks, comments, formulas, hidden rows/columns, sheets, speaker notes are just a few of the document elements that need detailed comparison.

The comparison needs to be easy to read and understand. Awkward hard-to-follow mark-up doesn’t support good data governance; it undermines it, because people give up and hope for the best, instead of confirming the correctness of the changes.

Metadata cleaning

Metadata cleaning—both email attachments and batch cleaning when saving to removable media or uploading to the cloud or archiving—ensures that documents contain only necessary metadata at every stage of the lifecycle.

A good metadata solution allows users to choose different levels of cleaning and automated conversion to PDF. You can share document automation metadata with a client safely, but not when sending documents to the other side in a negotiation. Standard cleaning policies should wipe built-in properties. When archiving, it may be advisable to wipe all document metadata, depending on business and legal requirements.

Access control

Data governance includes access control: sharing documents only with people who need access to them. Access control must be enforced on documents at rest in the document management system or file folder and when collaborating via email or cloudy shared folders.

A smart sending app serves as a failsafe to prevent sending to the wrong people by detecting questionable addresses or domains and notifying the sender, so they can choose the right address before hitting Send. Relying solely on Outlook email address autocomplete will lead to a sending mistake where you send an email to a close-but-wrong email address. That mistake is at least a reportable data breach, and can cause much more costly problems.

For especially sensitive matters, sharing securely can require encrypted emails and extra-large attachments. Mail systems typically restrict the size of attachments to 20MB or less. People need to share larger attachments securely beyond email.

Whatever secure file sharing solution you use should include end-to-end encryption, prevent unauthorised forwarding, and automatically expire download links. Ideally it should record a complete audit log of all activity on every file. Shared storage should include comparison and metadata cleaning so users can easily follow your organisation’s data governance guidelines.

Hosting a shared cloud data store internally ensures greatest control and security. But if that’s not an option, free services can provide adequate security. The folks at Mozilla offer Firefox Send as an option for files up to 2.5GB in size. Firefox Send uses the same encryption method as Firefox Sync (read this article about Firefox Sync’s encryption for an easy-to-understand primer on how they do what they do, and why their approach is the most secure among all browser file sharing methods).

Solutions that integrate data governance in the document lifecycle

The document lifecycle covers a wide range of processes but boils down to just a few phases: create, refine, deliver, archive. Finding the right solutions for each phase depends on a variety of factors: who needs to do what, and when, for whom.

We at Octantus Associates can help find the most effective solutions from the most reliable vendors. We start by evaluating the technology you already have to see if it can serve your purposes, perhaps integrated differently. If other tech would deliver greater benefits, we leverage our experience and knowledge of the relevant markets to recommend suitable solutions for your situation. We can assist with the business case, evaluation and user acceptance testing, and implementation.

Make It Easy to Do the Right Thing

Prevention is the best cure. Well-thought-out data practices and good document lifecycle hygiene reduce problems arising from inaccurate metadata in your documents. How you create, change, consume, save, and share documents affects your ability to respond to any document demand—government investigation, litigation discovery, anyone’s “right to be forgotten”, the GDPR, a data breach accusation, or a normal business need to retrieve documents.

Good data governance embedded in the document lifecycle makes it easy to do the right thing and reduces risk.

A clean document lifecycle with good data governance embedded within it eliminates the data you need not save and ensures the data you save is correct, while improving search performance and responsiveness.

Tools work best when they integrate at the point of use. Adoption decreases with every extra click or screen hop. An integrated approach that minimises steps, clicks, and screen changes makes it easy for people to do the right thing: save and share only what they should do, efficiently.

What do you think?

What procedures and tools do you use to embed good data governance in your business workflows?

Pin It on Pinterest