Metadata – a guide for non-techies

About Charles Drayson

Charles is a UK lawyer who has used document automation for 20 years. He has worked for large law firms, corporate legal teams, and has automated legal and non-legal documents. He writes for Legito to share his passion for using automation to get work done. “I get a kick out of creating good content and seeing it used repeatedly and reliably by colleagues without fuss and bother”.

Metadata – a guide for non-techies

Charles Drayson

Mar 17 · 5 min read
Solutions like Legito provide enhanced features by using metadata. You will see metadata mentioned in articles and white papers. This is a guide to what metadata is and what it can do for you.

Metadata is commonly defined as ‘data about data’.

The definition is concise and clever for someone with a technical understanding of metadata. It conveys no useful understanding to anyone else,. Persevere, dear reader – the learning is worth the effort, even if it won’t make you more fun at parties.

Imagine you have a SharePoint library with a PDF copy of the signed employment contracts for all your employees, past and present. Imagine you have another library with a copy of all your customer contracts. Maybe you don’t need to imagine – this is the reality for many organisations. Suppose your organisation is seeking investment and you need to provide information about those contracts. They might ask for a list of employees who have contracts that include restrictive covenants. They might ask for a list of customer contracts that include a clause dealing with ‘change of control’ of the company. You have the documents – now you need to extract the relevant ones. How do you do it?

Typical file stores are ineffective when you need to retrieve information from your organisation’s records.

When you stored the documents, did you store them in a file structure that allows you to identify the documents in each category? Even if you stored the documents in categories, you probably didn’t anticipate the particular categories you need now. And, when you store documents using sub-categories, you can only do that for one way of working – if you store employment contracts according to the department, you cannot also store the contracts by reference to role types. If you store customer contracts according to customer name, you cannot also store them by reference to product type. If you need to store any documents using more than one filing system, you have to duplicate the contract for each filing system. If one of those documents changes, will you be able to update all the other copies that exist in the other filing systems?

If you have only a few documents, you can read them and find the information you need. It probably doesn’t matter if you store them using a simple filing system. As the number of documents increases, that task becomes error-prone and tedious. The difficulty: you have the data, but you cannot readily get to it. This is the type of problem that metadata can solve.

Conceptually, you could make it easier to locate data if you store documents in a more machine-readable format instead of a scanned copy of a signed document. You can search SharePoint for keywords within Word documents, for example. You would need a library where you store the .docx files alongside the .pdf file. Most document archives don’t have that. You would also need to construct a keyword search that extracts the files you want. It’s not easy.

Documents are designed to be read by humans, one at a time, and in limited quantity. Metadata is information about documents (or spreadsheets or any other type of data) that describes the contents of the documents useable by a system that wants to interrogate the data. It’s like an index but more comprehensive. Metadata acts like a pointer to the information locked up in your data.

Metadata preserves the digital record of transactions to future-proof information retrieval.

Documents are here to stay (we still have humans), but metadata can be created in real time when the document is created, and then retained for future use. You might not know the future requirements, but you can retain the digital inputs to have the best chance of meeting those unknown requirements when they arrive.

Metadata can power dashboard and reporting features. In Legito, we extract metadata that can be used in customised workspaces so that teams get an intuitive view of current processes.

When deploying any new system, someone should consider what happens if and when you need to move your records to another system. Metadata makes it easier to migrate data between systems.

Metadata is not just for documents.

Metadata linked to documents is useful, but organisations usually create documents when performing a process. The process probably captures data about that is not recorded in the document (audit trails for internal approvals, for example). All related data should be retained for re-use.

Use metadata to re-perform automation.

Process automation using a tool like Legito is not limited to one-way sequential steps. Processes divert according to prevailing conditions, and sometimes a process has to be repeated with altered inputs. Customers change their mind. Approvers require changes. New information emerges. Nobody wants to re-input data when re-performing a process or regenerating a document. By retaining data through the process, it’s necessary only to enter information that needs to be updated. The previous data can be re-used.

Capturing metadata.

The benefits of metadata are magnified if you collect data all the way through a process. It’s easier to capture and re-use metadata if the end-to-end process is performed using one solution, like Legito.
Charles Drayson
Aug 30 · 5 min read

Solutions like Legito provide enhanced features by using metadata. You will see metadata mentioned in articles and white papers. This is a guide to what metadata is and what it can do for you.

Metadata is commonly defined as ‘data about data’.

The definition is concise and clever for someone with a technical understanding of metadata. It conveys no useful understanding to anyone else,. Persevere, dear reader – the learning is worth the effort, even if it won’t make you more fun at parties.

Imagine you have a SharePoint library with a PDF copy of the signed employment contracts for all your employees, past and present. Imagine you have another library with a copy of all your customer contracts. Maybe you don’t need to imagine – this is the reality for many organisations. Suppose your organisation is seeking investment and you need to provide information about those contracts. They might ask for a list of employees who have contracts that include restrictive covenants. They might ask for a list of customer contracts that include a clause dealing with ‘change of control’ of the company. You have the documents – now you need to extract the relevant ones. How do you do it?

Typical file stores are ineffective when you need to retrieve information from your organisation’s records.

When you stored the documents, did you store them in a file structure that allows you to identify the documents in each category? Even if you stored the documents in categories, you probably didn’t anticipate the particular categories you need now. And, when you store documents using sub-categories, you can only do that for one way of working – if you store employment contracts according to the department, you cannot also store the contracts by reference to role types. If you store customer contracts according to customer name, you cannot also store them by reference to product type.

If you need to store any documents using more than one filing system, you have to duplicate the contract for each filing system. If one of those documents changes, will you be able to update all the other copies that exist in the other filing systems?

If you have only a few documents, you can read them and find the information you need. It probably doesn’t matter if you store them using a simple filing system. As the number of documents increases, that task becomes error-prone and tedious. The difficulty: you have the data, but you cannot readily get to it. This is the type of problem that metadata can solve.

Conceptually, you could make it easier to locate data if you store documents in a more machine-readable format instead of a scanned copy of a signed document. You can search SharePoint for keywords within Word documents, for example. You would need a library where you store the .docx files alongside the .pdf file. Most document archives don’t have that. You would also need to construct a keyword search that extracts the files you want. It’s not easy.

Documents are designed to be read by humans, one at a time, and in limited quantity. Metadata is information about documents (or spreadsheets or any other type of data) that describes the contents of the documents useable by a system that wants to interrogate the data. It’s like an index but more comprehensive. Metadata acts like a pointer to the information locked up in your data.

Metadata preserves the digital record of transactions to future-proof information retrieval.

It’s ironic and wasteful to have systems that power organisation back-office processes and to use documents to create a record of events. Using documents as the system-of-record for contracts, processes and transactions is inherently limiting. The net effect is converting machine-readable data into a human-readable medium in a one-way conversion. It’s the one-way nature of the conversion that causes the waste. It’s hard for systems to reverse engineer the information locked up in a document in a way that is reliable and flexible. Some software promises to read legacy documents (think e-discovery systems and some AI-powered software), but it’s unlikely you will get results that have the same integrity as the original data.

Documents are here to stay (we still have humans), but metadata can be created in real time when the document is created, and then retained for future use. You might not know the future requirements, but you can retain the digital inputs to have the best chance of meeting those unknown requirements when they arrive.

Metadata can power dashboard and reporting features. In Legito, we extract metadata that can be used in customised workspaces so that teams get an intuitive view of current processes.

When deploying any new system, someone should consider what happens if and when you need to move your records to another system. Metadata makes it easier to migrate data between systems.

Metadata is not just for documents.

Metadata linked to documents is useful, but organisations usually create documents when performing a process. The process probably captures data about that is not recorded in the document (audit trails for internal approvals, for example). All related data should be retained for re-use.

Use metadata to re-perform automation.

Process automation using a tool like Legito is not limited to one-way sequential steps. Processes divert according to prevailing conditions, and sometimes a process has to be repeated with altered inputs. Customers change their mind. Approvers require changes. New information emerges. Nobody wants to re-input data when re-performing a process or regenerating a document. By retaining data through the process, it’s necessary only to enter information that needs to be updated. The previous data can be re-used.

Capturing metadata.

The benefits of metadata are magnified if you collect data all the way through a process. It’s easier to capture and re-use metadata if the end-to-end process is performed using one solution, like Legito.

More Industry Insights