How to Avoid Structured Data Pitfalls

Documoto

The technical publishing world has increasingly gravitated toward structured data concepts and tools over the past decade. From XML to DITA to component content management systems (CCMSs), the evolution continues thanks to savings in labor, improved accuracy, and optimized workflows afforded by structured data methodologies.

While using structured data to publish technical documentation offers significant efficiencies, it is quite common to find publishers who impose unnecessary challenges upon themselves. This article addresses the top three traps publishers should recognize and avoid.

Use Unique Identifiers for File Names

What is the purpose of a file name?

From a software perspective, often the only limiting parameter for a file name is a unique string of accepted characters. However, it is extremely common in practice for individuals saving files to include information associated with the file, or “intelligence.”

While it is important to capture metadata, the file name is a great place to include attributes of the content. Identifiers such as a part, page, or media file types should be unique. In the same way that you would never want someone to try to create 50 books with filenames like:

Book1.docx
Book2.docx
Etc.

Book1.docx does not contain meaningful data for a Publisher let alone an end user to identify. In fact, it makes the content harder to locate. Instead, it makes more sense to use:

pc_11223344.docx
PC_55667788.docx

Serial numbers are also a great way to identify your parts books. If you do not use serial numbers, then we often advise our clients to use ‘increments’. This essentially mimics the act of serializing or part numbering for the book creation process.

Choose Your Storage Wisely!

Most companies have a huge assortment of software tools, databases, server locations, and so on, which offers multiple locations to store important information. Depending on the type of information, there are clear advantages in storing the data in a specific location.

Let’s consider a scenario in which text-based data is stored within an engineering drawing. Maintaining this information in CAD is significantly more expensive than managing the same data in text-based databases. Revising data in CAD requires skilled, highly compensated resources (usually in short supply), and also involves more demanding workflows to implement changes.

High-quality models and drawings require CAD; however, text-based information is better suited for storage in other applications.

One example occurs when OEMs place vendor information on a drawing. This practice is not suggested, even if a part or assembly is sole-sourced. Instead, vendor data can be captured in an ERP system, and then programmatically applied to a related purchase order. If a new vendor is eventually contracted to supply the part in question, the time difference in maintenance is profound.

With regard to electronic parts catalogs, including text-based information in an illustration is similarly detrimental. Use software that has appropriate places to capture text-based data. It is always easier to update a text field than an illustration. This ideal is readily apparent if the text field has many instances of re-use.

Furthermore, combining data elements such as the illustration and text attributes limits the functionality of relational database behavior if changes need to be made on one element rather than both elements simultaneously. Combining elements like this negates one of the major benefits of implementing a database publishing system.

Don’t Combine Data Elements!

Manufacturers employ many strategies to increase publishing efficiency. Some of these strategies endure, even though they were born in an unstructured world.

For instance, consider a parts list matrix within a parts book that shows what the corresponding part is for each model. In other words, one page is detailing all of the parts within the machine assembly for all models. While this was extremely effective when creating standalone PDF-type documents, it is incredibly constraining in a structured publishing environment.

With structured database publishing software, tying multiple data elements into a rigid form severely limits the ability of the relational database to manage the data. Each data element needs to be distinct.

This rule of thumb also applies to our earlier example of placing text-based information within an illustration. Another common error of this type is placing metadata (or information that can be captured in metadata) within a description field.

It is incredible how many unique strategies creative publishers have invented to save time. While these approaches may have been advantageous in the past, and may still offer short-term benefits, these “benefits” eventually add up to an opportunity cost that manufacturers cannot afford. Continuing legacy practices that are incompatible with new technologies is guaranteed to create havoc when a modern system is inevitably adopted.

It may be time to pause and sharpen your ax before swinging at the trees. Are the “keys” to your relational database unintelligent? Are you using the most efficient location to store corresponding data? Are all of your data elements discrete? If you answer, “No,” to any of these questions, you have great opportunities to maximize efficiency in your current—and future—publishing tools.

Have additional suggestions? Leave a note in the comment section.

DATA-SHEET-AUTHORING-SUITE-01

How to Avoid Structured Data Pitfalls

Use Unique Identifiers for File Names

Choose Your Storage Wisely!

Don’t Combine Data Elements!

Get News & Blog Updates!