Learn about Data Governance and how using a Centralized Cloud Archive improves it
"There were 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every two days." - Eric Schmidt ( former Technical Advisor - Alphabet, Google)
Data is the new currency; it is everywhere, and continues to grow exponentially in its various formats - structured, semi-structured, and unstructured. But, whatever the format, businesses cannot afford to slack on the proper accumulation and categorization of data, otherwise known as Data Governance, given that optimum value can be obtained from these data sources. As Jay Baer, a market and customer experience expert, remarked, "We are surrounded by data but starved for insights."
So, what exactly is Data Governance, and what are its key elements?
Imagine that you wanted to rebrand and launch a failing product and needed some insight into achieving this by looking at YTD sales analysis for the previous five years or perhaps customer feedback for those same years. But, the data for your analysis or information is fragmented between different data storage units or lost due to local deletion by error. What a loss regarding revenue, time, insights, and progress.
Here is where the need arises for a proper Data Governance strategy to ensure such irreparable damage does not happen. A proper Data Governance strategy lists the procedures to maintain, classify, retain, access, and keep secure data related to a business. As data grows exponentially daily, mainly fueled by Big Data and Digital transformations, and with an estimated growth of 181 zettabytes by 2025, the need for a proper Data Governance strategy to ensure proper data usage becomes imperative.
Below are four elements that are key to a proper Data Governance strategy:
Data Lifecycle Management:
To prepare a proper Data Governance strategy, one must first understand the circle of life that data goes through. Data is created, used, shared, maintained, stored, archived, and finally deleted. Understanding these core aspects form the main points of Data Lifecycle Management.
For example, John Doe applied for the position of QA Manager. He applied online on the company’s website (creation), and his resume was chosen by his employers (used) and sent to HR to offer him a job (shared). John accepted the job and started work with the company. His details were kept with HR to update records annually and for tax and legal purposes (maintained and stored).
Finally, John retired, and his file was handed to the Data Steward (archived), where it may or may not be kept (deleted) depending on legal retention policies.
Now compare that to the data lifecycle of a draft sales PowerPoint presentation. The presentation will be created, used, shared, and probably deleted in favor of the final version, which will go through the entire Data Lifecycle Management process. Understanding the data is critical; that is where Data Quality Management comes to the fore.
Data Quality Management:
Let’s go back to the example of relaunching a failing brand. You finally found all the pertinent files you have been looking for for the past five years. But interspersed with the sales and promotion figures are files dealing with a final presentation and numerous formats of that presentation leading up to the final format.
What do you keep? What is needed and what is not, and how do you know the difference? This is where Data Quality Management (DQM) comes in. Essential questions to ask about data when observing DQM are:
- Is it unique? Are there multiple versions of one file or a final version I must keep? Do the draft copies have important handwritten notes that point to the final version and provide greater insight?
- Is it valid? Do I need to keep this data? Is there a possible future use for it?
- Is it accurate? Are the files being saved for future use accurate?
- Is it complete? Are the files that need to be saved in their entirety?
- Is the quality good? Are the quality of the files suitable, providing insightful context in the years to come?
- Is it accessible? Are these records properly archived, or are they fragmented? How can we get access to them?
Data Stewardship:
Now that we have answered all those questions, imagine for a moment all this data - structured, unstructured, and semi-structured sitting in data silos or data lakes as one giant beast. Which begs the famous idiom question - Who will bell the cat?
Who will take on this humongous task of Data Quality Management, i.e., classifying, archiving, storing, creating best practice guidelines, and ensuring data security and integrity?
This is where Data Stewardship comes into play. Appointing a sole person or a committee (which is better) to create and oversee all the tasks of Data Management is the optimum choice in the eventual buildup of good Data Governance strategies.
The main job of a stewarding committee is to ensure that data is properly collected, managed, accessed when needed, and disposed of at the end of the retention period.
Some essential functions of a data stewarding committee are:
- Publishing policies on the collection and management of data ( something that can be achieved faster if DQM practices are already in place)
- Educating employees on proper DQM best practices, and providing training on Record Information Management (RIM) policies established by the company, ensuring these trainings are given after 3 years to stay compliant with existing and new regulations.
- Revising retention policies to meet new regulations.
- Creating a hierarchical chain of command within the committee based on the classification of records.
Data Security:
Should you keep the data, classify or not, and share it or not? There are many questions regarding the usefulness and usability of data. But one thing stands out - whatever the reasons, all data usage should be considered secure in the entire Data Lifecycle Management process to the point of its deletion.
Data Security be it encryption, resiliency, masking, or ultimate erasure, tools have to be deployed along with policies to ensure that the company’s data is safe and secure and used by the proper personnel.
The recurring theme we get from all these essential elements is after identifying and classifying data, where can we keep all this data? And while it occurs to most businesses to keep data stored either physically or in-house, the case for a centralized cloud archival system is getting stronger daily. So, should one go for a centralized cloud archival system? Here are a few advantageous arguments that prove a case in point.
Advantages of a Centralized Cloud Archival System:
- With internet usability being predominant worldwide and the availability of a company’s intranet to its employees, access to data on a centralized cloud archive has never been easier.
- Organisations and departments within organizations can share data and resources more efficiently. Specific data can be quickly discovered with such tools as E-discovery.
- As data grows daily, a centralized cloud archive system can meet scalability demands while remaining flexible.
- A cloud archive system is more cost-effective in keeping data, meeting the demands of growing data, and having flexibility compared to a local in-house data storage unit, not to mention the office space it will save and the costs of having a built-in infrastructure IT room specifically for this.
- Data is stored in a secure, centralized cloud archive system that ensures no unauthorized access. Moreover, it ensures timely data backups and updates to the system and is less likely to be damaged or lost in a local disaster.
As we let the advantages of a centralized cloud archive sink in, here are ten ways it can help businesses in their Data Governance strategies:
Vaultastic - centralized, and agile information archival
10 Ways a Centralised Cloud Archive can improve your Data Governance:
Focus on global policies:
While classification and maintenance of data is a crucial factor in governance, the time factor is as important an element as any other. The question - of how long to retain this data is relevant in the archival process, and the answer is not so black and white.
With local and global policies changing daily when confronted with new and sometimes imposing queries, data retention times vary from year to year. For the people responsible for maintaining or classifying the data, there is a need to store data immediately, pending proper relevance tagged to the data. While in-house storage units can house them temporarily, they are vulnerable to security breaches, deletion due to error, or data fragmentation.
The obvious choice is to store them in a centralized cloud archive with combative features like encryption, security, secure access, flexibility, and scalability.
Data Access and ABAC:
Because data is located in a secure, centralized cloud archive, employees distributed in different geographical locations can access data at any time depending on their time zone, at the same time as other employees, or at multiple times.
But should all employees have access to everything? The Centralised cloud archive ensures that Attribute Based Access Control (ABAC) is in force, ensuring employee rights based on attributes assigned to them. These rights are usually enforced when creating DQM strategies or by Data Stewards based on changing company, local and global policies.
Deduplication:
A centralized cloud archive system has the innate software technology to ensure no data duplication, ensuring only one final copy to file. This is in stark contrast to the in-house data silos, which promote data fragmentation and unnecessary duplication of files.
Self-service for data consumers:
All the data is stored in a single searchable platform, making it easier for consumers to source or explore independently. Self-service access allows consumers to access any data they have permission for without having to request access from the data owners manually.
Removing the need for Physical Infrastructures:
A Centralised Cloud Archive System is much more cost effective than traditional storage methods, as it eliminates the need for businesses to purchase and maintain their data storage infrastructures.
Maintaining transparency and Automated Reporting:
With a Centralised Cloud Archive System, all the data stored on it is available in a searchable format, making it easier for stakeholders to understand the information and use it for decision-making. This helps in improving transparency and accountability within the organization.
Moreover, with automated reporting, and a data monitoring system in place, there is transparency as to who is obtaining data access, when, and where.
Removing the redundant data:
When a file is no longer needed, has served its retention period, and has been approved for deletion by the data stewarding committee, it is easier to access this redundant data file from the Centralised Cloud Archive and permanently delete it.
Data Durability:
SaaS cloud data archiving platforms that offer high reliability and availability with in-built disaster recovery sites (like Vaultastic does) will drastically reduce the RPO and RTO anxieties of CXO teams. And also eliminate the effort of performing backups of the data.
Data Security:
The Centralised Cloud Archive is more secure than other data warehouses or in-house storage. Cloud-based data archiving platforms like Vaultastic leverage the cloud’s shared security model to provide multi-layered protection against cyber attacks.
Updated patches, two-factor authentication, encryption, and relevant security controls ensure that businesses’ data are kept in a tight vault.
Cost-Optimization:
SaaS data archiving platforms that can optimize costs along multiple dimensions even as data grows daily should favor you. Data will grow continuously, and you want your costs to be kept in check.
Conclusion
More and more businesses are migrating to the Cloud for their solutions, primarily their archiving solutions. The reasons are many - cost-effectiveness, data security, deduplication, better infrastructure, IT support, user-friendly, the list is endless.
Once a business has established its Data Governance strategy and implemented it, the next step is to ensure this data is secured in a proper location.
What better way than the cloud, which is proving its practicality day by day.
If you have your Data Governance strategy in place, Vaultastic, an elastic cloud-based data archiving service powered by AWS, can help you quickly implement your strategy.
Vaultastic excels at archiving unstructured data in the form of emails, files, and SaaS data from a wide range of sources. A secure, robust platform with on-demand data services significantly eases data governance while optimizing data management costs by up to 60%.
To learn more about how Vaultastic can help you, look here.