Why Data Quality is the most important thing for all applications
There are millions of applications running across the globe. All of them have one thing in common, they all depend on good data to be classified as successful, trusted and reliable.
While most architects try and make sure that the data specification is clearly defined and tested in development. Scenario changes when the system moves into production, everything that was working with surgical precision suddenly starts failing. And in a lot of cases this cut-off for moving to production is very narrow.
Business owners and Project Managers should consider Data Quality before embarking on an Integration, Data Migration or a new application development endeavour.
In this blog I introduce you to options to reduce this last minute failure
As things move closer to production time – environments like UAT, Pre-Prod should increasingly be similar to production environment. This throws up errors and performance issues early and gives developers reasonable time to address issues properly instead of last minute quick fixes and patches. And this activity should be given due importance in the project plan.
Now that we have planned for time for issues when going live. I would like to introduce you to some of the common reasons for why things fail.
Data is imported from other internal and 3rd party systems and may be have issues in terms of quality, data type, data length, format, nulls when not allowed, etc.
- Date – is usually the most common culprit when data is moving between systems. It could be incorrectly entered, formatted differently, front end sending it back differently based on geographic setting on the procedures.
- Time included with date or not
- Special Characters
- Spatial Data
- Data Length
- Currency and Numeric data between different countries
- ETL Tool truncates or transforms the source data causing quality issues.
- Range of Values acceptable for a particular data element
- Windows vs Unicode format
- Column numbers and order in source data like CSV and other source types
Plan to reduce things that fail
- Try and work with data as close to source data as possible
- Analyse source data and make sure that low, average and high extremities of data is incorporated in test data.
- For string data this could be having all types of data possible in source like nulls, 0 length strings, largest possible string length, special characters and multi-lingual data based on application requirement
- For date and time – make sure the min and max data examples exist in test data. Also try and fix or eliminate data with date issue.
- Some countries use “.” as decimal separator where as others use “,”. Make sure this is handled consistently across system.
- Different platform store and deal with Boolean data differently make sure this is consistent. Also, remember some platforms allow nulls in this whereas others don’t.
- Different platforms use spatial data differently, this should be made consistent across the system and make sure is tested early on.
If you need a smoother transition or commissioning of an Application or a Database Platform you can contact the author on email@example.com
About the author
Mukesh, I an experienced IT professional from United Kingdom with focus on providing excellent database solutions for diverse industry applications and platforms. He is passionate about simplifying & streamlining business processes using best available technology.
Mukesh is adept in providing reliable database, data warehouse, migration, integration & reporting solutions using Microsoft BI Stack (SSIS, SSAS, SSRS) on SQL Server, AWS Redshift, Ms-APS/PDW, Oracle, MySQL. ETL and Reporting platforms like SSRS, Power BI, Business Objects, and Crystal Reports.