Bigeye, the data observability company, announced the results of its 2023 State of Data Quality survey. The report sheds light on the most pervasive problems in data quality today.
The report, which was researched and authored by Bigeye, consisted of answers from 100 survey respondents. At least 63 came from mid-to-large cloud data warehouse customers (with a spend of more than $500k per annum) who have some form of data monitoring in place, whether third-party or built in-house.
First line of defense against data issues
Bigeye’s survey found that data engineers are the first line of defense in managing data issues, followed closely by software engineers. The role of data engineer has moved on par with software engineering. Like software engineers, data engineers are in charge of a product – the data product – that increasingly demands software-like levels of process, maintenance, and code review.
Desire for automation
Respondents who used third-party data monitoring solutions found about a 2x to 3x ROI over in-house solutions. They also noted that at full utilization, third-party data monitoring solved for two issues: fractured infrastructure, and anomalous data. They further reported that third-party data monitoring solutions had better test libraries, and a broader perspective on data problems.
Data incident frequency
Research revealed that companies experience a median of five to ten data incidents over a period of three months. These incidents range from severe enough to impact the company’s bottom line, to reducing engineer productivity. These incidents take an average of 48 hours to troubleshoot.
Organizations with more than five data incidents a month are essentially lurching from incident to incident, with little ability to trust data or invest in larger data infrastructure projects. They are largely performing reactive over proactive data quality work.
Other important findings from the survey
There were other interesting insights revealed through survey results, including:
- Respondents told us 37,500 man hours to build an in-house data quality monitoring solution
- Roughly that equates to one year of work for 20 engineers
- 70% of respondents reported at least two data incidents that diminished the productivity oftheir teams
- Data issues most commonly take ~1-2 days to spot and fix, but with a long tail lasting up toweeks and months
- Respondents reported at least two “severe” data incidents in the last six months, whichcreated damage to the business/bottom line and were visible at the C-level
“Coming from a data team before starting Bigeye, I knew anecdotally how much of a burden data quality and pipeline reliability issues were. These survey results confirmed my experience: data quality issues are the biggest blockers preventing data teams from being successful,” said Kyle Kirwan, Bigeye’s CEO and co-founder. “We’ve heard that around 250-500 hours are lost every quarter, just dealing with data pipeline issues.”
To read the full report, click HERE.
Sign up for the free insideAI News newsletter.
Join us on Twitter: https://twitter.com/InsideBigData1
Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/
Join us on Facebook: https://www.facebook.com/insideAI NewsNOW
Speak Your Mind