At its Google Cloud Next conference in San Francisco back in March, Google unveiled Cloud Dataprep, a service for companies to clean their structured and unstructured data sets for analysis in Google’s BigQuery, for example, or even for use in training machine learning models.
Over the past six months, Cloud Dataprep has been in private beta, but Google is now officially graduating the service into public beta for anyone to use.
Some reports indicate that analysts and data scientists can spend up to 80 percent of their time cleaning and preparing raw data for analysis. And this is where Dataprep comes into play, as it can automatically detect data type, schema, and even where there is mismatched or missing data.
A key facet of Dataprep is the visual layout, that makes it easier for non data-engineers to alter or add to their data sets.
The software is actually an embedded version of the Wrangler enterprise app from Trifacta, a well-funded startup that offers software for cleaning up messy data. Indeed, Dataprep was built in collaboration with Trifacta.
“Cloud Dataprep also has intelligence built-in for understanding and automatically operationalizing your particular usage patterns, making data preparation even faster and less prone to user error,” noted Google product manager Eric Anderson. “The overall result is more productive, efficient, and powerful data analytics pipelines, leading to faster time-to-insight.”
It’s worth noting here that Cloud Dataprep integrates with other services on Google Cloud Platform in addition to BigQuery, including Cloud Storage, Cloud Dataflow, and the Cloud Machine Learning Engine.
Cloud Dataprep is available from today for anyone to use.