Data Platforms
Data Platform Development
A data platform is a complete solution for ingesting, processing, analyzing and presenting the data generated by modern digital organization.
Using a modern data tool stack is essential for speed, efficiency and throughput. Gone are the days where you could take 6 to 12 months. Business users need the data today in the way they know the data.
Here at Data Engineers, we are able to implement and develop the technologies required for a modern data platform while applying standards and automation through well developed frameworks and strategies.
Data Acquisition
A data platform need the ability to consume data, weather data is pushed directly form a source system or an orchestration tool is required to go and fetch data. Depending on where data is coming from and the capability of accessing the data different techniques are required to extract data.
We have partnered with Fivetran to enable a seamless connectivity with multiple source systems such SQL Server, Azure SQL, Xero, Workday, LinkedIn, Google Analytics without having to write a single piece of code utilise an azure function connector to bring in additional HTTP endpoints to systems which don't natively connect through.
Fivetran isn't the only tool in our toolbox though, we have created azure function apps to listen to events such as cosmosdb data changes, azure logic apps for orchestration of complex scenarios to 3rd party API access, Bimlflex and Data Factories to move data from SQL Server Databases and storage accounts for their data lake capabilities.
When it comes to acquiring data, we are able to review and advise the most appropriate option for you scenorio.
Data Ingestion
Once data has been acquired, data needs to be able to land within the Data Platform of choice or that it can be utilized in downstream processes. The actual ingestion process depends on the use-case for the data and how the data was acquired.
Files stored in a Data Lake for example could be either ingested and extracted or just referenced through external tables. Streamed data or data received via Fivetran or Data Factory would be stored directly in a table. We have experience helping customers ingest their data in the most appropriate and cost effective way.
Data Modelling
Once data has landed within the data platform we tend to call it the "source" data. As Data Engineers, we don't want anyone to actually access this data as its likely to have records which have been marked as deleted or the column naming may not be accurate, for example custom_1234.
The data needs to be transitioned through intermediatory views and be transformed into something which can be self discovered by end users. We would typically call the data in that format as "domain" data. This is fine for simple scenarios but there are times when data is complex and there is a necessity to model the data in business terms first which is called Data Modelling of which there are two well know techniques, Data Vault and Kimball.
Data Engineers has some of the most certified experienced Data Vault modellers ready to assist you and your company to achieving data warehouse greatness. By applying experience, up-to-date best practices and standards, we can model and develop or coach you to achieving your value delivering data warehouse solution.
We also have the only Genesee Academy certified trainer in the Australasian region to train and certify your staff.
Data Transformation
Once you know how you would like to present your data either through data modelling or in direct domain models, we need to create the transformations so that the data flows through seemly. As in software engineering practices, we want to be able to define repeatable patterns and apply principals such as DRY (Don't repeat yourself), KISS (Keep it stupid simple) and Yagni (You ain't gonna need it) when writing them. We have adopted the use of dbt as our tool of choice within our DataOps framework to enable Data Engineers to write the code once and deploy to multiple environments when its ready.
Data Governance
Often forgotten about, we believe Data Governance should be integrated into your development efforts, as such the basic information for your data catalog and data classifications so that when you release your new feature, your users know what is available and that only those authorized to access the data can see the real data.