Data Architect, Data Warehouse Architect and BI Solution Architect
What is the
difference between a data warehouse architect and a data architect? How about
BI solution architect?
Both data architect
and data warehouse architect do data modelling, as in ERWin stuff. Or
Embarcadero ER Studio. But a data warehouse architect is more specialised on
designing the data model for a data warehouse, whereas a data architect is more
specialised on designing the data model for databases used by transactional
systems.
A data warehouse
architect does a lot more than just data modelling. They also does the ETL and
the infrastructure. These are areas that a data architect doesn’t do normally.
For a data architect
to be able to call themselves a data warehouse architect, they don’t only need
to learn how to create a data model for a data warehouse (as in dimensional
modelling). But they need to also understand the ETL architecture. And they
need to understand the servers. For example, they need to be able to specify
the specification for the production data warehouse servers, i.e. CPU, memory,
disks. And other server stuff like clustering, mirroring and DR. And they need
to understand physical database stuff too, like table partitioning, file groups
and materialised views.
In my book I specify
there are 2 sides of data warehouse architecture. The first one is the logical
architecture and the second one is physical architecture. A warehouse architect
needs to understand both.
A data warehouse
architect in my opinion demands 4 separate skills: data architecture, ETL,
database platform and physical infrastructure. By “database platform” I mean
SQL Server knowledge, Oracle knowledge, Teradata knowledge, Netezza knowledge,
etc. For example, “SQL Server 2008 Data Warehousing Features” is a “must know”
for a DW architect in SQL Server. Whereas UPI & USI is a “must know” for DW
architect in Teradata. If we design a DW on Oracle 11g R2, we need to know
Oracle specific DW features, such as Initialisation Parameter Settings,
Partitionwise Joins, Compression, Parallel Execution, etc.
A BI architect is
more on the application side, as in SharePoint architecture, Hyperion
architecture, Reporting Services architecture, and Analysis Services
architecture. For example: a BI system where we have reporting services running
on SharePoint, plus excel services and PPS services on SharePoint, and SSAS
cubes too reading from a warehouse or mart. And on top of that some custom .NET
coding for authentication or customised security. Plus they allow some
self-service BI using Qlikview or PowerPivot.
Back to the data
warehouse architect, the ETL aspect of the job is sometimes quite demanding.
There is something called ETL architecture in warehousing, which is basically
a) the architecture
of the overnight batch, i.e. the structure of the tasks and workflows, the
execution order of the workflows, the backup, the reconciliation, the checking
and alert, and the data quality. The overnight batch is not only about data
loading / ETL. It also has: data serving elements, i.e. processing reports
(stored as PDFs, ready to be served to achieve split second response time),
refreshing OLAP cubes.
b) the architecture
of the continuous feed throughout the day for real time warehousing
c) the physical
infrastructure of the ETL, i.e. the servers, the databases, the data
connections
d) the methods to
extract and load the data i.e. sliding window, swap partition, flipping twin
tables, identifying incremental extraction, changed data capture, change tracking
mechanism, (filtered) replication between warehouse and mart (or mirroring),
how to extract data from cubes.
If a data warehouse
architect only understands dimensional modelling, the company will have
problems in the ETL and infrastructure. The servers might not be ‘suit for
purpose’, for example disk configuration is not optimised for warehousing.
How about “BI
solution architect”? A data warehouse is the back end. Business Intelligence is
the front end. Data warehousing is about the data model, the ETL and the
databases. BI is about the reports, the OLAP cubes, the analytical
applications, the data mining, the KPIs, the dashboards, the score cards, and
the performance management. It is a common perception that a BI solution
architect is a “front end” person. An application person. This perception is
incorrect.
A “solution
architect” is responsible for the whole solution. Not only the front end, but
also the back end. It is impossible for him to be responsible for the whole
solution without looking after the back end. In the case of a BI solution
architect, he or she needs to look after both the BI front end (reports, cubes,
performance management, etc.), and the back end (data warehouse, ETL). It is
impossible for a BI solution architect to be responsible for the BI without
looking after the data warehouse. In some companies, instead of calling the
role “BI solution architect”, it is called “data warehouse solution architect”.
It doesn’t mean that the role is only responsible for the back end data warehouse,
but he is also responsible for the BI front end. In some companies, they have
not only one but several BI solution architects, each responsible for a certain
area.
There are several
different types of architects in IT, for example: system architect, data
architect, solution architect, information architect and enterprise architect.
This segregation of duties only happens in very large group of companies
(enterprises). For example, a banking group with 3000 IT staff. If the IT staff
is only 100 usually the architect does multiple functions. A system architect
is responsible for infrastructure, including networks and servers. I’ve
mentioned about data architect and solution architect. An information architect
is responsible for the flow of information throughout the enterprise, including
databases and data quality. An enterprise architect is responsible for all the
applications in the enterprise, making sure they run in sync and adhere to the
standards.
Comments
Post a Comment