At the end, important information like version number, type, price indications, etc. The criteria for success are explained in more detail in the full ETL Tools & Data Integration Survey 2014.
#Company & Market
- Entry into the market (year): In which year was the product released for the first time?
- Customers world wide: The number of customers worldwide.
- Installations world wide: The number of installations worldwide. For code generators the number of developer seat.
- Symmetric Multiprocessing (SMP): Is Symmetric Multiprocessing supported? Standard in Windows NT and UNIX. The processors in SMP systems share their internal and external memory.
- Massively parallel processing (MPP): Every processor in a MPP system has its own internal and external memory and database, allowing high performance to be achieved. These databases should be synchronized.
- Cluster Aware: Is the ETL server ‘cluster aware’ and does it support load balancing, fail-over and other cluster facilities?
- Grid: Can an ETL process run on a ‘grid’ of computers or servers?
- Job distribution: Is it possible to run ETL processes on different machines or processors?
- Data pipelining: Can the different steps of an ETL process be run on different machines or processors?
- Partitioning: Is it possible to partition based on, for example, product codes, to determine on which machine or processor the data has to be processed?
- Basic architecture: h = hub & spoke (all data runs through one point), d = distributed (multiple lines between sources and targets) and m = multi hub/spoke
- End-to-end BI infrastructure: Does the ETL tool exchange metadata, for example star schemas, with OLAP or reporting tools, from their own range of products or third parties products
- Supports CWM: Is the ETL tool CWM-compliant, in other words does it support the Common Warehouse Meta Model?
- Version control system: Does the product contain a version control system with check-in and check-out features?
- Splitting data streams/multiple targets: Is it possible to read a data source once and load the results into two or more tables?
- Conditional splitting: The same, but then in a conditionally, for example, if revenue is higher than 1000 put the results in table 1 otherwise in table 2
- Union: Put rows of different tables with the same structure into one table or dataset
- Pivoting: Is it possible to transform denormalised data, putting data in the column names, into rows
- Depivoting: The other way around, transform (highly) normalised data to de-normalised data, putting data in the columns
- Key lookups in memory: Can you load a table completely into internal memory and search the table? (without having to make joins)
- Key lookups reusable across processes: Are these tables reusable across different loading processes in such a way the key lookup table is loaded once into memory?
- Read non-structured data: Can the product read non structured data like text, email, video etc and if so which file types
- Slowly changing dimensions: Does the tool support slowly changing dimensions (hm = manually; wizard = wizard; auto = out-of-the-box)
- Scheduler: Is there a mature scheduler that supports dependencies?
- Error handling within job: Can errors be detected within the job, and is it possible to change the route within the process flow when a specific error arises?
- Impact analysis: Is it possible to make an impact analysis of proposed changes (when an attribute or table must change)
- Data lineage: Can one track down easily the source of an attribute/information-element (reversed impact analysis)
- Automatic documentation: Can you publish and document a process/transformation automatically and navigate through it with a browser?
- Support for data mining models: Is it possible during the loading process to make use of the results of a data mining process?
- Support for analytical functions: During the loading process, is it possible to invoke different kinds of analytical functions like forecasting, basket analysis, regression?
- Ease-of-use: The ease-of-use of the product. Is it easy to learn and easy to use on a daily basis?
- WYSIWYG: Is the principle ‘What You See Is What You Get’ applied to the DATA?
- Screen design: Is the screen design ergonomic and well-balanced?
- Task compatibility ETL / EAI: Does the tool support the tasks (in the same sequence) as the ETL developer?
- Training Requirement: What is the recommended training both for the developer and the end user
- Completeness of the GUI: What percentage of the functionality can be generated directly from the GUI
- Training requirements: both for the professional developer and the business user (where appropriate)
- Reusability of components: Are components reusable and can they handle parameters (this is not the same as copy-paste)?
- Decomposition: Can processes be divided into named small pieces of building blocks (modular programming)?
- User-defined functions: Is it possible to define user-defined functions and to use them in the process flow?
- Comments on selection of objects: Can one make comments on a selection of objects in such a way that these comments are tightly connected
- Step-by-step running: Can you run the process flow step-by-step?
- Row-by-row running: Can you run the process flow row-by-row?
- Breakpoints: Can you set a breakpoint on a particular process step or a row of data?
- Watch Points: Can you define watch points, so the system postpone running when a certain condition is met?
- Compiler/validater: Is it possible to validate the process flow (and/or code) with one click of the mouse and are errors reported and marked?
- Integration batch – real-time: Is it possible to define within the ETL tool process flows moving and transforming data in real-time and in batch?
- Mechanism: How are changes in the source systems detected and put through (mq = message queuing; logging = database logs or journals; trigger = database triggers)?
- On-demand data integration: Can you publish an ETL process and/or target attribute as a Web Service?
- Native connections: How much and which native connections does the ETL tool support? (ODBC, OLE DB and flat files excluded)
- Packages / enterprise applications: How many packages / enterprise applications can the tool read meta data from with one click of the mouse (for example SAP, Siebel, Peoplesoft, JD Edwards, Baan)
- Real-time connections: How many and which type of message queuing products can the tool connect to?
- Support for joined tables as source: Can you join two tables in a graphical manner letting the database execute the join as opposed to letting the ETL tool join the tables
- Changed data capture: Does the ETL tool support the principle of Changed Data Capture (select only the changes (deltas) from the database)
- Built-in functions for data quality: Are there functions available to control the quality of the data (for example a matching transformation or an address cleanser) # fuzzy logic
- Built-in functions for data validation: Does the tool have functions for data validation like ‘delete duplicates’, ‘missing values’, ‘incorrect data’) # 100% match
- Data profiling: Options for data profiling, uniqueness of values, maximum, minimum, distribution of values, and so on.
- Platforms: The number of platforms the ETL tool runs on
- Version: The version number of the product that has been evaluated
- Engine-based / code generator: Is the tool Engine-based or an Code-generator (eg = engine based; cg = code generator)
- Type: ‘Process’ when the user can define an unlimited number of tasks/steps between source and target else ‘Map’
...
Source : ETLTool.Com
No comments:
Post a Comment
Share Your Inspiration...