Apache Hadoop is a collection of open source framework that is used to efficiently store and process big datasets in a distributed computing environment, ranging in size from gigabytes to …
Data Warehouse, Data Lake & Data Vault
Data Lakes & Data Warehouses Data Lakes and Data Warehouses both act as repositories, but they are designed for very different purposes. Data Warehouses work best for specific projects with …
Data Warehouse Implementation – Guidelines and Principles
Focus on Business Goals: Make sure DW serves organizational priorities and solves business problems. Start with the End in Mind: Let the business priority and scope of end-data-delivery in the …
Data Warehousing and Business Intelligence (Momentary Look)
Requirements’ Understanding Define and Maintain the DW/BI Architecture Define DW/BI Technical Architecture Define DW/BI Management Processes Develop the Data Warehouse and Data Marts Map Sources to Targets Remediate and Transform …
Document and Content Management (A Glance)
Plan for Lifecycle Management Plan for Records Management Develop a Content Strategy Create Content Handling Policies Social Media Policies Device Access Policies Handling Sensitive Data Responding to Litigation Define Content …
Data Management – Records/Document Retention and Disposal
Effective Document / Records management requires clear policies and procedures, especially regarding retention and disposal of records. A retention and disposition policy will define the timeframes during which documents for …
Data Management – DII (A Momentary Look)
Plan and Analyze Define Data Integration and Lifecycle Requirements Perform Data Discovery Document Data Lineage Profile Data Collect Business Rules Design Data Integration Solutions Design Data Integration Architecture Select Interaction …
Data Management – Data Profiling
Understanding Data Content and Structure is essential for Data Governance, Data Architecture, Data Modeling and Design, Data Storage and Operations, Data Security, Data Quality and Data Integration and Interoperability. Data …
DII – Data Interaction Models – P2P, Canonical and Publish/Subscribe
Canonical Model (Hub-and-Spoke) A Canonical Data Model is a common model used by an organization or data exchange group that standardizes the format in which data will be shared. In …