
Introduction
The focus of this course is int the programming and basic techniques for inference that are usually applied in data science. We start by reviewing and enforcing programming skills. Then we will use the database of entomological data practice and build the required bases for more structured tools like bootstrap or Jackknife cuts.
Figure 1 further explores the impact of temperature on ozone level.
Rubrics for a Data Analyst (Associate and professional)
We also want to cover the fundamentals of a common Data Analyst hast to know according to the DataCamp’s rubrics at the current date (March-01, 2023).
| Topics | Competnecy | Sufficeint | Insufficent |
|---|---|---|---|
| Data Validation | Assess data quality and perform validation tasks | Has validated all variables against provided criteria and where necessary has performed cleaning tasks to result in analysis-ready data | Has not conducted all the required checks and/or has not cleaned the data. May have removed data rather than performed cleaning tasks |
| Data Visualization | Create data visualizations to demonstrate the characteristics of data and represent relationships between features | Has created at least two different visualizations of single variables (e.g. histogram, bar chart, single boxplot) Has created at least one visualization including two or more variables (e.g. scatterplot, filled bar chart, multiple boxplots) Has used visualizations that support the findings being presented | Has used the same visualization throughout Has not included graphics to represent single variables and relationships Has not used visualizations that support the findings being presented |
| Communication | Presents data concepts to small, diverse audiences | For each analysis step, has explained their findings and/or the reasoning for selecting approaches | Has not provided a summary for each step (data validation, exploratory analysis) |
| Topics | Competency | Sufficient | Insufficient |
|---|---|---|---|
| Data Validation | Assess data quality and perform validation tasks | Has validated all variables and where necessary has performed cleaning tasks to result in analysis-ready data | Has not conducted all the required checks and/or has not cleaned the data. May have removed data rather than performed cleaning tasks |
| Data Visualization | Create data visualizations to demonstrate the characteristics of data and represent relationships between features | Has created at least two different visualizations of single variables (e.g. histogram, bar chart, single boxplot) Has created at least one visualization including two or more variables (e.g. scatterplot, filled barchart, multiple boxplots) Has used visualizations that support the findings being presented | Has used the same visualization throughout Has not included graphics to represent single variables and relationships Has not used visualizations that support the findings being presented |
| Business Focus | Collects relevant information, detects patterns, observes and interprets data | Has described at least one of the business goals of the project \n Has explained how their work has addressed the business problem \n Has provided at least one recommendation for future action to be taken based on the outcome of the work done | Has not identified any business goals Has not explained how their work has addressed the business problem Has not provided any recommendations for future actions |
| Business Metrics | Benchmarks, monitors, and evaluates business processes | Has defined a metric that can be used by the business in the future to measure success in solving the problem Has evaluated the metric using the existing data to provide a baseline measure for the problem | Has not identified a metric to compare the model performance to the business problem or has not shown the metric with the current data |
| Communication | Employs multiple tactics (written and verbal) to communicate to business leaders | For each analysis step, has provided a written explanation of their findings and/or reasoning for selecting approaches \n Has delivered a verbal presentation addressing the business goals, outcomes and recommendations | Has not provided a written summary for each step Has not delivered a verbal presentation |
Here we also enclose the material to take the Skill Assessments for Data Analyst and Data Scientist certifications.
DataCamp recommends that you complete the follwing tracks the associate and professional certification
- Data Analyst with SQL career track for
- Data Analyst with R or
- Data Analyst with Python career tracks for the professional certification being R or Python accordingly
| SKILL ASSESMENT | ASSOCIATE | PROFESSIONAL |
|---|---|---|
| Data Management in SQL (PostgreSQL) | 134 | 134 |
| Data Analysis in SQL (PostgreSQL) | 115 | 115 |
| Importing & Cleaning Data in R/Python | NA | 111 |
| Data Manipulation in R/Python | NA | 111 |
| Statistical Fundamentals in R/Python | NA | 125 |
DataCamp recommends that you complete the following tracks
- SQL Fundamentals skill
- Data Scientist with R (career)
- Data Scientist with Python (career)
you may want to enroll in the Data Communication Concepts course to prepare for the practical exam.
| SKILL ASSESMENT | ASSOCIATE | PROFESSIONAL |
|---|---|---|
| Data Manipulation in R/Python | 131 | 131 |
| Statistical Fundamentals in R/Python | 125 | 125 |
| Importing & Cleaning in R/Python | 130 | 149 |
| Data Management in SQL (PostgreSQL) | NA | 134 |
| Machine Learning Fundamentals in R/Python | 119 | 119 |
| R / Python Programming | 120 | 140 |
The tidyverse
We need to install a R package. The majority of the packages that we will use are part of the so-called tidyverse package. The packages in the tidyverse share a common philosophy of data and R programming, and are designed to work together naturally.
You can install the complete tidyverse with the line of code:
then we can use it by loading in the preamble section with
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ tibble 3.1.8 ✔ dplyr 1.0.10
✔ tidyr 1.2.0 ✔ stringr 1.5.0
✔ readr 2.1.2 ✔ forcats 0.5.2
✔ purrr 0.3.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::group_rows() masks kableExtra::group_rows()
✖ dplyr::lag() masks stats::lag()
see https://www.tidyverse.org/ documentation.