Department of Environmental Engineering
University of Genoa

- FLUBIO -

Marie Curie Early Stage Training Site at the University of Genova, Italy



Data analytics and exploratory data analysis


For those new to this world, here are a few definitions:


Currently I working a lot in the fields of data science, data analytics, exploratory data analysis and interactive data visualization.   My  intention is not only use data to perform predictive analytics, statistical learning, and communicate data thru interactive visualization systems, but I would like to use the same concepts to analyze the results obtained from design optimization and design space exploration studies. No need to mention that this is the starting point to data driven simulation. I am especially interested in working with data coming from multiphysics simulations and in sports analytics.


If you are interested in collaborating in any of these areas just drop me an email.

Hereafter, I summarize a few tools very useful to get you started:

•    R                             (https://www.r-project.org/)
•    Rstudio                    (https://www.rstudio.com/)
•    Python                     (https://www.python.org/)
•    Anaconda Python     (http://www.continuum.io/)
•    pandas                     (http://pandas.pydata.org/)
•    matplotlib                 (http://matplotlib.org/)
•    statmodels                (http://statsmodels.sourceforge.net/)
•    bokeh                       (http://bokeh.pydata.org/en/latest/)
•    scipy                         (http://www.scipy.org/)
•    numpy                      (http://www.numpy.org/)
•    scikit-learn                (http://scikit-learn.org/stable/)
•    ipython                      (http://ipython.org/)
•    d3.js                          (http://d3js.org/)


In this page I hope to share a few Python scripts and ipython notebooks that you can use to do data science.

But remember, the most important part is the data.   With no data, no DA, no EDA.

At the moment I have a few datasets obtained from optimization cases, if you are interested in using them just let me know.  Have in mind that you can find gazillion datasets in internet.




Link to tutorials:

Data scraping, data wrangling, data analytics, exploratory data analysis, advanced plotting and clustering with love.
This is an sport analytics tutorial and I address many interesting topics (just read the title). 


A word cloud generated using Python.
In this link a share a script to generate a word cloud plus a few very interesting links related to machine learning. 




Ongoing projects:

Web-based interactive data visualization and analysis toolkit.
The goal of this project is to enhance people's ability to understand and communicate data through the design of interactive systems for data visualization and analysis.





Link to online data-sets:

http://www.data.gov/
The home of the U.S. Government’s open data

http://www.data.go.jp/?lang=english
The Japanese government is promoting the Open Data initiative

https://www.census.gov/topics.html
US census datasets by topics

https://open-data.europa.eu/en/data/
European union open data portal

https://www.kaggle.com/
Is a platform for predictive modelling and analytics competitions on which companies and researchers post their data and statisticians and data miners from all over the world compete to produce the best models

http://jaberg.github.io/skdata/
Data sets for machine learning in Python


http://archive.ics.uci.edu/ml/datasets.html
UC Irvine Machine Learning Repository

https://vincentarelbundock.github.io/Rdatasets/datasets.html
R datasets






Joel GUERRERO
joel.guerrero@unige.it
Personal Web Page

DICCA, University of Genova
1, Via Montallegro
16145 Genova, Italy

Last update: 21/FEB/2018