IT services blog - HSL at University of Virginia-Claude Moore Health Sciences Library

Exploratory data analysis (EDA)

allows developers and programmers to provide stakeholders with a clearer understanding of what questions may reasonably be asked of a dataset with very little programming effort. How much data is actually present in every row or "what are the unique, or most common values in this column" are some basic questions can help shave up to 30% of the data science workflow experience off according to some random source on the internet, and from my perspective is just an essential first step, period.

Carnegie Mellon has a deep-dive chapter on the subject

https://www.stat.cmu.edu/~hseltman/309/Book/chapter4.pdf

and here's a brief and reasonably concise overview https://www.svds.com/value-exploratory-data-analysis/

EDA in Python

Pandas profiling and Sweetviz are simple installs that work well with Streamlit,

To test you can set up a streamlit share and then install

https://pypi.org/project/streamlit-pandas-profiling/ - a Streamlit-ready library for pandas profiling - univariate and some multivariate (2 variables max) graphical analysis
https://pypi.org/project/sweetviz/ - and some "tutorial" info https://discuss.streamlit.io/t/this-is-how-to-use-sweetviz-with-streamlit/10897

here's some python code wrapped in streamlit that provides both for you to test with a CSV of your choosing

https://github.com/alibama/code-for-cville/blob/master/divides.py

--- i pulled most of this from the video here...

https://www.youtube.com/watch?v=zWiliqjyPlQ - this video goes in depth - i skipped to about minute 30 to get in to the sweetviz stuff and then headed over to the Github repo

https://github.com/Jcharis/Streamlit_DataScience_Apps/blob/master/EDA_app_with_Streamlit_Components/app.py

to use this file as a basis for an even more stripped down version seen above

Claude Moore Health Sciences Library

Exploratory data analysis with streamlit, sweetviz and pandas profiling

Exploratory data analysis (EDA)

EDA in Python

Leave a comment

Commenting on blog posts requires an account.

Login is required to interact with this comment. Please and try again.

If you do not have an account, Register Now.

Exploratory data analysis with streamlit, sweetviz and pandas profiling

Exploratory data analysis (EDA)

EDA in Python

Leave a comment

Commenting on blog posts requires an account.

Login is required to interact with this comment. Please login here and try again.

If you do not have an account, Register Now.

Login is required to interact with this comment. Please and try again.