Researchers

IT Services Blog

IT services blog

Exploratory data analysis with streamlit, sweetviz and pandas profiling

by Anson Parker on 2021-06-01T15:50:44-04:00 | Comments

Exploratory data analysis (EDA)

allows developers and programmers to provide stakeholders with a clearer understanding of what questions may reasonably be asked of a dataset with very little programming effort.  How much data is actually present in every row or "what are the unique, or most common values in this column" are some basic questions can help shave up to 30% of the data science workflow experience off according to some random source on the internet, and from my perspective is just an essential first step, period.

Carnegie Mellon has a deep-dive chapter on the subject

https://www.stat.cmu.edu/~hseltman/309/Book/chapter4.pdf

and here's a brief and reasonably concise overview https://www.svds.com/value-exploratory-data-analysis/

EDA in Python

Pandas profiling and Sweetviz are simple installs that work well with Streamlit, 

To test you can set up a streamlit share and then install 

here's some python code wrapped in streamlit that provides both for you to test with a CSV of your choosing

https://github.com/alibama/code-for-cville/blob/master/divides.py 

--- i  pulled most of this from the video here...  

https://www.youtube.com/watch?v=zWiliqjyPlQ - this video goes in depth - i skipped to about minute 30 to get in to the sweetviz stuff and then headed over to the Github repo 

https://github.com/Jcharis/Streamlit_DataScience_Apps/blob/master/EDA_app_with_Streamlit_Components/app.py

to use this file as a basis for an even more stripped down version seen above


 Add a Comment

0 Comments.

  Return to Blog
This post is closed for further discussion.

Skip to Main Content

Claude Moore Health Sciences Library
1350 Jefferson Park Avenue P.O. Box 800722
Charlottesville, VA 22908 (Directions)

facebook twitter instagram
© 2021 by the Rector and Visitors of the University of Virginia
Copyright & Privacy