Skip to content

overviewpy

PyPI experimental CI/CD

overviewpy aims to make it easy to get an overview of a data set by displaying relevant sample information.

Installation

$ pip install overviewpy

Usage

Implemented Functions

The goal of overviewpy is to make it easy to get an overview of a data set by displaying relevant sample information. At the moment, there are the following functions:

  • overview_tab generates a tabular overview of the sample (and returns a data frame). The general sample plots a two-column table that provides information on an id in the left column and a the time frame on the right column.
  • overview_na plots an overview of missing values by variable (both by row and by column)

overview_tab

Generate some general overview of the data set using the time and scope conditions with overview_tab. The resulting data frame collapses the time condition for each id by taking into account potential gaps in the time frame.

from overviewpy.overviewpy import overview_tab
import pandas as pd

data = {
       'id': ['RWA', 'RWA', 'RWA', 'GAB', 'GAB', 'FRA', \
        'FRA', 'BEL', 'BEL', 'ARG'],
       'year': [2022, 2023, 2021, 2023, 2020, 2019, 2015, \
        2014, 2013, 2002]
   }

df = pd.DataFrame(data)

df_overview = overview_tab(df=df, id='id', time='year')

overview_na

overview_na is a simple function that provides information about the content of all variables in your data, not only the time and scope conditions. It returns a horizontal ggplot bar plot that indicates the amount of missing data (NAs) for each variable (on the y-axis). You can choose whether to display the relative amount of NAs for each variable in percentage (the default) or the total number of NAs.

from overviewpy.overviewpy import overview_na
import pandas as pd
import numpy as np

data_na = {
        'id': ['RWA', 'RWA', 'RWA', np.nan, 'GAB', 'GAB',\
            'FRA', 'FRA', 'BEL', 'BEL', 'ARG', np.nan,  np.nan],
        'year': [2022, 2001, 2000, 2023, 2021, 2023, 2020, \
            2019,  np.nan, 2015, 2014, 2013, 2002]
    }

df_na = pd.DataFrame(data_na)

overview_na(df_na)

Roadmap

overviewpy seeks to mirror the functionality of overviewR and will extend its features with the following functionality in the future:

  • overview_crosstab generates a cross table. The conditional column allows to disaggregate the overview table by specifying two conditions, hence resulting a 2x2 table. This way, it is easy to visualize the time and scope conditions as well as theoretical assumptions with examples from the data set.
  • overview_latex converts the output of both overview_tab and overview_crosstab into LaTeX code and/or directly into a .tex file.
  • overview_plot is an alternative to visualize the sample (a way to present results from overview_tab)
  • overview_crossplot is an alternative to visualize a cross table (a way to present results from overview_crosstab)
  • overview_heat plots a heat map of your time line
  • overview_overlap plots comparison plots (bar graph and Venn diagram) to compare to data frames

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

overviewpy is licensed under the terms of the BSD 3-Clause license.

Credits

overviewpy was created with cookiecutter and the py-pkgs-cookiecutter template.