IO Utils#

class pyqrlew.io.PostgreSQL(name: str = 'pyqrlew-db', user: str = 'postgres', password: str = 'pyqrlew-db', port: int = 5433)[source]#

A class representing a PostgreSQL database with PyQrlew integration.

This class provides methods for interacting with an PostgreSQL database using the PyQrlew library.

Example

>>> import pyqrlew as qrl
>>> from pyqrlew.io import PostgreSQL
>>> import pandas as pd
>>> DB = PostgreSQL() # Setup a default database
>>> DB.load_extract() # Insert a demo dataset
>>> dataset = Dataset.from_database(name='extract', engine=DB.engine(), schema_name='extract')
>>> dataset.relations()
[(['extract', 'extract', 'beacon'], <Relation at 0x784b78176fd0>),
(['extract', 'extract', 'census'], <Relation at 0x784b78177490>)]
>>> dataset.extract.census.relation().schema()
{age: int, workclass: str, fnlwgt: str, education: str, education_num: int, marital_status: str, occupation: str, relationship: str, race: str, sex: str, capital_gain: int, capital_loss: int, hours_per_week: int, native_country: str, income: str}
>>> query = "SELECT AVG(age) AS avg_age FROM extract.census"
>>> pd.read_sql(query, DB.engine())
        avg_age
0       48.005155
Parameters:
  • name (str) –

  • user (str) –

  • password (str) –

  • port (int) –

load_extract() PostgreSQL[source]#

It loads the ‘extract’ dataset in the ‘extract’ schema. It internally calls the load_resource method

Returns:

The instance of the PostgreSQL class, allowing for method chaining.

Return type:

PostgreSQL

load_financial() PostgreSQL[source]#

It loads the ‘financial’ dataset in the ‘financial’ schema. It internally calls the load_resource method

Returns:

The instance of the PostgreSQL class, allowing for method chaining.

Return type:

PostgreSQL

load_hepatitis() PostgreSQL[source]#

It loads the ‘hepatitis’ dataset in the ‘hepatitis_std’ schema. It internally calls the load_resource method

Returns:

The instance of the PostgreSQL class, allowing for method chaining.

Return type:

PostgreSQL

load_imdb() PostgreSQL[source]#

It loads the ‘imdb’ dataset in the ‘imdb_ijs’ schema. It internally calls the load_resource method

Returns:

The instance of the PostgreSQL class, allowing for method chaining.

Return type:

PostgreSQL

load_retail() PostgreSQL[source]#

It loads the ‘retail’ dataset in the ‘retail’ schema. It internally calls the load_resource method

Returns:

The instance of the PostgreSQL class, allowing for method chaining.

Return type:

PostgreSQL

extract() Dataset[source]#

It loads the extract dataset in the database and it returns a qrlew Dataset with data ranges and categorical possible values if the unique column values are less than POSSIBLE_VALUES_THRESHOLD.

Return type:

Dataset

financial() Dataset[source]#

It loads the financial dataset in the database and it returns a qrlew Dataset with data ranges and categorical possible values if the unique column values are less than POSSIBLE_VALUES_THRESHOLD.

Return type:

Dataset

hepatitis() Dataset[source]#

It loads the hepatitis dataset in the database and it returns a qrlew Dataset with data ranges and categorical possible values if the unique column values are less than POSSIBLE_VALUES_THRESHOLD.

Return type:

Dataset

imdb() Dataset[source]#

It loads the imdb dataset in the database and it returns a qrlew Dataset with data ranges and categorical possible values if the unique column values are less than POSSIBLE_VALUES_THRESHOLD.

Return type:

Dataset

retail() Dataset[source]#

It loads the retail dataset in the database and it returns a qrlew Dataset with data ranges and categorical possible values if the unique column values are less than POSSIBLE_VALUES_THRESHOLD.

Return type:

Dataset

class pyqrlew.io.SQLite(db_file: str, ranges: bool)[source]#

A class representing an SQLite database with PyQrlew integration.

This class provides methods for interacting with an SQLite database using the PyQrlew library.

Example

>>> import pandas as pd
>>> qdb = SQLite("my_database.db")
>>> df = pd.DataFrame([[1, "a", True], [2, "b", False]], columns=["col1", "col2", "col3"])
>>> qdb.load_pandas("table1", df)
>>> qdb.print_infos_metadata()
Table: table1
- col1, Type: BIGINT
- col2, Type: TEXT
- col3, Type: BOOLEAN
>>> qdb.dataset().relations()
[(['table1'], <Relation at 0x120f47120>)]
>>> query = "SELECT SUM(col1) FROM table1"
>>> qdb.execute(query)
[(3,)]
>>> relation = qdb.dataset().sql(query)
>>> qdb.eval(relation)
[(3,)]
Parameters:
  • db_file (str) –

  • ranges (bool) –

load_csv(table_name: str, csv_file: str) None[source]#

Create a table in the SQLite database from data stored in a csv file.

Parameters:
  • table_name (str) – Name of the created table

  • csv_file (str) – Path or URL to the CSV file containing the data.

Return type:

None

load_pandas(table_name: str, df: DataFrame) None[source]#

Create a table in the SQLite database from data stored in a Pandas DataFrame.

Parameters:
  • table_name (str) – Name of the created table

  • df (pandas.DataFrame) – Data represented as a pandas DataFrame

Return type:

None

url() str[source]#

Get the URL string of the SQLite database.

Return type:

str

remove() None[source]#

Remove the SQLite database file from the filesystem.

Return type:

None

engine() Engine[source]#

Create and return an SQLAlchemy Engine instance for the SQLite database.

Return type:

Engine

eval(relation: Relation) Sequence[list][source]#

Convert a PyQrlew relation into a sql query string, send it to the database and return the result.

Parameters:

relation (Relation) –

Return type:

Sequence[list]

execute(query: str) Sequence[list][source]#

Execute a raw SQL query and return the result.

Parameters:

query (str) –

Return type:

Sequence[list]

dataset() Dataset[source]#

Create and return a PyQrlew Dataset linked to the SQLite database.

Return type:

Dataset

print_infos_metadata() None[source]#

Print metadata information about tables and columns in the database.

Return type:

None

pyqrlew.io.dataset.dataset_from_database(name: str, engine: Engine, schema_name: str | None = None, ranges: bool = False, possible_values_threshold: int | None = None) Dataset[source]#

Builds a Dataset from a sqlalchemy Engine

Parameters:
  • name (str) – Name of the Dataset

  • engine (Engine) – The sqlalchemy Engine to use

  • schema_name (Optional[str], optional) – The DB schema to use. Defaults to None.

  • ranges (bool, optional) – Use the actual min and max of the data as ranges. This is unsafe from a privacy perspective. Defaults to False.

  • possible_values_threshold (Optional[int], optional) – Use the actual observed values as range. This is unsafe from a privacy perspective. Defaults to None.

Return type:

Dataset