IO Utils#

class pyqrlew.io.PostgreSQL(name: str = 'pyqrlew-db', user: str = 'postgres', password: str = 'pyqrlew-db', port: int = 5433)[source]#

A class representing a PostgreSQL database with PyQrlew integration.

This class provides methods for interacting with an PostgreSQL database using the PyQrlew library.

Example

>>> import pyqrlew as qrl
>>> from pyqrlew.io import PostgreSQL
>>> import pandas as pd
>>> DB = PostgreSQL() # Setup a default database
>>> DB.load_extract() # Insert a demo dataset
>>> dataset = Dataset.from_database(name='extract', engine=DB.engine(), schema_name='extract')
>>> dataset.relations()
[(['extract', 'extract', 'beacon'], <Relation at 0x784b78176fd0>),
(['extract', 'extract', 'census'], <Relation at 0x784b78177490>)]
>>> dataset.extract.census.relation().schema()
{age: int, workclass: str, fnlwgt: str, education: str, education_num: int, marital_status: str, occupation: str, relationship: str, race: str, sex: str, capital_gain: int, capital_loss: int, hours_per_week: int, native_country: str, income: str}
>>> query = "SELECT AVG(age) AS avg_age FROM extract.census"
>>> pd.read_sql(query, DB.engine())
        avg_age
0       48.005155

Parameters:

name (str) –
user (str) –
password (str) –
port (int) –

load_extract() → PostgreSQL[source]#

It loads the ‘extract’ dataset in the ‘extract’ schema. It internally calls the load_resource method

Returns:: The instance of the PostgreSQL class, allowing for method chaining.
Return type:: PostgreSQL

load_financial() → PostgreSQL[source]#

It loads the ‘financial’ dataset in the ‘financial’ schema. It internally calls the load_resource method

Returns:: The instance of the PostgreSQL class, allowing for method chaining.
Return type:: PostgreSQL

load_hepatitis() → PostgreSQL[source]#

It loads the ‘hepatitis’ dataset in the ‘hepatitis_std’ schema. It internally calls the load_resource method

Returns:: The instance of the PostgreSQL class, allowing for method chaining.
Return type:: PostgreSQL

load_imdb() → PostgreSQL[source]#

It loads the ‘imdb’ dataset in the ‘imdb_ijs’ schema. It internally calls the load_resource method

Returns:: The instance of the PostgreSQL class, allowing for method chaining.
Return type:: PostgreSQL

load_retail() → PostgreSQL[source]#

It loads the ‘retail’ dataset in the ‘retail’ schema. It internally calls the load_resource method

Returns:: The instance of the PostgreSQL class, allowing for method chaining.
Return type:: PostgreSQL

extract() → Dataset[source]#

It loads the extract dataset in the database and it returns a qrlew Dataset with data ranges and categorical possible values if the unique column values are less than POSSIBLE_VALUES_THRESHOLD.

Return type:: Dataset

financial() → Dataset[source]#

It loads the financial dataset in the database and it returns a qrlew Dataset with data ranges and categorical possible values if the unique column values are less than POSSIBLE_VALUES_THRESHOLD.

Return type:: Dataset

hepatitis() → Dataset[source]#

It loads the hepatitis dataset in the database and it returns a qrlew Dataset with data ranges and categorical possible values if the unique column values are less than POSSIBLE_VALUES_THRESHOLD.

Return type:: Dataset

imdb() → Dataset[source]#

It loads the imdb dataset in the database and it returns a qrlew Dataset with data ranges and categorical possible values if the unique column values are less than POSSIBLE_VALUES_THRESHOLD.

Return type:: Dataset

retail() → Dataset[source]#

It loads the retail dataset in the database and it returns a qrlew Dataset with data ranges and categorical possible values if the unique column values are less than POSSIBLE_VALUES_THRESHOLD.

Return type:: Dataset

class pyqrlew.io.SQLite(db_file: str, ranges: bool)[source]#

A class representing an SQLite database with PyQrlew integration.

This class provides methods for interacting with an SQLite database using the PyQrlew library.

Example

>>> import pandas as pd
>>> qdb = SQLite("my_database.db")
>>> df = pd.DataFrame([[1, "a", True], [2, "b", False]], columns=["col1", "col2", "col3"])
>>> qdb.load_pandas("table1", df)
>>> qdb.print_infos_metadata()
Table: table1
- col1, Type: BIGINT
- col2, Type: TEXT
- col3, Type: BOOLEAN
>>> qdb.dataset().relations()
[(['table1'], <Relation at 0x120f47120>)]
>>> query = "SELECT SUM(col1) FROM table1"
>>> qdb.execute(query)
[(3,)]
>>> relation = qdb.dataset().sql(query)
>>> qdb.eval(relation)
[(3,)]

Parameters:

db_file (str) –
ranges (bool) –

load_csv(table_name: str, csv_file: str) → None[source]#

Create a table in the SQLite database from data stored in a csv file.

Parameters:

table_name (str) – Name of the created table
csv_file (str) – Path or URL to the CSV file containing the data.

Return type:

None

load_pandas(table_name: str, df: DataFrame) → None[source]#

Create a table in the SQLite database from data stored in a Pandas DataFrame.

Parameters:

table_name (str) – Name of the created table
df (pandas.DataFrame) – Data represented as a pandas DataFrame

Return type:

None

url() → str[source]#

Get the URL string of the SQLite database.

Return type:: str

remove() → None[source]#

Remove the SQLite database file from the filesystem.

Return type:: None

engine() → Engine[source]#

Create and return an SQLAlchemy Engine instance for the SQLite database.

Return type:: Engine

eval(relation: Relation) → Sequence[list][source]#

Convert a PyQrlew relation into a sql query string, send it to the database and return the result.

Parameters:: relation (Relation) –
Return type:: Sequence[list]

execute(query: str) → Sequence[list][source]#

Execute a raw SQL query and return the result.

Parameters:: query (str) –
Return type:: Sequence[list]

dataset() → Dataset[source]#

Create and return a PyQrlew Dataset linked to the SQLite database.

Return type:: Dataset

print_infos_metadata() → None[source]#

Print metadata information about tables and columns in the database.

Return type:: None

pyqrlew.io.dataset.dataset_from_database(name: str, engine: Engine, schema_name: str | None = None, ranges: bool = False, possible_values_threshold: int | None = None) → Dataset[source]#

Builds a Dataset from a sqlalchemy Engine

Parameters:

name (str) – Name of the Dataset
engine (Engine) – The sqlalchemy Engine to use
schema_name (Optional[str], optional) – The DB schema to use. Defaults to None.
ranges (bool, optional) – Use the actual min and max of the data as ranges. This is unsafe from a privacy perspective. Defaults to False.
possible_values_threshold (Optional[int], optional) – Use the actual observed values as range. This is unsafe from a privacy perspective. Defaults to None.

Return type:

Dataset