IO Utils#
- class pyqrlew.io.PostgreSQL(name: str = 'pyqrlew-db', user: str = 'postgres', password: str = 'pyqrlew-db', port: int = 5433)[source]#
A class representing a PostgreSQL database with PyQrlew integration.
This class provides methods for interacting with an PostgreSQL database using the PyQrlew library.
Example
>>> import pyqrlew as qrl >>> from pyqrlew.io import PostgreSQL >>> import pandas as pd >>> DB = PostgreSQL() # Setup a default database >>> DB.load_extract() # Insert a demo dataset >>> dataset = Dataset.from_database(name='extract', engine=DB.engine(), schema_name='extract') >>> dataset.relations() [(['extract', 'extract', 'beacon'], <Relation at 0x784b78176fd0>), (['extract', 'extract', 'census'], <Relation at 0x784b78177490>)] >>> dataset.extract.census.relation().schema() {age: int, workclass: str, fnlwgt: str, education: str, education_num: int, marital_status: str, occupation: str, relationship: str, race: str, sex: str, capital_gain: int, capital_loss: int, hours_per_week: int, native_country: str, income: str} >>> query = "SELECT AVG(age) AS avg_age FROM extract.census" >>> pd.read_sql(query, DB.engine()) avg_age 0 48.005155
- load_extract() PostgreSQL[source]#
It loads the ‘extract’ dataset in the ‘extract’ schema. It internally calls the load_resource method
- Returns:
The instance of the PostgreSQL class, allowing for method chaining.
- Return type:
- load_financial() PostgreSQL[source]#
It loads the ‘financial’ dataset in the ‘financial’ schema. It internally calls the load_resource method
- Returns:
The instance of the PostgreSQL class, allowing for method chaining.
- Return type:
- load_hepatitis() PostgreSQL[source]#
It loads the ‘hepatitis’ dataset in the ‘hepatitis_std’ schema. It internally calls the load_resource method
- Returns:
The instance of the PostgreSQL class, allowing for method chaining.
- Return type:
- load_imdb() PostgreSQL[source]#
It loads the ‘imdb’ dataset in the ‘imdb_ijs’ schema. It internally calls the load_resource method
- Returns:
The instance of the PostgreSQL class, allowing for method chaining.
- Return type:
- load_retail() PostgreSQL[source]#
It loads the ‘retail’ dataset in the ‘retail’ schema. It internally calls the load_resource method
- Returns:
The instance of the PostgreSQL class, allowing for method chaining.
- Return type:
- extract() Dataset[source]#
It loads the extract dataset in the database and it returns a qrlew Dataset with data ranges and categorical possible values if the unique column values are less than POSSIBLE_VALUES_THRESHOLD.
- Return type:
- financial() Dataset[source]#
It loads the financial dataset in the database and it returns a qrlew Dataset with data ranges and categorical possible values if the unique column values are less than POSSIBLE_VALUES_THRESHOLD.
- Return type:
- hepatitis() Dataset[source]#
It loads the hepatitis dataset in the database and it returns a qrlew Dataset with data ranges and categorical possible values if the unique column values are less than POSSIBLE_VALUES_THRESHOLD.
- Return type:
- class pyqrlew.io.SQLite(db_file: str, ranges: bool)[source]#
A class representing an SQLite database with PyQrlew integration.
This class provides methods for interacting with an SQLite database using the PyQrlew library.
Example
>>> import pandas as pd >>> qdb = SQLite("my_database.db") >>> df = pd.DataFrame([[1, "a", True], [2, "b", False]], columns=["col1", "col2", "col3"]) >>> qdb.load_pandas("table1", df) >>> qdb.print_infos_metadata() Table: table1 - col1, Type: BIGINT - col2, Type: TEXT - col3, Type: BOOLEAN >>> qdb.dataset().relations() [(['table1'], <Relation at 0x120f47120>)] >>> query = "SELECT SUM(col1) FROM table1" >>> qdb.execute(query) [(3,)] >>> relation = qdb.dataset().sql(query) >>> qdb.eval(relation) [(3,)]
- load_csv(table_name: str, csv_file: str) None[source]#
Create a table in the SQLite database from data stored in a csv file.
- load_pandas(table_name: str, df: DataFrame) None[source]#
Create a table in the SQLite database from data stored in a Pandas DataFrame.
- Parameters:
table_name (str) – Name of the created table
df (
pandas.DataFrame) – Data represented as a pandas DataFrame
- Return type:
None
- engine() Engine[source]#
Create and return an SQLAlchemy Engine instance for the SQLite database.
- Return type:
Engine
- eval(relation: Relation) Sequence[list][source]#
Convert a PyQrlew relation into a sql query string, send it to the database and return the result.
- pyqrlew.io.dataset.dataset_from_database(name: str, engine: Engine, schema_name: str | None = None, ranges: bool = False, possible_values_threshold: int | None = None) Dataset[source]#
Builds a Dataset from a sqlalchemy Engine
- Parameters:
name (str) – Name of the Dataset
engine (Engine) – The sqlalchemy Engine to use
schema_name (Optional[str], optional) – The DB schema to use. Defaults to None.
ranges (bool, optional) – Use the actual min and max of the data as ranges. This is unsafe from a privacy perspective. Defaults to False.
possible_values_threshold (Optional[int], optional) – Use the actual observed values as range. This is unsafe from a privacy perspective. Defaults to None.
- Return type: