Method to load data

DataAnalyzr.get_data(
    db_type: Literal["files", "redshift", "postgres", "sqlite"],
    db_config: dict,
    vector_store_config: dict = {},
) -> None

This method retrieves data from various types of databases or files based on the provided configuration. It also creates a vector store for the data. This method must be called before performing any analysis.

The required keys in the config dictionary depend on the specified db_type.
The vector_store_config dictionary is optional and can be used to configure the vector store.
Sets the df_dict, database_connector and vector_store attributes of the DataAnalyzr object.
The method does not return any value.

Parameters

db_type

Literal['files', 'redshift', 'postgres', 'sqlite']

required

The type of database to connect to.

db_type = "postgres"

db_config

dictionary

required

Configuration dictionary for the database connection.

When db_type is files:

db_config = {
    "datasets": [
        {
            "name": "dataset1",
            "value": "path/to/dataset1.csv",
            # files can be in .csv, .xlsx, .xls, and .json formats
        },
        {
            "name": "dataset2",
            "value": "path/to/dataset2.xlsx",
            "kwargs": {"sheet_name": "Sheet1"},
            # pass optional keyword arguments for reading the file
        },
        {
            "name": "dataset3",
            "value": pd.read_csv("path/to/dataset3.csv"),
            # you can also pass pandas DataFrame objects
        },
    ],
    "db_path": "path/to/construct/sqlite.db", # optional
}

Show details

datasets

list

List of dictionaries containing the name and value of the datasets to load.

db_path

string

Location where a SQLite database must be created. Only relevant when analysis_type is sql. Defaults to sqlite/<random-path>.db.

When db_type is redshift or postgres:

db_config = {
    "host": "localhost",
    "port": 5432,
    "user": "username",
    "password": "password",
    "database": "dbname",
    "schema": ["schema_name1", "schema_name2"], # optional
    "tables": ["table_name1", "table_name2"], # optional
}

Show details

host

string

Hostname of the database server.

port

integer

Port number of the database server.

user

string

Username for the database connection.

password

string

Password for the database connection.

database

string

Name of the database to connect to.

schema

list

List of schema names to load. Defaults to all schemas not in the information_schema and pg_catalog schemas.

tables

list

List of table names to load. Defaults to all tables in the specified schema.

When db_type is sqlite:

db_config = {
    "db_path": "path/to/sqlite.db",
}

Show details

db_path

string

Path to the SQLite database file.

vector_store_config

dictionary

Configuration dictionary for the vector store.

vector_store_config = {
    "path": "path/to/vector_store", # optional
    "remake_store": False # optional
}

For details on vector store usage and configuration, refer to the Vector Store guide.

Show details

path

string

Path to the vector store. If not vector store is found at the specified path, a new one will be created. Defaults to vector_store/<random-path>.

remake_store

boolean

Whether to recreate the vector store. If set to True, the vector store will be recreated. Defaults to False.

Example usage

db_config = {
    "datasets": [
        {
            "name": "dataset1",
            "value": "path/to/dataset1.csv",
            # files can be in .csv, .xlsx, .xls, and .json formats
        },
        {
            "name": "dataset2",
            "value": "path/to/dataset2.xlsx",
            "kwargs": {"sheet_name": "Sheet1"},
            # pass optional keyword arguments for reading the file
        },
        {
            "name": "dataset3",
            "value": pd.read_csv("path/to/dataset3.csv"),
            # you can also pass pandas DataFrame objects
        },
    ],
    "db_path": "path/to/construct/sqlite.db", # optional
}
vector_store_config = {
    "path": "path/to/vector_store",
    "remake_store": False,
}
data_analyzr.get_data(
    db_type="files",
    db_config=db_config,
    vector_store_config=vector_store_config, # optional
)

Cookbooks

​Parameters

​Example usage

Parameters

Example usage