> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lyzr.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Pythonic Analysis Agent

> LLM-powered pythonic conversational analytics

DataAnalyzr's pythonic analysis provides users with a comprehensive toolkit for conducting machine learning-based analysis using Python's extensive array of libraries such as NumPy, pandas, scikit-learn, and statsmodels.
Users can tap into Python's vast ecosystem of machine learning tools and techniques to derive actionable insights from their data and solve complex analytical problems.

<img width="100%" src="https://mintcdn.com/lyzrinc/bGxeLdNqyZVWUWU3/assets/images/pre-built-agents/data_analyzr_flow.svg?fit=max&auto=format&n=bGxeLdNqyZVWUWU3&q=85&s=6d06adc81cb7099b195d7ddc8ba565c7" data-path="assets/images/pre-built-agents/data_analyzr_flow.svg" />

## Getting started

Data analysis with DataAnalyzr includes four simple steps:

<Steps>
  <Step title="Install lyzr">
    Install the `lyzr` package with the `data-analyzr` variant using pip.
  </Step>

  <Step title="Create an instance">
    Create an instance of the `DataAnalyzr` class with the desired analysis type and API key.
  </Step>

  <Step title="Load data">
    Load data from files, Redshift, PostgreSQL, or SQLite databases using the `get_data` method.
  </Step>

  <Step title="Ask a question">
    Ask a question using the `ask` method to generate visualizations, insights, recommendations, and tasks.
  </Step>
</Steps>

Let's go over these steps one by one for pythonic data analysis.

## Installation

The first step is to install using pip. In order to use lyzr's data analysis capabilities, install it with the `data-analyzr` variant.

```bash theme={null}
pip install lyzr[data-analyzr]
```

This will install the `lyzr` package alongside all dependencies required for data analysis.

## Creating an instance

The next step is to create a class instance.

```python theme={null}
from lyzr import DataAnalyzr

analyzr = DataAnalyzr(analysis_type="ml", api_key="openai_key")
```

The `analysis_type` parameter can take three options:

1. `ml` - for analysis with Python code, using Pandas, Scikit-learn and other similar packages.
2. `sql` - for SQL analysis.
3. `skip` - if you want to skip the analysis altogether, and get insights directly from the uploaded data.

<Note>Note that you can also provide an `api_key` parameter. This parameter is optional and given as an alternative to setting the environment variable.</Note>
<Info>For documentation on the SQL-based analysis, please refer to the [SQL Analysis Agent](/pre-built-agents/data-analyzr/text-sql-agent) documentation.</Info>

For details on all the options available in instantiating the `DataAnalyzr` class, [visit the API reference](/pre-built-agents/data-analyzr/api-reference.mdx).

## Loading data

DataAnalyzr provides multiple options for connecting with your data.
Whether you are working with data files in CSV, Excel, JSON, etc.
formats, or you want to connect to an online database in Redshift, or perhaps you have a local SQLite database, there is an option for you.

The class method used to connect with data is `get_data`.
It takes three parameters - `db_type`, `db_config`, and `vector_store_config` - the values of which depend on the format of the input data.
Let's look at a couple of examples:

### Loading data from files

Collect all your data files in a list of dictionaries, with their names, paths and keyword arguments.
Then pass this dictionary when calling the `get_data` method.

```python theme={null}
db_config = {
    "datasets": [
        {
            "name": "Name of dataset",
            "value": "path/to/dataset.csv",
            # files can be in .csv, .xlsx, .xls, and .json formats
        },
        {
            "name": "Name of dataset",
            "value": "path/to/dataset.xlsx",
            "kwargs": {"sheet_name": "Sheet1"},
            # pass additional keyword arguments for reading the file
        },
        {
            "name": "Name of dataset",
            "value": pd.read_csv("path/to/dataset.csv"),
            # you can also pass pandas DataFrame objects
        },
    ]
}
# Call the get_data method
analyzr.get_data(
    db_type = "files",
    db_config = db_config,
)
```

The value of `db_type` tells the system:

* which type of data it will need to explore
* what to expect in the db\_config parameter

### Loading data from Redshift

As a first step, you will need to collect all the Redshift details.

```python theme={null}
redshift_config = {
    "host": "",
    "port": 3000, # e.g. 5439
    "database": "",
    "user": "",
    "password": "",
    "schema": ["schema1", "public"], # optional
    "tables": ["table1", "table2"], # optional
}
analyzr.get_data(
    db_type = "redshift",
    db_config = redshift_config,
    vector_store_config = { # optional
        "path": "path/to/vector/store",
        "remake": True, # overwrite the existing vector store, if any
    }
)
```

Once again, note that the value of `db_type` tells the system:

* which type of data it will need to explore
* what to expect in the db\_config parameter

While you may pass only one `database` in `db_config`, the number of tables and schemas is not limited, they are passed as lists.
If no `schema` and `tables` are passed, all the tables from the `public` schema are taken.

### Loading data from PostgreSQL

The implementation for PostgreSQL is very similar to that for Redshift.
Start by collecting all the DB details.

```python theme={null}
postgres_config = {
    "host": "",
    "port": 3001, # e.g. 5439
    "database": "",
    "user": "",
    "password": "",
    "schema": ["schema1", "public"], # optional
    "tables": ["table1", "table2"], # optional
}
analyzr.get_data(
    db_type = "postgres",
    db_config = postgres_config,
)
```

<Note>
  Note that the value of `db_type` is now `postgres`, while everything else is the same.
</Note>

### Loading data from SQLite

A local SQLite database can also be used for analysis with DataAnalyzr.
You only need to pass the path to this database in the `db_config` parameter.

```python theme={null}
analyzr.get_data(
    db_type = "sqlite",
    db_config = {"db_path": "path/to/sqlite.db"},
)
```

Alternatively, if you have a URL which holds the SQLite database, you can pass this URL as the value of `db_path`.

## Getting results

You can use the `DataAnalyzr` object to perform an analysis on the DataFrame by passing an analysis query to the method `ask`.
This function enables you to ask questions directly related to the data at hand, allowing DataAnalyzr to process the inquiry and provide the corresponding visualisation, insights, recommendations, and tasks.

A most simple such implementation looks like this:

```python theme={null}
result = analyzr.ask("Your question here") # returns dict
```

Here, `result` has keys `visualisation`, `insights`, `recommendations` and `tasks`.

You can control the outputs received from `ask`:

```python theme={null}
result = analyzr.ask(
    user_input = "Your question here",
    outputs = ["insights"],
)
```

Here, result still has the keys `"visualisation"`, `“insights”`, `“recommendations”` and `“tasks”` but their values are changed.

```python theme={null}
>>> print(result["insights"])
This is the insights generated by the LLM analysis...
>>> print(result["recommendations"])
	
>>> repr(result["recommendations"])
"''"
```

You can also specify the context for the analysis and the generation of outputs.
For example, you could let the system know that this analysis is for a specific user profile, or that the outputs should be presented in a specific way.

```python theme={null}
# A humorous example
result = analyzr.ask(
    "How many columns in my dataset?",
    context = {"insights": "Give your response as a pirate."},
    outputs = ["insights"],
)
print(result['insights'])
- Yer dataset be havin' 11 columns in total, matey.
- Thar be 147 rows o' data, with ranks spannin' from 1 to 147.
- The mean rank be 74, which be the midpoint o' the data, arrr.
```

## Getting visualisations

To retrieve plots from your analysis, use the `ask` method and pass `"visualisation"` in your `outputs` parameter.
The dictionary returned has a key `"visualisation"` which contains the path to the PNG image. By default, visualization images are saved to `./generated_plots/plot.png`, but this can be controlled using the `plot_path` parameter. Here's an example:

```python theme={null}
result = analyzr.ask(
    "Your query here",
    outputs = ["visualisation"],
    plot_path = "my_image.png",
)
```

The result then has a `"visualisation"` key which is simply the value of `plot_path`:

```python theme={null}
>>>> print(result["visualisation"])
my_image.png
```

It is also possible to pass a directory to `plot_path`.
The generated image is then saved in the directory.
Additionally, you may pass a context for the image generation.

```python theme={null}
result = analyzr.ask(
    "Your query here",
    outputs = ["visualisation"],
    context = {"visualisation": "Your visualisation context here"},
    plot_path = "my_directory",
)
print(result["visualisation"])
> my_directory/plot.png
```

You can then view the image using `pillow` or `matplotlib`:

```python theme={null}
from PIL import Image

image = Image.open(result["visualisation"])
image.show()

# OR ----

import matplotlib.pyplot as plt
import matplotlib.image as mpimg

img = mpimg.imread(result["visualisation"])
imgplot = plt.imshow(img)
plt.axis('off')
plt.show()
```
