Skip to content

Commit 5bcb1e2

Browse files
ReadMe outline (#86)
* ReadMe outline * google form section * some text * more README * some README stuff * Added GIF and Google Form Link * few chanhges * updated readme * feedback * fix form link * forgot period * update readme * more text for blob store * update link --------- Co-authored-by: Daniel Kang <[email protected]>
1 parent b1938cd commit 5bcb1e2

File tree

3 files changed

+133
-0
lines changed

3 files changed

+133
-0
lines changed

README.md

+104
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
<h1 style="text-align: center;">AIDB</h1>
2+
3+
<p align="center"> Analyze unstructured data blazingly fast with machine learning. Connect your own ML models to your own data sources and query away! </p>
4+
5+
<p align="center">
6+
<img src="assets/aidbuse.gif" style="width:550px;"/>
7+
</p>
8+
9+
## Quick Start
10+
11+
In order to start using AIDB, all you need to do is install the requirements, specify a configuration, and query!
12+
Setting up on the environment is as simple as
13+
```bash
14+
git clone https://github.com/ddkang/aidb.git
15+
cd aidb
16+
pip install -r requirements.txt
17+
18+
# Optional if you'd like to run the examples below
19+
gdown https://drive.google.com/uc?id=1SyHRaJNvVa7V08mw-4_Vqj7tCynRRA3x
20+
unzip data.zip -d tests/
21+
22+
```
23+
24+
### Text Example (in CSV)
25+
26+
We've set up an example of analyzing product reviews with HuggingFace. Set your HuggingFace API key. After this, all you need to do is run
27+
```bash
28+
python launch.py --config=config.sentiment --setup-blob-table --setup-output-table
29+
```
30+
31+
As an example query, you can run
32+
```sql
33+
SELECT AVG(score)
34+
FROM sentiment
35+
WHERE label = '5 stars'
36+
ERROR_TARGET 10%
37+
CONFIDENCE 95%;
38+
```
39+
40+
You can see the mappings [here](https://github.com/ddkang/aidb/blob/main/config/sentiment.py#L15). We use the HuggingFace API to generate sentiments from the reviews.
41+
42+
43+
### Image Example (local directory)
44+
45+
We've also set up another example of analyzing whether or not user-generated content is adult content for filtering.
46+
In order to run this example, all you need to do is run
47+
```bash
48+
python launch.py --config=config.nsfw_detect --setup-blob-table --setup-output-table
49+
```
50+
51+
As an example query, you can run
52+
```sql
53+
SELECT *
54+
FROM nsfw
55+
WHERE racy LIKE 'POSSIBLE';
56+
```
57+
58+
You can see the mappings [here](https://github.com/ddkang/aidb/blob/main/config/nsfw_detect.py#L10). We use the Google Vision API to generate the safety labels.
59+
60+
61+
62+
## Key Features
63+
64+
AIDB focuses on keeping cost down and interoperability high.
65+
66+
We reduce costs with our optimizations:
67+
- First-class support for approximate queries, reducing the cost of aggregations by up to **350x**.
68+
- Caching, which speeds up multiple queries over the same data.
69+
70+
We keep interoperability high by allowing you to bring your own data source, ML models, and vector databases!
71+
72+
73+
## Approximate Querying
74+
75+
One key feature of AIDB is first-class support for approximate queries.
76+
Currently, we support approximate `AVG`, `COUNT`, and `SUM`.
77+
We don't currently support `GROUP BY` or `JOIN` for approximate aggregations, but it's on our roadmap.
78+
Please reach out if you'd like us to support your queries!
79+
80+
In order to execute an approximate aggregation query, simply append `ERROR_TARGET <error percent>% CONFIDENCE <confidence>%` to your normal aggregation.
81+
As a full example, you can compute an approximate count by doing:
82+
```sql
83+
SELECT COUNT(xmin)
84+
FROM objects
85+
ERROR_TARGET 5%
86+
CONFIDENCE 95%;
87+
```
88+
89+
The `ERROR_TARGET` specifies the percent error _compared to running the query exactly._
90+
For example, if the true answer is 100, you will get answers between 95 and 105 (95% of the time).
91+
92+
## Useful Links
93+
- [How to connect ML APIs](https://github.com/ddkang/aidb/blob/main/aidb/inference/examples/README.md)
94+
- [How to define configuration file](https://github.com/ddkang/aidb/tree/main/config)
95+
- [Connecting to Data Store](https://github.com/ddkang/aidb/tree/main/aidb_utilities/blob_store)
96+
97+
## Contribute
98+
99+
We have many improvements we'd like to implement. Please help us! For the time being, please [email](mailto:[email protected]) us, if you'd like to help contribute.
100+
101+
102+
## Contact Us
103+
104+
Need help in setting up AIDB for your specific dataset or want a new feature? Please fill [this form](https://forms.gle/YyAXWxqzZPVBrvBR7).

aidb_utilities/blob_store/README.md

+29
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Connecting to Data Stores
2+
3+
We provide utilities to connect to different forms of data stores.
4+
You can also implement your own.
5+
6+
7+
## Images stored in the local storage
8+
9+
In our first example, we show how to access images stored in local storage:
10+
11+
```python
12+
local_image_store = LocalImageBlobStore(data_dir)
13+
image_blobs = local_image_store.get_blobs()
14+
base_table_setup = BaseTablesSetup(DB_URL)
15+
base_table_setup.insert_blob_meta_data('blob00', input_blobs, ['blob_id'])
16+
```
17+
18+
19+
20+
## Documents stored in the AWS S3 storage
21+
22+
We also show how to access documents stored in S3:
23+
24+
```python
25+
aws_doc_store = AwsS3DocumentBlobStore('bucket-name', '<your-aws-access-key>', 'your-secret-key')
26+
doc_blobs = aws_doc_store.get_blobs()
27+
base_table_setup = BaseTablesSetup(DB_URL)
28+
base_table_setup.insert_blob_meta_data('blob00', doc_blobs, ['blob_id'])
29+
```

assets/aidbuse.gif

3.65 MB
Loading

0 commit comments

Comments
 (0)