|
| 1 | +<h1 style="text-align: center;">AIDB</h1> |
| 2 | + |
| 3 | +<p align="center"> Analyze unstructured data blazingly fast with machine learning. Connect your own ML models to your own data sources and query away! </p> |
| 4 | + |
| 5 | +<p align="center"> |
| 6 | + <img src="assets/aidbuse.gif" style="width:550px;"/> |
| 7 | +</p> |
| 8 | + |
| 9 | +## Quick Start |
| 10 | + |
| 11 | +In order to start using AIDB, all you need to do is install the requirements, specify a configuration, and query! |
| 12 | +Setting up on the environment is as simple as |
| 13 | +```bash |
| 14 | +git clone https://github.com/ddkang/aidb.git |
| 15 | +cd aidb |
| 16 | +pip install -r requirements.txt |
| 17 | + |
| 18 | +# Optional if you'd like to run the examples below |
| 19 | +gdown https://drive.google.com/uc?id=1SyHRaJNvVa7V08mw-4_Vqj7tCynRRA3x |
| 20 | +unzip data.zip -d tests/ |
| 21 | + |
| 22 | +``` |
| 23 | + |
| 24 | +### Text Example (in CSV) |
| 25 | + |
| 26 | +We've set up an example of analyzing product reviews with HuggingFace. Set your HuggingFace API key. After this, all you need to do is run |
| 27 | +```bash |
| 28 | +python launch.py --config=config.sentiment --setup-blob-table --setup-output-table |
| 29 | +``` |
| 30 | + |
| 31 | +As an example query, you can run |
| 32 | +```sql |
| 33 | +SELECT AVG(score) |
| 34 | +FROM sentiment |
| 35 | +WHERE label = '5 stars' |
| 36 | +ERROR_TARGET 10% |
| 37 | +CONFIDENCE 95%; |
| 38 | +``` |
| 39 | + |
| 40 | +You can see the mappings [here](https://github.com/ddkang/aidb/blob/main/config/sentiment.py#L15). We use the HuggingFace API to generate sentiments from the reviews. |
| 41 | + |
| 42 | + |
| 43 | +### Image Example (local directory) |
| 44 | + |
| 45 | +We've also set up another example of analyzing whether or not user-generated content is adult content for filtering. |
| 46 | +In order to run this example, all you need to do is run |
| 47 | +```bash |
| 48 | +python launch.py --config=config.nsfw_detect --setup-blob-table --setup-output-table |
| 49 | +``` |
| 50 | + |
| 51 | +As an example query, you can run |
| 52 | +```sql |
| 53 | +SELECT * |
| 54 | +FROM nsfw |
| 55 | +WHERE racy LIKE 'POSSIBLE'; |
| 56 | +``` |
| 57 | + |
| 58 | +You can see the mappings [here](https://github.com/ddkang/aidb/blob/main/config/nsfw_detect.py#L10). We use the Google Vision API to generate the safety labels. |
| 59 | + |
| 60 | + |
| 61 | + |
| 62 | +## Key Features |
| 63 | + |
| 64 | +AIDB focuses on keeping cost down and interoperability high. |
| 65 | + |
| 66 | +We reduce costs with our optimizations: |
| 67 | +- First-class support for approximate queries, reducing the cost of aggregations by up to **350x**. |
| 68 | +- Caching, which speeds up multiple queries over the same data. |
| 69 | + |
| 70 | +We keep interoperability high by allowing you to bring your own data source, ML models, and vector databases! |
| 71 | + |
| 72 | + |
| 73 | +## Approximate Querying |
| 74 | + |
| 75 | +One key feature of AIDB is first-class support for approximate queries. |
| 76 | +Currently, we support approximate `AVG`, `COUNT`, and `SUM`. |
| 77 | +We don't currently support `GROUP BY` or `JOIN` for approximate aggregations, but it's on our roadmap. |
| 78 | +Please reach out if you'd like us to support your queries! |
| 79 | + |
| 80 | +In order to execute an approximate aggregation query, simply append `ERROR_TARGET <error percent>% CONFIDENCE <confidence>%` to your normal aggregation. |
| 81 | +As a full example, you can compute an approximate count by doing: |
| 82 | +```sql |
| 83 | +SELECT COUNT(xmin) |
| 84 | +FROM objects |
| 85 | +ERROR_TARGET 5% |
| 86 | +CONFIDENCE 95%; |
| 87 | +``` |
| 88 | + |
| 89 | +The `ERROR_TARGET` specifies the percent error _compared to running the query exactly._ |
| 90 | +For example, if the true answer is 100, you will get answers between 95 and 105 (95% of the time). |
| 91 | + |
| 92 | +## Useful Links |
| 93 | +- [How to connect ML APIs](https://github.com/ddkang/aidb/blob/main/aidb/inference/examples/README.md) |
| 94 | +- [How to define configuration file](https://github.com/ddkang/aidb/tree/main/config) |
| 95 | +- [Connecting to Data Store](https://github.com/ddkang/aidb/tree/main/aidb_utilities/blob_store) |
| 96 | + |
| 97 | +## Contribute |
| 98 | + |
| 99 | +We have many improvements we'd like to implement. Please help us! For the time being, please [email ](mailto:[email protected]) us, if you'd like to help contribute. |
| 100 | + |
| 101 | + |
| 102 | +## Contact Us |
| 103 | + |
| 104 | +Need help in setting up AIDB for your specific dataset or want a new feature? Please fill [this form](https://forms.gle/YyAXWxqzZPVBrvBR7). |
0 commit comments