You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/backend/dsl.md
+44-38
Original file line number
Diff line number
Diff line change
@@ -5,17 +5,14 @@ Bullet DSL is a configuration-based DSL that allows users to plug their data int
5
5
To support this, Bullet DSL provides two major components. The first is for reading data from a pluggable data source (the *connectors* for talking to various data sources), and the second is for converting data (the *converters* for understanding your data formats) into [BulletRecords](ingestion.md).
6
6
By enabling Bullet DSL in the Backend and configuring Bullet DSL, your backend will use the two components to read from the configured data source and convert the data into BulletRecords, without you having to write any code.
7
7
8
-
The three interfaces that the DSL uses are:
8
+
There is also an optional minor component that acts as the glue between the connectors and the converters. These are the *deserializers*. They exist if the data coming out of connector is of a format that cannot be understood by a converter. Typically, this happens for serialized data that needs to be deserialized first before a converter can understand it.
9
9
10
-
1. The **BulletConnector** : Bullet DSL's reading component
11
-
2. The **BulletRecordConverter** : Bullet DSL's converting component
12
-
3. The **Bullet Backend** : The implementation of Bullet on a Stream Processor
13
-
14
-
There is also an optional BulletDeserializer component that sits between the Connector and the Converter to deserialize the data.
10
+
The four interfaces that the DSL uses are:
15
11
16
-
!!!note
17
-
18
-
For the Backend, please refer to the DSL-specific Bullet Storm setup [here](storm-setup.md#using-bullet-dsl). (Currently, only Bullet Storm supports Bullet DSL.)
12
+
1. The **BulletConnector** : Bullet DSL's reading component
13
+
2. The **BulletDeserializer** : Bullet DSL's optional deserializing component
14
+
3. The **BulletRecordConverter** : Bullet DSL's converting component
15
+
4. The **Bullet Backend** : The implementation of Bullet on a Stream Processor
The MapBulletRecordConverter is used to convert Java Maps of Objects into BulletRecords. Without a schema, it simply inserts every entry in the Map into a BulletRecord without any type-checking. If the Map contains objects that are not types supported by the BulletRecord, you might have issues when serializing the record.
137
134
135
+
### JSONBulletRecordConverter
136
+
137
+
The JSONBulletRecordConverter is used to convert String JSON representations of records into BulletRecords. Without a schema, it simply inserts every entry in the JSON object into a BulletRecord without any type-checking and it only uses the Double type for all numeric values (since it is unable to guess whether records might need a wider type). You should use a schema and mention the appropriate types if you want more specific numeric types for the fields in your record. If the JSON contains objects that are not types supported by the BulletRecord, you might have issues when serializing the record.
138
+
138
139
### AvroBulletRecordConverter
139
140
140
141
The AvroBulletRecordConverter is used to convert Avro records into BulletRecords. Without a schema, it inserts every field into a BulletRecord without any type-checking. With a schema, you get type-checking, and you can also specify a RECORD field, and the converter will accept Avro Records in addition to Maps, flattening them into the BulletRecord.
@@ -146,16 +147,14 @@ The schema consists of a list of fields each described by a name, reference, typ
146
147
1. `name` : The name of the field in the BulletRecord
147
148
2. `reference` : The field to extract from the to-be-converted object
148
149
3. `type` : The type of the field
149
-
4. `subtype` : The subtype of any nested fields in this field (if any)
150
150
151
151
152
152
When using the schema:
153
153
154
154
1. The `name` of the field in the schema will be the name of the field in the BulletRecord.
155
155
2. The `reference` of the field in the schema is the field/value to be extracted from an object when it is converted to a BulletRecord.
156
156
3. If the `reference` is null, it is assumed that the `name` and the `reference` are the same.
157
-
4. The `type` must be specified and will be used for type-checking.
158
-
5. The `subtype` must be specified for certain `type` values (`LIST`, `LISTOFMAP`, `MAP`, or `MAPOFMAP`). Otherwise, it must be null.
157
+
4. The `type` must be specified and can be used for type-checking. If you provide a schema and set the `bullet.dsl.converter.schema.type.check.enable` setting, then the converter will validate that the types in the source data matches the given type here. Otherwise, the type provided will be assumed. This is useful when initially using the DSL and you are not sure of the types.
159
158
160
159
#### Types
161
160
@@ -165,24 +164,34 @@ When using the schema:
165
164
4. FLOAT
166
165
5. DOUBLE
167
166
6. STRING
168
-
7. LIST
169
-
8. LISTOFMAP
170
-
9. MAP
171
-
10. MAPOFMAP
172
-
11. RECORD
173
-
174
-
#### Subtypes
175
-
176
-
1. BOOLEAN
177
-
2. INTEGER
178
-
3. LONG
179
-
4. FLOAT
180
-
5. DOUBLE
181
-
6. STRING
182
-
183
-
!!!note "RECORD"
184
-
185
-
For RECORD type, you should normally reference a Map. For each key-value pair in the Map, a field will be inserted into the BulletRecord. Hence, the name in a RECORD field is left empty.
167
+
7. BOOLEAN_MAP
168
+
8. INTEGER_MAP
169
+
9. LONG_MAP
170
+
10. FLOAT_MAP
171
+
11. DOUBLE_MAP
172
+
12. STRING_MAP
173
+
13. BOOLEAN_MAP_MAP
174
+
14. INTEGER_MAP_MAP
175
+
15. LONG_MAP_MAP
176
+
16. FLOAT_MAP_MAP
177
+
17. DOUBLE_MAP_MAP
178
+
18. STRING_MAP_MAP
179
+
19. BOOLEAN_LIST
180
+
20. INTEGER_LIST
181
+
21. LONG_LIST
182
+
22. FLOAT_LIST
183
+
23. DOUBLE_LIST
184
+
24. STRING_LIST
185
+
25. BOOLEAN_MAP_LIST
186
+
26. INTEGER_MAP_LIST
187
+
27. LONG_MAP_LIST
188
+
28. FLOAT_MAP_LIST
189
+
29. DOUBLE_MAP_LIST
190
+
30. STRING_MAP_LIST
191
+
192
+
!!!note "Special Type for a RECORD"
193
+
194
+
There is a special case where if you omit the `type` and the `name` for an entry in the schema, the reference is assumed to be a map containing arbitrary fields with types in the list above. You can use this if you have a map field that contains various objects with one or more types in the list above and want to flatten that map out into the target record using the respective types of each field in the map. The names of the fields in the map will be used as the top-level names in the resulting record.
186
195
187
196
#### Example Schema
188
197
@@ -195,13 +204,11 @@ When using the schema:
195
204
},
196
205
{
197
206
"name": "myBoolMap",
198
-
"type": "MAP",
199
-
"subtype": "BOOLEAN"
207
+
"type": "BOOLEAN_MAP"
200
208
},
201
209
{
202
210
"name": "myLongMapMap",
203
-
"type": "MAPOFMAP",
204
-
"subtype": "LONG"
211
+
"type": "LONG_MAP_MAP"
205
212
},
206
213
{
207
214
"name": "myIntFromSomeMap",
@@ -217,18 +224,17 @@ When using the schema:
217
224
"name": "myIntFromSomeNestedMapsAndLists",
218
225
"reference": "someMap.nestedMap.nestedList.0",
219
226
"type": "INTEGER"
220
-
},
227
+
},
221
228
{
222
-
"reference" : "someMap",
223
-
"type": "RECORD"
229
+
"reference" : "someMap"
224
230
}
225
231
]
226
232
}
227
233
```
228
234
229
235
## BulletDeserializer
230
236
231
-
BulletDeserializer is an abstract Java class that can be implemented to deserialize/transform output from BulletConnector to input for BulletRecordConverter. It is an *optional* component and whether it's necessary or not depends on the output of your data sources. For example, if your KafkaConnector outputs byte arrays that are actually Java-serialized Maps, and you're using a MapBulletRecordConverter, you would use the JavaDeserializer, which would deserialize byte arrays into Java Maps for the converter.
237
+
BulletDeserializer is an abstract Java class that can be implemented to deserialize/transform output from BulletConnector to input for BulletRecordConverter. It is an *optional* component and whether it's necessary or not depends on the output of your data sources. If one is not needed, the `IdentityDeserializer` can be used. For example, if your KafkaConnector outputs byte arrays that are actually Java-serialized Maps, and you're using a MapBulletRecordConverter, you would use the JavaDeserializer, which would deserialize byte arrays into Java Maps for the converter.
232
238
233
239
Currently, we support two BulletDeserializer implementations:
Copy file name to clipboardexpand all lines: docs/backend/ingestion.md
+1
Original file line number
Diff line number
Diff line change
@@ -33,6 +33,7 @@ Data placed into a Bullet Record is strongly typed. We support these types curre
33
33
34
34
1. Map of Strings to any of the [Primitives](#primitives)
35
35
2. Map of Strings to any Map in 1
36
+
3. List of any of the [Primitives](#primitives)
36
37
3. List of any Map in 1
37
38
38
39
With these types, it is unlikely you would have data that cannot be represented as Bullet Record but if you do, please let us know and we are more than willing to accommodate.
Copy file name to clipboardexpand all lines: docs/backend/spark-setup.md
+19-2
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ Download the Bullet Spark standalone jar from [JCenter](http://jcenter.bintray.c
12
12
13
13
If you are using Bullet Kafka as pluggable PubSub, you can download the fat jar from [JCenter](http://jcenter.bintray.com/com/yahoo/bullet/bullet-kafka/). Otherwise, you need to plug in your own PubSub jar or use the RESTPubSub built-into bullet-core and turned on in the API.
14
14
15
-
To use Bullet Spark, you need to implement your own [Data Producer Trait](https://github.com/bullet-db/bullet-spark/blob/master/src/main/scala/com/yahoo/bullet/spark/DataProducer.scala) with a JVM based project. You have two ways to implement it as described in the [Spark Architecture](spark-architecture.md#data-processing) section. You include the Bullet artifact and Spark dependencies in your pom.xml or other equivalent build tools. The artifacts are available through JCenter. Here is an example if you use Scala and Maven:
15
+
To use Bullet Spark, you need to implement your own [Data Producer Trait](https://github.com/bullet-db/bullet-spark/blob/master/src/main/scala/com/yahoo/bullet/spark/DataProducer.scala) with a JVM based project or you can use Bullet DSL (see below). If you choose to implement your own, you have two ways as described in the [Spark Architecture](spark-architecture.md#data-processing) section. You include the Bullet artifact and Spark dependencies in your pom.xml or other equivalent build tools. The artifacts are available through JCenter. Here is an example if you use Scala and Maven:
16
16
17
17
```xml
18
18
<repositories>
@@ -65,9 +65,26 @@ To use Bullet Spark, you need to implement your own [Data Producer Trait](https:
65
65
66
66
You can also add ```<classifier>sources</classifier>``` or ```<classifier>javadoc</classifier>``` if you want the sources or javadoc.
67
67
68
+
### Using Bullet DSL
69
+
70
+
Instead of implementing your own Data Producer, you can also use the provided DSL receiver with [Bullet DSL](dsl.md). To do so, add the following settings to your YAML configuration:
71
+
72
+
```yaml
73
+
# If true, enables the Bullet DSL data producer which can be configured to read from a custom data source. If enabled,
74
+
# the DSL data producer is used instead of the producer.
75
+
bullet.spark.dsl.data.producer.enable: true
76
+
77
+
# If true, enables the deserializer between the Bullet DSL connector and converter components. Otherwise, this step is skipped.
78
+
bullet.spark.dsl.deserializer.enable: false
79
+
```
80
+
81
+
You may then use the appropriate DSL settings to point to the class names of the Connector and Converter you wish to use to read from your data source and convert it to BulletRecord instances.
82
+
83
+
There is also a setting to enable [BulletDeserializer](dsl.md#bulletdeserializer), which is an optional component of Bullet DSL for deserializing data between reading and converting.
84
+
68
85
## Launch
69
86
70
-
After you have implemented your own data producer and built a jar, you could launch your Bullet Spark application. Here is an example command for a [YARN cluster](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html).
87
+
After you have implemented your own data producer or used Bullet DSL and built a jar, you could launch your Bullet Spark application. Here is an example command for a [YARN cluster](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html).
0 commit comments