Skip to content

Commit c59dc38

Browse files
authored
1.0 (#39)
1 parent 26b3f28 commit c59dc38

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+680
-628
lines changed
+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Replace me with the real documentation.
+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Replace me with the real documentation.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Replace me with the real documentation.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Replace me with the real documentation.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Replace me with the real documentation.
+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Replace me with the real documentation.
+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Replace me with the real documentation.
+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Replace me with the real documentation.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Replace me with the real documentation.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Replace me with the real documentation.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Replace me with the real documentation.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Replace me with the real documentation.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Replace me with the real documentation.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Replace me with the real documentation.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Replace me with the real documentation.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Replace me with the real documentation.

docs/backend/dsl.md

+44-38
Original file line numberDiff line numberDiff line change
@@ -5,17 +5,14 @@ Bullet DSL is a configuration-based DSL that allows users to plug their data int
55
To support this, Bullet DSL provides two major components. The first is for reading data from a pluggable data source (the *connectors* for talking to various data sources), and the second is for converting data (the *converters* for understanding your data formats) into [BulletRecords](ingestion.md).
66
By enabling Bullet DSL in the Backend and configuring Bullet DSL, your backend will use the two components to read from the configured data source and convert the data into BulletRecords, without you having to write any code.
77

8-
The three interfaces that the DSL uses are:
8+
There is also an optional minor component that acts as the glue between the connectors and the converters. These are the *deserializers*. They exist if the data coming out of connector is of a format that cannot be understood by a converter. Typically, this happens for serialized data that needs to be deserialized first before a converter can understand it.
99

10-
1. The **BulletConnector** : Bullet DSL's reading component
11-
2. The **BulletRecordConverter** : Bullet DSL's converting component
12-
3. The **Bullet Backend** : The implementation of Bullet on a Stream Processor
13-
14-
There is also an optional BulletDeserializer component that sits between the Connector and the Converter to deserialize the data.
10+
The four interfaces that the DSL uses are:
1511

16-
!!!note
17-
18-
For the Backend, please refer to the DSL-specific Bullet Storm setup [here](storm-setup.md#using-bullet-dsl). (Currently, only Bullet Storm supports Bullet DSL.)
12+
1. The **BulletConnector** : Bullet DSL's reading component
13+
2. The **BulletDeserializer** : Bullet DSL's optional deserializing component
14+
3. The **BulletRecordConverter** : Bullet DSL's converting component
15+
4. The **Bullet Backend** : The implementation of Bullet on a Stream Processor
1916

2017
## BulletConnector
2118

@@ -135,6 +132,10 @@ bullet.dsl.converter.pojo.class.name: "com.your.package.YourPOJO"
135132

136133
The MapBulletRecordConverter is used to convert Java Maps of Objects into BulletRecords. Without a schema, it simply inserts every entry in the Map into a BulletRecord without any type-checking. If the Map contains objects that are not types supported by the BulletRecord, you might have issues when serializing the record.
137134

135+
### JSONBulletRecordConverter
136+
137+
The JSONBulletRecordConverter is used to convert String JSON representations of records into BulletRecords. Without a schema, it simply inserts every entry in the JSON object into a BulletRecord without any type-checking and it only uses the Double type for all numeric values (since it is unable to guess whether records might need a wider type). You should use a schema and mention the appropriate types if you want more specific numeric types for the fields in your record. If the JSON contains objects that are not types supported by the BulletRecord, you might have issues when serializing the record.
138+
138139
### AvroBulletRecordConverter
139140

140141
The AvroBulletRecordConverter is used to convert Avro records into BulletRecords. Without a schema, it inserts every field into a BulletRecord without any type-checking. With a schema, you get type-checking, and you can also specify a RECORD field, and the converter will accept Avro Records in addition to Maps, flattening them into the BulletRecord.
@@ -146,16 +147,14 @@ The schema consists of a list of fields each described by a name, reference, typ
146147
1. `name` : The name of the field in the BulletRecord
147148
2. `reference` : The field to extract from the to-be-converted object
148149
3. `type` : The type of the field
149-
4. `subtype` : The subtype of any nested fields in this field (if any)
150150

151151

152152
When using the schema:
153153

154154
1. The `name` of the field in the schema will be the name of the field in the BulletRecord.
155155
2. The `reference` of the field in the schema is the field/value to be extracted from an object when it is converted to a BulletRecord.
156156
3. If the `reference` is null, it is assumed that the `name` and the `reference` are the same.
157-
4. The `type` must be specified and will be used for type-checking.
158-
5. The `subtype` must be specified for certain `type` values (`LIST`, `LISTOFMAP`, `MAP`, or `MAPOFMAP`). Otherwise, it must be null.
157+
4. The `type` must be specified and can be used for type-checking. If you provide a schema and set the `bullet.dsl.converter.schema.type.check.enable` setting, then the converter will validate that the types in the source data matches the given type here. Otherwise, the type provided will be assumed. This is useful when initially using the DSL and you are not sure of the types.
159158

160159
#### Types
161160

@@ -165,24 +164,34 @@ When using the schema:
165164
4. FLOAT
166165
5. DOUBLE
167166
6. STRING
168-
7. LIST
169-
8. LISTOFMAP
170-
9. MAP
171-
10. MAPOFMAP
172-
11. RECORD
173-
174-
#### Subtypes
175-
176-
1. BOOLEAN
177-
2. INTEGER
178-
3. LONG
179-
4. FLOAT
180-
5. DOUBLE
181-
6. STRING
182-
183-
!!!note "RECORD"
184-
185-
For RECORD type, you should normally reference a Map. For each key-value pair in the Map, a field will be inserted into the BulletRecord. Hence, the name in a RECORD field is left empty.
167+
7. BOOLEAN_MAP
168+
8. INTEGER_MAP
169+
9. LONG_MAP
170+
10. FLOAT_MAP
171+
11. DOUBLE_MAP
172+
12. STRING_MAP
173+
13. BOOLEAN_MAP_MAP
174+
14. INTEGER_MAP_MAP
175+
15. LONG_MAP_MAP
176+
16. FLOAT_MAP_MAP
177+
17. DOUBLE_MAP_MAP
178+
18. STRING_MAP_MAP
179+
19. BOOLEAN_LIST
180+
20. INTEGER_LIST
181+
21. LONG_LIST
182+
22. FLOAT_LIST
183+
23. DOUBLE_LIST
184+
24. STRING_LIST
185+
25. BOOLEAN_MAP_LIST
186+
26. INTEGER_MAP_LIST
187+
27. LONG_MAP_LIST
188+
28. FLOAT_MAP_LIST
189+
29. DOUBLE_MAP_LIST
190+
30. STRING_MAP_LIST
191+
192+
!!!note "Special Type for a RECORD"
193+
194+
There is a special case where if you omit the `type` and the `name` for an entry in the schema, the reference is assumed to be a map containing arbitrary fields with types in the list above. You can use this if you have a map field that contains various objects with one or more types in the list above and want to flatten that map out into the target record using the respective types of each field in the map. The names of the fields in the map will be used as the top-level names in the resulting record.
186195

187196
#### Example Schema
188197

@@ -195,13 +204,11 @@ When using the schema:
195204
},
196205
{
197206
"name": "myBoolMap",
198-
"type": "MAP",
199-
"subtype": "BOOLEAN"
207+
"type": "BOOLEAN_MAP"
200208
},
201209
{
202210
"name": "myLongMapMap",
203-
"type": "MAPOFMAP",
204-
"subtype": "LONG"
211+
"type": "LONG_MAP_MAP"
205212
},
206213
{
207214
"name": "myIntFromSomeMap",
@@ -217,18 +224,17 @@ When using the schema:
217224
"name": "myIntFromSomeNestedMapsAndLists",
218225
"reference": "someMap.nestedMap.nestedList.0",
219226
"type": "INTEGER"
220-
},
227+
},
221228
{
222-
"reference" : "someMap",
223-
"type": "RECORD"
229+
"reference" : "someMap"
224230
}
225231
]
226232
}
227233
```
228234

229235
## BulletDeserializer
230236

231-
BulletDeserializer is an abstract Java class that can be implemented to deserialize/transform output from BulletConnector to input for BulletRecordConverter. It is an *optional* component and whether it's necessary or not depends on the output of your data sources. For example, if your KafkaConnector outputs byte arrays that are actually Java-serialized Maps, and you're using a MapBulletRecordConverter, you would use the JavaDeserializer, which would deserialize byte arrays into Java Maps for the converter.
237+
BulletDeserializer is an abstract Java class that can be implemented to deserialize/transform output from BulletConnector to input for BulletRecordConverter. It is an *optional* component and whether it's necessary or not depends on the output of your data sources. If one is not needed, the `IdentityDeserializer` can be used. For example, if your KafkaConnector outputs byte arrays that are actually Java-serialized Maps, and you're using a MapBulletRecordConverter, you would use the JavaDeserializer, which would deserialize byte arrays into Java Maps for the converter.
232238

233239
Currently, we support two BulletDeserializer implementations:
234240

docs/backend/ingestion.md

+1
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ Data placed into a Bullet Record is strongly typed. We support these types curre
3333

3434
1. Map of Strings to any of the [Primitives](#primitives)
3535
2. Map of Strings to any Map in 1
36+
3. List of any of the [Primitives](#primitives)
3637
3. List of any Map in 1
3738

3839
With these types, it is unlikely you would have data that cannot be represented as Bullet Record but if you do, please let us know and we are more than willing to accommodate.

docs/backend/spark-setup.md

+19-2
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Download the Bullet Spark standalone jar from [JCenter](http://jcenter.bintray.c
1212

1313
If you are using Bullet Kafka as pluggable PubSub, you can download the fat jar from [JCenter](http://jcenter.bintray.com/com/yahoo/bullet/bullet-kafka/). Otherwise, you need to plug in your own PubSub jar or use the RESTPubSub built-into bullet-core and turned on in the API.
1414

15-
To use Bullet Spark, you need to implement your own [Data Producer Trait](https://github.com/bullet-db/bullet-spark/blob/master/src/main/scala/com/yahoo/bullet/spark/DataProducer.scala) with a JVM based project. You have two ways to implement it as described in the [Spark Architecture](spark-architecture.md#data-processing) section. You include the Bullet artifact and Spark dependencies in your pom.xml or other equivalent build tools. The artifacts are available through JCenter. Here is an example if you use Scala and Maven:
15+
To use Bullet Spark, you need to implement your own [Data Producer Trait](https://github.com/bullet-db/bullet-spark/blob/master/src/main/scala/com/yahoo/bullet/spark/DataProducer.scala) with a JVM based project or you can use Bullet DSL (see below). If you choose to implement your own, you have two ways as described in the [Spark Architecture](spark-architecture.md#data-processing) section. You include the Bullet artifact and Spark dependencies in your pom.xml or other equivalent build tools. The artifacts are available through JCenter. Here is an example if you use Scala and Maven:
1616

1717
```xml
1818
<repositories>
@@ -65,9 +65,26 @@ To use Bullet Spark, you need to implement your own [Data Producer Trait](https:
6565

6666
You can also add ```<classifier>sources</classifier>``` or ```<classifier>javadoc</classifier>``` if you want the sources or javadoc.
6767

68+
### Using Bullet DSL
69+
70+
Instead of implementing your own Data Producer, you can also use the provided DSL receiver with [Bullet DSL](dsl.md). To do so, add the following settings to your YAML configuration:
71+
72+
```yaml
73+
# If true, enables the Bullet DSL data producer which can be configured to read from a custom data source. If enabled,
74+
# the DSL data producer is used instead of the producer.
75+
bullet.spark.dsl.data.producer.enable: true
76+
77+
# If true, enables the deserializer between the Bullet DSL connector and converter components. Otherwise, this step is skipped.
78+
bullet.spark.dsl.deserializer.enable: false
79+
```
80+
81+
You may then use the appropriate DSL settings to point to the class names of the Connector and Converter you wish to use to read from your data source and convert it to BulletRecord instances.
82+
83+
There is also a setting to enable [BulletDeserializer](dsl.md#bulletdeserializer), which is an optional component of Bullet DSL for deserializing data between reading and converting.
84+
6885
## Launch
6986
70-
After you have implemented your own data producer and built a jar, you could launch your Bullet Spark application. Here is an example command for a [YARN cluster](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html).
87+
After you have implemented your own data producer or used Bullet DSL and built a jar, you could launch your Bullet Spark application. Here is an example command for a [YARN cluster](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html).
7188
7289
```bash
7390
./bin/spark-submit \

0 commit comments

Comments
 (0)