|
1 | 1 | <div align="center">
|
2 | 2 |
|
3 |
| -# [DQL](https://deno.land/x/dql) |
| 3 | +# [🦕 DQL](https://deno.land/x/dql) |
4 | 4 |
|
5 | 5 | ### _**Web Scraping with Deno – DOM + GraphQL**_
|
6 | 6 |
|
7 | 7 | </div>
|
8 | 8 |
|
9 | 9 | ---
|
10 | 10 |
|
11 |
| -**`DQL`** lets you use GraphQL queries to extract data from the DOM of a web page or HTML fragment (for sandboxing or use cases without network access). It accepts [**GraphQL Queries**](https://graphql.org/learn/queries) as input, and returns formatted JSON data as output. |
12 |
| - |
13 |
| -> - [**Try out a real-world example of `useQuery` in the `Deno Playground`**](https://dash.deno.com/playground/dql) |
14 |
| -> - [**View the example's JSON endpoint at `dql.deno.dev`**](https://dql.deno.dev) |
15 |
| -
|
16 |
| -## Summary |
17 |
| - |
18 |
| -This is a fork of [**DenoQL**](https://deno.land/x/denoql) with some heavy refactoring and some additional features: |
| 11 | +**`DQL`** is a web scraping module for Deno and Deno Deploy that integrates the power of [**GraphQL Queries**](https://graphql.org/learn/queries) with the DOM tree of a remote webpage or HTML document fragment. This is a fork of [**DenoQL**](https://deno.land/x/denoql) with some heavy refactoring and some additional features: |
19 | 12 |
|
20 | 13 | - [x] Compatibility with the [**Deno Deploy**](https://deno.com/deploy) architecture
|
21 | 14 | - [x] Ability to pass variables alongside all queries
|
22 | 15 | - [x] New state-management class with additional methods
|
23 | 16 | - [x] Modular project structure (as opposed to a mostly single-file design)
|
24 | 17 | - [x] Improved types and schema structure
|
25 |
| -- [ ] **This is a work-in-progress and there is still much to be done.** * |
26 | 18 |
|
27 |
| -## Usage |
| 19 | +> **Note**: _This is a work-in-progress and there is still a lot to be done._ |
28 | 20 |
|
29 |
| -The primary function exported by the module is the workhorse named `useQuery`: |
| 21 | +### 🛝 [**`GraphQL Playground`**](https://dql.deno.dev) |
30 | 22 |
|
31 |
| -```ts |
32 |
| -import { useQuery } from "https://deno.land/x/dql/mod.ts"; |
| 23 | +### 📝 [**`HackerNews Scraper`**](https://dash.deno.com/playground/dql-hn) |
33 | 24 |
|
34 |
| -const data = await useQuery(`query { ... }`); |
35 |
| -``` |
| 25 | +### 🚛 [**`Junkyard Scraper`**](https://dash.deno.com/playground/dirty-sparrow-69) |
36 | 26 |
|
37 |
| -### Query Options |
| 27 | +--- |
38 | 28 |
|
39 |
| -You can also provide an options object for the second argument of `useQuery`: |
| 29 | +## `useQuery` |
| 30 | + |
| 31 | +The primary function exported by the module is the workhorse named `useQuery`: |
40 | 32 |
|
41 | 33 | ```ts
|
42 |
| -const data = await useQuery(`query { ... }`, { |
43 |
| - concurrency: 8, |
44 |
| - fetch_options: { |
45 |
| - // passed as the second param to fetch() |
46 |
| - }, |
47 |
| - variables: { |
48 |
| - // any variables used in your queries go here |
49 |
| - }, |
50 |
| -}); |
| 34 | +import { useQuery } from "https://deno.land/x/dql/mod.ts"; |
| 35 | + |
| 36 | +const data = await useQuery(`query { ... }`); |
51 | 37 | ```
|
52 | 38 |
|
53 |
| -### Authenticated Requests |
| 39 | +### `QueryOptions` |
54 | 40 |
|
55 |
| -To authenticate your requests, you can add an `Authorization` header like so: |
| 41 | +You can also provide a `QueryOptions` object as the second argument of `useQuery`, to further control the behavior of your query requests. All properties are optional. |
56 | 42 |
|
57 | 43 | ```ts
|
58 | 44 | const data = await useQuery(`query { ... }`, {
|
59 |
| - fetch_options: { |
| 45 | + concurrency: 8, // passed directly to PQueue initializer |
| 46 | + fetch_options: { // passed directly to Fetch API requests |
60 | 47 | headers: {
|
61 | 48 | "Authorization": "Bearer ghp_a5025a80a24defd0a7d06b4fc215bb5635a167c6",
|
62 | 49 | },
|
63 | 50 | },
|
| 51 | + variables: {}, // variables defined in your queries |
| 52 | + operationName: "", // when using multiple queries |
64 | 53 | });
|
65 | 54 | ```
|
66 | 55 |
|
67 |
| -## GraphQL Playground |
68 |
| - |
69 |
| -### Deno Deploy |
| 56 | +## `createServer` |
70 | 57 |
|
71 |
| -With [**Deno Deploy**](https://dash.deno.com/new), you can deploy **`DQL`** with a GraphQL Playground in **only 2 LOC**: |
| 58 | +With [**Deno Deploy**](https://dash.deno.com/new), you can deploy **`DQL`** with a GraphQL Playground in **only 2 lines of code**: |
72 | 59 |
|
73 | 60 | ```ts
|
74 | 61 | import { createServer } from "https://deno.land/x/dql/mod.ts";
|
75 | 62 |
|
76 |
| -// change the endpoint to your unique URL ([...].deno.dev) |
77 |
| -createServer(80, { endpoint: "https://dirty-sparrow-69.deno.dev" }); |
| 63 | +createServer(80, { endpoint: "https://dql.deno.dev" }); |
78 | 64 | ```
|
79 | 65 |
|
80 |
| -> - [**Try it out at `dirty-sparrow-69.deno.dev`**](https://dirty-sparrow-69.deno.dev) |
81 |
| -> - [**View the public code in the `Deno Playground`**](https://dash.deno.com/playground/dirty-sparrow-69) |
| 66 | +`🛝` [Try the **GraphQL Playground** at **`dql.deno.dev`**](https://dql.deno.dev)\ |
| 67 | +`🦕` [View the source code in the **`Deno Playground`**](https://dash.deno.com/playground/dql) |
82 | 68 |
|
83 |
| -### Command Line Usage (CLI) |
| 69 | +## Command Line Usage (CLI) |
84 | 70 |
|
85 | 71 | ```bash
|
86 |
| -# spin up a playground on port 8080 |
87 | 72 | deno run -A --unstable https://deno.land/x/dql/serve.ts
|
88 | 73 | ```
|
89 | 74 |
|
| 75 | +#### Custom port (default is `8080`) |
| 76 | + |
90 | 77 | ```bash
|
91 |
| -# ... or using a custom port |
92 |
| -deno run -A --unstable https://deno.land/x/dql/serve.ts --port 3000 |
| 78 | +deno run -A https://deno.land/x/dql/serve.ts --port 3000 |
93 | 79 | ```
|
94 | 80 |
|
95 |
| -> **Note**: you need to have the [**Deno CLI**](https://deno.land) installed for CLI usage. |
| 81 | +> **Warning**: you need to have the [**Deno CLI**](https://deno.land) installed first. |
96 | 82 |
|
97 |
| -### Programmatic Usage |
| 83 | +--- |
98 | 84 |
|
99 |
| -```ts |
100 |
| -import { createServer } from "https://deno.land/x/dql/mod.ts"; |
| 85 | +## 💻 Examples |
101 | 86 |
|
102 |
| -// start a playground on port 8080 |
103 |
| -createServer(); |
| 87 | +### `🚛` Junkyard Scraper · [**`Deno Playground 🦕`**](https://dash.deno.com/playground/dirty-sparrow-69) |
104 | 88 |
|
105 |
| -// or using a custom port |
106 |
| -createServer(3000); |
| 89 | +```ts |
| 90 | +import { useQuery } from "https://deno.land/x/dql/mod.ts"; |
| 91 | +import { serve } from "https://deno.land/std@0.147.0/http/server.ts"; |
| 92 | + |
| 93 | +serve(async (res: Request) => |
| 94 | + await useQuery( |
| 95 | + ` |
| 96 | + query Junkyard ( |
| 97 | + $url: String |
| 98 | + $itemSelector: String = "table > tbody > tr" |
| 99 | + ) { |
| 100 | + vehicles: page(url: $url) { |
| 101 | + totalCount: count(selector: $itemSelector) |
| 102 | + nodes: queryAll(selector: $itemSelector) { |
| 103 | + id: index |
| 104 | + vin: text(selector: "td:nth-child(7)", trim: true) |
| 105 | + sku: text(selector: "td:nth-child(6)", trim: true) |
| 106 | + year: text(selector: "td:nth-child(1)", trim: true) |
| 107 | + model: text(selector: "td:nth-child(2) > .notranslate", trim: true) |
| 108 | + aisle: text(selector: "td:nth-child(3)", trim: true) |
| 109 | + store: text(selector: "td:nth-child(4)", trim: true) |
| 110 | + color: text(selector: "td:nth-child(5)", trim: true) |
| 111 | + date: attr(selector: "td:nth-child(8)", name: "data-value") |
| 112 | + image: src(selector: "td > a > img") |
| 113 | + } |
| 114 | + } |
| 115 | + }`, |
| 116 | + { |
| 117 | + variables: { |
| 118 | + "url": "http://nvpap.deno.dev/action=getVehicles&makes=BMW", |
| 119 | + }, |
| 120 | + }, |
| 121 | + ) |
| 122 | + .then((data) => JSON.stringify(data, null, 2)) |
| 123 | + .then((json) => |
| 124 | + new Response(json, { |
| 125 | + headers: { "content-type": "application/json;charset=utf-8" }, |
| 126 | + }) |
| 127 | + ) |
| 128 | +); |
107 | 129 | ```
|
108 | 130 |
|
109 |
| -## Examples |
110 |
| - |
111 |
| -### Junkyard Inventory Scraper |
112 |
| - |
113 |
| -> - [**Try it for yourself in the `Deno Playground`**](https://dash.deno.com/playground/dql) |
114 |
| -> - [**View the JSON endpoint at `dql.deno.dev`**](https://dql.deno.dev) |
| 131 | +### 📝 HackerNews Scraper · [**`Deno Playground 🦕`**](https://dash.deno.com/playground/dql-hn) |
115 | 132 |
|
116 | 133 | ```ts
|
117 | 134 | import { useQuery } from "https://deno.land/x/dql/mod.ts";
|
118 |
| - |
119 |
| -const query = `query Junkyard ($url: String, $itemSelector: String) { |
120 |
| - vehicles: page(url: $url) { |
121 |
| - totalCount: count(selector: $itemSelector) |
122 |
| - items: queryAll(selector: $itemSelector) { |
123 |
| - id: index |
124 |
| - vin: text(selector: "td:nth-child(7)", trim: true) |
125 |
| - sku: text(selector: "td:nth-child(6)", trim: true) |
126 |
| - year: text(selector: "td:nth-child(1)", trim: true) |
127 |
| - model: text(selector: "td:nth-child(2) > .notranslate", trim: true) |
128 |
| - aisle: text(selector: "td:nth-child(3)", trim: true) |
129 |
| - store: text(selector: "td:nth-child(4)", trim: true) |
130 |
| - color: text(selector: "td:nth-child(5)", trim: true) |
131 |
| - date: attr(selector: "td:nth-child(8)", name: "data-value") |
132 |
| - image: src(selector: "td > a > img") |
| 135 | +import { serve } from "https://deno.land/std@0.147.0/http/server.ts"; |
| 136 | + |
| 137 | +serve(async (res: Request) => |
| 138 | + await useQuery(` |
| 139 | + query HackerNews ( |
| 140 | + $url: String = "http://news.ycombinator.com" |
| 141 | + $rowSelector: String = "tr.athing" |
| 142 | + ) { |
| 143 | + page(url: $url) { |
| 144 | + title |
| 145 | + totalCount: count(selector: $rowSelector) |
| 146 | + nodes: queryAll(selector: $rowSelector) { |
| 147 | + rank: text(selector: "td span.rank", trim: true) |
| 148 | + title: text(selector: "td.title a", trim: true) |
| 149 | + site: text(selector: "span.sitestr", trim: true) |
| 150 | + url: href(selector: "td.title a") |
| 151 | + attrs: next { |
| 152 | + score: text(selector: "span.score", trim: true) |
| 153 | + user: text(selector: "a.hnuser", trim: true) |
| 154 | + date: attr(selector: "span.age", name: "title") |
| 155 | + } |
| 156 | + } |
133 | 157 | }
|
134 |
| - } |
135 |
| -}`; |
136 |
| - |
137 |
| -// pass any variables using the 'variables' key |
138 |
| -const response = await useQuery(query, { |
139 |
| - variables: { |
140 |
| - "url": "http://nvpap.deno.dev/action=getVehicles&makes=BMW", |
141 |
| - "itemSelector": "table > tbody > tr", |
142 |
| - }, |
143 |
| -}); |
144 |
| - |
145 |
| -// do something with response (Object) |
146 |
| -console.log(response); |
| 158 | + }`) |
| 159 | + .then((data) => JSON.stringify(data, null, 2)) |
| 160 | + .then((json) => |
| 161 | + new Response(json, { |
| 162 | + headers: { "content-type": "application/json;charset=utf-8" }, |
| 163 | + }) |
| 164 | + ) |
| 165 | +); |
147 | 166 | ```
|
148 | 167 |
|
149 |
| ---- |
| 168 | +## License |
150 | 169 |
|
151 |
| -<div align="center"> |
152 |
| - |
153 |
| -MIT © [Nicholas Berlette](https://github.com/nberlette) • based on [DenoQL](https://deno.land/x/denoql) by [nyancodeid](https://github.com/nyancodeid) |
154 |
| - |
155 |
| -</div> |
| 170 | +MIT © [**Nicholas Berlette**](https://github.com/nberlette), based on [DenoQL](https://deno.land/x/denoql). |
0 commit comments