job-crawler API
On-demand job search across Craigslist, Indeed & Monster. Results are ranked by a weighted blend of five signals — relevance (OpenAI embeddings), occupation match, pay, freshness, and distance. All endpoints return JSON; a failing source degrades to a diagnostic, never an error.
Query & listing titles are mapped to canonical
O*NET occupations so synonyms collapse to one node ("SWE" = "Software Engineer" =
"Application Developer" → Software Developers) — beating fuzzy keyword/embedding
matching. The response returns the matched node as query_occupation and per-result
occupation; confident matches also expand the crawl to alias terms for recall.
Pay is normalized to an hourly equivalent assuming a
40-hour work week (8 h/day, 2080 h/year), returned as salary.hourly and used for
ranking so jobs are comparable regardless of how each source quotes pay. Every result carries a
score (0–1) and a score_breakdown of the per-signal contributions
(semantic · pay · recency · proximity).
Search jobs
Crawls the selected sources live, normalizes + dedupes, then ranks. Returns 200 even
if every source fails (see diagnostics).
| Field | Type | Description |
|---|---|---|
| queryrequired | string | Keyword search; also the default semantic-match text. |
| description | string | Richer text for semantic ranking (overrides query for embeddings). |
| location.postal | string | ZIP. Geocoded server-side for the radius search. |
| location.city | string | "City, ST" (alternative to postal). |
| location.radius_mi | int 1–500 | Search radius. Default 30. |
| sources | string[] | craigslist·indeed·monster. Default ["craigslist"]. |
| limit | int 1–100 | Max results returned. Default 25. |
"sources": [] · remove the field → all enabled
Request body
Response
Pick a preset, then Send (⌘/Ctrl+Enter).
Liveness
DB connectivity, enabled sources, embedding backend.
loading…
Source health
Per-source circuit state, last success, rolling ok/err counts.
loading…
Webhook ingest
Schema-less receiver for inbound provider events. Accepts any method,
content-type, and body — JSON, form-encoded, XML, plain text, or empty — and stores it
verbatim, so unknown payloads can be inspected and reverse-engineered. Acks
200 immediately (500 only on a DB-write failure, so the sender
retries); 1 MB cap. Unauthenticated by design for now — signature headers are captured so
HMAC verification can be added once the provider's signing scheme is known.
| Field | Type | Description |
|---|---|---|
| source | path | Logical sender, e.g. workstream. Each source is stored & queried independently. |
Live endpoint: POST https://workstream.gatherat.ai/webhooks/workstream.
Each delivery is persisted with the raw body, best-effort body_json,
extracted event_type, all headers (incl. signatures),
query, method, content_type, and the client IP. A
GET carrying a challenge/hub.challenge param is echoed back
for verification handshakes.
Most-recent deliveries, newest first (id, time, method, content_type, event_type, body length).
One full stored delivery — headers, query, raw body, and parsed JSON.
loading recent…