Skip to main content

Signal Database: Bulk Data via Google Cloud Storage

Access 350+ signal types in bulk via GCS buckets. Covers authentication, bucket structure, file formats, and refresh cadence.

Written by Kyle Schuster
Updated this week

Overview

The Autobound Signal Database delivers 350+ signal types in bulk through Google Cloud Storage (GCS) buckets. Autobound provisions read access to dedicated buckets, and you authenticate with a service account to pull data on your own schedule.

Getting Started

  1. Contact sales@autobound.ai to receive your GCP service account JSON key file

  2. Set the environment variable: export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key.json"

  3. Use gsutil commands to access your provisioned buckets

Example Commands

gsutil ls gs://autobound-news-v2/gsutil cp gs://autobound-news-v2/2026-01-31-17-30-00/output.jsonl ./

Available Buckets

SEC Filings

  • gs://autobound-10k-v1/ — Annual filings (10-K)

  • gs://autobound-10q-v1/ — Quarterly filings (10-Q)

  • gs://autobound-20f-v1/ — Foreign company filings (20-F)

  • gs://autobound-6k-v1/ — Foreign company reports (6-K)

  • gs://autobound-8k/ — Current reports (8-K)

  • gs://autobound-earnings-transcripts/ — Earnings call transcripts

Social and Web Signals

Dedicated buckets for LinkedIn posts (company and contact-level), LinkedIn comments, Glassdoor reviews, Reddit mentions, Twitter/X posts, and YouTube activity.

Company Intelligence

Buckets for news, hiring trends, hiring velocity, employee growth, GitHub activity, product reviews, patents, SEO traffic, website intelligence, work milestones, financials, tech stack, and intent signals.

Reference Data

  • gs://autobound-company-database/

  • gs://autobound-contact-database/

  • gs://autobound-manifests/

Bucket Structure

Each bucket contains timestamped folders. Inside each folder you will find two files: output.jsonl and output.parquet. Always pull from the most recent folder to get the latest data.

File Formats

  • JSONL: Best for streaming ingestion and simple parsing. One signal per line.

  • Parquet: Optimized for data warehouses and large-scale analytics. Includes fields like signal_id, signal_type, detected_at, and nested structs for company and contact data.

Refresh Cadence

As of February 2026, several data types now deliver weekly rather than monthly:

  • Weekly: SEC filings, earnings transcripts, news, hiring trends, hiring velocity, work milestones, patents

  • Bi-weekly: LinkedIn posts, financial fundamentals

  • Monthly: GitHub, product reviews, SEO traffic, Reddit mentions, Glassdoor reviews, employee growth, tech stack

  • Quarterly: Employee growth (some categories)

Need Help?

For service account credentials or to select specific signal types, contact sales@autobound.ai.

Did this answer your question?