Sample Redash SQL Queries
Each sample query below includes the SQL along with a link to run the query on our Redash front-end. In each query, the snapshot= filter in the WHERE clause references the date of the GCD data dump.
If you would like to run your own queries, email a......@gcdata.org to request a Redash account.
More information about the gcdissuesnapshot schema can be found on the About Redash page.
If you notice a problem, please open a Github issue in gcd-etl.
🆕 Non-variant issue count in GCD reaches TWO MILLION!
In August 2023, after years of diligent work by all the GCD volunteers, non-variant comic issue count reached 2 million. Well done!
SELECT snapshot,
count(distinct(issue_id)) as issues
FROM gcd.gcdissuesnapshot
WHERE snapshot > 20230300 AND snapshot < 20230900
AND variant_of_issue_id = 0
AND series_is_comics_publication = true
GROUP BY snapshot
ORDER BY snapshot DESCCounts of series, issues, and stories, by country code
Did you know that 59% of series in GCD are from outside the United States?
SELECT series_country_code,
count(distinct(series_id)) AS series,
count(distinct(issue_id)) AS issues,
count(distinct(story_id)) AS stories
FROM gcd.gcdissuesnapshot
WHERE snapshot = 20251115
GROUP BY series_country_code
ORDER BY series DESCTop English-language writers, by issue count, with series and publisher counts
Stan is the man, but do you know Jack?
SELECT story.writer,
count(distinct(issue_id)) AS issues,
count(distinct(series_id)) AS series,
count(distinct(publisher_id)) AS publishers
FROM gcd.gcdissuesnapshot
CROSS JOIN UNNEST(story_script) AS story(writer)
WHERE snapshot = 20251115 AND
story.writer NOT LIKE '%?%' AND story.writer != '' AND
series_language_code = 'en' AND variant_of_issue_id = 0
GROUP BY story.writer
ORDER BY issues DESC
LIMIT 1000
``🆕 Top English-language writer each decade since 1900, by issue count
Stan Lee owns a few decades, but other great writers have taken their turn!
You can change the count parameter to see more than one top writer per decade. Adjust first_decade and last_decade to zoom in and out. Change the series language using the drop-down.
SELECT decade, rank, writer, issues FROM (
SELECT story.writer,
FLOOR(publication_date/100000)*10 as decade,
count(distinct(issue_id)) AS issues,
row_number() over (partition by FLOOR(publication_date/100000)*10 order by count(distinct(issue_id)) desc) as rank
FROM "gcd"."gcdissuesnapshot"
CROSS JOIN UNNEST(story_script) AS story(writer)
WHERE snapshot = 20251115 AND
story.writer NOT LIKE '%?%' AND story.writer != '' AND
series_language_code = 'en' AND variant_of_issue_id = 0
GROUP BY story.writer, FLOOR(publication_date/100000)*10
ORDER BY decade DESC, issues DESC
)
WHERE rank <= 1 AND decade >= 1900 AND decade <= 2020
ORDER BY decade DESC, rank
``Series by Stan Lee, with issue counts
Stan did write a LOT of comics!
SELECT series_name,
count(distinct(issue_id)) AS issues
FROM gcd.gcdissuesnapshot
WHERE snapshot = 20251115 AND
cardinality(filter(story_script, x -> regexp_like(x, 'Stan Lee( [\[\(].*)?'))) > 0 AND
series_language_code = 'en' AND variant_of_issue_id = 0
GROUP BY series_name
ORDER BY issues DESCTop artists, by issue count, with series and publisher counts
Jack Kirby went through a lot of pencils in his lifetime!
SELECT story.artist,
count(distinct(issue_id)) as issues,
count(distinct(series_id)) as series,
count(distinct(publisher_id)) as publishers
FROM gcd.gcdissuesnapshot
CROSS JOIN UNNEST(story_pencils) AS story(artist)
WHERE snapshot = 20251115 AND
story.artist not like '%?%' AND
story.artist NOT IN ('', 'various') AND
variant_of_issue_id=0
GROUP BY story.artist
ORDER BY issues DESC
LIMIT 1000Series drawn by Curt Swan, with issue counts
Curt drew that iconic "S" more than maybe anyone.
SELECT series_name,
count(distinct(issue_id)) AS issues
FROM gcd.gcdissuesnapshot
WHERE snapshot = 20251115 AND
contains(story_pencils, 'Curt Swan') AND
variant_of_issue_id = 0
GROUP BY series_name
ORDER BY issues DESCMost common page counts for English-language comics
The ubiquitous 36-pager takes the top spot. Number 3 may surprise you!
You can change the series language using the drop-down, and adjust the time period with start_year and end_year.
SELECT page_count,
count(distinct(issue_id)) as issues
FROM gcd.gcdissuesnapshot
WHERE snapshot = 20251115 AND
series_language_code = 'en' AND
page_count != 0 AND
publication_date >= -1 AND
publication_date <= 21009999
GROUP BY page_count
ORDER BY issues DESC
LIMIT 1000Most common US dollar prices for comics published since 2019
Remember when comics were a nickel? Me neither.
You can select a difference currency code in the dropdown, and you can change the date range with start_year and end_year.
SELECT i.price,
count(distinct(issue_id)) as issues
FROM gcd.gcdissuesnapshot
CROSS JOIN UNNEST(price) AS i(price)
WHERE snapshot = 20251115 AND
i.price LIKE '%USD%' AND
publication_date >= 20190000 AND
publication_date <= 21009999
GROUP BY i.price
ORDER BY issues DESC
LIMIT 10Average US dollar issue prices for 36-page comics published since 1970
Production of 36-page comics peaked in 1994, when the average price was $2.12 USD.
You can change language, country, page_count, currency, start_year, and end_year parameters.
SELECT CAST(FLOOR(publication_date/10000) as varchar) as year,
ROUND(avg(cast(regexp_extract(i.price, '[0-9]+\.[0-9]+') as double)), 2) as average_price,
count(1) as unique_issue_count
FROM gcd.gcdissuesnapshot
CROSS JOIN UNNEST(price) AS i(price)
WHERE snapshot = 20251115 AND
series_language_code = 'en' AND
series_country_code = 'us' AND
page_count = 36 AND
variant_of_issue_id = 0 AND
regexp_like(i.price, '^[0-9]+\.[0-9]+ ?USD') AND
publication_date >= 19700000 AND
publication_date <= 20249999
GROUP BY FLOOR(publication_date/10000)
ORDER BY year DESCIssues with greatest number of variants published
Spider-Man, Batman, and Godzilla take the top spots.
SELECT variant_of_issue_id,
series_name,
issue_number_raw,
publication_date,
count(distinct(issue_id)) as issues
FROM gcd.gcdissuesnapshot
WHERE snapshot = 20251115 AND variant_of_issue_id != 0
GROUP BY variant_of_issue_id,
series_name,
issue_number_raw,
publication_date
ORDER BY issues DESC
LIMIT 100Variants of The Amazing Spider-Man 666
Face it, Tiger... you just hit the jackpot!
SELECT series_name, issue_number_raw, variant_name
FROM gcd.gcdissuesnapshot
WHERE snapshot = 20251115 AND variant_of_issue_id = 863623
GROUP BY series_name, issue_number_raw, variant_nameGCD issue entries created in October 2019, with series, issue, story counts
Check out 10/29! It was a busy day.
SELECT created,
count(distinct(series_id)) as series,
count(distinct(issue_id)) as issues,
count(distinct(story_id)) as stories
FROM gcd.gcdissuesnapshot
WHERE snapshot = 20251115 AND created < 20191101 AND created > 20191000
GROUP BY createdGCD issue entries modified in October 2019, with series, issue, story counts
Over 50,000 issues were modified in this month alone!
SELECT modified,
count(distinct(series_id)) as series,
count(distinct(issue_id)) as issues,
count(distinct(story_id)) as stories
FROM gcd.gcdissuesnapshot
WHERE snapshot = 20251115 AND modified < 20191101 AND modified > 20191000
GROUP BY modifiedStories tracked in new credit system, since 2020-01-01
A new credit tracking system rolled out in 2020; how's that going?
A note about the last part of the WHERE clause -- since the snapshot is a date integer in YYYYMMDD format, the % 10000 modulo is a convenient way to look just at the month and day, limiting the number of snapshots considered. Presto would need a large amount of memory to load in all the snapshots and do the distinct story_id count. Using a subset of snapshots allows the query to stay within memory limits.
SELECT snapshot, count(distinct(story_id)) as stories
FROM gcd.gcdissuesnapshot
WHERE snapshot >= 20200101 AND story_credit_source='gcd_story_credit'
AND snapshot % 10000 IN (215, 515, 815, 1115)
GROUP BY snapshot
ORDER BY snapshot DESCComic issue counts by GCD data dump since 2015
It is great to see how the database has grown through the years!
A note about the first part of the WHERE clause -- since the snapshot is a date integer in YYYYMMDD format, the % 10000 modulo is a convenient way to look just at the month and day. 201 is February 1, and 801 is August 1. Presto would need a large amount of memory to load in 24 snapshots a year for 9+ years. Using a subset of snapshots allows the query to stay within memory limits.
SELECT snapshot,
count(distinct(issue_id)) as issues
FROM gcd.gcdissuesnapshot
WHERE snapshot % 10000 IN (215, 515, 815, 1115)
AND variant_of_issue_id = 0
AND series_is_comics_publication = true
GROUP BY snapshot
ORDER BY snapshot DESC