Sample Redash SQL Queries
Each sample query below includes the SQL along with a link to run the query on our Redash front-end. In each query, the snapshot=
filter in the WHERE clause references the date of the GCD data dump.
If you would like to run your own queries, email a......@gcdata.org to request a Redash account.
More information about the gcdissuesnapshot
schema can be found on the About Redash page.
If you notice a problem, please open a Github issue in gcd-etl
.
🆕 Non-variant issue count in GCD reaches TWO MILLION!
In August 2023, after years of diligent work by all the GCD volunteers, non-variant comic issue count reached 2 million. Well done!
SELECT snapshot,
count(distinct(issue_id)) as issues
FROM gcd.gcdissuesnapshot
WHERE snapshot > 20230300 AND snapshot < 20230900
AND variant_of_issue_id = 0
AND series_is_comics_publication = true
GROUP BY snapshot
ORDER BY snapshot DESC
Counts of series, issues, and stories, by country code
Did you know that 59% of series in GCD are from outside the United States?
SELECT series_country_code,
count(distinct(series_id)) AS series,
count(distinct(issue_id)) AS issues,
count(distinct(story_id)) AS stories
FROM gcd.gcdissuesnapshot
WHERE snapshot = 20250401
GROUP BY series_country_code
ORDER BY series DESC
Top English-language writers, by issue count, with series and publisher counts
Stan is the man, but do you know Jack?
SELECT story.writer,
count(distinct(issue_id)) AS issues,
count(distinct(series_id)) AS series,
count(distinct(publisher_id)) AS publishers
FROM gcd.gcdissuesnapshot
CROSS JOIN UNNEST(story_script) AS story(writer)
WHERE snapshot = 20250401 AND
story.writer NOT LIKE '%?%' AND story.writer != '' AND
series_language_code = 'en' AND variant_of_issue_id = 0
GROUP BY story.writer
ORDER BY issues DESC
LIMIT 1000
``
🆕 Top English-language writer each decade since 1900, by issue count
Stan Lee owns a few decades, but other great writers have taken their turn!
You can change the count parameter to see more than one top writer per decade. Adjust first_decade and last_decade to zoom in and out. Change the series language using the drop-down.
SELECT decade, rank, writer, issues FROM (
SELECT story.writer,
FLOOR(publication_date/100000)*10 as decade,
count(distinct(issue_id)) AS issues,
row_number() over (partition by FLOOR(publication_date/100000)*10 order by count(distinct(issue_id)) desc) as rank
FROM "gcd"."gcdissuesnapshot"
CROSS JOIN UNNEST(story_script) AS story(writer)
WHERE snapshot = 20250401 AND
story.writer NOT LIKE '%?%' AND story.writer != '' AND
series_language_code = 'en' AND variant_of_issue_id = 0
GROUP BY story.writer, FLOOR(publication_date/100000)*10
ORDER BY decade DESC, issues DESC
)
WHERE rank <= 1 AND decade >= 1900 AND decade <= 2020
ORDER BY decade DESC, rank
``
Series by Stan Lee, with issue counts
Stan did write a LOT of comics!
SELECT series_name,
count(distinct(issue_id)) AS issues
FROM gcd.gcdissuesnapshot
WHERE snapshot = 20250401 AND
cardinality(filter(story_script, x -> regexp_like(x, 'Stan Lee( [\[\(].*)?'))) > 0 AND
series_language_code = 'en' AND variant_of_issue_id = 0
GROUP BY series_name
ORDER BY issues DESC
Top artists, by issue count, with series and publisher counts
Jack Kirby went through a lot of pencils in his lifetime!
SELECT story.artist,
count(distinct(issue_id)) as issues,
count(distinct(series_id)) as series,
count(distinct(publisher_id)) as publishers
FROM gcd.gcdissuesnapshot
CROSS JOIN UNNEST(story_pencils) AS story(artist)
WHERE snapshot = 20250401 AND
story.artist not like '%?%' AND
story.artist NOT IN ('', 'various') AND
variant_of_issue_id=0
GROUP BY story.artist
ORDER BY issues DESC
LIMIT 1000
Series drawn by Curt Swan, with issue counts
Curt drew that iconic "S" more than maybe anyone.
SELECT series_name,
count(distinct(issue_id)) AS issues
FROM gcd.gcdissuesnapshot
WHERE snapshot = 20250401 AND
contains(story_pencils, 'Curt Swan') AND
variant_of_issue_id = 0
GROUP BY series_name
ORDER BY issues DESC
Most common page counts for English-language comics
The ubiquitous 36-pager takes the top spot. Number 3 may surprise you!
You can change the series language using the drop-down, and adjust the time period with start_year and end_year.
SELECT page_count,
count(distinct(issue_id)) as issues
FROM gcd.gcdissuesnapshot
WHERE snapshot = 20250401 AND
series_language_code = 'en' AND
page_count != 0 AND
publication_date >= -1 AND
publication_date <= 21009999
GROUP BY page_count
ORDER BY issues DESC
LIMIT 1000
Most common US dollar prices for comics published since 2019
Remember when comics were a nickel? Me neither.
You can select a difference currency code in the dropdown, and you can change the date range with start_year and end_year.
SELECT i.price,
count(distinct(issue_id)) as issues
FROM gcd.gcdissuesnapshot
CROSS JOIN UNNEST(price) AS i(price)
WHERE snapshot = 20250401 AND
i.price LIKE '%USD%' AND
publication_date >= 20190000 AND
publication_date <= 21009999
GROUP BY i.price
ORDER BY issues DESC
LIMIT 10
Average US dollar issue prices for 36-page comics published since 1970
Production of 36-page comics peaked in 1994, when the average price was $2.12 USD.
You can change language, country, page_count, currency, start_year, and end_year parameters.
SELECT CAST(FLOOR(publication_date/10000) as varchar) as year,
ROUND(avg(cast(regexp_extract(i.price, '[0-9]+\.[0-9]+') as double)), 2) as average_price,
count(1) as unique_issue_count
FROM gcd.gcdissuesnapshot
CROSS JOIN UNNEST(price) AS i(price)
WHERE snapshot = 20250401 AND
series_language_code = 'en' AND
series_country_code = 'us' AND
page_count = 36 AND
variant_of_issue_id = 0 AND
regexp_like(i.price, '^[0-9]+\.[0-9]+ ?USD') AND
publication_date >= 19700000 AND
publication_date <= 20239999
GROUP BY FLOOR(publication_date/10000)
ORDER BY year DESC
Issues with greatest number of variants published
Spider-Man, Batman, and Godzilla take the top spots.
SELECT variant_of_issue_id,
series_name,
issue_number_raw,
publication_date,
count(distinct(issue_id)) as issues
FROM gcd.gcdissuesnapshot
WHERE snapshot = 20250401 AND variant_of_issue_id != 0
GROUP BY variant_of_issue_id,
series_name,
issue_number_raw,
publication_date
ORDER BY issues DESC
LIMIT 100
Variants of The Amazing Spider-Man 666
Face it, Tiger... you just hit the jackpot!
SELECT series_name, issue_number_raw, variant_name
FROM gcd.gcdissuesnapshot
WHERE snapshot = 20250401 AND variant_of_issue_id = 863623
GROUP BY series_name, issue_number_raw, variant_name
GCD issue entries created in October 2019, with series, issue, story counts
Check out 10/29! It was a busy day.
SELECT created,
count(distinct(series_id)) as series,
count(distinct(issue_id)) as issues,
count(distinct(story_id)) as stories
FROM gcd.gcdissuesnapshot
WHERE snapshot = 20250401 AND created < 20191101 AND created > 20191000
GROUP BY created
GCD issue entries modified in October 2019, with series, issue, story counts
Over 50,000 issues were modified in this month alone!
SELECT modified,
count(distinct(series_id)) as series,
count(distinct(issue_id)) as issues,
count(distinct(story_id)) as stories
FROM gcd.gcdissuesnapshot
WHERE snapshot = 20250401 AND modified < 20191101 AND modified > 20191000
GROUP BY modified
Stories tracked in new credit system, since 2020-01-01
A new credit tracking system rolled out in 2020; how's that going?
A note about the last part of the WHERE clause -- since the snapshot is a date integer in YYYYMMDD format, the % 100
modulo is a convenient way to look just at the day of the month. Half of the snapshots are day 1 and half are day 15. Presto would need a large amount of memory to load in all the snapshots and do the distinct story_id count. Using half of snapshots allows the query to stay within memory limits.
SELECT snapshot, count(distinct(story_id)) as stories
FROM gcd.gcdissuesnapshot
WHERE snapshot >= 20200101 AND story_credit_source='gcd_story_credit'
AND snapshot % 100 = 1
GROUP BY snapshot
ORDER BY snapshot DESC
Comic issue counts by GCD data dump since 2015
It is great to see how the database has grown through the years!
A note about the first part of the WHERE clause -- since the snapshot is a date integer in YYYYMMDD format, the % 10000
modulo is a convenient way to look just at the month and day. 201
is February 1, and 801
is August 1. Presto would need a large amount of memory to load in 24 snapshots a year for 9+ years. Using a subset of snapshots allows the query to stay within memory limits.
SELECT snapshot,
count(distinct(issue_id)) as issues
FROM gcd.gcdissuesnapshot
WHERE snapshot % 10000 IN (101, 401, 701, 1001)
AND variant_of_issue_id = 0
AND series_is_comics_publication = true
GROUP BY snapshot
ORDER BY snapshot DESC