Grand Comics Data Analytics

Sample Redash SQL Queries

Each sample query below includes the SQL along with a link to run the query on our Redash front-end. In each query, the snapshot= filter in the WHERE clause references the date of the GCD data dump.

If you would like to run your own queries, email a......@gcdata.org to request a Redash account.

More information about the gcdissuesnapshot schema can be found on the About Redash page.

If you notice a problem, please open a Github issue in gcd-etl.

🆕 Non-variant issue count in GCD reaches TWO MILLION!

In August 2023, after years of diligent work by all the GCD volunteers, non-variant comic issue count reached 2 million. Well done!

SELECT snapshot,
        count(distinct(issue_id)) as issues
FROM gcd.gcdissuesnapshot
WHERE snapshot > 20230300 AND snapshot < 20230900
      AND variant_of_issue_id = 0 
      AND series_is_comics_publication = true
GROUP BY snapshot
ORDER BY snapshot DESC

▶️ Run Query▶️ Show Graph

Counts of series, issues, and stories, by country code

Did you know that 59% of series in GCD are from outside the United States?

SELECT series_country_code,
         count(distinct(series_id)) AS series,
         count(distinct(issue_id)) AS issues,
         count(distinct(story_id)) AS stories
FROM gcd.gcdissuesnapshot
WHERE snapshot = 20250401
GROUP BY  series_country_code
ORDER BY  series DESC

▶️ Run Query

Top English-language writers, by issue count, with series and publisher counts

Stan is the man, but do you know Jack?

SELECT story.writer,
         count(distinct(issue_id)) AS issues,
         count(distinct(series_id)) AS series,
         count(distinct(publisher_id)) AS publishers
FROM gcd.gcdissuesnapshot
CROSS JOIN UNNEST(story_script) AS story(writer)
WHERE snapshot = 20250401 AND
        story.writer NOT LIKE '%?%' AND story.writer != '' AND
        series_language_code = 'en' AND variant_of_issue_id = 0
GROUP BY  story.writer
ORDER BY  issues DESC
LIMIT 1000
``

▶️ Run Query

🆕 Top English-language writer each decade since 1900, by issue count

Stan Lee owns a few decades, but other great writers have taken their turn!

You can change the count parameter to see more than one top writer per decade. Adjust first_decade and last_decade to zoom in and out. Change the series language using the drop-down.

SELECT decade, rank, writer, issues FROM (
    SELECT story.writer,
         FLOOR(publication_date/100000)*10 as decade,
         count(distinct(issue_id)) AS issues,
         row_number() over (partition by FLOOR(publication_date/100000)*10 order by count(distinct(issue_id)) desc) as rank
    FROM "gcd"."gcdissuesnapshot"
    CROSS JOIN UNNEST(story_script) AS story(writer)
    WHERE snapshot = 20250401 AND
            story.writer NOT LIKE '%?%' AND story.writer != '' AND
            series_language_code = 'en' AND variant_of_issue_id = 0
    GROUP BY  story.writer, FLOOR(publication_date/100000)*10
    ORDER BY  decade DESC, issues DESC
)
WHERE rank <= 1 AND decade >= 1900 AND decade <= 2020
ORDER BY decade DESC, rank
``

▶️ Run Query▶️ Show Graph

Series by Stan Lee, with issue counts

Stan did write a LOT of comics!

SELECT series_name,
         count(distinct(issue_id)) AS issues
FROM gcd.gcdissuesnapshot
WHERE snapshot = 20250401 AND
        cardinality(filter(story_script, x -> regexp_like(x, 'Stan Lee( [\[\(].*)?'))) > 0 AND
        series_language_code = 'en' AND variant_of_issue_id = 0
GROUP BY  series_name
ORDER BY  issues DESC

▶️ Run Query

Top artists, by issue count, with series and publisher counts

Jack Kirby went through a lot of pencils in his lifetime!

SELECT story.artist,
        count(distinct(issue_id)) as issues,
        count(distinct(series_id)) as series,
        count(distinct(publisher_id)) as publishers
FROM gcd.gcdissuesnapshot
CROSS JOIN UNNEST(story_pencils) AS story(artist)
WHERE snapshot = 20250401 AND
        story.artist not like '%?%' AND
        story.artist NOT IN ('', 'various') AND
        variant_of_issue_id=0
GROUP BY story.artist
ORDER BY issues DESC
LIMIT 1000

▶️ Run Query

Series drawn by Curt Swan, with issue counts

Curt drew that iconic "S" more than maybe anyone.

SELECT series_name,
         count(distinct(issue_id)) AS issues
FROM gcd.gcdissuesnapshot
WHERE snapshot = 20250401 AND
        contains(story_pencils, 'Curt Swan') AND
        variant_of_issue_id = 0
GROUP BY  series_name
ORDER BY  issues DESC

▶️ Run Query

Most common page counts for English-language comics

The ubiquitous 36-pager takes the top spot. Number 3 may surprise you!

You can change the series language using the drop-down, and adjust the time period with start_year and end_year.

SELECT page_count,
        count(distinct(issue_id)) as issues
FROM gcd.gcdissuesnapshot
WHERE snapshot = 20250401 AND
        series_language_code = 'en' AND
        page_count != 0 AND
        publication_date >= -1 AND
        publication_date <= 21009999
GROUP BY page_count
ORDER BY issues DESC
LIMIT 1000

▶️ Run Query

Most common US dollar prices for comics published since 2019

Remember when comics were a nickel? Me neither.

You can select a difference currency code in the dropdown, and you can change the date range with start_year and end_year.

SELECT i.price,
        count(distinct(issue_id)) as issues
FROM gcd.gcdissuesnapshot
CROSS JOIN UNNEST(price) AS i(price)
WHERE snapshot = 20250401 AND
        i.price LIKE '%USD%' AND
        publication_date >= 20190000 AND
        publication_date <= 21009999
GROUP BY i.price
ORDER BY issues DESC
LIMIT 10

▶️ Run Query

Average US dollar issue prices for 36-page comics published since 1970

Production of 36-page comics peaked in 1994, when the average price was $2.12 USD.

You can change language, country, page_count, currency, start_year, and end_year parameters.

SELECT CAST(FLOOR(publication_date/10000) as varchar) as year,
        ROUND(avg(cast(regexp_extract(i.price, '[0-9]+\.[0-9]+') as double)), 2) as average_price,
        count(1) as unique_issue_count
FROM gcd.gcdissuesnapshot
CROSS JOIN UNNEST(price) AS i(price)
WHERE snapshot = 20250401 AND
        series_language_code = 'en' AND
        series_country_code = 'us' AND
        page_count = 36 AND
        variant_of_issue_id = 0 AND
        regexp_like(i.price, '^[0-9]+\.[0-9]+ ?USD') AND
        publication_date >= 19700000 AND
        publication_date <= 20239999
GROUP BY FLOOR(publication_date/10000)
ORDER BY year DESC

▶️ Run Query▶️ Show Graph

Issues with greatest number of variants published

Spider-Man, Batman, and Godzilla take the top spots.

SELECT variant_of_issue_id,
         series_name,
         issue_number_raw,
         publication_date,
         count(distinct(issue_id)) as issues
FROM gcd.gcdissuesnapshot
WHERE snapshot = 20250401 AND variant_of_issue_id != 0
GROUP BY variant_of_issue_id,
         series_name,
         issue_number_raw,
         publication_date
ORDER BY issues DESC
LIMIT 100

▶️ Run Query

Variants of The Amazing Spider-Man 666

Face it, Tiger... you just hit the jackpot!

SELECT series_name, issue_number_raw, variant_name
FROM gcd.gcdissuesnapshot
WHERE snapshot = 20250401 AND variant_of_issue_id = 863623
GROUP BY series_name, issue_number_raw, variant_name

▶️ Run Query

GCD issue entries created in October 2019, with series, issue, story counts

Check out 10/29! It was a busy day.

SELECT created,
        count(distinct(series_id)) as series,
        count(distinct(issue_id)) as issues,
        count(distinct(story_id)) as stories
FROM gcd.gcdissuesnapshot
WHERE snapshot = 20250401 AND created < 20191101 AND created > 20191000
GROUP BY created

▶️ Run Query

GCD issue entries modified in October 2019, with series, issue, story counts

Over 50,000 issues were modified in this month alone!

SELECT modified,
        count(distinct(series_id)) as series,
        count(distinct(issue_id)) as issues,
        count(distinct(story_id)) as stories
FROM gcd.gcdissuesnapshot
WHERE snapshot = 20250401 AND modified < 20191101 AND modified > 20191000
GROUP BY modified

▶️ Run Query

Stories tracked in new credit system, since 2020-01-01

A new credit tracking system rolled out in 2020; how's that going?

A note about the last part of the WHERE clause -- since the snapshot is a date integer in YYYYMMDD format, the % 100 modulo is a convenient way to look just at the day of the month. Half of the snapshots are day 1 and half are day 15. Presto would need a large amount of memory to load in all the snapshots and do the distinct story_id count. Using half of snapshots allows the query to stay within memory limits.

SELECT snapshot, count(distinct(story_id)) as stories
FROM gcd.gcdissuesnapshot
WHERE snapshot >= 20200101 AND story_credit_source='gcd_story_credit'
      AND snapshot % 100 = 1
GROUP BY snapshot
ORDER BY snapshot DESC

▶️ Run Query▶️ Show Graph

Comic issue counts by GCD data dump since 2015

It is great to see how the database has grown through the years!

A note about the first part of the WHERE clause -- since the snapshot is a date integer in YYYYMMDD format, the % 10000 modulo is a convenient way to look just at the month and day. 201 is February 1, and 801 is August 1. Presto would need a large amount of memory to load in 24 snapshots a year for 9+ years. Using a subset of snapshots allows the query to stay within memory limits.

SELECT snapshot,
        count(distinct(issue_id)) as issues
FROM gcd.gcdissuesnapshot
WHERE snapshot % 10000 IN (101, 401, 701, 1001)
      AND variant_of_issue_id = 0 
      AND series_is_comics_publication = true
GROUP BY snapshot
ORDER BY snapshot DESC

▶️ Run Query▶️ Show Graph