Grand Comics Data Analytics

Sample Imhotep Queries

The GCD Imhotep cluster is not currently available. The queries below illustrate the IQL versions of the Presto SQL queries served via Redash. In each query, <start-date> and<end-date> reference the date of the GCD data dump. Each link uses dates for the most recent data dump that would be uploaded to and available via Imhotep.

Counts of series, issues, and stories, by country code

Did you know that 59% of series in GCD are from outside the United States?

FROM gcdissuesnapshot <start-date> <end-date>
GROUP BY series_country_code
SELECT distinct(series_id) /* series */,
       distinct(issue_id) /* issues */,
       distinct(story_id) /* stories */

Top English-language writers, by issue count, with series and publisher counts

Stan is the man, but do you know Jack?

FROM gcdissuesnapshot <start-date> <end-date>
WHERE story_script !=~'.*\?.*' story_script!='' 
      series_language_code='en' 
      variant_of_issue_id=0
GROUP BY story_script /* writer */
SELECT distinct(issue_id), distinct(series_id), distinct(publisher_id)

Series by Stan Lee, with issue counts

Stan did write a LOT of comics!

FROM gcdissuesnapshot <start-date> <end-date>
WHERE story_script=~'Stan Lee( [\[\(].*)?'
      series_language_code='en' 
      variant_of_issue_id=0
GROUP BY series_name
SELECT distinct(issue_id)

Top artists, by issue count, with series and publisher counts

Jack Kirby went through a lot of pencils in his lifetime!

FROM gcdissuesnapshot <start-date> <end-date>
WHERE story_pencils !=~'.*\?.*' story_pencils!='' 
      variant_of_issue_id=0
GROUP BY story_pencils /* artist */
SELECT distinct(issue_id), distinct(series_id), distinct(publisher_id)

Series drawn by Curt Swan, with issue counts

Curt drew that iconic "S" more than maybe anyone.

FROM gcdissuesnapshot <start-date> <end-date>
WHERE story_pencils='Curt Swan' 
      variant_of_issue_id=0
GROUP BY series_name
SELECT distinct(issue_id)

Most common page counts for English-language comics

The ubiquitous 36-pager takes the top spot. Number 3 may surprise you!

FROM gcdissuesnapshot <start-date> <end-date>
WHERE series_language_code='en'
      page_count!=0
GROUP BY page_count[1000]
SELECT distinct(issue_id)

Most common US dollar prices for comics published since 2019

Remember when comics were a nickel? Me neither.

FROM gcdissuesnapshot <start-date> <end-date>
WHERE price=~'.*USD.*'
      publication_date > 20190000
GROUP BY price
SELECT distinct(issue_id)

Issues with greatest number of variants published

Spider-Man, Batman, and Godzilla take the top spots.

FROM gcdissuesnapshot <start-date> <end-date>
WHERE variant_of_issue_id!=0
GROUP BY variant_of_issue_id[100],
         series_name,
         issue_number_raw,
         publication_date
SELECT distinct(issue_id)

Variants of The Amazing Spider-Man 666

Face it, Tiger... you just hit the jackpot!

FROM gcdissuesnapshot <start-date> <end-date>
WHERE variant_of_issue_id = 863623
GROUP BY series_name,
         issue_number_raw,
         variant_name
SELECT distinct(issue_id)

GCD issue entries created in October 2019, with series, issue, story counts

Check out 10/29! It was a busy day.

FROM gcdissuesnapshot <start-date> <end-date>
WHERE created < 20191101 created > 20191000
GROUP BY created
SELECT distinct(series_id) /* series */,
       distinct(issue_id) /* issues */,
       distinct(story_id) /* stories */

GCD issue entries modified in October 2019, with series, issue, story counts

Over 50,000 issues were modified in this month alone!

FROM gcdissuesnapshot <start-date> <end-date>
WHERE modified < 20191101 modified > 20191000
GROUP BY modified
SELECT distinct(series_id) /* series */,
       distinct(issue_id) /* issues */,
       distinct(story_id) /* stories */

Stories tracked in new credit system, since 2020-01-01

A new credit tracking system rolled out in 2020; how's that going?

FROM gcdissuesnapshot 2020-01-01
WHERE story_credit_source='gcd_story_credit'
GROUP BY time(1d), story_credit_source
SELECT distinct(story_id)

Issue counts by GCD data dump, since 2019-11-01

Curious how fast the database is growing?

FROM gcdissuesnapshot 2019-11-01 today
GROUP BY time(1d)
SELECT distinct(issue_id)