- OCR 'lower third'
- chyrons
- overlaid text on broadcasts
- not captions or descriptive text
- editorial / summarizing in nature
- 4 TV channels, 24x7, ~1 min from realtime
- CNN
- MSNBC
- Fox News
- BBC News
---
AFTER WH MEETING, SCHUMER DISHES WHEN HE THOUGHT NIC WAS OFF
---
# API
- Tab Separated Values
- https://archive.org/services/third-eye.php
- nice for command-line
- import to google and excel spreadsheets
- filtered
- raw (~25MB / day)
- more errors
- 3rd-party filtering possible
- TSV files uploaded to https://archive.org/details/third-eye
---
# Chyron filtering
- tesseract OCR
- free; errors
- simhash
- groups 'nearly the same'
- character flips
- word off in time
- look for vowels
- pick 'most seen' group every minute
- and tweet
---
# TV AI Examples
- Vox determined Puerto Rico was paid little attention by Fox News
- https://vox.com/2017/10/2/16401614/fox-news-puerto-rico-charts
- audio fingerprints
- presented keynote paper on CSPAN floor speeches and vocal pitch (UIowa)
- discovered 375K political Ads
- find sound bites of speeches
- Faceomatic - find appearances of US Govt officials in TV (facial recognition)
---
# Where We're Going
- https://archive.org/details/TVNewsKitchen
- want to serve journalists, researchers, librarians & more
- responsible behavior and access to data
- non-consumptive use
---
# The End