we'll be needing parsers for each of the following formats. 20061109-18:01:58 mengwong@alexandre:~% psql -U karma_live karma_live -c 'select distinct(magic) from source' magic ------------------------- apwg.phish cloudmark.phish isipp.iadb isipp.iddb isipp.wadb lashback.untrustedlinks rbl.domain rbl.mixed rbl.simpleip rbl.url score.ip simple.axfr simple.domainlist simple.emaillist simple.iplist simple.urllist spamcop.bl spamhaus.xbl truste.seals verisign.merchants (20 rows) the legacy java parsers followed this directory structure: 20061109-18:10:07 mengwong@newyears-wired:~/src/karma/src/java/com/karmasphere/pusher% tree -I .svn . |-- Parser.java |-- ParserException.java |-- ParserMap.java |-- ParserRequest.java |-- ParserWrapper.java |-- README |-- apwg | `-- ApwgPhishParser.java |-- block | |-- AxfrParser.java | |-- BlockParser.java | |-- LineParser.java | `-- SimpleAxfrParser.java |-- cloudmark | |-- CloudmarkPhishingParser.java | `-- CloudmarkRatingParser.java |-- isipp | |-- IsippIadbParser.java | |-- IsippIddbParser.java | `-- IsippWadbParser.java |-- lashback | `-- LashbackUntrustedLinksParser.java |-- njabl | `-- NjablDnsblParser.java |-- rbl | |-- DomainRblParser.java | |-- IpRblParser.java | |-- MixedRblParser.java | |-- RblParser.java | |-- SimpleDomainRblParser.java | |-- SimpleIpRblParser.java | `-- UrlRblParser.java |-- score | `-- IpScoreParser.java |-- simple | |-- DomainListParser.java | |-- EmailListParser.java | |-- IpListParser.java | |-- ListParser.java | |-- SimpleListParser.java | `-- UrlListParser.java |-- spamcop | `-- SpamcopBlacklistParser.java |-- spamhaus | `-- SpamhausXblParser.java |-- surfcontrol | `-- SurfControlCategoryParser.java |-- truste | `-- TrustESealParser.java |-- util | |-- DomainGlob.java | |-- IpRange.java | |-- ParseUtil.java | `-- UrlGlob.java `-- verisign `-- VerisignMerchantsParser.java the corresponding Perl directory structure is: . |-- APWG | `-- Phish.pm |-- Base.pm |-- Cloudmark | `-- Phish.pm |-- ISIPP | |-- IADB.pm | |-- IDDB.pm | `-- WADB.pm |-- Lashback | `-- UntrustedLinks.pm |-- Njabl |-- RBL | |-- Domain.pm | |-- Mixed.pm | |-- SimpleIP.pm | `-- URL.pm |-- README |-- Record.pm |-- Score | `-- IP.pm |-- Simple | |-- AXFR.pm | |-- DomainList.pm | |-- EmailList.pm | |-- IPList.pm | |-- List.pm | `-- URLList.pm |-- Spamcop | `-- BL.pm |-- Spamhaus | `-- XBL.pm |-- Surfcontrol |-- Truste | `-- Seals.pm `-- Verisign `-- Merchants.pm