NAME ip2cn-server - IP to country name (IP geolocation) server SYNOPSIS ip2cn-server [-c /path/to/conf] [-a LocalAddr] [-p LocalPort] \ [-d] [-h] [-j] [-t] [-v] DESCRIPTION Memory resident IP to country name (IP geolocation) server. Copyright (C) 2008 Grant Coady GPLv2 This program is free software; you can redistribute it and/or modify it under the terms of version 2 of the GNU General Public License as published by the Free Software Foundation. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details: http://www.gnu.org/licenses/old-licenses/gpl-2.0.html Why? The size of database files required for IP geolocation impose a significant load time penalty for applications using that data. To speed up these applications this server keeps one copy of the database files in memory and responds to client requests for IP to country name lookups quickly. How it works On startup the server loads the database into memory and starts listening for socket connections. Clients make a connection to the server and send a query, the server responds with a reply. The connection may be for one or many queries. Casual clients do a connect-query-disconnect while a tail logging client might keep a connection open for the life of the process. An optional reply format field may be appended to the queries (see below). The server communicates with the client applications via inet domain sockets and can have simultaneous open connections to several applications. There's no need for exclusive access locking as was the case with the previous ip2c-server. The server also runs a logfile and catches signals for database reload and graceful shutdown. Clients disconnect cleanly when the server is shutdown. (Clients written in gawk terminate if the server disconnects). Advantages The main advantage of using this server is that ad-hoc queries no longer have to wait for the database to load into memory, this suits shell and CGI applications that need to perform single lookups on demand. Applications may share the IP to country database in an efficient manner. Geolocation accuracy The database used with this server is derived from the top level Internet registries only, so IP blocks are resolved to the country where the registry is located. In Europe there is some cross-border uncertainty as well as use of the EU location. This server is only as accurate as the data collected from the public registries: apnic, afrinic, arin, lacnic, ripe, iana, and iso for the country codes and names. Old ip2c-server clients Existing users of the old server, junkview and sf4sf, have been converted to use this server. They also fall back to loading the database files direct of the server is not running. New casual client scripts are detailed below, 'ccfind' is rewritten to suit the server. OPTIONS -c Configuration file Default: '/etc/junkview.conf' Location of the configuration file server reads to discover from which directory to read the database files. -a LocalAddr Default: 'localhost' Address of server machine, if you use the machine's name rather than 'localhost', the server may receive queries from the network. -p LocalPort Default: '4743' Define which port the server listens on for new connections. -d Daemon mode Default: off Set this to have the server switch to daemon mode, specify this when starting the server from your initscripts. -h Help Default: off Display help text, a reminder use perldoc ip2cn-server. -j Junkview mode Default: off When set, server uses database file 'ip2c-data' for junkview operation, otherwise the server uses the smaller, lower resolution 'ip2c-index' database file. The default database, ip2c-index, has adjacent IP blocks for the same country merged for faster load time and operation. -t Time database loading Default: off. When set, the server exits after loading the database files. -v Verbose Default: it depends When set the server reports operation. It is switched off when server switches to daemon mode, it is switched on for non-daemon mode. Use -v to watch server load files before enter daemon mode. NOTES Testing as user For testing: ip2cn-server -v; for server use: ip2cn-server -d [-j] [-v] the server will turn off -v option when switching to daemon mode. Activity logging The server appends to a logfile while running, either: /var/log/ip2cn-server.log (run as root), or /tmp/ip2cn-server.log (run as user) An exclusive write lock is held on the logfile. PID file The server writes a pidfile for process management, this file is written to: /var/log/ip2cn-server.pid (run as root), or /tmp/ip2cn-server.pid (run as user) Database tables A tarball for datafiles files for IP to country names lookup to suit this server is available from (usually updated daily): ftp://bugsplatter.id.au/junkview/ip2c-database.tar.lzma 172k ftp://bugsplatter.id.au/junkview/ip2c-database.tar.bz2 272k ftp://bugsplatter.id.au/junkview/ip2c-database.tar.gz 280k Note that the database tarball above is suitable only for IP to country lookup, it cannot be used to discover individual IP block allocation as adjacent blocks with the same country have been merged to reduce the database size from 93k records to 37k recods. If you want the full resolution database files, see the database update scripts at http://bugsplatter.id.au/junkview/ which grab and process the IP to country database files available from http://software77.net/ and supply the '-j' command line option to the ip2cn-server. A shared reading lock (flock 1) is held while reading database file to prevent them being written while open for read. Update scripts should also use flock when rewriting any database files. Database country names The database uses the two-letter ISO codes for countries, and, in the most part the ISO country names except where they've been trimmed for readability. The following non-iso names are added to the names lookup: AP Asia Pacific CS Serbia and Montenegro EU European Union ZZ IETF Reserved or Private ISO country codes not found in the registry IP block allocations are removed as they serve no purpose here. Database ER diagram +----------------+ | ip2c-index | +----------------+ |*Record number* | +--------------+ | IP block start | | ip2c-names | | IP block end | +--------------+ | Country code |---|*Country code*| +----------------+ | Country name | +--------------+ Database reload The server will reload database files on receipt of the SIGUSR1 signal, send it like this: # /etc/rc.d/rc.ip2cn-server reload which executes this command: # kill -SIGUSR1 $(cat /var/run/ip2cn-server.pid) Signal handling During startup or database load, the SIGINT or SIGHUP will trigger immediate shutdown. While the server is running the sockets interface, SIGINT or SIGHUP will trigger a graceful shutdown of the sockets interface, ignoring new connections and queries. This action suspends the server's clients while the server is busy reloading the database. This graceful shutdown also prevents the clients getting a broken connection for a shutdown request. They exit cleanly when the server exits as they are waiting for the reply from server. SIGINT, SIGHUP trigger graceful shutdown. SIGUSR1, SIGUSR2 trigger a database reload. SIGTERM triggers an ungraceful exit as the system is going down. See: http://bugsplatter.id.au/ip2cn/rc.ip2cn-server for an example of ip2cn-server graceful stop using signals. Query handling Query is numeric or dotquad IPv4 IP address, reply takes various forms depending on an optional query format field detailed below. For casual queries try: gawk client (3.1.5 or later): #!/usr/bin/gawk -f BEGIN { service = "/inet/tcp/0/localhost/4743" } { print $0 |& service service |& getline print } $ echo 64.233.167.99 |./client 64.233.167.99 US:United States shell wrapper : #!/bin/bash echo $1 | gawk ' BEGIN { service = "/inet/tcp/0/localhost/4743" } { print $0 |& service service |& getline print }' 2>/dev/null $ ./client1 64.233.167.99 64.233.167.99 US:United States Reply formatting At the moment there are two different users of this server, each expecting its own reply format, so they now append a single character format specifier. For casual use there's a default format that returns the query, country code and country name. The reply format may be specified by appending a format field to the query from this list. 't' tab character 'c' country code, two letters 'n' country name, text - length varies 'd' decimal (numeric) query address 'f' decimal query address formatted: "%10u " (fixed width) 'p' query dotquad address formatted "%-16s" (fixed width) 'q' query dotquad address Space or punctuation characters are inserted into the reply as field separators. Examples: the predefind reply formats are: 'pc:n' default '-' 'c:n' for sf4sf '+' 'a:c :n:s:e' for junkview When using ip2cn-server with the -j option, more information is available: 'a' network block address containing query IP 'b' network block address formatted: "%-19s" (fixed width) 's' numeric block start address 'e' numeric block end address 'z' network block address formatted with zero fill for sorting Examples $ echo 64.233.167.99 |./client # default 64.233.167.99 US:United States $ echo 64.233.167.99 -|./client # sf4sf US:United States $ echo 64.233.167.99 +|./client # junkview 64.233.160.0/19:US :United States:1089052672:1089060863 $ echo 64.233.167.99 'pw c -> n'|./client # freeform 64.233.167.99 64.233.160.0/19 US -> United States Note use of single quotes to stop the shell interpreting the format field. SEE ALSO Related projects junkview http://bugsplatter.id.au/junkview/ awk and bash scripts behind the junkshow page junkshow http://bugsplatter.id.au/junkshow/ display last 24 hours activity from the firewall sf4sf http://bugsplatter.id.au/firewall/ firewall log monitor, a pretty printer with geolocation option. cc2ip http://bugsplatter.id.au/cc2ip/ country code or name to IP blocks converter to make blacklist or whitelist firewall rules. AUTHOR Copyright (C) 2008 Grant Coady GPLv2 This program is free software; you can redistribute it and/or modify it under the terms of version 2 of the GNU General Public License as published by the Free Software Foundation. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details: http://www.gnu.org/licenses/old-licenses/gpl-2.0.html Home site: Credits Sockets code by Michael Chapman with minor modifications to suit this application: to handle empty queries, and to wrap into a subroutine with graceful shutdown option. Binary search algorithm from Tim Bray's site, more information here: http://www.tbray.org/ongoing/When/200x/2003/03/22/Binary