[RFC|Accepted] IP banning to use 'longest prefix matching'

Note: We are moving the topics of this forum and it will be deleted at some point

Publish your own request for comments/change or patches for the next version of phpBB. Discuss the contributions and proposals of others. Upcoming releases are 3.2/Rhea and 3.3.
User avatar
bantu
3.0 Release Manager
3.0 Release Manager
Posts: 557
Joined: Thu Sep 07, 2006 11:22 am
Location: Karlsruhe, Germany
Contact:

[RFC|Accepted] IP banning to use 'longest prefix matching'

Post by bantu »

The implementation of IP banning in 3.0.x is somewhat suboptimal.

Especially banning IP ranges is a problem. When an IP range is banned, every IP in that range is individually inserted into the ban table. While this makes ban lookups for a specific IP in the database almost trivial, it often generates memory problems on boards with limited resources (e.g. memory_limit = 16M) when fetching the whole ban list. It also doesn't work (doesn't scale) with IPv6 due to the bigger address space.

Proposal
I propose changing IP banning to use longest prefix matching, storing subnets and IP addresses as subnets instead of single IP addresses.

Implementation
To be done by someone else and generally open for discussion. ;-) Any Volunteers? :-P

Tracker Ticket
http://tracker.phpbb.com/browse/PHPBB3-9687

References
http://en.wikipedia.org/wiki/Longest_prefix_match


Nelsaidi
Registered User
Posts: 122
Joined: Tue Nov 11, 2008 5:44 pm

Re: [RFC] Changing IP banning to use 'longest prefix matchin

Post by Nelsaidi »

Out of curiosity for why does it need to store one for each range? Can it not store the IP's as an integer with a start and a finish IP (indicating the range) - and then just do a lookup to see if the current ip is a match between the start and finish?

This is how local db-based geo-ip lookups are done.

shentino
Registered User
Posts: 10
Joined: Fri Jul 09, 2010 10:39 am

Re: [RFC] Changing IP banning to use 'longest prefix matchin

Post by shentino »

I'd propose full on CIDR mask lengths. Store a zero-bit-padded IP address along with a byte specifying the number of bits to mask out during testing.

User avatar
bantu
3.0 Release Manager
3.0 Release Manager
Posts: 557
Joined: Thu Sep 07, 2006 11:22 am
Location: Karlsruhe, Germany
Contact:

Re: [RFC|Accepted] IP banning to use 'longest prefix matchin

Post by bantu »

Hmm, PostgreSQL has some functions and data types that would be useful for such things[0]. Unfortunately, as far as I can see MySQL still lacks those.

[0] http://www.postgresql.org/docs/8.4/stat ... s-net.html

User avatar
Jaftica
Registered User
Posts: 23
Joined: Sat Nov 13, 2010 12:28 am

Re: [RFC|Accepted] IP banning to use 'longest prefix matchin

Post by Jaftica »

Especially banning IP ranges is a problem. When an IP range is banned
What a great idea. I am on dial-up out here in the sticks...and I mean the sticks. Banning an IP takes forever as I have so many in the database due to ranges. Hopefully they listen.

Oleg
Posts: 1150
Joined: Tue Feb 23, 2010 2:38 am
Contact:

Re: [RFC|Accepted] IP banning to use 'longest prefix matchin

Post by Oleg »

What is the proposed storage format for IP addresses? How will these columns be indexed? Storing the addresses as ranges, with endpoints converted to integers, would make them trivially indexable even by primitive database engines and will work for all use cases. If desired, range may be converted to network/mask in UI.

IPv6 addresses, being 128 bits long, are too large for integer fields but may be serialized into zero-padded hex representation and then compared as strings, which would still require one query. A field in config table may indicate whether there are any ipv6 IP addresses banned; checking it will not require a query and it would save the separate ipv6 query for boards using only ipv4. Similar check may of course be implemented for ipv4.

User avatar
bantu
3.0 Release Manager
3.0 Release Manager
Posts: 557
Joined: Thu Sep 07, 2006 11:22 am
Location: Karlsruhe, Germany
Contact:

Re: [RFC|Accepted] IP banning to use 'longest prefix matchin

Post by bantu »

nn- wrote:What is the proposed storage format for IP addresses? How will these columns be indexed? Storing the addresses as ranges, with endpoints converted to integers, would make them trivially indexable even by primitive database engines and will work for all use cases. If desired, range may be converted to network/mask in UI.
I'm pretty much open for suggestions. Right now I am thinking of using ranges as well, but I'm also trying to get some real-life data to test different data types to see how they perform (also on different DBMSes).

User avatar
bantu
3.0 Release Manager
3.0 Release Manager
Posts: 557
Joined: Thu Sep 07, 2006 11:22 am
Location: Karlsruhe, Germany
Contact:

Re: [RFC|Accepted] IP banning to use 'longest prefix matchin

Post by bantu »

We can store IPv4 addresses and IPv6 addresses in the same table by mapping IPv4 addresses into the IPv6 space or we can have one table for each address type. Since the majority of addresses will still be IPv4 for the next few years I'd suggest to use separate tables so we can save IPv4 addresses as 32-bit integers (as has been suggested), since that's what they are.

The next question is whether to store real ranges (e.g. 192.168.0.1 - 192.168.0.3) or to use CIDR notation and only store a network address and a mask (e.g. 192.168.0.0/30). Using CIDR notation implies that ranges can only contain a number of IP addresses that is a power of 2. This might be confusing to end users because they could not ban ranges such as 192.168.0.1 - 192.168.0.3 (or it would require additional entries in the table). It would however match how IP addresses are assigned and would probably allow the use of bitwise operations.

User avatar
AmigoJack
Registered User
Posts: 110
Joined: Wed May 04, 2011 7:47 pm
Location: グリーン ヒル ゾーン
Contact:

Re: [RFC|Accepted] IP banning to use 'longest prefix matchin

Post by AmigoJack »

I discourage storing CIDR notation - that'd be human readable, but comparing IPs would always end up in string operations for the DBMS.

Why not implementing a logic where we operate with 4 blocks of 32bit? Advantages are that it can be done with any DBMS plus the most simple indices can be used, resulting in great performance.

My idea requires the following:
  • Every IPv4 will be converted to IPv6.
  • IPv6 addresses must be fully qualified (eliminating shortcuts).
  • To finally operate with an IPv6 address, their letters must be in lowercase and the colons need to be stripped, e.g.:

    Code: Select all

    $sIp6= strtolower( str_replace( ':', '', $sIp6 ) );
The database table for storing IP ranges would cut addresses into 4 blocks and store them separately as 32bit Integers:

Code: Select all

CREATE TABLE phpbb_ban_ip_table
( from1 INT( 10 ) UNSIGNED NOT NULL
, from2 INT( 10 ) UNSIGNED NOT NULL
, from3 INT( 10 ) UNSIGNED NOT NULL
, from4 INT( 10 ) UNSIGNED NOT NULL
, to1 INT( 10 ) UNSIGNED NOT NULL
, to2 INT( 10 ) UNSIGNED NOT NULL
, to3 INT( 10 ) UNSIGNED NOT NULL
, to4 INT( 10 ) UNSIGNED NOT NULL
, PRIMARY KEY( from1, from2, from3, from4, to1, to2, to3, to4 )
);
In PHP we always end up formulating an SQL query, so we need a function to cut the long IPv6 address into 4 blocks:

Code: Select all

function GetIp6Blocks( $sIp6 ) {
    $aResult= array();

    for( $iBitBlock= 24; $iBitBlock>= 0; $iBitBlock-= 8 )  // Cut IPv6 into 4 blocks of 32bit
    $aResult[]= (string)base_convert( substr( $sIp6, $iBitBlock, 8 ), 16, 10 );

    return $aResult;
} 
Inserting a new IP range would be as easy as this:

Code: Select all

$sql= 'INSERT
    INTO '. BAN_IP_TABLE. '( from1, from2, from3, from4, to1, to2, to3, to4 )
    VALUES( '. implode( ', ', GetIp6Blocks( $sIp6From ) ). ', '. implode( ', ', GetIp6Blocks( $sIp6To ) ). ' )';
$db-> sql_query( $sql ); 
Checking an IP if it is within a range would also be very easy:

Code: Select all

$aTarget= GetIp6Blocks( $sIp6ToCheck );
$sql= 'SELECT *
    FROM '. BAN_IP_TABLE. '
    WHERE '. $aTarget[0]. ' BETWEEN from1 AND to1
    AND '. $aTarget[1]. ' BETWEEN from2 AND to2
    AND '. $aTarget[2]. ' BETWEEN from3 AND to3
    AND '. $aTarget[3]. ' BETWEEN from4 AND to4';
$hResult= $db-> sql_query( $sql );
$bFound= $db-> sql_fetchrow( $hResult );
$db-> sql_freeresult( $hResult );

if( $bFound ) { ... } 
That's it already. CIDR support can be achieved indirectly:
  • As for user input, we simple convert a CIDR notation into an IP range (starting IP and ending IP).
  • As for user display (output) we could check each IP range we have and extract the common prefix of both IPs, then check if the remaining parts can be written as a subnet - and only then, we're able to display it in CIDR notation.
Advantage of this is that the user who manages the ban list can decide himself if he wants a mixed display with IP ranges and possible CIDR notations, or IP ranges only.

Post Reply