[RFC|Accepted] IP banning to use 'longest prefix matching'
- bantu
- 3.0 Release Manager
- Posts: 557
- Joined: Thu Sep 07, 2006 11:22 am
- Location: Karlsruhe, Germany
- Contact:
[RFC|Accepted] IP banning to use 'longest prefix matching'
The implementation of IP banning in 3.0.x is somewhat suboptimal.
Especially banning IP ranges is a problem. When an IP range is banned, every IP in that range is individually inserted into the ban table. While this makes ban lookups for a specific IP in the database almost trivial, it often generates memory problems on boards with limited resources (e.g. memory_limit = 16M) when fetching the whole ban list. It also doesn't work (doesn't scale) with IPv6 due to the bigger address space.
Proposal
I propose changing IP banning to use longest prefix matching, storing subnets and IP addresses as subnets instead of single IP addresses.
Implementation
To be done by someone else and generally open for discussion. Any Volunteers?
Tracker Ticket
http://tracker.phpbb.com/browse/PHPBB3-9687
References
http://en.wikipedia.org/wiki/Longest_prefix_match
Especially banning IP ranges is a problem. When an IP range is banned, every IP in that range is individually inserted into the ban table. While this makes ban lookups for a specific IP in the database almost trivial, it often generates memory problems on boards with limited resources (e.g. memory_limit = 16M) when fetching the whole ban list. It also doesn't work (doesn't scale) with IPv6 due to the bigger address space.
Proposal
I propose changing IP banning to use longest prefix matching, storing subnets and IP addresses as subnets instead of single IP addresses.
Implementation
To be done by someone else and generally open for discussion. Any Volunteers?
Tracker Ticket
http://tracker.phpbb.com/browse/PHPBB3-9687
References
http://en.wikipedia.org/wiki/Longest_prefix_match
-
- Registered User
- Posts: 523
- Joined: Sat Apr 22, 2006 10:29 pm
- Contact:
Re: [RFC] Changing IP banning to use 'longest prefix matchin
Out of curiosity for why does it need to store one for each range? Can it not store the IP's as an integer with a start and a finish IP (indicating the range) - and then just do a lookup to see if the current ip is a match between the start and finish?
This is how local db-based geo-ip lookups are done.
This is how local db-based geo-ip lookups are done.
Re: [RFC] Changing IP banning to use 'longest prefix matchin
I'd propose full on CIDR mask lengths. Store a zero-bit-padded IP address along with a byte specifying the number of bits to mask out during testing.
- bantu
- 3.0 Release Manager
- Posts: 557
- Joined: Thu Sep 07, 2006 11:22 am
- Location: Karlsruhe, Germany
- Contact:
Re: [RFC|Accepted] IP banning to use 'longest prefix matchin
Hmm, PostgreSQL has some functions and data types that would be useful for such things[0]. Unfortunately, as far as I can see MySQL still lacks those.
[0] http://www.postgresql.org/docs/8.4/stat ... s-net.html
[0] http://www.postgresql.org/docs/8.4/stat ... s-net.html
Re: [RFC|Accepted] IP banning to use 'longest prefix matchin
What a great idea. I am on dial-up out here in the sticks...and I mean the sticks. Banning an IP takes forever as I have so many in the database due to ranges. Hopefully they listen.Especially banning IP ranges is a problem. When an IP range is banned
Re: [RFC|Accepted] IP banning to use 'longest prefix matchin
What is the proposed storage format for IP addresses? How will these columns be indexed? Storing the addresses as ranges, with endpoints converted to integers, would make them trivially indexable even by primitive database engines and will work for all use cases. If desired, range may be converted to network/mask in UI.
IPv6 addresses, being 128 bits long, are too large for integer fields but may be serialized into zero-padded hex representation and then compared as strings, which would still require one query. A field in config table may indicate whether there are any ipv6 IP addresses banned; checking it will not require a query and it would save the separate ipv6 query for boards using only ipv4. Similar check may of course be implemented for ipv4.
IPv6 addresses, being 128 bits long, are too large for integer fields but may be serialized into zero-padded hex representation and then compared as strings, which would still require one query. A field in config table may indicate whether there are any ipv6 IP addresses banned; checking it will not require a query and it would save the separate ipv6 query for boards using only ipv4. Similar check may of course be implemented for ipv4.
- bantu
- 3.0 Release Manager
- Posts: 557
- Joined: Thu Sep 07, 2006 11:22 am
- Location: Karlsruhe, Germany
- Contact:
Re: [RFC|Accepted] IP banning to use 'longest prefix matchin
I'm pretty much open for suggestions. Right now I am thinking of using ranges as well, but I'm also trying to get some real-life data to test different data types to see how they perform (also on different DBMSes).nn- wrote:What is the proposed storage format for IP addresses? How will these columns be indexed? Storing the addresses as ranges, with endpoints converted to integers, would make them trivially indexable even by primitive database engines and will work for all use cases. If desired, range may be converted to network/mask in UI.
- bantu
- 3.0 Release Manager
- Posts: 557
- Joined: Thu Sep 07, 2006 11:22 am
- Location: Karlsruhe, Germany
- Contact:
Re: [RFC|Accepted] IP banning to use 'longest prefix matchin
We can store IPv4 addresses and IPv6 addresses in the same table by mapping IPv4 addresses into the IPv6 space or we can have one table for each address type. Since the majority of addresses will still be IPv4 for the next few years I'd suggest to use separate tables so we can save IPv4 addresses as 32-bit integers (as has been suggested), since that's what they are.
The next question is whether to store real ranges (e.g. 192.168.0.1 - 192.168.0.3) or to use CIDR notation and only store a network address and a mask (e.g. 192.168.0.0/30). Using CIDR notation implies that ranges can only contain a number of IP addresses that is a power of 2. This might be confusing to end users because they could not ban ranges such as 192.168.0.1 - 192.168.0.3 (or it would require additional entries in the table). It would however match how IP addresses are assigned and would probably allow the use of bitwise operations.
The next question is whether to store real ranges (e.g. 192.168.0.1 - 192.168.0.3) or to use CIDR notation and only store a network address and a mask (e.g. 192.168.0.0/30). Using CIDR notation implies that ranges can only contain a number of IP addresses that is a power of 2. This might be confusing to end users because they could not ban ranges such as 192.168.0.1 - 192.168.0.3 (or it would require additional entries in the table). It would however match how IP addresses are assigned and would probably allow the use of bitwise operations.
- AmigoJack
- Registered User
- Posts: 110
- Joined: Wed May 04, 2011 7:47 pm
- Location: グリーン ヒル ゾーン
- Contact:
Re: [RFC|Accepted] IP banning to use 'longest prefix matchin
I discourage storing CIDR notation - that'd be human readable, but comparing IPs would always end up in string operations for the DBMS.
Why not implementing a logic where we operate with 4 blocks of 32bit? Advantages are that it can be done with any DBMS plus the most simple indices can be used, resulting in great performance.
My idea requires the following:
In PHP we always end up formulating an SQL query, so we need a function to cut the long IPv6 address into 4 blocks:
Inserting a new IP range would be as easy as this:
Checking an IP if it is within a range would also be very easy:
That's it already. CIDR support can be achieved indirectly:
Why not implementing a logic where we operate with 4 blocks of 32bit? Advantages are that it can be done with any DBMS plus the most simple indices can be used, resulting in great performance.
My idea requires the following:
- Every IPv4 will be converted to IPv6.
- IPv6 addresses must be fully qualified (eliminating shortcuts).
- To finally operate with an IPv6 address, their letters must be in lowercase and the colons need to be stripped, e.g.:
Code: Select all
$sIp6= strtolower( str_replace( ':', '', $sIp6 ) );
Code: Select all
CREATE TABLE phpbb_ban_ip_table
( from1 INT( 10 ) UNSIGNED NOT NULL
, from2 INT( 10 ) UNSIGNED NOT NULL
, from3 INT( 10 ) UNSIGNED NOT NULL
, from4 INT( 10 ) UNSIGNED NOT NULL
, to1 INT( 10 ) UNSIGNED NOT NULL
, to2 INT( 10 ) UNSIGNED NOT NULL
, to3 INT( 10 ) UNSIGNED NOT NULL
, to4 INT( 10 ) UNSIGNED NOT NULL
, PRIMARY KEY( from1, from2, from3, from4, to1, to2, to3, to4 )
);
Code: Select all
function GetIp6Blocks( $sIp6 ) {
$aResult= array();
for( $iBitBlock= 24; $iBitBlock>= 0; $iBitBlock-= 8 ) // Cut IPv6 into 4 blocks of 32bit
$aResult[]= (string)base_convert( substr( $sIp6, $iBitBlock, 8 ), 16, 10 );
return $aResult;
}
Code: Select all
$sql= 'INSERT
INTO '. BAN_IP_TABLE. '( from1, from2, from3, from4, to1, to2, to3, to4 )
VALUES( '. implode( ', ', GetIp6Blocks( $sIp6From ) ). ', '. implode( ', ', GetIp6Blocks( $sIp6To ) ). ' )';
$db-> sql_query( $sql );
Code: Select all
$aTarget= GetIp6Blocks( $sIp6ToCheck );
$sql= 'SELECT *
FROM '. BAN_IP_TABLE. '
WHERE '. $aTarget[0]. ' BETWEEN from1 AND to1
AND '. $aTarget[1]. ' BETWEEN from2 AND to2
AND '. $aTarget[2]. ' BETWEEN from3 AND to3
AND '. $aTarget[3]. ' BETWEEN from4 AND to4';
$hResult= $db-> sql_query( $sql );
$bFound= $db-> sql_fetchrow( $hResult );
$db-> sql_freeresult( $hResult );
if( $bFound ) { ... }
- As for user input, we simple convert a CIDR notation into an IP range (starting IP and ending IP).
- As for user display (output) we could check each IP range we have and extract the common prefix of both IPs, then check if the remaining parts can be written as a subnet - and only then, we're able to display it in CIDR notation.