Page tree
Skip to end of metadata
Go to start of metadata

This is an example to explore how users can take IP addresses from their weblogs and identify the GeoLocation from where those IPs originate.

Table of Contents


Sample data download

  1. Visit the website MaxMind to download the file GeoLite City (CSV / zip)
  2. Extract the .zip which will contain two .csv files.
    • GeoCityBlocks.csv - These are blocks of IP addresses which included location ID.
    • GeoCityLocations.csv - These matched latitude and longitude with the location ID. 
  3. Next, download the Weblogs.log file here.

This product includes GeoLite data created by MaxMind, available from

Load the example file into Datameer

Perform file uploads into your Datameer instance so you can work with the demo data.

  • Upload the GeoCityBlocks and GeoCityLocations using a CSV file type.
  • Upload the Weblogs.log using Apache Log as the file type.
    • The parse pattern is: %h %l %u %t "%r" %>s %b

Open a new Workbook after the three files have been uploaded. Select all three files and click Add Data.

If you want to learn more about uploading data into Datameer, check out our documentation on Uploading Files.

If you want to lean more about how to open data into workbooks, check out our documentation on Workbooks.


Start the example

Join GeoCityBlocks and GeoCityLocations based on locId

Duplicate the joined sheet in order to clean up some of the data.

Click on the next available empty column to bring up the formula builder. Use the TRUNC formula on GeoCityBlocks.startIpNum.

As viewed in Fx bar



The next step is to convert the IP to a numerical representation. Click in the Fbar to edit the column that was just created and divide it by 256*256.

As viewed in Fx bar


Repeat the process of using the TRUNC function and modifying the formula with GeoCityBlocks.endIpNum

As viewed in Fx bar


How and why to convert an IP to a numerical representation?

 For example, using the following IP:

There are 4 numbers or buckets divided by the decimal place. Each number can range from 0-255. This means that matching an IP to the current data will yield very large data sets. The method to simplify this is to create buckets by dividing the Start and End IP with 256*256. This creates a smaller range for the IPs to match the data based on the first 2 numbers in an IP.


Now find all possible IP addresses within the start to end range. Click on the next column to bring up the formula builder and select the function EXPLODE_RANGE.

Use the two columns that were just created for the start (GeoCityBlocks_startIpNum) and end (GeoCityBlocks_endIpNum) arguments.

As viewed in Fx bar


Right click the column just created and rename it to IPBucket

Right click sheet1 at the bottom of the page and rename it to GeoJoinEdit


Create a new sheet and click in the first empty column to bring up the formula builder. Select the function IP_TO_NUMBER. The argument used will come from the Weblogs page in the workbook using the remoteHost column. This converts the remote host to a number that will be able to be joined with the GeoCity data.

As viewed in Fx bar


Name this column IPNumber

Click on the next empty column to bring up the formula builder. Select the TRUNC function and the column IPNumber that was just created as the argument. 

As viewed in Fx bar



Now modify the function in the same was as the previous TRUNC functions:


Name this column IPBucket

Name this sheet WeblogsEdit


Create a joined sheet between WeblogsEdit and GeoJoinEdit based on IPBucket.


The new joined sheet contains IP number, start and end of bucket, and location.

Name this sheet GeoWeblogJoin


The last step is to find exact matches for all the locations.

Ensure the page name being worked on is GeoWeblogJoin. Click the filter tool from the tool bar (or select it from the menu bar).

Check the box at the top to have the filtered results displayed on a new sheet.

The first filter is:

WeblogsEdit.IPNumber is greater equals to the GeoJoinEdit.startIPNum column.

Click the plus sign to add a filter

The second filter is:

WeblogsEdit.IPNumber is less equals to the GeoJoinEdit.endIPNum column.

Click Create Filter at the bottom to finish.

Save the workbook and process it to view the final results.

It is possible to continue to analyze this data by grouping by location and then doing a count to find where the most traffic is coming originating. Try it out!

“Got a question? Have an answer? Join the Datameer Community!

  • No labels