Photo by Sigmund on Unsplash

Alembic in 2021 (part 2)

  • Alembic
  • bash
  • Data Science
  • en
  • english
  • ETL
  • geospatial
  • GIS
  • h3
  • hexagons
  • Jupyter
  • PostGIS
  • uber
  • Par Tarek Baaga, le 23 juin 2021

    In the first part, the basics of Alembic have been presented, and we created a spatial table from a simple command. In this tutorial, we will see how to insert data into tables.

    Alembic multiple insert

    First, we have to create a new version with the command:


    alembic revision -m “bulk_insert_geometry_data”


    As it was presented in the first tutorial, this will create a new version. Before starting to populate upgrade() and downgrade() functions we have to import multiple dependencies. That dependency makes it possible to work with multiple data types:

    from alembic import op
    from sqlalchemy.sql import table, column
    from geoalchemy2.types import Geometry
    from sqlalchemy import String, Integer

    As for the upgrade and downgrade function, we have to specify the parameter of the table before inserting:

    def upgrade():
        op.execute('CREATE SCHEMA IF NOT EXISTS tiger')
        # Create an ad-hoc table to use for the insert statement.
        geomtable = table('geomtable',
            column('id', Integer),
            column('name', String),
            column('description', String),
            column('club', String),
            column('geom', Geometry(geometry_type='POINT', srid=4326))
        )
        op.bulk_insert(geomtable,
            [
                {'id':1, 'name':'Cristiano Ronaldo',
                        'description':'Le meilleur au monde', 
                        'club':'Juventus Turin',
                        "geom": "SRID=4326;POINT(7.742615 45.116177)"},
                {'id':2, 'name':'Zinedine Zidane',
                        'description':'Le meilleur de l"hstoire', 
                        'club':'Real Madrid',
                        "geom": "SRID=4326;POINT(-3.703790 40.416775)"},
                {'id':3, 'name':'Neymar Junior',
                        'description':'Le talent', 
                        'club':'Paris St Germain',
                        "geom": "SRID=4326;POINT(2.3488 48.8534)"},
            ]
        )  
    
    def downgrade():
         op.execute(
            """DELETE FROM geomtable WHERE name IN (
       'Cristiano Ronaldo' ,
       'Zinedine Zidane',
       'Neymar Junior'
                   );"""
        )

    Notice that in the upgrade function, the table is defined first. With the table defined, we can use the op.bulk_insert directive to be able to insert the data after. As for the downgrade function, it will delete all the rows. 

    To get the current version of your alembic:


    alembic current


    For inserting the rows and upgrading to the current version, we can run the command:


    alembic upgrade +1


    You can downgrade the version and delete the rows created in the upgrade by running:


    alembic downgrade -1


    Now let’s use some real-world situations by using some real data. First, let’s download the sociological district and the wifi network terminals data here:

    https://donnees.montreal.ca/ville-de-montreal/couverture-du-reseau-wifi-mtlwifi

    https://donnees.montreal.ca/ville-de-montreal/quartiers-sociologiques

    As we have already set up a PostgreSQL instance with the docker-compose file, let’s add the data of those 2 files in the database. To make the import work, we will use gdal by writing those 2 commands and store them in a bash file with the .sh extension:

    #!/bin/bash
    
    ogr2ogr -f "PostgreSQL" PG:"host=localhost port=5598 user=postgres dbname=postgres password=bidone" ./geodata/gdoqsocio2014shapfile/ShapFile/Quartiers_sociologiques_2014.shp \
      -lco PRECISION=no -nlt PROMOTE_TO_MULTI -nln quartiers_sociologiques -overwrite
    
    ogr2ogr -f "PostgreSQL" PG:"host=localhost port=5598 user=postgres dbname=postgres password=bidone" ./geodata/mtlwifi_bornes/mtlwifi_bornes.shp \
      -lco PRECISION=no -nln wifi_points -overwrite

    As we want the data to have the geometry type we also have to add another service that will add to the database the PostGIS extension and the pgh3 extension (we will talk about the h3 in the last article). As for the PostGIS extension, it’s mandatory to add it, otherwise, the geometry column will have bytea and not geometry as the type. Let’s add the service with theses commands:

    setup_db_ext:
        image: ubuntu/postgres
        depends_on:
          - db
        command: |
          /bin/bash -c
            "sleep 3 && \
            PGPASSWORD=mypassword psql -U postgres -h db postgres -c 'create extension IF NOT EXISTS postgis; create extension IF NOT EXISTS pgh3;'"

    With that method, we don’t need to define a geoalchemy model and insert the data one by one but instead, we just use a simple command that simplifies our work. 

    Transform the data with Alembic

    As you can see the data that we imported don’t have the same projection and as we want to combine those data we should use the same projection. As we want the projection to be EPSG:4326 let’s just modify the table mtlwifi_bornes. Also, by using ogr2ogr the geometry column is named by default wkb_geometry. We like to change the name to geom. After creating a new alembic file with alembic revision let’s add the changes:

    def upgrade():
         # Create an ad-hoc table to use for the insert statement.
        mtlwifi = table('mtlwifi_bornes',
            column('id_0', Integer),
            column('geom', Geometry(geometry_type='POINT', srid=32188)),
            column('id', String),
            column('lieu', String),
            column('latitude', String),
            column('longitude', String),
            column('x', String),
            column('y', String),
            column('type', String),
            column('arrondisse', String),
            column('zone activ', String),
        )
    
        op.execute(
            'alter table "mtlwifi_bornes" rename column wkb_geometry to geom'
        )
    
        op.execute(
            'alter table "mtlwifi_bornes" alter column geom type geometry(POINT, 4326) using st_transform(geom, 4326)'
        )
    
    def downgrade():
        op.execute(
            'alter table "mtlwifi_bornes" alter column geom type geometry(POINT, 32188) using st_transform(geom, 32188)'
        )
    
        op.execute(
            'alter table "mtlwifi_bornes" rename column geom to wkb_geometry'
        )

    As we saw in the previous lines there are multiple ways of inserting data with Alembic but the best is always the fastest right?

    The first one is to insert the data one by one. As this method can work it doesn’t scale very well. Imagine if you would have a million of data to enter it will take forever. If you wanna use this method make sure your dataset is very small to be effective. Also, the process is the same if you have to insert a polygon or a circle, it can become overwhelming. 

    The second method that we saw is to use an insert script using ogr2ogr coming from the GDAL library. This method is highly effective and can help in the development process by saving you tremendous time. The GDAL documentation can be found here:

    https://gdal.org/index.html

    That’s it for this tutorial. Here is the last part of this series on Alembic !!

    https://anagraph.io/en/articles/alembic_in_2021_part3/