rabpro

rabpro (core.py)

Class for running rabpro commands on your data.

class rabpro.core.profiler(coords, da=None, name='unnamed', path_results=None, verbose=True, update_gee_metadata=True)

The profiler class organizes data and methods for using rabpro. This is a parent class to the centerline and point classes which inherit profiler methods and attributes.

Attributes:
coordstuple OR list OR str OR GeoDataFrame

Coordinates of point(s) to delineate. A single point may be provided as a (lat, lon) tuple. Point(s) may also be provided via a shapefile or .csv file. If provided as .csv file, the columns must be labeled ‘latitude’ and ‘longitude’. If a shapefile path is provided, the shapefile may also contain columns for widths and/or along-valley distances. These columns should have “width” and “distance” somewhere in their column names, respectively, in order for rabpro to exploit them.

danumber

Represents the drainage area in square kilometers of the downstream-most point in the provided coords.

namestr

Name of location/river, used for filename saving purposes only.

verbosebool

If True, will print updates as processing progresses.

update_gee_metadatabool

If True, will attempt to download the latest GEE Data Catalog metadata.

Methods

delineate_basin:

Computes the watersheds for each lat/lon pair

elev_profile:

Compute the elevation profile

basin_stats:

Computes watershed statistics

export:

Exports computed data

basin_stats(datasets, gee_feature_path=None, reducer_funcs=None, folder=None, test=False)

Computes watershed statistics.

Parameters:
datasetslist of Dataset

Datasets to compute stats over. See the subbasin_stats.Dataset class

reducer_funcslist of functions

List of functions to apply to each feature over each dataset. Each function should take in an ee.Feature() object. For example, this is how the function and header are applied on a feature: feature.set(f.__name__, function(feature))

folderstr

Google Drive folder to store results in

delineate_basin(search_radius=None, map_only=False, force_merit=False, force_hydrobasins=False)

Computes the watersheds for each lat/lon pair and adds their drainage areas to the self.gdf GeoDataFrame.

There are two methods available for delineating basins: HydroBASINS and MERIT. HydroBASINS is appropriate for large basins (1000 km^2 and larger), and MERIT can provide more detailed basin delineations for smaller basins. The method used depends on the size of the basin, which is interpreted via the provided drainage area. If no drainage area was provided, HydroBASINS will be used. Otherwise, if the provided drainage area is less than 1000 km^2, MERIT will be used. MERIT may also be forced for larger basins using the ‘force_merit’ argument when instantiating the profiler.

Parameters:
search_radiusnumeric, optional

in meters, by default None

map_onlybool, optional

If we only want to map the point and not delineate the basin, by default False

force_meritbool, optional

Forces the use of MERIT to delineate basins.

force_hydrobasinsbool, optional

Forces the use of HydroBASINS to delineate basins.

Warns:
UserWarning

If no drainage area provided

RuntimeWarning

If MERIT or HydroBasins data is not available, or other issues

elev_profile(dist_to_walk_km=None)

Compute the elevation profile. The profile is computed such that the provided coordinate is the centerpoint (check if this is true).

Parameters:
dist_to_walk_kmnumeric

The distance to trace the elevation profile from the provided point. This distance applies to upstream and downstream–i.e. the total profile distance is twice this value. If not specified, will be automatically computed as 10 channel widths from provided DA value if not specified OR 5 km, whichever is larger.

export(what='all')

Exports data computed by rapbro.

Parameters:
whatlist, optional

Which data should be exported? Choose from ‘all’ - all computed data ‘flowline’ - flowline json is exported with elevation and distance attributes ‘watershed’ - watershed polygon The default is ‘all’.

Data utility functions (data_utils.py)

rabpro.data_utils.does_hydrobasins_exist(datapaths)

Checks if level 1 and level 12 HydroBasins data are available.

rabpro.data_utils.does_merit_exist(datapaths)

Checks if any MERIT tiles are in the MERIT data directories. Also checks if the vrts have been built. Returns the number of MERIT layers that have data (maximum of 4).

rabpro.data_utils.download_hydrobasins(datapath=None, proxy=None)

Downloads HydroBASINS to the proper location, then unzips and deletes zip file.

Parameters:
datapathstr or Path, optional

Directory to download and unzip HydroBasins data into; does not include filename. If None, will use the rabpro default.

proxystr, optional

Pass a proxy to requests.get, by default None

Examples

from rabpro import data_utils
data_utils.hydrobasins()
rabpro.data_utils.download_merit_hydro(merit_tile, username, password, proxy=None)

Downloads the four required rabpro layers for a given MERIT-Hydro 30 degree x 30 degree tile. Each tile contains 5 degree x 5 degree geotiffs. Data are downloaded and untarred into the location and filestructure rabpro expects. Virtual rasters are rebuilt after downloading this data.

Parameters:
merit_tilestr

The MERIT-Hydro tile to fetch. e.g. “n45w100”

usernamestr

The MERIT-Hydro username (must request from MERIT-Hydro developers).

passwordstr

The MERIT-Hydro password (must request from MERIT-Hydro developers).

proxystr, optional

A proxy to pass to requests Session.

Elevation profile computation (elev_profile.py)

rabpro.elev_profile.main(gdf, dist_to_walk_km, verbose=False, nrows=50, ncols=50)

Compute the elevation profile. The profile is computed such that the provided coordinate is the centerpoint (check if this is true).

Parameters:
gdfGeoDataFrame

Starting point geometry. Should have a column called ‘DA’ that stores drainage areas. Should be in EPSG 4326 for use of the Haversine formula

dist_to_walk_kmnumeric

Distance in kilometers to walk up- and downstream of the provided, mapped point to resolve the flowline

verbosebool

Defaults to False.

nrowsint

Number of rows in the neighborhood of the point to search. Defaults to 50.

ncolsint

Number of columns in the neighborhood of the point to search. Defaults to 50.

Returns:
gdfGeoDataFrame

The original point layer geometry.

flowlineGeoDataFrame

The elevation profile geometry.

Warns:
DeprecationWarning

If a list of coordinates is passed

Examples

import rabpro
coords = (56.22659, -130.87974)
rpo = rabpro.profiler(coords, name='basic_test')
gdf, flowline = rabpro.elev_profile.main(rpo.gdf, dist_to_walk_km=5)

MERIT utilities (merit_utils.py)

Utility functions for dealing with MERIT datasets

rabpro.merit_utils.idcs_to_geopolygons(idcs, gdobj, buf_amt=0.001)

Given a list of of pixel indices within a raster specified by gdobj, creates georeferenced polygons of the blobs formed by the union of the pixels.

“Wrapping” is also checked - this is to handle cases where the dateline meridian is crossed and return is therefore a set of polygons rather than a continuous one.

Parameters:
idcslist of integers

Pixel indices within the raster specified by gdobj that should be included in the polygon.

gdobjosgeo.gdal.Dataset

Object created by gdal.Open() on a raster or virtual raster.

buf_amtnumeric, optional

Amount by which to buffer pixels before unioning-helps close tiny gaps. By default 0.001.

Returns:
pgonslist of shapely.geometry.Polygon

List of georeferenced polygons; one per blob of indices.

crossingbool

If True, the polygons represent those that cross the dateline meridian (i.e. 180 degrees -> -180 degrees) and have been split.

rabpro.merit_utils.map_cl_pt_to_flowline(lonlat, da_obj, nrows, ncols, da=None, basin_pgon=None, fdr_obj=None, fdr_map=None)

Maps a point of known drainage area to a flowline of a flow accumulation grid. Returns the row, col of the mapped-to pixel. User may provide a basin polygon (in EPSG:4326) if already known. This polygon will be used to ensure the mapped-to-flowline is the correct one. If the basin polygon is provided, a flow directions object and its mapping must also be provided as well as the drainage area.

Parameters:
lonlatlist or tuple

Two-element list/tuple containing (longitude, latitude) coordinates of the point to map to a flowline.

da_objosgeo.gdal.Dataset

Flow accumulation object. Created by gdal.Open() on raster containing flow accumulations.

nrowsint

Number of rows in the neighborhood of the point to search.

ncolsint

Number of columns in the neighborhood of the point to search.

dafloat, optional

Drainage area of the point/gage if known. Units should correspond to those in da_obj, typically km^2. By default None.

basin_pgonshapely.geometry.polygon.Polygon, optional

Polygon of the watershed of the point, if known. By default None.

fdr_objosgeo.gdal.Dataset, optional

Flow direction object. Created by gdal.Open() on raster containing flow directions. Must be specified in order to use the basin_pgon. By default None.

fdr_maplist, optional

8-entry list corresponding to the numeric value for flow directions. The list should take the form [NW, N, NE, W, E, SW, S, SE]. By default None.

Returns:
(c_mapped, r_mapped)tuple or None

x and y coordinates of the mapped points. If no mapping is possible, None is returned. The x and y coordinates are with respect to the fac_obj.

solve_methodint

Indicates the reason why mapping succeeded/failed: 1 - (success) DA provided; a nearby flowline pixel was found within 15%

of the provided DA

2 - (success) DA provided; match was found on a nearby flowline that is

within our DA certainty bounds

3 - (success) basin polygon provided; a mappable flowline was found 4 - (success) DA not provided; mapped to the nearest flowline (>1km^2) 5 - (fail) DA not provided; no nearby flowlines exist 6 - (fail) DA provided; but no nearby DAs were close enough to map to 7 - (fail) basin polygon provided; but no nearby DAs were within the allowable

range

8 - (fail) basin polygon provided; no flowlines were 25% within the provided

basin

Warns:
UserWarning

No drainage area, flow, directions, or flow directions map provided

rabpro.merit_utils.neighborhood_vals_from_raster(cr, shape, vrt_obj, nodataval=<Mock name='mock.nan' id='139955189116144'>, wrap=None)

Queries a (virtual) raster object to return an array of neighbors surrounding a given point specified by cr (column, row). A shape can be provided to return as large of a neighborhood as desired; both dimensions must be odd. This function is almost always unnecessary and could be replaced with a single call to gdal’s ReadAsArray(), except that throws errors when requesting a neighborhood that is beyond the boundaries of the raster. Also note that requests for negative offsets do not throw errors, which is dangerous. This function checks for those cases and handles them.

An option is provided to ‘wrap’ in cases where neighborhoods are requested beyond the bounds of the raster. In these cases, the raster is effectively “grown” by appending copies of itself to ensure no nodata are returned. (This isn’t actually how the code works, just the simplest explanation.)

Parameters:
crtuple

(column, row) indices within the virtual raster specifying the point around which a neighborhood is requested.

shapetuple

Two-element tuple (nrows, ncols) specifying the shape of the neighborhood around cr to query.

vrt_objgdal.Dataset

Dataset object pointing to the raster from which to read; created by gdal.Open(path_to_raster).

nodatavalobject

Value to assign neighbors that are beyond the bounds of the raster. By default np.nan.

wrapstr or None

String of ‘h’, ‘v’, or ‘hv’ denoting if horizontal and/or vertical wrapping is desired. If None, no wrapping is performed. By default None.

Returns:
Ivalsnp.array

Array of same dimensions as shape containing the neighborhood values.

rabpro.merit_utils.trace_flowpath(fdr_obj, da_obj, cr_stpt, dist_to_walk_km=1, fmap=[32, 64, 128, 16, 1, 8, 4, 2])

Returns a flowpath of pixels up- and downstream of cr_stpt a distance of dist_to_walk_km in each direction (unless the end of the flowpath is reached). Flowpath is returned US->DS ordered.

Parameters:
fdr_objosgeo.gdal.Dataset

flow direction object opened with gdal.Open(). Assumes flow direction symbology matches MERIT-Hydro: 32 64 128 16 1 8 4 2

da_objosgeo.gdal.Dataset

drainage area object opened with gdal.Open()

cr_stptnumpy.ndarray

column, row of point to start walk

cr_enptnumpy.ndarray, optional

column, row of point to end walk. By default None

n_stepsint, optional

number of steps (pixels) to walk before halting. By default None

fmaplist, optional

[NW, N, NE, W, E, SW, S, SE], by default [32, 64, 128, 16, 1, 8, 4, 2]

Returns:
tuple

(rows, cols) of flowline.

Basin statistics (basin_stats.py)

Computes basin statistics using Google Earth Engine.

class rabpro.basin_stats.Dataset(data_id, band='None', resolution=None, start=None, end=None, stats=None, time_stats=None, mask=False, mosaic=False, prepend_label='', gee_type=None)

Represents one band of a GEE dataset with parameters specifying how to compute statistics.

Attributes:
data_idstr

Google Earth Engine dataset asset id

bandstr, optional

Google Earth Engine dataset band name

resolutionint, optional

Desired resolution in meters of the calculation over the dataset. Defaults to native resolution of the dataset.

startstr, optional

Desired start date of data in ISO format: YYYY-MM-DD. Defaults to None which GEE interprets as the dataset start.

endstr, optional

Desired end date of data in ISO format: YYYY-MM-DD. Defaults to None which GEE interprets as the dataset end.

statslist, optional

List of desired stats to compute: min, max, range, mean, count, std, sum, and percentiles in the following format: pct1, pct90, pct100, etc. Computes [“count”, “mean”] by default.

time_statslist, optional

List of desired stats to compute through time.

maskbool, optional

Whether or not to mask out water in the dataset using the Global Surface Water occurrence band. By default False.

mosaicbool, optional

If True, the imageCollection in data_id will be mosaiced first. Useful when the imageCollection is a set of tiled images. Do not use for time-varying imageCollections.

prepend_labelstr, optional

Text to prepend to the exported statistics file.

gee_typestr, optional

Either ‘image’ or ‘imagecollection’.

__str__()

Return str(self).

rabpro.basin_stats.compute(dataset_list, gee_feature_path=None, gee_featureCollection=None, basins_gdf=None, reducer_funcs=None, folder=None, filename=None, verbose=False, test=False, validate_dataset_list=True)

Compute subbasin statistics for each dataset and band specified. One of gee_feature_path, gee_featureCollection, or basins_gdf must be specified.

Parameters:
dataset_listlist of Datasets

List of rabpro Dataset objects to compute statistics over.

basins_gdfGeoDataFrame

Table of subbasin geometries.

gee_feature_pathstring

Path to a GEE feature collection

gee_featureCollectionee.FeatureCollection object

FeatureCollection object.

reducer_funcslist of functions, optional

List of functions to apply to each feature over each dataset. Each function should take in an ee.Feature() object. For example, this is how the function and header are applied on a feature: feature.set(f.__name__, function(feature))

verbosebool, optional

By default False.

folderstr, optional

Google Drive folder to store results in, by default top-level root.

filenamestr, optional

Name of the GEE export file.

testbool, optional

Return results to the active python session in addition to GDrive

validate_dataset_list: bool, optional

Validate the dataset_list against a scraped version of the GEE catalog?

Examples

import rabpro
import numpy as np
import geopandas as gpd
from shapely.geometry import box
from rabpro.basin_stats import Dataset

total_bounds = np.array([-85.91331249, 39.42609864, -85.88453019, 39.46429816])
gdf = gpd.GeoDataFrame({"idx": [1], "geometry": [box(*total_bounds)]}, crs="EPSG:4326")

# defaults
data, task = rabpro.basin_stats.compute(
    [
        Dataset(
            "JRC/GSW1_4/MonthlyRecurrence",
            "monthly_recurrence",
        )
    ],
    basins_gdf = gdf,
    test = True,
)

# with time_stats specified
data, task = rabpro.basin_stats.compute(
    [
        Dataset(
            "JRC/GSW1_4/MonthlyRecurrence",
            "monthly_recurrence",
            time_stats = ["median"]
        )
    ],
    basins_gdf = gdf,
    test = True,
)
rabpro.basin_stats.dataset_to_filename(prepend, data_id, band=None)

Examples

import rabpro
rabpro.basin_stats.dataset_to_filename("ECMWF/ERA5_LAND/MONTHLY", "temperature_2m")
rabpro.basin_stats.fetch_gee(url_list, prepend_list, col_drop_list=[], col_protect_list=['id_basin', 'id_outlet', 'idx', 'id', 'vote_id'], col_drop_defaults=['DA', 'count', '.geo', 'da_km2'])

Download and format data from GEE urls

Parameters:
url_listlist

list of urls returned from compute

prepend_listlist

tags to pre-append to corresponding columns, of length url_list

col_drop_listlist, optional

custom columns to drop, by default []

col_protect_listlist, optional

columns to avoid tagging, by default [“id_basin”, “id_outlet”, “idx”, “id”, “vote_id”]

col_drop_defaultslist, optional

built-in columns to drop, by default [“DA”, “count”, “.geo”, “system:index”, “da_km2”]

Returns:
DataFrame

Examples

import numpy as np
import geopandas as gpd
from shapely.geometry import box

import rabpro
from rabpro.basin_stats import Dataset

total_bounds = np.array([-85.91331249, 39.42609864, -85.88453019, 39.46429816])
basin = gpd.GeoDataFrame({"idx": [0], "geometry": [box(*total_bounds)]}, crs="EPSG:4326")

dataset_list = [
    Dataset("ECMWF/ERA5_LAND/MONTHLY", "temperature_2m", stats=["mean"], time_stats = ["median"]),
    Dataset("NASA/GPM_L3/IMERG_MONTHLY_V06", "precipitation", stats=["mean"], time_stats = ["median"]),
]

urls, tasks = rabpro.basin_stats.compute(
    dataset_list, basins_gdf=basin, folder="rabpro"
)

tag_list = ["temperature", "precip"]
data = rabpro.basin_stats.fetch_gee(urls, tag_list, col_drop_list = ["system:index"])
rabpro.basin_stats.format_gee(df_list, prepend_list, col_drop_list=[], col_protect_list=['id_basin', 'id_outlet', 'idx', 'id', 'vote_id'], col_drop_defaults=['DA', 'count', '.geo', 'da_km2'])
Parameters:
df_listlist

list of DataFrames

prepend_listlist

tags to pre-append to corresponding columns, of length url_list

col_drop_listlist, optional

custom columns to drop, by default []

col_protect_listlist, optional

columns to avoid tagging, by default [“id_basin”, “id_outlet”, “idx”, “id”, “vote_id”]

col_drop_defaultslist, optional

built-in columns to drop, by default [“DA”, “count”, “.geo”, “system:index”, “da_km2”]

Returns:
DataFrame
rabpro.basin_stats.image(dataset_list, gee_feature_path=None, basins_gdf=None, categorical=[False], verbose=False)

Download a GEE raster as a GeoTiff clipped to a basins GeoDataFrame

Parameters:
dataset_listlist of Datasets

List of Dataset objects to compute statistics over.

gee_feature_pathstr, optional

Path to a GEE feature collection, by default None

basins_gdfGeoDataFrame, optional

Table of subbasin geometries, by default None

categoricallist, optional

By default [False]

verbosebool, optional

By default False

Returns:
list

of GeoTiff download urls

Examples

import rabpro
from rabpro.basin_stats import Dataset

import numpy as np
from shapely.geometry import box

total_bounds = np.array([-85.91331249, 39.2, -85.5, 39.46429816])
gdf = gpd.GeoDataFrame({"idx": [1], "geometry": [box(*total_bounds)]}, crs="EPSG:4326")

dataset_list = [
    Dataset("ECMWF/ERA5_LAND/MONTHLY", "temperature_2m", time_stats=["median"])
]
urls, tasks = rabpro.basin_stats.image(dataset_list, basins_gdf=gdf)
basin_stats._fetch_raster(urls[0])

Basin delineation (basins.py)

Functions to calculate basin geometries.

rabpro.basins.load_continent_basins(gdf, level_one, level_twelve)

Load a HydroBasins continent polygon

Parameters:
gdfGeoDataFrame

Geometry data. Should be in EPSG:4326.

level_onestr

Path to level 1 HydroBasins data

level_twelvestr

Path to level 12 HydroBasins data

Returns:
GeoDataFrame

HydroBasins

Warns:
UserWarning

If coordinate does not lie in the HydroBasins polygons

rabpro.basins.main_hb(gdf, verbose=False)

Calculates basins using HydroBASINS

Parameters:
gdfGeoDataFrame

Contains geometry and DA (drainage area) columns.

verbosebool, optional

By default False

Returns:
subbasins_gdfGeoDataFrame

Contains subbasin geometries

sb_inc_gdfGeoDataFrame

Contains the polygons of the incremental catchments. The upstream-most basin will be the largest polygon in most cases, but that depends on the input centerline.

cl_dasnumpy.ndarray

Drainage areas

Warns:
RuntimeWarning

If gdf has no CRS defined

Examples

import rabpro
coords = (56.22659, -130.87974)
rpo = rabpro.profiler(coords, name='basic_test')
test = rabpro.basins.main_hb(rpo.gdf)
rabpro.basins.main_merit(gdf, da, nrows=51, ncols=51, map_only=False, verbose=False)

Calculates basins using MERIT

Parameters:
gdfGeoDataFrame

Centerline coordinates

danumeric

Drainage area in km^2

nrowsint

by default 51

ncolsint

by default 51

map_onlybool

If True, will map the coordinate to a MERIT flowline but not delineate the basin. The default is False.

verbosebool

by default False

Returns:
basinsGeoDataFrame

The delineated watershed

mappeddict

Information about the mapping quality.

Utility functions (utils.py)

rabpro.utils.area_4326(pgons_4326)

Given a list of shapely polygons in EPSG:4326, compute the area in km^2. Only returns the area of the perimeter; does not yet account for holes in the polygon. Mutlipolygons are supported.

rabpro.utils.build_gee_vector_asset(basins, out_path='basins.zip')

Create zipped shapefile for uploading as a Google Earth Engine vector asset.

Parameters:
basinsGeoDataFrame
out_pathstr, optional

by default “basins.zip”

Returns:
str

path to zip file

Examples

from rabpro import utils
import geopandas as gpd
basins = gpd.read_file("tests/results/merit_test/subbasins.json")
utils.build_gee_vector_asset(basins)
rabpro.utils.build_virtual_rasters(datapaths, skip_if_exists=False, verbose=True, **kwargs)

Builds virtual rasters on the four MERIT-Hydro tilesets.

Parameters:
datapaths: dict

Contains the paths to the data. Generate with get_datapaths().

skip_if_exists: bool, optional

If True, will not rebuild the virtual raster if one already exists. The default is False.

verbose: bool, optional

If True, will provide updates as virtual rasters are built.

**kwargs

Arguments passed to the build_vrt function.

Warns:
RuntimeWarning

Missing data

Examples

from rabpro import utils
d_paths = utils.get_datapaths()
utils.build_virtual_rasters(d_paths, extents=[-180.00041666666667, 179.99958333333913, 84.99958333333333, -60.000416666669], verbose=False)
rabpro.utils.build_vrt(tilespath, clipper=None, extents=None, outputfile=None, nodataval=None, res=None, sampling='nearest', ftype='tif', separate=False, quiet=False)

Creates a text file for input to gdalbuildvrt, then builds vrt file with same name. If output path is not specified, vrt is given the name of the final folder in the path.

Parameters:
tilespathstr

the path to the file (or folder of files) to be clipped– if tilespath contains an extension (e.g. .tif, .vrt), then that file is used. Otherwise, a virtual raster will be built of all the files in the provided folder. if filespath contains an extension (e.g. .tif, .vrt), filenames of tiffs to be written to vrt. This list can be created by tifflist and should be in the same folder

clipperstr, optional

path to a georeferenced image, vrt, or shapefile that will be used to clip. By default None

extentslist, optional

the extents by which to crop the vrt. Extents should be a 4 element list: [left, right, top, bottom] in the same projection coordinates as the file(s) to be clipped. By default None

outputfilestr, optional

path (including filename w/ext) to output the vrt. If none is provided, the vrt will be saved in the ‘filespath’ path. By default None

nodatavalint, optional

value to be masked as nodata, by default None

resflt, optional

resolution of the output vrt (applied to both x and y directions), by default None

samplingstr, optional

resampling scheme (nearest, bilinear, cubic, cubicspline, lanczos, average, mode), by default “nearest”

ftypestr, optional

“tif” if building from a list of tiffs, or “vrt” if building from a vrt, by default “tif”

separatebool, optional

See separate argument to gdalbuildvrt, by default False

quietbool, optional

Set True to print progress, by default False

Returns:
str

path of the built virtual raster

Raises:
TypeError

Unsupported file type passed in ‘ftype’

RuntimeError

No files found to build raster or raster build fails

rabpro.utils.coords_to_merit_tile(lon, lat)

Identify MERIT-Hydro “tiles” of interest. See http://hydro.iis.u-tokyo.ac.jp/~yamadai/MERIT_Hydro/

Parameters:
lonfloat
latfloat

Examples

from rabpro import utils
utils.coords_to_merit_tile(178, -17)
# > "s30e150"
utils.coords_to_merit_tile(-118, 32)
# > "n30w120"
utils.coords_to_merit_tile(-97.355, 45.8358)
# > "n30w120"
rabpro.utils.crop_binary_coords(coords)

Crops an array of (row, col) coordinates (e.g. blob indices) to the smallest possible array. Taken from RivGraph.im_utils

Parameters:
coords: np.array

N x 2 array. First column are rows, second are columns of pixel coordinates.

Returns:
I: np.array

Image of the cropped coordinates, plus padding if desired.

clipped: list

Number of pixels in [left, top, right, bottom] direction that were clipped. Clipped returns the indices within the original coords image that define where I should be positioned within the original image.

rabpro.utils.dist_from_da(da, nwidths=20)

Returns the along-stream distance of a flowline to resolve for a given DA. An empirical formula provided by https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2013WR013916 (equation 15) is used to estimate width.

Parameters:
dafloat or numeric

drainage area in km^2

nwidthsnumeric

number of channel widths to set distance

Returns:
dist: float

Distance in kilometers that represents nwidths*W_bankfull where W_bankfull computed according to Wilkerson et al., 2014.

rabpro.utils.get_datapaths(root_path=None, config_path=None, force=False, update_gee_metadata=False)

Returns a dictionary of paths to all data that rabpro uses. Also builds virtual rasters for MERIT data and downloads latest version of GEE catalog.

Parameters:
root_path: string, optional

Path to rabpro Data folder that contains the HydroBASINS, MERIT-Hydro, and/or gee catalog jsons. Will read from an environment variable “RABPRO_DATA”. If this variable is not set, uses appdirs to create a local data directory. This path is the parent directory for the MERIT and HydroBasins data directories.

config_path: string, optional

Path to rabpro config folder. Will read from an environment variable “RABPRO_CONFIG”. If not set, uses appdirs to create local directory.

force: boolean, optional

Set True to override datapath caching. Otherwise only fetched once per py session.

update_gee_metadata: boolean, optional

If True, will attempt to download the latest GEE dataset metadata.

Returns:
dict

contains paths to all data that rabpro uses

Examples

from rabpro import utils
utils.get_datapaths()
rabpro.utils.get_exportpaths(name, basepath=None, overwrite=False)

Returns a dictionary of paths for exporting rabpro results. Also creates “results” folders when necessary.

Parameters:
namestr

Name of directory to create within “results” directory.

basepathstr, optional

path to put “results” directory. By default None. If None, creates in current working directory.

overwritebool, optional

overwrite “name” directory, by default False

Returns:
dict

contains paths to all output that rabpro generates

rabpro.utils.haversine(lats, lons)

Computes distances between latitude and longitude pairs of points.

Parameters:
latsnumpy array (or interpretable by numpy)

latitude values

lonsnumpy array (or interpretable by numpy)

longitude values

Returns:
numpy.ndarray

Distance in meters between each point defined by lats, lons.

rabpro.utils.lonlat_plus_distance(lon, lat, dist, bearing=0)

Returns the lon, lat coordinates of a point that is dist away (dist in km) and with a given bearing (in degrees, 0 is North). Uses a Haversine approximation.

rabpro.utils.raster_extents(raster_path)

Output raster extents as [xmin, xmax, ymin, ymax]

Parameters:
raster_pathstr

Path to file

Returns:
list

[xmin, xmax, ymin, ymax]

Examples

from rabpro import utils
utils.raster_extents(utils.get_datapaths(rebuild_vrts=False)["DEM_fdr"])
rabpro.utils.union_gdf_polygons(gdf, idcs, buffer=True)

Given an input geodataframe and a list of indices, return a shapely geometry that unions the geometries found at idcs into a single shapely geometry object.

This function also buffers each polygon slightly, then un-buffers the unioned polygon by the same amount. This is to avoid errors associated with floating-point round-off; see here: https://gis.stackexchange.com/questions/277334/shapely-polygon-union-results-in-strange-artifacts-of-tiny-non-overlapping-area

Parameters:
gdfGeoDataFrame

Geometries to combine

idcslist

Indicates which geometris in ‘gdf’ to combine

bufferbool, optional

buffer polygons, by default True

Returns:
shapely.geometry object

Union of passed geometries

rabpro.utils.upload_gee_tif_asset(tif_path, gee_user, gcp_bucket, title, gcp_folder='', gee_folder='', time_start='', epsg='4326', description='', citation='', gcp_upload=True, gee_upload=True, dry_run=False, gee_force=False)

Upload a GeoTIFF file as a GEE raster asset

Parameters:
tif_pathstr

Path to GeoTIFF file

gee_userstr

Google Earth Engine user name

gcp_bucketstr

Google Cloud Platform bucket url

titlestr

GEE asset title

gcp_folderstr, optional

Google Cloud Platform bucket folder, by default “”

gee_folderstr, optional

Google Earth Engine asset folder, by default “”

time_startstr, optional

YYYY-MM-DD, by default “”

epsgstr, optional

EPSG CRS code, by default “4326”

descriptionstr, optional

GEE asset description text, by default “”

citationstr, optional

GEE asset citation text, appended to description, by default “”

gcp_uploadbool, optional

Set False to skip GCP uploading, by default True

gee_uploadbool, optional

Set False to skip GEE uploading, by default True

dry_runbool, optional

Set True to skip GCP and GEE uploading, by default False

gee_forcebool, optional

Set True to overwrite any existing GEE assets, by default False

Returns:
NoneType

None

Warns:
RuntimeWarning

Examples

from rabpro import utils
utils.upload_gee_tif_asset("my.tif", "my_gee_user", "my_gcp_bucket")
rabpro.utils.upload_gee_vector_asset(zip_path, gee_user, gcp_bucket, gcp_folder='', gcp_upload=True, gee_upload=True)

Upload a zipped shapefile as a GEE vector asset

Parameters:
zip_pathstr

Path to zipped shapefile.

gee_userstr

Google Earth Engine user name.

gcp_bucketstr

Google Cloud Platform bucket url (e.g. gs://my_bucket_name).

gcp_folderstr, optional

Google Cloud Platform bucket folder, by default “”

gcp_uploadbool, optional

Set False to skip GCP uploading, by default True

gee_uploadbool, optional

Set False to skip GEE uploading, by default True

Returns:
str

GEE asset path

Raises:
RuntimeError

Throws error if gsutil is not installed or authenticated.

Examples

from rabpro import utils
utils.upload_gee_vector_asset("test.zip", "my_gee_user", "my_gcp_bucket")
rabpro.utils.validify_polygons(polys)

Hacky ways to validify a (multi)polygon. If can’t be validified, returns the original.

Parameters:
geomlist

List of shapely.geometry.Polygon

Returns:
geomsvlist

List of shapely.geometry.Polygon that have been attempted to validify.

rabpro.utils.xy_to_coords(xs, ys, gt)

Transforms a set of x and y coordinates to their corresponding coordinates within a geotiff image.

Parameters:
xsnumpy.ndarray

x coordinates to transform

ysnumpy.ndarray

y coordinates to transform

gttuple

6-element tuple gdal GeoTransform. (uL_x, x_res, rotation, ul_y, rotation, y_res). Automatically created by gdal’s GetGeoTransform() method.

Returns:
cx, cytuple of ints

Column and row indices of the provided coordinates.