| Title: | Access and Analysis of Brazilian CNEFE Address Data |
|---|---|
| Description: | Download, cache and read municipality-level address data from the Cadastro Nacional de Enderecos para Fins Estatisticos (CNEFE) of the 2022 Brazilian Census, published by the Instituto Brasileiro de Geografia e Estatistica (IBGE) <https://ftp.ibge.gov.br/Cadastro_Nacional_de_Enderecos_para_Fins_Estatisticos/>. Beyond data access, provides spatial aggregation of addresses, computation of land-use mix indices, and dasymetric interpolation of census tract variables using CNEFE dwelling points as ancillary data. Results can be produced on 'H3' hexagonal grids or user-supplied polygons, and heavy operations leverage a 'DuckDB' backend with extensions for fast, in-process execution. |
| Authors: | Jorge Ubirajara Pedreira Junior [aut, cre, cph] (ORCID: <https://orcid.org/0000-0002-8243-5395>), Bruno Mioto [aut], Kaio Cunha Pedreira [ctb] |
| Maintainer: | Jorge Ubirajara Pedreira Junior <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.3.0.9000 |
| Built: | 2026-06-03 03:07:19 UTC |
| Source: | https://github.com/pedreirajr/cnefetools |
clear_cache_muni() removes CNEFE ZIP files stored in the user cache
directory by cnefe_counts(), compute_lumi(), tracts_to_h3(), and
related functions.
clear_cache_muni(code_muni = "all", verbose = TRUE)clear_cache_muni(code_muni = "all", verbose = TRUE)
code_muni |
Integer or |
verbose |
Logical; if |
Invisibly, the character vector of deleted file paths.
# Delete all cached CNEFE ZIPs clear_cache_muni() # Delete only the ZIP for Lauro de Freitas-BA clear_cache_muni(2919207)# Delete all cached CNEFE ZIPs clear_cache_muni() # Delete only the ZIP for Lauro de Freitas-BA clear_cache_muni(2919207)
clear_cache_tracts() removes census tract Parquet files stored in the
user cache directory by tracts_to_h3() and tracts_to_polygon().
clear_cache_tracts(uf = "all", verbose = TRUE)clear_cache_tracts(uf = "all", verbose = TRUE)
uf |
|
verbose |
Logical; if |
Invisibly, the character vector of deleted file paths.
# Delete all cached census tract Parquets clear_cache_tracts() # Delete only the Parquet for Bahia (several equivalent calls) clear_cache_tracts("BA") clear_cache_tracts(29) clear_cache_tracts(2919207) # municipality code → state resolved automatically# Delete all cached census tract Parquets clear_cache_tracts() # Delete only the Parquet for Bahia (several equivalent calls) clear_cache_tracts("BA") clear_cache_tracts(29) clear_cache_tracts(2919207) # municipality code → state resolved automatically
cnefe_counts() reads CNEFE records for a given municipality, assigns
each address point to spatial units (either H3 hexagonal cells or user-provided
polygons), and returns per-unit counts of COD_ESPECIE as addr_type1 to
addr_type8.
cnefe_counts( code_muni, year = 2022, polygon_type = c("hex", "user"), polygon = NULL, crs_output = NULL, h3_resolution = 9, verbose = TRUE, cache = TRUE, backend = c("duckdb", "r") )cnefe_counts( code_muni, year = 2022, polygon_type = c("hex", "user"), polygon = NULL, crs_output = NULL, h3_resolution = 9, verbose = TRUE, cache = TRUE, backend = c("duckdb", "r") )
code_muni |
Integer. Seven-digit IBGE municipality code. |
year |
Integer. The CNEFE data year. Currently only 2022 is supported. Defaults to 2022. |
polygon_type |
Character. Type of polygon aggregation: |
polygon |
An |
crs_output |
The CRS for the output object. Only used when
|
h3_resolution |
Integer. H3 grid resolution (default: 9). Only used when
|
verbose |
Logical; if |
cache |
Logical. If |
backend |
Character. |
The counts in the columns addr_type1 to addr_type8 correspond to:
addr_type1: Private household (Domicílio particular)
addr_type2: Collective household (Domicílio coletivo)
addr_type3: Agricultural establishment (Estabelecimento agropecuário)
addr_type4: Educational establishment (Estabelecimento de ensino)
addr_type5: Health establishment (Estabelecimento de saúde)
addr_type6: Establishment for other purposes (Estabelecimento de outras finalidades)
addr_type7: Building under construction or renovation (Edificação em construção ou reforma)
addr_type8: Religious establishment (Estabelecimento religioso)
An sf::sf object containing:
id_hex (when polygon_type = "hex"): H3 cell identifier
Original columns from polygon (when polygon_type = "user")
addr_type1 ... addr_type8: counts per address type
geometry: polygon geometry
When polygon_type = "user", the output CRS matches the original polygon CRS
(or crs_output if specified).
# Count addresses per H3 hexagon (resolution 9) hex_counts <- cnefe_counts(code_muni = 2929057, cache = FALSE) # Count addresses per user-provided polygon (neighborhoods of Lauro de Freitas-BA) # Using geobr to download neighborhood boundaries library(geobr) nei_ldf <- subset( read_neighborhood(year = 2022), code_muni == 2919207 ) hex_counts <- cnefe_counts( code_muni = 2919207, polygon_type = "user", polygon = nei_ldf, cache = FALSE )# Count addresses per H3 hexagon (resolution 9) hex_counts <- cnefe_counts(code_muni = 2929057, cache = FALSE) # Count addresses per user-provided polygon (neighborhoods of Lauro de Freitas-BA) # Using geobr to download neighborhood boundaries library(geobr) nei_ldf <- subset( read_neighborhood(year = 2022), code_muni == 2919207 ) hex_counts <- cnefe_counts( code_muni = 2919207, polygon_type = "user", polygon = nei_ldf, cache = FALSE )
Opens the bundled Excel data dictionary in the system's default spreadsheet viewer (e.g., Excel, LibreOffice).
cnefe_dictionary(year = 2022)cnefe_dictionary(year = 2022)
year |
Integer. The CNEFE data year. Currently only 2022 is supported. |
Invisibly, the path to the Excel file inside the installed package.
cnefe_dictionary()cnefe_dictionary()
Opens the bundled PDF methodological document in the system's default PDF viewer.
cnefe_doc(year = 2022)cnefe_doc(year = 2022)
year |
Integer. The CNEFE data year. Currently only 2022 is supported. |
Invisibly, the path to the PDF file inside the installed package.
cnefe_doc()cnefe_doc()
compute_lumi() reads CNEFE records for a given municipality,
assigns each address point to spatial units (either H3 hexagonal cells or
user-provided polygons), and computes the residential proportion (p_res) and land-use mix
indices, such as the Entropy Index (ei), the Herfindahl-Hirschman Index (hhi),
the Balance Index (bal), the Index of Concentration at Extremes (ice), the adapted HHI (hhi_adp),
and the Bidirectional Global-centered Index (bgbi), following the methodology proposed in
Pedreira Jr. et al. (2025).
compute_lumi( code_muni, year = 2022, polygon_type = c("hex", "user"), polygon = NULL, crs_output = NULL, h3_resolution = 9, verbose = TRUE, cache = TRUE, backend = c("duckdb", "r") )compute_lumi( code_muni, year = 2022, polygon_type = c("hex", "user"), polygon = NULL, crs_output = NULL, h3_resolution = 9, verbose = TRUE, cache = TRUE, backend = c("duckdb", "r") )
code_muni |
Integer. Seven-digit IBGE municipality code. |
year |
Integer. The CNEFE data year. Currently only 2022 is supported. Defaults to 2022. |
polygon_type |
Character. Type of polygon aggregation: |
polygon |
An |
crs_output |
The CRS for the output object. Only used when
|
h3_resolution |
Integer. H3 grid resolution (default: 9). Only used when
|
verbose |
Logical; if |
cache |
Logical. If |
backend |
Character. |
An sf::sf object containing:
polygon_type = "hex":id_hex: H3 cell identifier
p_res, ei, hhi, bal, ice, hhi_adp, bgbi: land-use
mix indicators
geometry: hexagon geometry (CRS 4326)
polygon_type = "user": Original columns from polygon
p_res, ei, hhi, bal, ice, hhi_adp, bgbi: land-use
mix indicators
geometry: polygon geometry (in the original or crs_output CRS)
Pedreira Jr., J. U.; Louro, T. V.; Assis, L. B. M.; Brito, P. L. Measuring land use mix with address-level census data (2025). engrXiv. https://engrxiv.org/preprint/view/5975
Booth, A.; Crouter, A. C. (Eds.). (2001). Does It Take a Village? Community Effects on Children, Adolescents, and Families. Psychology Press.
Song, Y.; Merlin, L.; Rodriguez, D. (2013). Comparing measures of urban land use mix. Computers, Environment and Urban Systems, 42, 1–13. https://doi.org/10.1016/j.compenvurbsys.2013.08.001
# Compute land-use mix indices on H3 hexagons lumi <- compute_lumi(code_muni = 2929057, cache = FALSE) # Compute land-use mix indices on user-provided polygons (neighborhoods of Lauro de Freitas-BA) # Using geobr to download neighborhood boundaries library(geobr) nei_ldf <- subset( read_neighborhood(year = 2022), code_muni == 2919207 ) lumi_poly <- compute_lumi( code_muni = 2919207, polygon_type = "user", polygon = nei_ldf, cache = FALSE )# Compute land-use mix indices on H3 hexagons lumi <- compute_lumi(code_muni = 2929057, cache = FALSE) # Compute land-use mix indices on user-provided polygons (neighborhoods of Lauro de Freitas-BA) # Using geobr to download neighborhood boundaries library(geobr) nei_ldf <- subset( read_neighborhood(year = 2022), code_muni == 2919207 ) lumi_poly <- compute_lumi( code_muni = 2919207, polygon_type = "user", polygon = nei_ldf, cache = FALSE )
Downloads and reads the CNEFE CSV file for a given IBGE municipality code, using the official IBGE FTP structure. The function relies on an internal index linking municipality codes to the corresponding ZIP URLs. Data are returned either as an Arrow Table (default) or as an sf object with SIRGAS 2000 coordinates.
read_cnefe( code_muni, year = 2022, verbose = TRUE, cache = TRUE, output = c("arrow", "sf") )read_cnefe( code_muni, year = 2022, verbose = TRUE, cache = TRUE, output = c("arrow", "sf") )
code_muni |
Integer. Seven-digit IBGE municipality code. |
year |
Integer. The CNEFE data year. Currently only 2022 is supported. Defaults to 2022. |
verbose |
Logical; if |
cache |
Logical; if |
output |
Character. Output format. |
When output = "arrow" (default), the function does not perform any spatial
conversion and simply returns the Arrow table. When output = "sf", the
function converts the result to an sf point object using the
LONGITUDE and LATITUDE columns, with CRS EPSG:4674 (SIRGAS 2000),
keeping these columns in the final object (remove = FALSE).
If output = "arrow", an arrow::Table containing all CNEFE records for
the given municipality.
If output = "sf", an sf object with point geometry in
EPSG:4674 (SIRGAS 2000), using the LONGITUDE and LATITUDE columns.
When cache = TRUE (the default), the downloaded ZIP file is stored in a
user-level cache directory specific to this package, created via
tools::R_user_dir() with which = "cache". This avoids re-downloading
the same municipality file across sessions.
When cache = FALSE, the ZIP file is stored in a temporary location and
removed when the function exits.
# Read CNEFE data as an Arrow table cnefe <- read_cnefe(code_muni = 2929057, cache = FALSE) # Read as an sf spatial object cnefe_sf <- read_cnefe(code_muni = 2929057, output = "sf", cache = FALSE)# Read CNEFE data as an Arrow table cnefe <- read_cnefe(code_muni = 2929057, cache = FALSE) # Read as an sf spatial object cnefe_sf <- read_cnefe(code_muni = 2929057, output = "sf", cache = FALSE)
tracts_to_h3() performs a dasymetric interpolation with the following steps:
census tract totals are allocated to CNEFE dwelling points inside each tract;
allocated values are aggregated to an H3 grid at a user-defined resolution.
The function uses DuckDB with the spatial and H3 extensions for the heavy work.
tracts_to_h3( code_muni, year = 2022, h3_resolution = 9, vars = c("pop_ph", "pop_ch"), cache = TRUE, verbose = TRUE )tracts_to_h3( code_muni, year = 2022, h3_resolution = 9, vars = c("pop_ph", "pop_ch"), cache = TRUE, verbose = TRUE )
code_muni |
Integer. Seven-digit IBGE municipality code. |
year |
Integer. The CNEFE data year. Currently only 2022 is supported. Defaults to 2022. |
h3_resolution |
Integer. H3 resolution (0 to 15). Defaults to 9. |
vars |
Character vector. Names of tract-level variables to interpolate. Supported variables:
For a reference table mapping these variable names to the official IBGE census tract codes and descriptions, see tracts_variables_ref. Allocation rules:
|
cache |
Logical. Whether to use the existing package cache for assets and CNEFE zips. |
verbose |
Logical. Whether to print step messages and timing. |
An sf object (CRS 4326) with an H3 grid and the requested interpolated variables.
# Interpolate population to H3 hexagons hex_pop <- tracts_to_h3( code_muni = 2929057, vars = c("pop_ph", "pop_ch"), cache = FALSE )# Interpolate population to H3 hexagons hex_pop <- tracts_to_h3( code_muni = 2929057, vars = c("pop_ph", "pop_ch"), cache = FALSE )
tracts_to_polygon() performs a dasymetric interpolation with the following steps:
census tract totals are allocated to CNEFE dwelling points inside each tract;
allocated values are aggregated to user-provided polygons (neighborhoods, administrative divisions, custom areas, etc.).
The function uses DuckDB with spatial extensions for the heavy work.
tracts_to_polygon( code_muni, polygon, year = 2022, vars = c("pop_ph", "pop_ch"), crs_output = NULL, cache = TRUE, verbose = TRUE )tracts_to_polygon( code_muni, polygon, year = 2022, vars = c("pop_ph", "pop_ch"), crs_output = NULL, cache = TRUE, verbose = TRUE )
code_muni |
Integer. Seven-digit IBGE municipality code. |
polygon |
An |
year |
Integer. The CNEFE data year. Currently only 2022 is supported. Defaults to 2022. |
vars |
Character vector. Names of tract-level variables to interpolate. Supported variables:
For a reference table mapping these variable names to the official IBGE census tract codes and descriptions, see tracts_variables_ref. Allocation rules:
|
crs_output |
The CRS for the output object. Default is |
cache |
Logical. Whether to use the existing package cache for assets and CNEFE zips. |
verbose |
Logical. Whether to print step messages and timing. |
An sf object with the user-provided polygons and the requested
interpolated variables. The output CRS matches the original polygon CRS
(or crs_output if specified).
# Interpolate population to user-provided polygons (neighborhoods of Lauro de Freitas-BA) # Using geobr to download neighborhood boundaries library(geobr) nei_ldf <- subset( read_neighborhood(year = 2022), code_muni == 2919207 ) poly_pop <- tracts_to_polygon( code_muni = 2919207, polygon = nei_ldf, vars = c("pop_ph", "pop_ch"), cache = FALSE )# Interpolate population to user-provided polygons (neighborhoods of Lauro de Freitas-BA) # Using geobr to download neighborhood boundaries library(geobr) nei_ldf <- subset( read_neighborhood(year = 2022), code_muni == 2919207 ) poly_pop <- tracts_to_polygon( code_muni = 2919207, polygon = nei_ldf, vars = c("pop_ph", "pop_ch"), cache = FALSE )
A data frame that maps variable names used in tracts_to_h3() and
tracts_to_polygon() to the official IBGE census tract dataset codes
and descriptions.
tracts_variables_reftracts_variables_ref
A data frame with 22 rows and 4 columns:
Variable name used in cnefetools functions.
Official IBGE variable code from the census tract aggregates.
Official IBGE variable description in Portuguese.
Name of the IBGE census tract table where the variable is found (Domicilios, Pessoas, or ResponsavelRenda).
IBGE - Censo Demografico 2022, Agregados por Setores Censitarios.
# View the reference table tracts_variables_ref # Find the IBGE code for a specific variable tracts_variables_ref[tracts_variables_ref$var_cnefetools == "pop_ph", ]# View the reference table tracts_variables_ref # Find the IBGE code for a specific variable tracts_variables_ref[tracts_variables_ref$var_cnefetools == "pop_ph", ]