Econometric framework
The workflow is outlined in detail in Fig. S10 of the Supplementary Information. To evaluate the economic and environmental outcomes associated with the Brazilian mining sector, we employed a robust econometric framework widely applied to study the socioeconomic implications of resource extraction33. This methodology has also been utilised to explore key drivers of economic growth51,52 and deforestation53,54. We applied a matching procedure to improve the balance between mining and non-mining municipalities, reducing statistical bias and mitigating model dependence55. By combining this balanced dataset with a comprehensive set of covariates in our regression design, we controlled for confounding factors – both observable and unobservable – that influence mining presence and its associated outcomes.
In light of the potential effects of mining activities on neighbouring areas, we aimed to account for the spatial spread of impacts across geographical locations. Combining well with the spatial nature of the mining data at hand, this study employed a spatial econometric approach. Spatial models explicitly consider the non-randomness of observations across space, thus addressing the bias and misleading inference that may result from spatial dependence56. Several applied contributions found strong evidence that socioeconomic and environmental observations were subject to spatial dependence at the regional level, including economic growth51,57,58,59 and deforestation54,60.
We employed panel-structure spatial models, accounting for spatial correlations and, at the same time, offering extended possibilities to consider time- or region-specific idiosyncratic effects61. Incorporating time-specific fixed effects, for example, accounts for factors influencing the dependent variables in specific years of the sample. The models can thereby consider trends affecting municipalities regardless of their exposure to mining, such as national (e.g., macroeconomic conditions, environmental and economic policies) and global (e.g., commodity price fluctuations) factors. We estimated the models using Bayesian methods following the standard Markov Chain Monte Carlo (MCMC) estimation framework as proposed for spatial econometrics62. The exact estimation procedure is presented in Supplementary Notes C. As a measure of uncertainty, we report 95% Bayesian credible intervals.
Economic growth
The underlying principle is to regress growth rates of countries or regions on income (usually GDP) at the initial period of a certain growth window as well as on a number of further determinants of growth. Typically, these include information on population growth, human capital stock and sectoral structure such as gross value added or employment across economic sectors51,58.
Following the literature on economic growth and spatial spillover52,58, and in line with the framework demonstrated for the Brazilian case59, we employed a panel-structure spatial Durbin model (SDM) of the form:
$${{{\bf{y}}}}_{t}=\rho {{\bf{W}}}{{{\bf{y}}}}_{t}+{{{\bf{X}}}}_{t}{{\boldsymbol{\beta }}}+{{{\bf{WX}}}}_{t}{{\boldsymbol{\theta }}}+{{{\boldsymbol{\xi }}}}_{t}+{{{\boldsymbol{\epsilon }}}}_{t},\quad {{{\boldsymbol{\epsilon }}}}_{t} \sim N({{\bf{0}}},{{\mathbf{\Omega }}}),{{\mathbf{\Omega }}}={\sigma }^{2}{{{\bf{I}}}}_{n},$$
(1)
where yt denotes an n × 1 vector of regional economic growth rates at time t. As advocated in earlier literature63, we used five-year periods as growth windows to smooth over short-term business cycle influences and calculated the respective average annual growth rates \({{{\bf{y}}}}_{t}=[\ln ({{{\bf{Y}}}}_{t+5})-\ln ({{{\bf{Y}}}}_{t})]/5 * 100\), with Yt denoting per capita GDP at time t (results were robust against variations in growth windows, see Fig. S7). Xt is an n × k matrix of k exogenous municipality characteristics in the initial period. These include prominent determinants of economic growth such as initial income, population density, education and indicators for industrial structure, but also information on mining activities, land use and land use change. We used interaction terms between binary mining indicators and yearly dummy variables in order to obtain year-specific effects of the presence of industrial and garimpo mining. Additionally, in order to obtain pooled effects for the time before and after 2010, respectively, we ran another model interacting the binary mining indicators with dummy variables of the respective period. We selected 2010 as the separation point based on our yearly coefficient results, which revealed significant pattern shifts, including a marked change in industrial mining’s GDP estimates and the conclusion of a period with particularly high absolute forest loss estimates for garimpos. The error term ϵt was assumed to follow a multivariate Normal distribution with zero mean and a diagonal variance-covariance matrix Ω with constant variance σ2. W is an n × n, non-negative, row-standardised spatial weights matrix. Its elements impose a structure of spatial dependence upon observational units, setting wii = 0 and wij > 0 if regions i and j are defined as neighbours (i, j = 1, …, n). The exact specification of W is presented and illustrated in Supplementary Notes B and Fig. S4. Characteristically for an SDM, the regression equation includes the spatially-lagged dependent variable Wyt as well as the spatially-lagged regional characteristics WXt as explanatory variables. The k × 1 vectors of the unknown parameters β and θ correspond to Xt and WXt respectively, and ρ (where the sufficient stability condition ∣ρ∣ < 1 is satisfied for row-standardised W62) is a scalar, measuring the magnitude of spatial autocorrelation. If ρ = 0, we obtain a growth regression model with spatial lags in X (SLX), where regional growth rates are independent, but WX is still considered. The model collapses into a classical linear model in the case where both ρ = 0 and θ = 0. Finally, the model considers a time-specific constant ξt, capturing year-specific confounding factors such as commodity price dynamics and domestic business cycles.
Forest loss
This model type was designed to assess the effects of mining on forest loss, where again we used municipalities as observation units. Forest loss is expected to be subject to considerable spatial spillover53,54,60, which is why we employed an SDM of the form
$${\tilde{{{\bf{y}}}}}_{t}=\lambda {{\bf{W}}}{\tilde{{{\bf{y}}}}}_{t}+{\tilde{{{\bf{X}}}}}_{t}{{\boldsymbol{\delta }}}+{{\bf{W}}}{\tilde{{{\bf{X}}}}}_{t}{{\boldsymbol{\gamma }}}+{{{\boldsymbol{\nu }}}}_{t}+{{{\boldsymbol{\mu }}}}_{t},\quad {{{\boldsymbol{\mu }}}}_{t} \sim N({{\bf{0}}},\tilde{{{\mathbf{\Omega }}}}),\tilde{{{\mathbf{\Omega }}}}={\tilde{\sigma }}^{2}{{{\bf{I}}}}_{n},$$
(2)
where the dependent variable \({\tilde{{{\bf{y}}}}}_{t}\) denotes a vector of cleared land within each municipality. In the \(n\times \tilde{k}\) matrix \({\tilde{{{\bf{X}}}}}_{t}\) we considered economic growth directly as a control variable instead of including the full set of growth determinants. Other control variables remained the same as in the growth specification, because most determinants of economic growth overlap with indicators used for explaining forest loss53. Mining again entered the model in the form of interaction terms between binary mining indicators and year and period dummy variables, respectively. Similar to the growth model, νt denotes a time-specific constant and we again assumed a normally distributed error term μt with constant variance \({\tilde{\sigma }}^{2}\). The k × 1 vectors δ and γ correspond to \({\tilde{{{\bf{X}}}}}_{t}\) and \({{\bf{W}}}{\tilde{{{\bf{X}}}}}_{t}\) respectively and λ is the spatial coefficient. The spatial weights matrix W and the properties of the spatial model remain the same as in Equation (1).
Direct and spillover impacts
Assuming independence of observations, the estimation coefficients of conventional (non-spatial) linear models can be typically interpreted as marginal changes in the dependent variable due to shifts in one of the explanatory variables. In this regard, spatial models require additional steps because we explicitly impose dependence among observations, implying that the partial derivatives of the dependent variable in region i with respect to an explanatory variable in region j are potentially non-zero and therefore cause feedback effects. Calculating average direct, indirect (i.e., spillover) and total impacts was proposed as a solution to this issue62: First, transforming Equation (1) (without loss of generality, the same derivation holds for Equation (2)) to
$${{{\bf{y}}}}_{t}={({{{\bf{I}}}}_{n}-\rho {{\bf{W}}})}^{-1}({{{\bf{X}}}}_{t}{{\boldsymbol{\beta }}}+{{{\bf{WX}}}}_{t}{{\boldsymbol{\theta }}}+{{{\boldsymbol{\xi }}}}_{t}+{{{\boldsymbol{\epsilon }}}}_{t}),$$
(3)
we derive n2 partial derivatives of a particular explanatory variable k as
$$\frac{\partial {y}_{i}}{\partial {x}_{jk}}={{{\bf{S}}}}_{k}{({{\bf{W}}})}_{ij}={({{{\bf{I}}}}_{n}-\rho {{\bf{W}}})}^{-1}{({{{\bf{I}}}}_{n}{\beta }_{k}+{{\bf{W}}}{\theta }_{k})}_{ij},$$
(4)
where infinite feedback effects are captured through the spatial multiplier \({({{{\bf{I}}}}_{n}-\rho {{\bf{W}}})}^{-1}\). The impact matrix is then summarised by calculating the average total effect as the average over all entries in Sk(W)ij, the average direct effect as the average when only considering its main diagonal, and the average indirect effect as the difference between the two. An interpretation of average direct effects is then given by the average response of the dependent to the independent variables over the sample of observations and hence similar to regression coefficients from classical linear models. The average spillover can be interpreted as the cumulative average response of a region’s dependent variable to a marginal change in an explanatory characteristic across all other regions.
Data
We compiled a balanced panel dataset covering 5262 Brazilian municipalities over the period 2005–2020. Calculating five-year average growth rates, this resulted in 57,882 observations prior to matching. Data were sourced from multiple databases and, where necessary, aggregated to the municipality level. Municipalities, the smallest administrative divisions in Brazil, occasionally undergo boundary changes due to splitting or merging, resulting in a variable total count over time. In order to keep a balanced panel with a constant number of spatial observations, we followed previous research59 and only considered municipalities with unchanged geographical extent over the sample period. Table S1 provides an overview of the variables used in this analysis.
Dependent variables
The dependent variable in the growth models was the five-year average annual growth rate of GDP per capita, which was computed from yearly per capita GDP in BRL at current purchasing power parities as reported by the Brazilian Institute for Geography and Statistics, IBGE34,35. In the last year of the panel, 2015, this measure therefore comprises economic growth between 2015 and 2020. We selected five-year growth windows as a suitable measure for mid-term economic effects63. We were aware that other studies emphasise poverty and distributional effects of mining45,64. However, we needed to resort to GDP growth in our study because alternative socioeconomic indicators that would allow for a broader understanding of human well-being are difficult to obtain for this level of geographical detail, especially for a yearly panel. GDP per capita, therefore, served as a key indicator for economic development, despite its limitations, such as only covering market transactions and not describing income distribution.
The municipalities included in the analysis varied in size, ranging from 3.57 km2 to 159,533 km2 (mean = 1541 km2, sd = 5683 km2). For the forest loss models, we used two measures of the dependent variable to capture the reduction in natural forest formation: (a) annual relative change, expressed in ha per km2, and (b) annual change in absolute ha. The data was calculated from municipality-level land cover statistics as provided by the MapBiomas project18. In contrast to the five-year windows used in the economic growth specification, forest cover changes were examined annually. This decision reflected the assumption that deforestation is typically driven by immediate events, with no meaningful recovery periods, making adjustments for business cycles unnecessary.
Mining indicators
The essential municipality characteristic for this study was the presence of mining activities. Mining entered the models as binary indicators for the presence of industrial and garimpo mining within a municipality in a certain year. Detailed yearly geospatial data on Brazilian land area covered by mining was taken from MapBiomas18. We transformed their continuous metric (ha per municipality) to a binary variable for simpler interpretation of effects. In the Supplementary Information, we show that alternative model specifications, which incorporate the mining area in ha, do not alter our main conclusions (Figs. S5 and S6).
Instead of land cover information about mining, the CFEM tax would have been another indicator for active mining activities. However, we refrained from using the CFEM as an explanatory variable, as the tax income directly enters municipality GDP, i.e., the dependent variable, creating identification issues in the econometric model.
Covariates and potential confounding factors
We accounted for a broad set of covariates in the design matrices X and \(\tilde{{{\bf{X}}}}\) to better isolate the effect of mining on GDP and forest cover by controlling for confounding factors that could simultaneously influence these outcomes and the presence of mining. By addressing such confounding variables, we sought to reduce bias and strengthen the robustness of our findings. The selected covariates reflect key local environmental, economic, and sociodemographic conditions, which are further detailed in the text. A comprehensive list of these covariates, along with the rationale for their inclusion, is provided in Tables S2 and S3.
We considered land use change dynamics as control variables in all models. Using satellite data on the conversion of land, e.g., from natural forest formation to pasture or from grassland to agriculture, is an efficient approach for observing economic activity and environmental transformation at the same time. Such patterns act as useful proxies for underlying factors, including fertile soils, residential development, or conservation, which may confound our analysis by simultaneously influencing mine development (e.g., fertile soils may attract land uses that compete with mining) and the outcome variables of our interest. Our data was obtained from MapBiomas18, providing yearly land transition information from 30 m resolution satellite images aggregated to the level of Brazilian municipalities. We utilised land cover classifications at the second sub-categorical level and considered forest formation and forest plantation for the case of forest, grassland as non-forest natural formation, and agriculture and pasture for farming. Other categories such as wetlands, non-vegetated areas and bodies of water were omitted since they had minor relevance for our analysis. In order to be consistent with the five-year GDP growth horizon, we computed the average change in ha over five years. Land use change from any category to forest formation was not considered as a covariate, because it marks a transformation that is only viable over a longer time horizon.
Initial land cover was considered as a proxy for the land cover conditions at the beginning of either a window of GDP growth or a one-year forest loss period. We again used data from MapBiomas18. In order to reflect the variation in municipality area, this variable entered the models as shares of natural forest, forest plantation, grassland, agriculture and pasture relative to the total municipality area.
The remaining covariates were motivated by economic growth theory (see the respective literature below), and by following a meta-analysis for the case of the forest loss models53. While some of these variables are not obvious confounders – i.e., they may influence the dependent variables without affecting mining expansion, or they may only exert an indirect influence via other channels – their inclusion is expected to improve the precision of effect estimates by accounting for additional variation in the outcomes. We considered initial income in terms of per capita GDP in the initial year of a growth window as a proxy for physical capital, which is a major determinant of economic growth in the neoclassical growth framework65. A negative relationship between the initial stock of physical capital and economic growth, which is explained by diminishing returns to capital accumulation, is a well-established stylised fact in the empirical literature known as the convergence hypothesis. In addition, a number of studies show that the convergence hypothesis holds for direct impacts in spatial econometric growth frameworks, while spillover effects from the flows of capital, goods, knowledge, and people between regions are shown to be positive, implying that relatively poorer regions benefit from having highly capitalised neighbours57.
Endogenous growth theory highlights the role of human capital as a key driver of innovation processes such as technological change66,67. However, whether indirect effects are positive or negative is uncertain because positive economic effects from knowledge spillover and brain drain channels may counteract each other. The role of the labour market in shaping mining activity is similarly ambiguous and likely context-dependent. Higher levels of human capital may attract industrial mining by reducing training costs and increasing productivity, while simultaneously discouraging informal or artisanal mining, as more educated populations are likely to pursue alternative livelihoods. We proxied human capital using the FIRJAN education index36, an index for Brazilian municipalities on a scale from 0 (worst) to 1 (best), measuring both schooling coverage and quality. The education index was only available from 2005, constraining our sample to this starting year.
Population growth was another component taken from the neoclassical growth framework. Following this theory, a positive impact of population growth would hold for absolute income growth at the national scale, but not for the growth of per capita income due to capital dilution. Therefore, unless higher output exceeds population growth, we would expect a negative effect. For subnational entities, this relationship is unclear, because one part of the population dynamics is migration patterns, which may vary across scale levels59. We obtained population counts per municipality from the IBGE and computed population growth again at five-year average rates. Population counts for 2007 and 2010 were interpolated due to missing data.
In line with numerous other studies, we used population density as a proxy for agglomeration externalities59. Population agglomeration effects have been considered in the economic geography literature. Denser populated (i.e., urban) regions are associated with positive effects on productivity growth, because they show higher rates of technological progress68. However, this relationship may not hold for relatively poor districts in countries where strong urbanisation is driven by extensive population growth without substantially affecting labour productivity.
We followed previous research58 and included the gross value added (GVA) in the agriculture, industry and service sectors as control variables in order to proxy the industrial structure of municipalities. These variables help account for heterogeneities in economic growth and forest cover dynamics, as well as underlying processes such as land competition and local material demand, which may either facilitate or constrain the development of mining activities.
The forest loss regressions followed a similar structure as the growth regression, with small adjustments. Instead of initial income, population growth and density, human capital and the sectoral mix variables, we directly used the five-year average annual economic growth rates as a proxy for economic activity. The empirical literature is inconclusive regarding the direction of the effect that income may have on forest cover53, yet this proxy subsumes a set of deforestation drivers that are related to any other anthropogenic activity besides the mining and other land use change effects. Forest cover change accounts were only included in the GDP growth models, as they effectively define the dependent variables in the forest loss models.
Precipitation and elevation were included as final control variables for economic growth and forest loss. These biophysical characteristics capture key spatial constraints that may influence GDP growth and forest cover, as well as factors relevant to mining presence or expansion, such as accessibility, land development potential, and susceptibility to wildfires or erosion. The data was compiled using the dplyr69, tidyr70, readxl71, stringi72, sf73, raster74, geobr75, exactextractr76 and elevatr77 R78 packages.
While we have carefully accounted for a comprehensive set of covariates, it is important to acknowledge that residual confounding may still persist. Unobserved or complex factors, which are difficult to capture in the model, could influence the relationships between mining, GDP, and forest cover. Nevertheless, the inclusion of time-fixed effects and the longitudinal panel design that tracks outcomes over time, helps mitigate the potential for such unobserved heterogeneity61. Moreover, the large sample size and temporal coverage of the data add confidence in the reliability of our estimates.
Statistical matching
To mitigate confounding and improve the robustness of our estimates, we employed coarsened exact matching (CEM), which approximates randomisation by grouping treated and control units based on similar characteristics55. CEM offers several advantages over other matching techniques, including its ability to handle high-dimensional covariates without relying on strong parametric assumptions. It also provides a better balance between the treated and control groups compared to other methods. The matching was implemented in R using the cem package79.
We applied a one-to-many matching approach using variables population density, natural forest cover, precipitation, elevation and municipality area to account for factors influencing the development of mining while preserving a sufficiently large sample. Matching was conducted separately for industrial and garimpo mining municipalities. As a result, the number of industrial mining observations was reduced from 4430 to 3923, while garimpo mining observations decreased from 2753 to 2356. The matched control groups comprised 42,646 and 35,156 observations for industrial and garimpo mining, respectively (Table S4). After further refinement to ensure a balanced panel, the final samples consisted of 33,836 and 23,727 observations for industrial and garimpo mining, respectively, as reported in the regression outputs in Tables S5–S16.
While the matching procedure enhanced balance between treated and control groups by excluding cases poorly suited for comparison, some differences remained – most notably in forest cover, where certain mining municipalities exhibited higher initial levels. This residual imbalance is addressed in the econometric models by incorporating initial forest cover as a covariate. The remaining imbalance is partly due to the necessity of retaining spatial information. Unlike studies that apply one-to-one matching32, we chose a broader approach to preserve neighbouring municipalities and the overall spatial structure. Our matching strategy also follows recommendations for CEM80, prioritising improved balance while maintaining a sufficiently large sample. Despite these limitations, the matching procedure enhanced the comparability between groups, reducing bias and strengthening the reliability of our findings, as shown in Fig. S11.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.