Pareto Package Vignette

Introduction

The (European) Pareto distribution is probably the most popular distribution for modeling large losses in reinsurance pricing. There are good reasons for this popularity, which are discussed in detail in Fackler (2013). We recommend Philbrick (1985) and Schmutz et.al. (1998) for an impression of how the (European) Pareto distribution is applied in practice.

In cases where the Pareto distribution is not flexible enough, pricing actuaries sometimes use piecewise Pareto distributions. For instance, a Pareto alpha of 1.5 is used to model claim sizes between USD 1M and USD 5M and an alpha of 2.5 is used above USD 5M. A particularly useful and non-trivial application of the piecewise Pareto distribution is that it can be used to match a tower of expected layer losses with a layer independent collective loss model. Details are described in Riegel (2018), who also provides a matching algorithm that works for an arbitrary number of reinsurance layers.

The package provides a tool kit for the Pareto, the piecewise Pareto and the generalized Pareto distribution, which is useful for pricing of reinsurance treaties. In particular, the package provides the matching algorithm for layer losses.

Pareto distribution

Definition: Let t > 0 and α > 0. The Pareto distribution Pareto(t, α) is defined by the distribution function $$ F_{t,\alpha}(x):=\begin{cases} 0 & \text{ for $x\le t$} \\ \displaystyle 1-\left(\frac{t}{x}\right)^{\alpha} & \text{ for $x>t$.} \end{cases} $$ This version of the Pareto distribution is also known as Pareto type I, European Pareto or single-parameter Pareto.

Distribution function and density

The functions pPareto and dPareto provide the distribution function and the density function of the Pareto distribution:

library(Pareto)
x <- c(1:10) * 1000
pPareto(x, 1000, 2)

##  [1] 0.0000000 0.7500000 0.8888889 0.9375000 0.9600000 0.9722222 0.9795918
##  [8] 0.9843750 0.9876543 0.9900000

plot(pPareto(1:5000, 1000, 2), xlab = "x", ylab = "CDF(x)")

dPareto(x, 1000, 2)

##  [1] 2.000000e-03 2.500000e-04 7.407407e-05 3.125000e-05 1.600000e-05
##  [6] 9.259259e-06 5.830904e-06 3.906250e-06 2.743484e-06 2.000000e-06

plot(dPareto(1:5000, 1000, 2), xlab = "x", ylab = "PDF(x)")

The package also provides the quantile function:

qPareto(0:10 / 10, 1000, 2)

##  [1] 1000.000 1054.093 1118.034 1195.229 1290.994 1414.214 1581.139 1825.742
##  [9] 2236.068 3162.278      Inf

Simulation:

rPareto(20, 1000, 2)

##  [1] 1637.111 1104.491 3036.419 1347.433 1335.077 1340.041 1054.705 2760.346
##  [9] 1303.541 1246.236 1326.101 1270.913 1007.516 1309.207 1761.155 3028.445
## [17] 1202.336 1212.207 2160.266 1263.376

Layer mean:

Let X ∼ Pareto(t, α) and a, c ≥ 0. Then E(min [c, max (X − a, 0)]) = ∫_a^c + a(1 − F_t, α(x)) dx = : I_t, α^{c
xs a} is the layer mean of c xs a, i.e. the expected loss to the layer given a single loss X.

Example: t = 500, α = 2, Layer 4000 xs 1000

Pareto_Layer_Mean(4000, 1000, 2, t = 500)

## [1] 200

Layer variance:

Let X ∼ Pareto(t, α) and a, c ≥ 0. Then the variance of the layer loss min [c, max (X − a, 0)] can be calculated with the function Pareto_Layer_Var.

Example: t = 500, α = 2, Layer 4000 xs 1000

Pareto_Layer_Var(4000, 1000, 2, t = 500)

## [1] 364719

Why is the Pareto distribution so popular?

Lemma:

Let X ∼ Pareto(t, α). Then cX ∼ Pareto(ct, α) for all c > 0.
Let X ∼ Pareto(t₁, α). For t₂ > t₁ we then have X|(X > t₂) ∼ Pareto(t₂, α)

Consequences:

The Pareto alpha is invariant wrt scaling (which implies that α does not depend on currencies and inflation)
For Pareto distributed data the Pareto alpha does not depend on the reporting threshold
For layers and thresholds above t the ratio between expected layer losses and/or excess frequencies depends only on α (and not on t)

Pareto extrapolation

Consider two layers c_i xs a_i and a Pareto(t, α) distributed severity with sufficiently small t. What is the expected loss of c₂ xs a₂ given the expected loss of c₁ xs a₁?

Example: Assume α = 2 and the expected loss of 4000 xs 1000 is 500. Calculate the expected loss of the layer 5000 xs 5000.

Pareto_Extrapolation(4000, 1000, 5000, 5000, 2) * 500

## [1] 62.5

Pareto_Extrapolation(4000, 1000, 5000, 5000, 2, ExpLoss_1 = 500)

## [1] 62.5

Pareto alpha between two layers:

Given the expected losses of two layers, there is typically a unique Pareto alpha α which is consistent with the ratio of the expected layer losses.

Example: Expected loss of 4000 xs 1000 is 500. Expected loss of 5000 xs 5000 is 62.5. Alpha between the two layers:

Pareto_Find_Alpha_btw_Layers(4000, 1000, 500, 5000, 5000, 62.5)

## [1] 2

Check: see previous example

Pareto alpha between a frequency and layer:

Given the expected excess frequency at a threshold and the expected loss of a layer, then there is typically a unique Pareto alpha α which is consistent with this data.

Example: Expected frequency in excess of 500 is 2.5. Expected loss of 4000 xs 1000 is 500. Alpha between the frequency and the layer:

Pareto_Find_Alpha_btw_FQ_Layer(500, 2.5, 4000, 1000, 500)

## [1] 2

Check:

Pareto_Layer_Mean(4000, 1000, 2, t = 500) * 2.5

## [1] 500

Matching the expected losses of two layers:

Given the expected losses of two layers, we can use these techniques to obtain a Poisson-Pareto model which matches the expected loss of both layers.

Example: Expected loss of 30 xs 10 is 26.66 (Burning Cost). Expected loss of 60 xs 40 is 15.95 (Exposure model).

Pareto_Find_Alpha_btw_Layers(30, 10, 26.66, 60, 40, 15.95)

## [1] 1.086263

Frequency @ 10:

26.66 / Pareto_Layer_Mean(30, 10, 1.086263)

## [1] 2.040392

A collective model $\sum_{n=1}^NX_n$ with X_N ∼ Pareto(10, 1.09) and N ∼ Poisson(2.04) matches both expected layer losses.

Frequency extrapolation and alpha between frequencies:

Given the frequency f₁ in excess of t₁ the frequency f₂ in excess of t₂ can directly be calculated as follows: $$ f_2 = f_1 \cdot \left(\frac{t_1}{t_2}\right)^\alpha $$ Vice versa, we can calculate the Pareto alpha, if the two excess frequencies f₁ and f₂ are given: $$ \alpha = \frac{\log(f_2/f_1)}{\log(t_1/t_2)}. $$

Example:

Expected frequency excess 1000 is 2. What is the expected frequency excess 4000 if we have a Pareto alpha of 2.5?

t_1 <- 1000
f_1 <- 2
t_2 <- 4000
(f_2 <- f_1 * (t_1 / t_2)^2.5)

## [1] 0.0625

Vice versa:

Pareto_Find_Alpha_btw_FQs(t_1, f_1, t_2, f_2)

## [1] 2.5

Maximum likelihood estimation of the parameter alpha

For i = 1, …, n let X_i ∼ Pareto(t, α) be Pareto distributed observations. Then we have the ML estimator $$ \hat{\alpha}^{ML}=\frac{n}{\sum_{i=1}^n\log(X_i/t)}. $$ Example:

Pareto distributed losses with a reporting threshold of t = 1000 and α = 2:

losses <- rPareto(1000, t = 1000, alpha = 2)
Pareto_ML_Estimator_Alpha(losses, t = 1000)

## [1] 2.087663

In reinsurance, sometimes large loss data from different sources are used for severity fits. Then the losses are typically only available in excess of certain reporting thresholds which may vary by data source. Assume that two data sources each contain 5000 losses in excess of 1000, which are Pareto distributed with an alpha of 2 but from data source 2 we only know the losses exceeding a reporting threshold of 3000. If we apply the standard ML estimator with a threshold of 1000, then we obtain an alpha which is too low, since we ignore that the loss data is not complete in excess of 1000:

losses_1 <- rPareto(5000, t = 1000, alpha = 2)
losses_2 <- rPareto(5000, t = 1000, alpha = 2)
reported <- losses_2 > 3000
losses_2 <- losses_2[reported]
losses <- c(losses_1, losses_2)
Pareto_ML_Estimator_Alpha(losses, t = 1000)

## [1] 1.64314

In the function Pareto_ML_Estimator_Alpha the user can define reporting threshold for each loss in order to handle this situation:

reporting_thresholds_1 <- rep(1000, length(losses_1))
reporting_thresholds_2 <- rep(3000, length(losses_2))
reporting_thresholds <- c(reporting_thresholds_1, reporting_thresholds_2)
Pareto_ML_Estimator_Alpha(losses, t = 1000, reporting_thresholds = reporting_thresholds)

## [1] 1.998982

Now, assume that the underlying policies have limits of 5000 or 10000 and that a loss is censored if it exceeds the respective limit. If the underlying losses are Pareto distributed before they are censored then ML estimation leads to a too large value for alpha:

limits <- sample(c(5000, 10000), length(losses), replace = T)
censored <- losses > limits
losses[censored] <- limits[censored]
reported <- losses > reporting_thresholds
losses <- losses[reported]
reporting_thresholds <- reporting_thresholds[reported]
Pareto_ML_Estimator_Alpha(losses, t = 1000, reporting_thresholds = reporting_thresholds)

## [1] 2.102217

In order to deal with this situation the function allows to specify for each loss if it is censored or not:

Pareto_ML_Estimator_Alpha(losses, t = 1000, reporting_thresholds = reporting_thresholds, 
                          is.censored = censored)

## [1] 2.002165

Truncation

Let X ∼ Pareto(t, α) and T > t. Then X|(X < T) has a truncated Pareto distribution. The Pareto functions mentioned above are also available for the truncated Pareto distribution.

Piecewise Pareto distribution

Definition: Let t := (t₁, …, t_n) be a vector of thresholds with 0 < t₁ < … < t_n < t_n + 1 := +∞ and let α := (α₁, …, α_n) be a vector of Pareto alphas with α_i ≥ 0 and α_n > 0. The piecewise Pareto distribution} PPareto(t, α) is defined by the distribution function $$ F_{\mathbf{t},\boldsymbol\alpha}(x):=\begin{cases} 0 & \text{ for $x<t_1$} \\ \displaystyle 1-\left(\frac{t_{k}}{x}\right)^{\alpha_k}\prod_{i=1}^{k-1}\left(\frac{t_i}{t_{i+1}}\right)^{\alpha_i} & \text{ for $x\in [t_k,t_{k+1}).$} \end{cases} $$

The family of piecewise Pareto distributions is very flexible:

Proposition: The set of Piecewise Pareto distributions is dense in the space of all positive-valued distributions (with respect to the Lévy metric).

This means that we can approximate any positive valued distribution as good as we want with piecewise Pareto. A very good approximation typically comes at the cost of many Pareto pieces. Piecewise Pareto is often a good alternative to a discrete distribution, since it is much better to handle!

The Pareto package also provides functions for the piecewise Pareto distribution. For instance:

Distribution function

x <- c(1:10) * 1000
t <- c(1000, 2000, 3000, 4000)
alpha <- c(2, 1, 3, 20)
pPiecewisePareto(x, t, alpha)

##  [1] 0.0000000 0.7500000 0.8333333 0.9296875 0.9991894 0.9999789 0.9999990
##  [8] 0.9999999 1.0000000 1.0000000

plot(pPiecewisePareto(1:5000, t, alpha), xlab = "x", ylab = "CDF(x)")

Density

dPiecewisePareto(x, t, alpha)

##  [1] 2.000000e-03 1.250000e-04 1.666667e-04 3.515625e-04 3.242592e-06
##  [6] 7.048328e-08 2.768239e-09 1.676381e-10 1.413089e-11 1.546188e-12

plot(dPiecewisePareto(1:5000, t, alpha), xlab = "x", ylab = "PDF(x)")

Simulation

rPiecewisePareto(20, t, alpha)

##  [1] 4388.814 3166.964 1132.868 1131.996 1376.962 1359.590 1434.926 2976.984
##  [9] 1018.649 1074.598 4120.283 3482.416 3379.459 1655.500 4024.778 3055.142
## [17] 4163.687 1146.321 3003.248 4252.189

Layer mean

PiecewisePareto_Layer_Mean(4000, 1000, t, alpha)

## [1] 826.6969

Layer variance

PiecewisePareto_Layer_Var(4000, 1000, t, alpha)

## [1] 922221.2

Maximum likelihood estimation of the alphas

Let t := (t₁, …, t_n) be a vector of thresholds and let α := (α₁, …, α_n) be a vector of Pareto alphas. For i = 1, …, n let X_i ∼ PPareto(t, α). If the vector t is known, then the parameter vector α can be estimated with maximum likelihood.

Example:

Piecewise Pareto distributed losses with t := (1000, 2000, 3000) and α := (1, 2, 3):

losses <- rPiecewisePareto(10000, t = c(1000, 2000, 3000), alpha = c(1, 2, 3))
PiecewisePareto_ML_Estimator_Alpha(losses, c(1000, 2000, 3000))

## [1] 0.9967721 2.0307044 2.9959546

Reporting thresholds and censoring of losses can be taken into account as described for the function Pareto_ML_Estimator_Alpha.

losses_1 <- rPiecewisePareto(5000, t = c(1000, 2000, 3000), alpha = c(1, 2, 3))
losses_2 <- rPiecewisePareto(5000, t = c(1000, 2000, 3000), alpha = c(1, 2, 3))
reported <- losses_2 > 3000
losses_2 <- losses_2[reported]
losses <- c(losses_1, losses_2)
PiecewisePareto_ML_Estimator_Alpha(losses, c(1000, 2000, 3000))

## [1] 0.7692198 1.2312989 2.9916150

reporting_thresholds_1 <- rep(1000, length(losses_1))
reporting_thresholds_2 <- rep(3000, length(losses_2))
reporting_thresholds <- c(reporting_thresholds_1, reporting_thresholds_2)
PiecewisePareto_ML_Estimator_Alpha(losses, c(1000, 2000, 3000), 
                                   reporting_thresholds = reporting_thresholds)

## [1] 1.002148 2.027767 2.991615

limits <- sample(c(2500, 5000, 10000), length(losses), replace = T)
censored <- losses > limits
losses[censored] <- limits[censored]
reported <- losses > reporting_thresholds
losses <- losses[reported]
reporting_thresholds <- reporting_thresholds[reported]
censored <- censored[reported]
PiecewisePareto_ML_Estimator_Alpha(losses, c(1000, 2000, 3000), 
                                   reporting_thresholds = reporting_thresholds)

## [1] 1.002148 2.938977 3.447913

PiecewisePareto_ML_Estimator_Alpha(losses, c(1000, 2000, 3000), 
                                   reporting_thresholds = reporting_thresholds, 
                                   is.censored = censored)

## [1] 1.002148 2.066754 3.057009

Truncation

The package also provides truncated versions of the piecewise Pareto distribution. There are two options available:

truncation_type = 'lp': Below the largest threshold t_n, the distribution function equals the distribution of the piecewise Pareto distribution without truncation. The last Pareto piece, however, is truncated at truncation
truncation_type = 'wd': The whole piecewise Pareto distribution is truncated at `truncation’

Matching a tower of layer losses

The Pareto distribution can be used to build a collective model which matches the expected loss of two layers. We can use piecewise Pareto if we want to match the expected loss of more than two layers.

Consider a sequence of attachment points 0 < a₁ < … < a_n < a_n + 1 := +∞. Let c_i := a_i + 1 − a_i and let e_i be the expected loss of the layer c_i xs a_i. Moreover, let f₁ be the expected frequency in excess of a₁.

The following matching algorithm uses one Pareto piece per layer and is straight forward:

Calculate the Pareto alpha α₁ between the excess frequency f₁ and the layer c₁ xs a₁
Calculate the frequency f₂ in excess of a₂: f₂ := (a₁/a₂)^α₁ ⋅ f₁
Calculate the Pareto alpha α₂ between the excess frequency f₂ and the layer c₂ xs a₂
Calculate the frequency f₃ in excess of a₃: f₃ := (a₂/a₃)^α₂ ⋅ f₃
…
Use a collective model $\sum_{n=1}^NX_n$ with E(N) = f₁ and X_n ∼ PPareto(t, α).

This approach always works for three layers, but it often does not work if we have three or more layers. For instance, Riegel (2018) shows that it does not work for the following example:

i	Cover c_i	Att. Pt. a_i	Exp. Loss e_i	Rate on Line e_i/c_i
1	500	1000	100	0.20
2	500	1500	90	0.18
3	500	2000	50	0.10
4	500	2500	40	0.08

The Pareto package provides a more complex matching approach that uses two Pareto pieces per layer. Riegel (2018) shows that this approach works for an arbitrary number of layers with consistent expected losses.

Example:

attachment_points <- c(1000, 1500, 2000, 2500, 3000)
exp_losses <- c(100, 90, 50, 40, 100)
fit <- PiecewisePareto_Match_Layer_Losses(attachment_points, exp_losses)
fit

## 
## Panjer & Piecewise Pareto model
## 
## Collective model with a Poisson distribution for the claim count and a Piecewise Pareto distributed severity.
## 
## Poisson Distribution:
## Expected Frequency:   0.2136971
## 
## Piecewise Pareto Distribution:
## Thresholds:         1000   1500   1932.059   2000   2147.531   2500   2847.756   3000
## Alphas:              0.3091209   0.1753613   9.685189   3.538534   0.817398   0.7663698   5.086828   2.845488
## The distribution is not truncated.
## 
## Status:               0
## Comments:             OK

The function PiecewisePareto_Match_Layer_Losses returns a PPP_Model object (PPP stands for Panjer & Piecewise Pareto) which contains the information required to specify a collective model with a Panjer distributed claim count and a piecewise Pareto distributed severity. The results can be checked using the attributes FQ, t and alpha of the object:

c(PiecewisePareto_Layer_Mean(500, 1000, fit$t, fit$alpha) * fit$FQ,
  PiecewisePareto_Layer_Mean(500, 1500, fit$t, fit$alpha) * fit$FQ,
  PiecewisePareto_Layer_Mean(500, 2000, fit$t, fit$alpha) * fit$FQ,
  PiecewisePareto_Layer_Mean(500, 2500, fit$t, fit$alpha) * fit$FQ,
  PiecewisePareto_Layer_Mean(Inf, 3000, fit$t, fit$alpha) * fit$FQ)

## [1] 100  90  50  40 100

There are, however, functions which can directly use PPP_Models:

covers <- c(diff(attachment_points), Inf)
Layer_Mean(fit, covers, attachment_points)

## [1] 100  90  50  40 100

Matching reference information

The function PiecewisePareto_Match_Layer_Losses can be used to match the expected losses of a complete tower of layers. If we want to match the expected losses of some reference layers which do not form a complete tower then it is more convenient to use the function Fit_References. Also excess frequencies can be provided as reference information. The function can be seen as a user interface for PiecewisePareto_Match_Layer_Losses:

  covers <- c(1000, 1000, 1000)
  att_points <- c(1000, 2000, 5000)
  exp_losses <- c(100, 50, 10)
  thresholds <- c(4000, 10000)
  fqs <- c(0.04, 0.005)
  fit <- Fit_References(covers, att_points, exp_losses, thresholds, fqs)
  Layer_Mean(fit, covers, att_points)

## [1] 100  50  10

  Excess_Frequency(fit, thresholds)

## [1] 0.040 0.005

If the package lpSolve is installed then the funcion Fit_References can handle ovelapping layers.

Interpolation of PML curves

The function Fit_PML_Curve can be used fit a PPP_Model that reproduces and interpolates the information provided in the PML curve. A PML curve is a table containing return periods and the corresponding loss amounts:

i	Return Period r_i	Amount x_i
1	1	1000
2	5	4000
3	10	7000
4	20	10000
5	50	13000
6	100	14000

The information contained in such a PML curve can be used to create a PPP_Model that has the expected excess frequency 1/r_i at x_i.

Example:

return_periods <- c(1, 5, 10, 20, 50, 100)
amounts <- c(1000, 4000, 7000, 10000, 13000, 14000)
fit <- Fit_PML_Curve(return_periods, amounts)
1 / Excess_Frequency(fit, amounts)

## [1]   1   5  10  20  50 100

PPP_Models (Panjer & Piecewise Pareto Models)

A PPP_Model object contains the information required to specify a collective model with a Panjer distributed claim count and a piecewise Pareto distributed severity.

Claim count distribution: The Panjer class contains the binomial distribution, the Poisson distribution and the negative binomial distribution. The distribution of the claim count N is specified by the expected frequency E(N) (attribute FQ of the object) and the dispersion D(N) := Var(N)/E(N) (attribute dispersion of the object). We have the following cases:

dispersion < 1: binomial distribution
dispersion = 1: Poisson distribution
dispersion > 1: negative binomial distribution.

Severity distribution: The piecewise Pareto distribution is specified by the vectors t, alpha, truncation and truncation_type.

The function PiecewisePareto_Match_Layer_Losses returns PPP_Model object. Such an object can also be directly created using the constructor function:

PPPM <- PPP_Model(FQ = 2, t = c(1000, 2000), alpha = c(1, 2), 
                  truncation = 10000, truncation_type = "wd", dispersion = 1.5)
PPPM

## 
## Panjer & Piecewise Pareto model
## 
## Collective model with a Negative Binomial distribution for the claim count and a Piecewise Pareto distributed severity.
## 
## Negative Binomial Distribution:
## Expected Frequency:   2
## Dispersion:           1.5 (i.e. contagion = 0.25)
## 
## Piecewise Pareto Distribution:
## Thresholds:         1000   2000
## Alphas:              1   2
## Truncation:           10000
## Truncation Type:      'wd'
## 
## Status:               0
## Comments:             OK

Expected Loss, Standard Deviation and Variance for Reinsurance Layers

A PPP_Model can directly be used to calculate the expected loss, the standard deviation or the variance of a reinsurance layer: function:

PPPM <- PPP_Model(FQ = 2, t = c(1000, 2000), alpha = c(1, 2), 
                  truncation = 10000, truncation_type = "wd", dispersion = 1.5)
Layer_Mean(PPPM, 4000, 1000)

## [1] 2475.811

Layer_Sd(PPPM, 4000, 1000)

## [1] 2676.332

Layer_Var(PPPM, 4000, 1000)

## [1] 7162754

Expected Excess Frequency

A PPP_Model can directly be used to calculate the expected frequency in excess of a threshold:

PPPM <- PPP_Model(FQ = 2, t = c(1000, 2000), alpha = c(1, 2), 
                  truncation = 10000, truncation_type = "wd", dispersion = 1.5)
thresholds <- c(0, 1000, 2000, 5000, 10000, Inf)
Excess_Frequency(PPPM, thresholds)

## [1] 2.0000000 2.0000000 0.9795918 0.1224490 0.0000000 0.0000000

Simulation of Losses

A PPP_Model can directly be used to simulate losses with the corresponding collective model:

PPPM <- PPP_Model(FQ = 2, t = c(1000, 2000), alpha = c(1, 2), 
                  truncation = 10000, truncation_type = "wd", dispersion = 1.5)
Simulate_Losses(PPPM, 10)

##           [,1]     [,2]     [,3]     [,4]     [,5]     [,6]
##  [1,] 1130.327 3691.225 1211.200      NaN      NaN      NaN
##  [2,] 1642.075 2249.313 1676.275      NaN      NaN      NaN
##  [3,] 1260.567 1837.291 2627.613 3723.649      NaN      NaN
##  [4,]      NaN      NaN      NaN      NaN      NaN      NaN
##  [5,] 1317.777 1479.792      NaN      NaN      NaN      NaN
##  [6,] 5625.040 1104.465      NaN      NaN      NaN      NaN
##  [7,] 2579.793 1072.609 1251.572 2593.910 3937.548 7350.691
##  [8,] 1530.694 1010.830 4015.490 2122.978 2041.165      NaN
##  [9,] 1152.159 2894.500 1565.685 3404.412 3112.052      NaN
## [10,] 1255.690 4492.307 4820.611 1418.448      NaN      NaN

The function Simulate_Losses returns a matrix where each row contains the losses from one simulation.

Note that for a given expected frequency FQ not every dispersion dispersion < 1 is possible for the binomial distribution. In this case a binomial distribution with the smallest dispersion larger than or equal to dispersion is used for the simulation.

Generalized Pareto Distribution

Definition: Let t > 0 and α_ini, α_tail > 0. The generalized Pareto distribution GenPareto(t, α_ini, α_tail) is defined by the distribution function $$ F_{t,\alpha_\text{ini}, \alpha_\text{tail}}(x):=\begin{cases} 0 & \text{ for $x\le t$} \\ \displaystyle 1-\left(1+\frac{\alpha_\text{ini}}{\alpha_\text{tail}} \left(\frac{x}{t}-1\right)\right)^{-\alpha_\text{tail}} & \text{ for $x>t$.} \end{cases} $$ We do not the standard parameterization from extreme value theory but the parameterization from Riegel (2008) which is useful in a reinsurance context.

Distribution function and density

The functions pGenPareto and dGenPareto provide the distribution function and the density function of the Pareto distribution:

x <- c(1:10) * 1000
pGenPareto(x, t = 1000, alpha_ini = 1, alpha_tail = 2)

##  [1] 0.0000000 0.5555556 0.7500000 0.8400000 0.8888889 0.9183673 0.9375000
##  [8] 0.9506173 0.9600000 0.9669421

plot(pGenPareto(1:5000, 1000, 1, 2), xlab = "x", ylab = "CDF(x)")

dGenPareto(x, t = 1000, alpha_ini = 1, alpha_tail = 2)

##  [1] 1.000000e-03 2.962963e-04 1.250000e-04 6.400000e-05 3.703704e-05
##  [6] 2.332362e-05 1.562500e-05 1.097394e-05 8.000000e-06 6.010518e-06

plot(dGenPareto(1:5000, 1000, 1, 2), xlab = "x", ylab = "PDF(x)")

The package also provides the quantile function:

qGenPareto(0:10 / 10, 1000, 1, 2)

##  [1] 1000.000 1108.185 1236.068 1390.457 1581.989 1828.427 2162.278 2651.484
##  [9] 3472.136 5324.555      Inf

Simulation:

rGenPareto(20, 1000, 1, 2)

##  [1] 1292.057 1472.876 1070.442 5463.614 2544.848 1475.913 4798.516 1200.637
##  [9] 3231.748 1678.502 1660.534 1608.730 1630.249 2108.869 1240.013 1304.851
## [17] 1051.637 1682.737 1056.447 1303.661

Layer mean:

GenPareto_Layer_Mean(4000, 1000, t = 500, alpha_ini = 1, alpha_tail = 2)

## [1] 484.8485

Layer variance:

GenPareto_Layer_Var(4000, 1000, t = 500, alpha_ini = 1, alpha_tail = 2)

## [1] 908942.5

Maximum likelihood estimation of the alpha_ini and alpha_tail

Let t > 0 and α_ini, α_tail > 0 and let X_i ∼ GenPareto(t, α_ini, α_tail). For known t the parameters α_ini, α_tail can be estimated with maximum likelihood.

Example:

Generalized Pareto distributed losses with t := 1000 and α_ini = 1, α_tail = 2:

losses <- rGenPareto(10000, t = 1000, alpha_ini = 1, alpha_tail = 2)
GenPareto_ML_Estimator_Alpha(losses, 1000)

## [1] 0.9824642 2.0823929

Reporting thresholds and censoring of losses can be taken into account as described for the function Pareto_ML_Estimator_Alpha.

losses_1 <- rGenPareto(5000, t = 1000, alpha_ini = 1, alpha_tail = 2)
losses_2 <- rGenPareto(5000, t = 1000, alpha_ini = 1, alpha_tail = 2)
reported <- losses_2 > 3000
losses_2 <- losses_2[reported]
losses <- c(losses_1, losses_2)
GenPareto_ML_Estimator_Alpha(losses, 1000)

## [1] 0.6844721 2.1299510

reporting_thresholds_1 <- rep(1000, length(losses_1))
reporting_thresholds_2 <- rep(3000, length(losses_2))
reporting_thresholds <- c(reporting_thresholds_1, reporting_thresholds_2)
GenPareto_ML_Estimator_Alpha(losses, 1000, 
                             reporting_thresholds = reporting_thresholds)

## [1] 1.032185 1.997527

limits <- sample(c(2500, 5000, 10000), length(losses), replace = T)
censored <- losses > limits
losses[censored] <- limits[censored]
reported <- losses > reporting_thresholds
losses <- losses[reported]
reporting_thresholds <- reporting_thresholds[reported]
censored <- censored[reported]
GenPareto_ML_Estimator_Alpha(losses, 1000, 
                             reporting_thresholds = reporting_thresholds)

## [1] 0.9388486 5.7980265

GenPareto_ML_Estimator_Alpha(losses, 1000, 
                             reporting_thresholds = reporting_thresholds, 
                             is.censored = censored)

## [1] 1.013396 2.152538

Truncation

Let X ∼ GenPareto(t, α_ini, α_tail) and T > t. Then X|(X < T) has a truncated generalized Pareto distribution. The Pareto functions mentioned above are also available for the truncated generalized Pareto distribution.

PGP_Models (Panjer & Generalized Pareto Models)

A PGP_Model object contains the information required to specify a collective model with a Panjer distributed claim count and a generalized Pareto distributed severity.

Claim count distribution: Like in a PPP_Model the claim count distribution from the Panjer class is specified by the expected frequency E(N) (attribute FQ of the object) and the dispersion D(N) := Var(N)/E(N) (attribute dispersion of the object).

Severity distribution: The generalized Pareto distribution is specified by the parameters t, alpha_ini, alpha_tail and truncation.

A PPP_Model object can be created using the constructor function:

PGPM <- PGP_Model(FQ = 2, t = 1000, alpha_ini = 1, alpha_tail = 2, 
                  truncation = 10000, dispersion = 1.5)
PGPM

## 
## Panjer & Generalized Pareto model
## 
## Collective model with a Negative Binomial distribution for the claim count and a generalized Pareto distributed severity.
## 
## Negative Binomial Distribution:
## Expected Frequency:   2
## Dispersion:           1.5 (i.e. contagion = 0.25)
## Generalized Pareto Distribution:
## Threshold:            1000
## alpha_ini:            1
## alpha_tail:           2
## Truncation:           10000
## 
## Status:               0
## Comments:             OK

Methods for PGP_Models

For PGP_Models the same methods are available as for PPP_Models:

PGPM <- PGP_Model(FQ = 2, t = 1000, alpha_ini = 1, alpha_tail = 2, 
                  truncation = 10000, dispersion = 1.5)
Layer_Mean(PGPM, 4000, 1000)

## [1] 2484.33

Layer_Sd(PGPM, 4000, 1000)

## [1] 2756.15

Layer_Var(PGPM, 4000, 1000)

## [1] 7596365

thresholds <- c(0, 1000, 2000, 5000, 10000, Inf)
Excess_Frequency(PGPM, thresholds)

## [1] 2.0000000 2.0000000 0.8509022 0.1614435 0.0000000 0.0000000

Simulate_Losses(PGPM, 10)

##           [,1]     [,2]     [,3]     [,4]     [,5]
##  [1,] 6281.433 2919.929 1240.720      NaN      NaN
##  [2,] 7349.580      NaN      NaN      NaN      NaN
##  [3,]      NaN      NaN      NaN      NaN      NaN
##  [4,] 2035.615 2887.031 4406.759 1090.022      NaN
##  [5,]      NaN      NaN      NaN      NaN      NaN
##  [6,] 1142.625 4996.268 4987.864 1403.999      NaN
##  [7,] 3231.888 2042.610 7329.381 6640.051 1251.552
##  [8,] 1021.384 1335.786      NaN      NaN      NaN
##  [9,]      NaN      NaN      NaN      NaN      NaN
## [10,] 1884.719 1150.568      NaN      NaN      NaN

References

Fackler, M. (2013) Reinventing Pareto: Fits for both small and large losses. ASTIN Colloquium Den Haag

Johnson, N.L., and Kotz, S. (1970) Continuous Univariate Distributions-I. Houghton Mifflin Co

Philbrick, S.W. (1985) A Practical Guide to the Single Parameter Pareto Distribution. PCAS LXXII: 44–84

Riegel, U. (2008) Generalizations of common ILF models. Bl"{a}tter der DGVFM 29: 45–71

Riegel, U. (2018) Matching tower information with piecewise Pareto. European Actuarial Journal 8(2): 437–460

Schmutz, M., and Doerr, R.R. (1998) Das Pareto-Modell in der Sach-Rueckversicherung. Formeln und Anwendungen. Swiss Re Publications, Zuerich