Jump to content
Whatbird Community

Citizen Science Data Quality Conservation and Personal Birding Data Projects


AlexHenry

Recommended Posts

Okay, so I’ve been thinking about data quality and coverage patterns a lot lately. Felt like starting a discussion though I won’t be surprised if no one responds.

First of all, I want to say that I think eBird is an excellent resource both for birders and for researches and for those who manage conservation efforts.

Comparing eBird data to iNaturalist data, I think eBird data must be way better. One of the fundamental issues with iNaturalist is that it collects presence/absence data but not relative abundance data. The other issue - and this is a big one - is that it requires observations to be photographed. This leads to major biases in the data, since certain species are much more difficult to photograph than others. The result - small birds flitting around way up in a tree get ignored, but oh! Look at that perched Bald Eagle over there posing for a pic.

But eBird is not without its issues. Mainly, I think that large and easily detectable species are over represented in the data, while smaller less detectable species are underrepresented by comparison. A great example of this in my area would be Golden Eagles and Oak Titmice in the Interior Coast Range of California. Golden Eagle, when soaring, can be visible from a mile away. So if you are anywhere within a one-mile-radius circle of that Golden Eagle, chances are you’ll detect it. Meanwhile, there may be dozens and dozens of Oak Titmice in that same area, but only the ones you pass close by are detected. So, you end up with checklists that have like 5-10 Golden Eagles (probably the same 2 or 3 highly detectable birds seen repeatedly) and only have 5-10 Oak Titmice (even though there are probably many more in the general area that went undetected). Obviously Oak Titmice are actually more numerous, but since the data is biased by detectability, the true relative abundance is hard to figure out.

Perhaps a bigger data quality issue is chasing rare birds. Simply put the mob mentality and fear of missing out that birders have leads to over-reporting of rare birds relative to common ones.

So how can these issues be rectified? I think part of the responsibility falls on us, the users, and part of the responsibility falls on eBird. As eBird users, I think we should be less conservative when counting or estimating common species, especially those that are less detectable. Perhaps in some cases it is better to err on the side of being liberal than conservative with counts. Be conservative with the big super detectable stuff, but liberal with the smaller less detectable stuff. And then chasing rare birds - I hesitate to condemn it as I’ve done plenty of it. But maybe try to minimize it? At the end of the day your list size doesn’t matter as much as the quality of the data. 

Anyone have anything to add? Agree/disagree? Other issues I didn’t mention? Other methods for improving our data quality as users? Want to defend iNat?

  • Like 7
Link to comment
Share on other sites

Agree wholeheartedly. However, I don’t fully grasp wanting to report 10 Oak Titmice if you detected 7 for sure. I see where your conjunctivitis from with that though. EBird wants us to report everything we saw and heard, not everything that we knew was there but didn’t detect. Heck, every checklist would be a lot higher species count wise if that were the case. Bird places that aren’t birded often. Go on eBird and enter a common bird in your area. See where there are gaps in the data where it hasn’t been reported. If house finch hasn’t been reported at a spot in my area, nothing has. Find a park or something in that area and go. Discover new places. Don’t chase every rarity. Don’t only go to a place and bird the best area there. There is a hotpsot by me that is one of, if not the top birded hotspots in the state. However, hardly anybody checks the back half of the place. Only the front 7 ponds are checked frequently. The back 4 are often completely ignored. Bird underbirded sites. Spend a longer time at a hotpsot, see how many birds you can find. Spend less time a hotpsot, see how many birds you can find. Think outside the box. 

  • Like 7
Link to comment
Share on other sites

I think you've left out a third group that bears some of the responsibility: the researchers using the data. Ebird may not be a good way to calculate the relative number of Golden Eagles and Oak Titmice. But it can be useful for for estimating whether there are more or less Golden Egles or Oak Titmice present in a specific time span. A good scientist should thoroughly understand how the data is produced and what its limits are.

  • Like 6
Link to comment
Share on other sites

25 minutes ago, aveschapinas said:

I think you've left out a third group that bears some of the responsibility: the researchers using the data. Ebird may not be a good way to calculate the relative number of Golden Eagles and Oak Titmice. But it can be useful for for estimating whether there are more or less Golden Egles or Oak Titmice present in a specific time span. A good scientist should thoroughly understand how the data is produced and what its limits are.

Great point here

Link to comment
Share on other sites

1 hour ago, IKLland said:

Go on eBird and enter a common bird in your area. See where there are gaps in the data where it hasn’t been reported. If house finch hasn’t been reported at a spot in my area, nothing has.

This is a good idea that I hadn’t done before, I think I’ll try it out!

  • Like 1
Link to comment
Share on other sites

This is a huge topic, and one I find very interesting. Thanks for the thoughtful post, Alex. 

I’ll be happy to contribute to this thread and have some initial thoughts. First, data quality is an ever evolving topic, one warranting numerous PhD dissertations and endless study, but at a very general level, my feeling is that one big data problem for eBird is simply the system itself. At present, There are over 900,000 eBirders and almost 83 million complete checklists. To the best of my knowledge there are less than 4,000 regional reviewers worldwide. The math simply doesn’t check out. 

While reviewers don’t review every sighting, the filters do. And the filters are overseen by reviewers. Yet, the task over overseeing all the data is simply hurculean. For example, in my county there are three active reviewers and over 500 species in the filter, plus hybrids, slashes, and spuhs. It would be virtually impossible to go through every single species filter one by one, fine tune it (this alone is something that could be a whole topic), then run it and review all the records that end up in the queue. On top of that, there is daily monitoring of checklists for duration, distance, correct protocol use, and accuracy in location. There is also the task of checking species maps for status and distribution, and monitoring media that is submitted (that’s a female CITE, not a BWTE). These are just some of the issues, plus many smaller ones. These problems are not unique to my county, so multiply that by how many filters regions there are globally, add in some unique issues that others face that my county doesn’t, and voila, you have eBird. In essence, the dataset is just too big. I don’t know much about statistics, but my sense is that some of the large scale data lies outside of acceptable error rates. 

As far as users go, anyone with internet access can submit eBird data. One suggestion I have heard thrown around is to require all users to take a short course and and answer a survey or questionnaire. I doubt tha would really be effective.

Without opening Pandora’s box, I’ll say that I think the answer lies with AI and computer trained models. I can envision one day where reviewers are not needed, and users don’t need to flag incorrect media, the computer is smart enough to do it for us. I don’t want to go too far down this rabbit hole, but this may be the only way to effectively create really good, accurate data for eBird.

I don’t know iNat, so I won’t comment there, but I appreciate hearing what people think are it’s greatest strengths as well as it’s shortcomings.

I was just recently asking a well-known reviewer about eBird data and how scientist use it, and they didn’t really know. This is a very valid question and one I would love to know more about.

  • Like 6
Link to comment
Share on other sites

Ok, did some quick math. Actually, at last estimate there were about 3,500 reviewers worldwide, not all of whom are active. Even so, take that number and divide 83 million compete checklists by it. That works out to each reviewer being responsible for the data in almost 24,000 complete checklists. Obviously, this is just an exercise, but it gives you a sense of scale for how much data a small number of people is tasking with overseeing. 

If one reviewer looked closely at 10 checklists each day and responded to inaccurate data, of which there would be some, it would take 6.5 years to get through that amount of data. And that’s assuming the number is stagnant, which it’s not as more and more checklists are submitted each day. I need to sit down, I’m starting to feel dizzy.

  • Like 3
Link to comment
Share on other sites

My issue with the rarities is they can drastically skew the percentages in the "Target Birds" tool on eBird when every twitcher in the area wants them on their list, making very rare birds appear more common than birds that are plentiful but difficult to detect (e.g. owls, nightjars, seabirds, and certain warblers in tall trees).

For example, in my home county of Clark, Nevada, the Rufous-backed Robin isn't far off the top 100 target birds for January, and it's fairly easy to see over 100 birds in January here. Naturally, you'd come to the conclusion that you have a decent chance of seeing a bird that is actually incredibly rare this far north. It's even above alpine residents like the Mountain Chickadee that are difficult to observe due to requiring snow chains.

Whenever I'm about to travel, I use that tool a lot to find interesting new birds, and it can paint a different picture than the actual distribution in underbirded areas.

Edited by Zoroark
  • Like 5
Link to comment
Share on other sites

I think it would be a great help to have each county have like 7 reviewers, but they don’t all do everything typical reviewers do. One or two people would flag incorrect media. One or two can manage locations and checklist duration, distance, etc. One or two can review rare bird reports and high counts. One or two can review specials maps for status and distribution. This way, you have more volunteers, but the role each volunteer is playing isn’t very time consuming. I get that in some underbirded areas this will be impossible, but at least for the heavily birded areas like coastal California, it can be done. 

  • Like 1
Link to comment
Share on other sites

iNaturalist is hugely beneficial for anyone interested in more than just birds.  The species maps, similar species and just overall breadth are a godsend for anyone taking pics of more.  Their AI identification is amazingly accurate, or at least close enough to get you in the ballpark.  You do not have to have a photograph to enter an observation, but it likely to never get “approved”.  But you can have it for your own records now.  For just birds though, eBird is the place.

 

I am always a bit more liberal with my common species.  I know there are some that will only count what they see at the same time.  I don’t unless it is highly probably to be same common birds I just saw 5 minutes prior.  Northern Cardinals singing from 10 different locations are probably 10 different birds on territory.  I also assume that I’ve missed some of the common birds at some point.  

 

Rare birds are definitely a problem for the Target Species at the county level.  It isn’t just the mass of people, but some historical data can skew things.  Right now Evening Grosbeak is #12 on my Year Needs list for the county.  The only reason it is that high are some historical data entered from 77-78 when the massive irruption happened into the area.  I try to overcome some of this for my own data, by using the data from all neighboring counties also.  Evening Grosbeak now moves down to #18, still skewed high, but at least it makes more sense now when compared to the other birds.  My data is really only skewed severely, when I try to attach codes to them (1-6) by a Harlequin Duck.  I add in a component of timing (how many periods are they seen in) to come up with a way to identify the rarity.  This HADU is only record in the 8 county area, but it stuck around for so long it ends up coding as a Code 2 bird.  

 

I think eventually the resources will be there to use “AI” for various functions.  Review for potentially problematic birds, a different type of filtering setups.  Also more regional definitions rather than just at the county level; Farmland-“X” County, Woodlands-“X” County, Prairies….mountains, swampland, etc.  We probably get Hotspot Target Species at some point sooner than the habitat view.  

Link to comment
Share on other sites

18 hours ago, DLecy said:

 

I was just recently asking a well-known reviewer about eBird data and how scientist use it, and they didn’t really know. This is a very valid question and one I would love to know more about.

They have a specific tab related to the science of eBird.  

These are some of the various published works that have used eBird data.

https://science.ebird.org/en/research-and-conservation/publications

 

  • Like 1
Link to comment
Share on other sites

10 minutes ago, chipperatl said:

They have a specific tab related to the science of eBird.  

These are some of the various published works that have used eBird data.

https://science.ebird.org/en/research-and-conservation/publications

 

Awesome! Thanks for this. Lots of this literature is way above my pay grade, but it’s cool to know how some of the data is being used!

Link to comment
Share on other sites

Just now, DLecy said:

Awesome! Thanks for this. Lots of this literature is way above my pay grade, but it’s cool to know how some of the data is being used!

There are some other projects if you go to the Science tab, such as their Shorebird project.  so the data is definitely getting used.  Even if as you said, and I agree, it is way above my pay grade to understand how it was used.  

  • Like 1
Link to comment
Share on other sites

19 hours ago, DLecy said:

At present, There are over 900,000 eBirders and almost 83 million complete checklists.

You don't cite the source of this number.  I recall from a recent ABA podcast that eBirders break down by regularity of activity, just like any other group of hobbyists, volunteers, and weekend warriors.  A large majority are short-timers who posted a half-dozen lists in a week and are never seen again.  Another slice are infrequent, posting every few weeks or months when they see something out of the ordinary.  I recall less than 20% are birders who use the site regularly (weekly or more frequently), and not all of them report from the same locations on an ongoing basis.

So the ratio of reviewers to ACTIVE eBirders is more favorable than the raw number of members makes it appear.  (But yeah, more reviewers would be helpful!)

Edited by Charlie Spencer
  • Like 1
Link to comment
Share on other sites

31 minutes ago, Charlie Spencer said:

You don't cite the source of this number.  I recall from a recent ABA podcast that eBirders break down by regularity of activity, just like any other group of hobbyists, volunteers, and weekend warriors.  A large majority are short-timers who posted a half-dozen lists in a week and are never seen again.  Another slice are infrequent, posting every few weeks or months when they see something out of the ordinary.  I recall less than 20% are birders who use the site regularly (weekly or more frequently), and not all of them report from the same locations on an ongoing basis.

So the ratio of reviewers to ACTIVE eBirders is more favorable than the raw number of members makes it appear.  (But yeah, more reviewers would be helpful!)

The source is from the eBird homepage. It’s posted there permanently and updates in real time. 

I hear your points. I think the more relevant issue is how many checklists there are to review. From a review perspective, the number of unique eBirders is somewhat irrelevant. The checklists that are submitted are what reviewers review, not the eBirders themselves. Some users submit thousands of complete checklists per year, while others may submit three in a calendar year. One scenario tasks the review process much more than the other.

  • Like 2
Link to comment
Share on other sites

3 hours ago, chipperatl said:

I am always a bit more liberal with my common species.  I know there are some that will only count what they see at the same time.  I don’t unless it is highly probably to be same common birds I just saw 5 minutes prior.  Northern Cardinals singing from 10 different locations are probably 10 different birds on territory.  I also assume that I’ve missed some of the common birds at some point.  

 

I was taught by birders that I respect a lot that you have to take the birds' habits into consideration. If you see a hummingbird 100 yards down the path from a previous one it's likely to be a different individual. But a flock of Bushy-Crested Jays seen again in that range can often be assumed to be the same group, or some of the same individuals at least. The expert birders I know use this kind of strategy. But probably the biggest source of disagreement, in my opinion, is vultures flying overhead. I count the maximum number seen at one time, but others will make guesstimates as to whether there is a different group. If you can't explain to me why you think that I don't buy it...

  • Like 2
Link to comment
Share on other sites

18 hours ago, aveschapinas said:

I was taught by birders that I respect a lot that you have to take the birds' habits into consideration.

This.  If I hear a Pileated and then hear another call a tenth of a mile away, I -assume- it's the same bird.  If I see a Northern Mock and see another a tenth away, I -assume- they're two different birds.  Depending on the day, species, and number I'e already seen, I -assume- there's some overlap and may include every other bird in my count.

The less likely the bird, or the more likely it is to hang out in flocks, the more likely I am to go with the 'max seen at one time' count.

  • Like 2
Link to comment
Share on other sites

Was curious what peaople think about pishing. Those who don’t push or play calls are often seeing more birds. Do you think this can cause skews in the data as well? If people don’t push, don’t see as many birds, or people that do push see more birds, this can cause skewed data in my opinion. I’m not saying don’t pish, and I’m not telling anyone to change what they’re doing. Just throwing out the thought. 

Link to comment
Share on other sites

5 minutes ago, IKLland said:

Was curious what peaople think about pishing.

Pishing is a practical joke that long-time birders pull on newbies, so the old timers can watch a kid walk around getting spit on his shirt while they laugh behind his back.  :classic_blink:

Nate Swick on the ABA podcast has wondered about reporting Merlin detections for breeding bird surveys.  Officially Merlin is not acceptable, but what if it detects birds you didn't hear?  What's more important, the data or how it's gathered?  Merlin is only going to get more accurate; for most of us, our hearing isn't.

Edited by Charlie Spencer
  • Haha 4
Link to comment
Share on other sites

2 minutes ago, Charlie Spencer said:

Pishing is a practical joke that long-time birders pull on newbies, so the old timers can watch a kid walk around getting spit on his shirt while they laugh behind his back.  :classic_blink:

Nate Swick on the ABA podcast has wondered about reporting Merlin detections for breeding bird counts.  Officially Merlin is not acceptable, but what if it detects birds you didn't hear? 

I submit birds Merlin picks up as long as I heard the sound myself. If Merlin picks up a Yellow Warbler, but I didn’t hear a yellow warbler myself, I don’t submit it. 

  • Like 2
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...